Extrinsic Evaluation: Shakespeare Language Models

Compare text generation across all trained models

Generate Shakespearean text using all available models (Classical N-grams, Neural N-grams, and GPT) to compare their outputs side-by-side.

Loaded Models Summary:

  • Classical N-grams (Task 2): 4 models
  • Neural N-grams (Task 3): 2 models
  • GPT Models (Task 4): 3 models
  • Total Models: 9 models ready for comparison

All models trained on Shakespeare's complete works using BPE tokenization.

10 100
0.1 2

Quick Examples:


About Extrinsic Evaluation:

This interface provides a qualitative comparison of text generation capabilities across your three model types:

  1. Classical N-grams: Statistical models using frequency-based probability estimation
  2. Neural N-grams: Neural networks with embeddings and learned representations
  3. GPT Models: Transformer-based autoregressive language models with self-attention

Evaluation Criteria to Consider:

  • Fluency: How natural and readable is the generated text?
  • Coherence: Does the generated text maintain logical flow and context?
  • Shakespearean Style: How well does it capture Shakespeare's language patterns?
  • Diversity: How varied and creative are the outputs?
  • Context Preservation: How well does each model continue from the given prompt?

Note: Validation perplexity scores are shown in parentheses - lower scores indicate better performance on held-out data.