Extrinsic Evaluation: Shakespeare Language Models
Compare text generation across all trained models
Generate Shakespearean text using all available models (Classical N-grams, Neural N-grams, and GPT) to compare their outputs side-by-side.
Loaded Models Summary:
- Classical N-grams (Task 2): 4 models
- Neural N-grams (Task 3): 2 models
- GPT Models (Task 4): 3 models
- Total Models: 9 models ready for comparison
All models trained on Shakespeare's complete works using BPE tokenization.
10 100
0.1 2
Quick Examples:
About Extrinsic Evaluation:
This interface provides a qualitative comparison of text generation capabilities across your three model types:
- Classical N-grams: Statistical models using frequency-based probability estimation
- Neural N-grams: Neural networks with embeddings and learned representations
- GPT Models: Transformer-based autoregressive language models with self-attention
Evaluation Criteria to Consider:
- Fluency: How natural and readable is the generated text?
- Coherence: Does the generated text maintain logical flow and context?
- Shakespearean Style: How well does it capture Shakespeare's language patterns?
- Diversity: How varied and creative are the outputs?
- Context Preservation: How well does each model continue from the given prompt?
Note: Validation perplexity scores are shown in parentheses - lower scores indicate better performance on held-out data.