Extrinsic Evaluation: Shakespeare Language Models

Compare text generation across all trained models

Generate Shakespearean text using all available models (Classical N-grams, Neural N-grams, and GPT) to compare their outputs side-by-side.

Loaded Models Summary:

All models trained on Shakespeare's complete works using BPE tokenization.

Context/Prompt

Enter starting text for generation (all models will continue from this context)

Max Length

Maximum tokens to generate

10 100

Temperature

Randomness (higher = more creative)

0.1 2

Generated Text Comparison

Outputs from all models organized by type

About Extrinsic Evaluation:

This interface provides a qualitative comparison of text generation capabilities across your three model types:

Classical N-grams: Statistical models using frequency-based probability estimation
Neural N-grams: Neural networks with embeddings and learned representations
GPT Models: Transformer-based autoregressive language models with self-attention

Evaluation Criteria to Consider:

Fluency: How natural and readable is the generated text?
Coherence: Does the generated text maintain logical flow and context?
Shakespearean Style: How well does it capture Shakespeare's language patterns?
Diversity: How varied and creative are the outputs?
Context Preservation: How well does each model continue from the given prompt?

Note: Validation perplexity scores are shown in parentheses - lower scores indicate better performance on held-out data.