How Lexical Analysis Works

A quick guide to understanding your vocabulary analysis results.

What does this tool do?

The Lexical Analysis tool takes any English text or subtitle file and breaks down its vocabulary using the Common European Framework of Reference (CEFR) scale, ranging from A1 (beginner) through to C2 (proficient). It also identifies words from the Academic Word List (AWL), which are commonly found in academic and professional texts.

Think of it as a vocabulary profile of your text. It tells you how much of the content sits at each difficulty level, so you can understand what to expect before you start reading or watching.

Who is it for?

1

Language learners who want to know how challenging a book, article, or movie will be before they commit to it. Upload the subtitles or paste the text, and you'll see exactly what level of vocabulary you'll encounter.

2

Teachers who want to evaluate whether a text is appropriate for their students' level, or who need data to support material selection decisions.

Total Words vs. Unique Words

When you see your results, you'll notice two key numbers:

Total Words

The total number of words in the text, including repetitions. If the word "the" appears 50 times, it counts as 50.

Unique Words

The number of distinct words. If "the" appears 50 times, it only counts once.

Why does unique word count matter?

Unique words tell you the range of vocabulary in the text. A movie might have 10,000 total words but only 1,800 unique words. That unique count tells you how many different words you'll actually need to understand. It's one of the most useful numbers for gauging how rich the vocabulary is.

Where should you focus?

In most English texts, the majority of words will fall into the A1 category. These are the most basic, high-frequency words like "the", "is", "have", "go" -- words that almost every learner already knows. That's completely normal and expected.

The real insights come from looking at the levels beyond A1:

A2 Elementary: everyday vocabulary that early learners are building
B1 Intermediate: words you'd encounter in news and conversation
B2 Upper-intermediate: more nuanced, topic-specific words
C1 Advanced: sophisticated and less common vocabulary
C2 Proficient: rare, literary, or highly specialized words
AWL Academic Word List: common in academic and professional texts

These are the levels that reveal the true difficulty of a text and show you which words are worth studying. If a text has a high percentage of B2+ and academic vocabulary, you know it's going to be a challenge, and also a great learning opportunity.

Free for everyone

This tool is 100% free and always will be. It was specifically engineered so that it could be offered to everyone at no cost: no sign-up walls, no usage limits, no premium tier.

To make that possible, there is a small trade-off in accuracy compared to tools and processes that are more resource-intensive.

Despite this, the tool achieves approximately 90% accuracy in its level assignments. For understanding the overall difficulty profile of a text, that level of accuracy is more than sufficient, and you get instant results without waiting or paying.

What can you do with it?

  • Upload the subtitles of a movie or TV show to see if the vocabulary matches your level before you watch it.
  • Paste a news article or book excerpt to understand what new words you might learn from reading it.
  • Compare two texts to see which one has more vocabulary at your target level.
  • Use the highlighted text view to spot exactly which words are at each CEFR level within the text.
Comparing Two Texts

What makes one text harder than another?

When you compare two texts side by side, the tool gives you three key dimensions to consider. Each one tells you something different about how demanding the content is.

Total word count

The most straightforward measure. A text with 15,000 words is simply longer than one with 5,000 words. More words means more time and sustained concentration required.

For subtitles, total word count also reflects how dialogue-heavy the content is. An action movie might have far fewer words than a courtroom drama, even if both are the same length.

A higher total word count doesn't necessarily mean harder vocabulary. It just means there's more of it to get through.

Vocabulary range (unique words)

This is where things get more interesting. Two texts can have a similar total word count but very different unique word counts. The one with more unique words uses a wider range of vocabulary, which usually makes it more demanding.

Text A: 12,000 total / 1,200 unique

Repeats the same words often. Patterns become familiar quickly.

Text B: 12,000 total / 2,800 unique

Uses a much wider variety of words. More new vocabulary to absorb.

A text with more unique words exposes you to more vocabulary, which is great for learning -- but it also means you're encountering new words more frequently, which can make reading or listening harder.

Difficulty distribution

This is the most revealing dimension. It's not just about how many unique words a text has, but where those words sit on the CEFR scale.

The comparison chart and table break down the unique word count by level. When comparing two texts, pay attention to the Diff column, which shows the difference in unique words at each CEFR level.

More B2, C1, and C2 unique words = harder vocabulary

If one text has significantly more unique words at B2 and above, it uses more advanced, less common vocabulary. These are the words that learners at intermediate levels are less likely to know.

More AWL unique words = more academic content

A text with more Academic Word List entries is likely more formal or topic-specific. Documentaries and news tend to score higher here than casual conversation or sitcoms.

For example, if Text B has 20 more C1 unique words and 15 more AWL unique words than Text A, then Text B uses notably more advanced and academic vocabulary -- even if both texts have a similar total word count.

Putting it all together

No single number tells the full story. When comparing two texts, consider all three dimensions together:

1

Total words tells you how much content there is to process.

2

Unique words tells you how varied the vocabulary is.

3

CEFR distribution tells you how advanced that vocabulary is.

A text with fewer total words, fewer unique words, and most of its vocabulary at A2-B1 will generally be easier than a text with more words, more variety, and a heavier concentration of B2+ vocabulary. Use the comparison view to see these differences at a glance.