The College Board recently announced that they will add 4 more practice tests into Bluebook on February 3th. Naturally, students are quite confused and anxious: will the test format change? Will it be harder or easier? And most importantly, is choice D still the most common answer?
Now that practice tests 7-10 have been officially released, I will analyze them to answer all of those questions as well as provide you with deeper insights into how they compare to the previous tests.
First, we will talk about how I get all of the data from Bluebook. Then, I will crunch the numbers and compare it with practice tests 1-4, since the new tests seem to be their replacements. Finally, we will figure out what this means for the future SAT tests.
As opposed to last time, the questions are not in SAT Suite Question Bank (or at least, I don't have their ids). Long story short, with the help of Claude AI, I wrote a script that:
- detects and split up the passage, the question, and the 4 answer choices
- uses OCR (image to text) to extract them out
- store all of these into a CSV file.
Here's how the region detection looks like:

And here's how the results looks like:

With these data, we can now starts crunching the numbers.
Since the metrics are the same as last time, I will only go through them briefly. You can check out the previous blog, where I explained the metrics in more detailed.
Before going deeper into the advance stuffs, I have a personal beef that is needed to be resolved.
With data from the SAT question bank, I couldn't determine an objectively correct answer to this question. Let's look at the new tests instead.

Again, choice (D) seems to be the best answer choice, along with choice (C). Unfortunately, due to the small size of the data set, the p-value for this event turns out to be way higher than 0.05, which makes it considered statistically insignificant. There goes my dream again. 😤
On a more serious topic, let's look at what we are here for: the new tests difficulty.
Let's cut right to the chase. Here are the figures for the practice tests 7-10, placed along side that of practice tests 1-4.

Overall, the differences between Tests 1-4 and Tests 7-10 are relatively small, indicating minimal changes in the test structure. However, there are a few notable shifts worth mentioning.
The grade level has increased slightly from 13.59 to 13.94, suggesting a moderate rise in reading difficulty. While this change is not drastic, it could indicate that the text demands slightly more advanced comprehension skills.
Interestingly, despite this increase in grade level, the text appears to have become somewhat easier for non-native English speakers. The McAlpine EFL score has decreased from 36.17 to 32.77, meaning that the language used in later tests is likely more accessible to those learning English as a foreign language. This shift might be due to simpler vocabulary, clearer sentence structures, or less idiomatic phrasing.
Overall huge W for the non-native gang.

Another key observation is that test-takers are now spending more time per question. The reading time for the whole question increased by about two seconds on average (from 35.33s to 37.02s), which may indicate slightly longer passages or more complex question wording.
This aligns with the increase in reasoning steps (from 0.648 to 0.727), suggesting that questions may require more logical processing, contributing to the longer response times.
Finally, the complexity of distractors has not changed significantly. The distractor complexity only increased slightly from 0.484 to 0.498, meaning that incorrect answer choices did not become notably harder to distinguish. This suggests that while questions may require more reasoning, the challenge of identifying the correct answer among distractors has remained relatively stable.
In conclusion, while the overall changes between Tests 1-4 and Tests 7-10 are minimal, there are a few notable trends. The slight increase in grade level suggests a modest rise in reading difficulty, but at the same time, the text appears to have become more accessible for non-native English speakers.
These shifts suggest that the test is becoming slightly more demanding in reasoning but potentially clearer in language, which means more accessible for foreign learners.