The SAT Suite Question Bank (SSQB) plays such a huge role in understanding the digital SAT. Yet, I haven't seen any resources that focus on it. Therefore, I spent the whole Lunar New Year analyzing the entire Verbal section (1270 questions).

This blog is a report on my findings.

Specifically, it will first state the methodology used to get and analyze the data. Then, it will present the calculated metrics and explain what it means.

The analysis go through various metrics:

  • Readability scores (i.e how easy is the passage for you to understand)
  • Length of the questions. Specifically, the average time to read the passage + question + 4 answer choices.
  • Cognitive load factors. This is measure by the amount of information one need to retrieve from the passage in order to answer the question.

Warning: This blog will be extremely nerdy.

Nerd Emoij

Grabbing the data from SSQB is surprisingly easy.

Basically, every time you request a set of questions by Domain Scores, the website makes a request to the server, which response with the metadata of questions. Afterwards, if you click on individual questions, the website makes another request to the server. This time, the server returns more detailed data of the question, including the passage, question, and 4 answer choices (named "stimulus", "stem", and "answerOptions", respectively).

Long story short, I filter out only Reading & Writing questions using the UI. Then, I grabbed the questions by using a script that auto-click through each question, write down any server responses made by the website into a file, and then download the file. The data looks something like this:

JSON structure of the data

With the data obtained, we are finally ready to dive into the numbers

After getting the data, it is time to analyze them. This is done with Python along with various Python library, including textstat, nltk, and spacy.

The age old question. Back when I got 1600 on the SAT, I always pick (C) since it's a meme in Vietnam. Now that we have the data, I can finally settle this question.

A table showing the percentage correct of each answer choice

You might look at this and says choice (D) is the best one to pick. However, when we calculate the p-value (basically how likely this event is), it turns out that p=0.1452 > 0.005, which makes this statistically insignificant.

I guess we'll have to wait another day to settle this debate.

Sad Emoji Disappearing Meme Screaming Face

Without further ado, here are the numbers for the entire SSQB:

A table containing all of the metrics

Some background information and comments regarding the metrics:

  • score_band_range_cd: College Board use these internally to label the difficulty of the question. Ranges from 1 to 7. The higher the number, the harder the question is.
  • flesch_reading_ease: A metric used to determine the ease of readability of the passage. Unintuitively, the lower the number is, the harder the passage. The mean score, 47.08, means that SAT passages are quite difficult to understand.
  • grade_level: Another metric for readability. It combines various readability indices to estimate the school grade level required to understand the text.
  • mcalpine_efl: This metric estimate the readability of an English text for a non-native speaker.
  • reading_time metrics: As the name suggests, they are the time required to read the entire passage/question, in seconds. This is calculated with the assumptions that average reading speed of English speaking adults is 238 WPM.
  • reasoning_steps: The amount of step you need to take before answering the question. For example, a score of 0 means that the question can be answered based solely on the information in the passage. A score of 1, on the other hand, requires you to combine 2 different information in the passage to infer 1 new information.
  • distractor_complexity: This measures how misleading the incorrect choices are. Specifically, it measures how similar are the incorrect choices to the passage.

Told you this was gonna be nerdy.

Nerd Emoij

A few grains of salt:

  • The data was taken from SSQB. As such, questions from past tests were not included.
  • The reasoning steps are calculated using machine learning. As such, it reflects more on how a machine, not a human, approach these questions. Still, I think they provide a useful starting point for quantifying the cognitive complexity of the test.
  • This is also the case with the distractor complexity metric.

Personally, I find these metrics reasonable. The readability scores confirm what we all know - that SAT passages are quite challenging, even for native English speakers.

The grade level metric indicates that on average, the SAT passages are written at a level suitable for college freshmen or sophomores, with a mean grade level of 13.59. This is somewhat surprising, considering the SAT is a test primarily taken by high school juniors and seniors. Gotta make it hard to get 1600 somehow I guess.

High-school me trying to understand 2nd-year-college-level passage

The reading time metrics reveal an interesting insight. On average, it takes about 35.33 seconds to fully read a SAT Reading question, including the passage excerpt, question stem, and all 4 answer choices.

The reasoning steps metric is also intriguing. It implies that most SAT questions can be answered either by simply locating the right information and/or taking 1 additional logical step.

For the distractor complexity, a score of 0.48 means that it is moderately misleading. I think this somewhat align with the fact that in a lot of questions, you can eliminate the first 2 incorrect choices quite easily.

Aside from just to satisfy our curiosity, I think we can learn quite a lot from these results.

On average, test takers have around 71 seconds per question on the SAT Reading section (32 minutes for 27 questions).

This means you actually have about twice as much time as needed to just read through each question. Of course, this test does not take into account the time needed for you to figure your thoughts out.

So what should you do with the extra ~35 seconds per question? One strategy could be to read the passage more carefully and thoroughly to make sure that you don't need to read it a second time.

As mentioned, you only need 0-2 logical steps to answer any question. So next time you are stuck at an Inferences or Command of Evidence question, don't over think it. Try to understand the passage and keep it simple.