Why we built Break the Test
Two things kept coming up when we talked to students preparing for the SAT, ACT, and AP exams. The first was that they were running out of material. The second was that running out of material was not really their problem.
Start with the supply side. The College Board publishes a small set of full-length digital SAT practice tests. There are around seven in active circulation at any given time. A student who is serious about the test will get through all of them inside a month, usually less. After that, the options are bad. There are recycled questions floating around tutoring forums, mostly typed up from memory, mostly slightly wrong. There are prep books from the major test prep companies that are written to feel like the real thing, but everyone who has taken the real thing can tell the difference within two questions. There is a quiet rationing problem in test prep that nobody talks about, because the people best positioned to fix it are the same people benefiting from the scarcity.
So the first goal was straightforward: a practice bank that does not run out. Items in the shape, difficulty, and texture of the real test, but not copies of it. Practice that prepares a student the same way the official versions do, without the rationing.
The harder problem is the second one. Even with unlimited practice, most students plateau. They run a hundred reading questions, get the same percentage wrong they did at the start, and conclude that they are bad at reading. They are not bad at reading. They are losing points to a small number of hidden patterns, the same ones, over and over, and nobody has named the patterns for them.
We started calling these the hidden tests inside the test. They are not the content. They are the structure underneath the content. A reading question is not really asking whether you understood the passage. It is asking whether you can tell the difference between an answer that overclaims, an answer that quietly changes the scope, an answer that is true but irrelevant to what was asked, and an answer that paraphrases background text instead of the inference the question wanted. Four wrong answers, four different traps. Once you can see the traps, the question becomes easy. Until you can see them, no amount of practice helps.
The same shape shows up in every section. Grammar questions are not really about whether you know commas. They are about whether you can recognize a sentence boundary in two seconds. AP free-response questions are not really about how much you know about the topic. They are about whether you read the prompt verb correctly. Common App supplemental essays are not really about your story. They are about whether you decoded what the school was actually asking, instead of writing the generic essay you had ready.
This is what Break the Test trains. One pattern family at a time, in five-minute rounds, on a phone. After every answer the tool tells you what each of the wrong choices was doing. Not "C was wrong." Why C was wrong. After enough rounds the patterns become visible to you the way they already are to the people who write the questions. Most of the improvement students get from expensive tutoring is this exact transfer. We thought it could be made smaller, faster, and available to anyone with a phone.
Every item we publish is one we would defend to a student who got it wrong. That is not a high bar to state and it is a hard one to clear at scale. Most of the work that is not visible from the outside is the work that gets each item to that bar. The bar is high because the cost of getting it wrong is a student losing trust in the tool.
Nothing we publish is copied from protected exam content. The items are original, in the shape of the pattern family we are trying to train. The College Board does not need our help, and the last thing test prep needs is another product quietly republishing what it should not.
Three modules are live as of this writing. Trapline Reading trains trap-answer recognition on short reading passages. Grammar Switchboard trains sentence-structure intuition on Standard English Conventions items. Rubric Radar trains AP students to read the rubric verb before they read the prompt. Two more modules are in progress: Constraint Slice for the digital SAT math section, where the win is choosing the right representation instead of grinding the algebra, and Prompt Decoder for Common App supplements, where the win is figuring out what the school is actually asking before you write a word.
The thing we keep telling ourselves is that this is not a content product. There are plenty of content products. This is a pattern-recognition product. The content is just the substrate the patterns live in. If a student finishes a few weeks of rounds and starts seeing the hidden test the moment they open a real exam, the tool has worked, even if they could not name a single specific question from inside it.
That is the bet. We will know within a year whether it pays off. Until then it is live, free where we can keep it free, and named after the thing we are actually trying to do.
Related project
Break the Test
Practice the test you actually take.
Other notes
The decisions that don't iterate
Most things you build are reversible. A few are not. Telling them apart is harder than it sounds, and getting it wrong is what most software regret turns out to be.
What 'honest software' means in practice
We use the phrase a lot. It is easy to say. It is harder to specify.
Why we built Vyzrly
College admissions has always been a black box. We wanted to make it a little more honest.
When AI is the wrong tool
The reflex to reach for AI on every problem is a symptom of taste failure, not technical sophistication.
Why we built Glossem
Product copy lives inside code. That is a problem for everyone who is not an engineer.
Why we built USACO Tutor
Competitive programming builds a kind of thinking that matters. We wanted to make that more accessible.