A first look at Section Analytics.: Brightroom

Most candidates know their score. They do not know where their score is bleeding. A 645 on a full-length mock is a number. It says almost nothing about whether the missed points came from a single weak DI sub-skill, from pacing drift in the back half of Verbal, or from a slow accumulation of small errors across the entire Quant section.

Section Analytics is the surface we have been building to close that gap. v1 enters private alpha this week with fifty candidates drawn from the Pro 4-month and Six-Month cohorts. This is the first look. The beta widens when the alpha tells us which views are worth the candidate's attention.

What the v1 surface shows

Three signals per section, deliberately. We resisted the usual instinct to ship a dashboard. Each of the three was chosen because it drives an actual change in what the candidate should do next.

Quant · predicted band

Predicted665 ± 18

Percentile62nd

Median time / Q2:14

The v1 Quant panel for an alpha-cohort candidate. The interval, not the point estimate, is the headline number. The percentile and pacing readouts sit below because both change in conversation with the band.

Predicted band, with an interval. The engine simulates the candidate's next mock against the current item pool many times and reports the central interval of the result. Early-cohort candidates see wide intervals; candidates a week out from a target date see tight ones. We chose to ship an interval, not a point estimate, because a point estimate of a noisy thing is a guess wearing a tie.

Per-section percentile. Scaled scores are noisy across calibration revisions. Percentiles, normalized against the current test-taker population, are more honest. The panel reports each section's percentile alongside its band, refreshed every session. A candidate whose scaled score sits flat while their percentile drops three points week-over-week is being outpaced by the rest of the population. That is a real signal even though the surface score reads stable.

Median time-per-question. Not the average: the median. Across the last fourteen days of the candidate's sessions, per section. The pacing change in April 2024 made the engine route on time-on-task internally; the v1 surface now shows the candidate the same number the engine has been reading.

How the band is computed

Standard IRT-based simulation. Specifically: take the candidate's current mastery vector, sample a full section's worth of items at the difficulty distribution the live test serves, score the simulated section under the 3PL response model, repeat several thousand times, and report the 20th–80th percentile of the resulting distribution as the band.

This is how serious IRT-based test simulators have worked for decades. We are not claiming a new method. We are claiming a clean implementation of one that most consumer prep platforms either skip entirely or hide behind a single-number readout that no statistician would sign off on.

What is explicitly not in v1

Three views lived in the prototype for months and did not ship this week.

Topic-level mastery deltas. Each sub-skill the candidate's mastery vector tracks has a week-over-week delta the engine already computes internally. Surfacing them in v1 was the most-tempting and the most-discussed line on the panel. We held them because the candidate-facing language for thirty sub-skill deltas is not yet good enough; surfaced without that language, the column becomes a spreadsheet, and the candidate's attention is the asset we are trying to use carefully.

Peer benchmarking. The engine can compare a candidate's pacing and mastery against candidates targeting the same score band. The privacy review is finished and the candidate-consent flow is not. We will not ship one without the other.

Item-level commentary. A version that surfaces, for every missed item, the engine's read on why the candidate missed it. The diagnoses are crisp on roughly two-thirds of items and generic on the rest. Uneven feedback is worse than no feedback. The view returns when the diagnosis quality is uniform.

Why a fifty-candidate alpha

Two reasons we are not opening this to every Pro subscriber today.

The first is that we do not yet know which of the three views drives behavior. Internal usage is not candidate usage. Our hypothesis is that the median-time readout will be the most-glanced view in the wild. But our hypothesis is not data. A fifty-candidate cohort, instrumented carefully, gives us a behavioral read in a few weeks that the team's intuition will not.

The second is that the v1 surface still calls the engine in a way that does not yet scale cleanly. The Monte Carlo run that produces the band is cheap on a small alpha and expensive at scale. The engineering work to make it cheap at scale is on the next sprint, not this one. We would rather ship a surface that is right for a small alpha than the same surface, degraded, for everyone.

Who is in the alpha

Fifty candidates, drawn from Pro 4-month and Six-Month subscribers with at least one diagnostic re-take and a target date inside the next 120 days. We selected the cohort to spread across diagnostic bands so the feedback covers the full range of candidate states. The selection email goes out this afternoon. There is no opt-in form for v1; the v1 cohort is invite-only.

The next note will be the beta. When the alpha tells us which view to keep, which to redesign, and which to cut, we will write it down and open the surface more widely. The cohort beyond fifty is the next decision, not this one.

Brightroom Product

A first look at Section Analytics.

What the v1 surface shows

How the band is computed

What is explicitly not in v1

Why a fifty-candidate alpha

Who is in the alpha

Introducing Brightroom for Institutions: the room, opened to a group.

Introducing Companion: the Brightroom app for iPhone.

Introducing the Brightroom Library.

Cookies on Brightroom