The v1 engine that shipped in March picks the next question properly. It does not yet pick it at the right speed. That gap is the subject of this note, and the subject of the change that went live in the alpha cohort last Wednesday.

Pacing is now a first-class signal inside the selection loop. Time-on-question and the standard error on the candidate's ability estimate move together. The engine is finally allowed to know whether the candidate is running out of clock.

The shape of the problem

Across the four hundred alpha sessions we have logged on v1, the same pattern shows up. The first eight quant items sit close to the two-minute budget. The middle six drift twenty seconds long. The last seven crash — three to four-minute responses, item-skip rates climbing, accuracy dropping by twelve points relative to the same difficulty band earlier in the section.

2:00 target
FASTSLOW
Median time-per-question across 21 Quant items, alpha cohort, March 2024. The back third runs hot. The accuracy drop in that segment is not a difficulty problem.

The accuracy drop in the back third is not driven by item difficulty. v1's Fisher-information selection actually eases the difficulty of the items in the last seven positions, because the candidate's estimated ability has usually settled by then and the engine targets information gain rather than maximum challenge. The drop is driven by time pressure: the candidate is answering items they could answer cleanly with thirty more seconds, and answering them wrong without it.

What v1 was doing about it

Nothing. The v1 loop logged time-on-question. It did not consume it. Selection ran on ability and topic balance. Pacing data sat in the response table, available for analytics, invisible to the loop. That is the gap this change closes.

The change

Pacing now enters the loop in two places. First, the candidate's pacing position — a running estimate of how much time-budget remains relative to items remaining — biases the difficulty target. When pacing is comfortable, the engine selects at maximum information. When pacing is tight, it widens the eligible band downward and prefers items the candidate is likely to answer in under a minute, recovering clock without sacrificing measurement.

Second, the stop condition is now sensitive to pacing collapse. If three consecutive responses run more than 90 seconds over their predicted time-on-task, and the candidate's standard error has stopped tightening, the engine ends the section early rather than burn clock on items it can already see are not going to land cleanly. A short, clean stop is worth more than a long, ragged one.

# v1.1 selection step
theta, se = irt_mle(history, item_params)
pace      = pacing_position(history, section_clock)

eligible  = pool.filter(unseen, topic_balance, exposure_cap)
target    = difficulty_target(theta, se, pace)

scored    = [
    (item, fisher_information(item, theta))
    for item in eligible
    if abs(item.b - target) <= window(pace)
]
next_q    = max(scored, key=lambda x: x[1])

if pacing_collapse(history) and se_stalled(history):
    end_section()

Two changes from the March loop. The eligible pool is filtered against a pace-dependent difficulty window before scoring. The stop condition gets a second predicate. Everything else is what we shipped four weeks ago.

What this fixes, what it does not

Two weeks of v1.1 alpha sessions, against the four preceding weeks of v1.0, on a matched cohort of forty-eight candidates.

The back-third accuracy gap closed by nine points. Items in positions 15–21 are now landing within three points of the accuracy seen at positions 1–8 at the same fitted difficulty. Some of that is pacing recovery; some is the engine routing around items it would otherwise have served at maximum difficulty in a position where the candidate has no clock to think.

Session length tightened by 11%. The median quant section runs 28 minutes instead of 31. Standard error on θ at session end is unchanged within noise. The engine is making the same measurement in less of the candidate's time.

The pacing signal is still noisy at the item-level. Time-on-task is a composite of three things — reading speed, working-out speed, and answer commitment — and the loop currently treats it as a single scalar. The candidates whose reading is slow but whose working-out is fast get treated like the candidates whose working-out is slow, which is the wrong call about half the time. Separating those signals is the v2 pacing work and it will need the calibration pipeline to support it.

What we are not doing

We are not surfacing the pacing signal directly to the candidate in this release. A "you are running slow" prompt mid-section is the kind of feedback that produces anxiety-driven rushing — the failure mode the pacing change was supposed to address. The engine is making the pacing-aware call inside the loop. The candidate sees the consequence (a slightly easier item, a slightly shorter section) without being told what triggered it. That is the right shape of the feedback at this stage.

We will write a longer note when the pacing signal gets decomposed into its three constituents and the loop starts routing on them separately. That work is on the Q3 roadmap.

— Brightroom Engineering