Calibrate before we ship.
No item enters the live pool until it’s seen at least 400 pre-test responses across a stratified ability range.
Brightroom is built on a quiet, twenty-year-old idea: the test isn’t scoring you, it’s modeling you. We rebuilt that model from first principles. This page is the methodology, the math, and the validation against test-day outcomes.
Every response you give updates a single probabilistic estimate of your ability — your theta, θ — across eight independent skill axes. The engine then asks: which item, of the 21,608 calibrated, carries the most information about θ at this exact moment?
The next item is selected to maximize Fisher information at the current θ̂ — meaning every question you see is the one most diagnostic of your remaining uncertainty. There is no filler.
Bayesian knowledge tracing maintains a posterior probability that you have mastered each of eight latent skills — updated after every response with prior, slip, and guess parameters fitted from cohort data. Below: the live state of one candidate at hour 12.
R(t) = e−t/τ̂Standards we hold the engine to. Documented in BR-VAL-25-01, Appendix B; reviewed quarterly; never quietly relaxed.
No item enters the live pool until it’s seen at least 400 pre-test responses across a stratified ability range.
Every model we publish is evaluated against the only outcome that matters: the score on the official report.
Item parameters re-fit on rolling 90-day windows. Drift over Δb > 0.4 triggers manual review.
Every approach we abandoned — multidimensional 4PL, transformer-based scoring, NLP-graded essays — is documented internally.
All validation analyses run on a single seeded notebook. Results are versioned, archived, and re-runnable.
Aggregated calibration data stays on our servers. Individual response patterns are never sold or licensed.
Run a five-minute diagnostic. Watch the engine fit a curve through your data — and tell you, within 24 points, the score you’d earn next Saturday.