Research

The science of
getting better

Brightroom is built on a quiet, twenty-year-old idea: the test isn’t scoring you, it’s modeling you. We rebuilt that model from first principles. This page is the methodology, the math, and the validation against test-day outcomes.

CohortN = 8,432 candidates
Items21,608 calibrated
Predictive R²0.94 vs. official
Enginev4.9 · 18 hr ago
i.
The engine

One equation, recomputed every twelve seconds.

Every response you give updates a single probabilistic estimate of your ability — your theta, θ — across eight independent skill axes. The engine then asks: which item, of the 21,608 calibrated, carries the most information about θ at this exact moment?

P(uij = 1 ∣ θj) = ci + (1 − ci) · 11 + eaijbi)θj · candidate ability·ai · item discrimination·bi · item difficulty·ci · pseudo-guessing

The next item is selected to maximize Fisher information at the current θ̂ — meaning every question you see is the one most diagnostic of your remaining uncertainty. There is no filler.

θ̂ 1.42
SE 0.21
Q 23 / 64
ABILITY · θP(correct)−3+3θ̂ = 1.42
ii.
Knowledge tracing

A live map of every concept you’ve ever almost understood.

Bayesian knowledge tracing maintains a posterior probability that you have mastered each of eight latent skills — updated after every response with prior, slip, and guess parameters fitted from cohort data. Below: the live state of one candidate at hour 12.

ALGEBRADATA SUFF.GEOMETRYWORD PROBSCRIT. REASONREAD. COMPSENT. CORRQUANT. REAS
SKILLPΔ 24H
Algebra & equations0.78+0.04
Data sufficiency0.62+0.09
Geometry & coordinate0.71+0.02
Word problems0.84+0.01
Critical reasoning0.550.03
Reading comprehension0.69+0.05
Sentence correction0.58+0.07
Quantitative reasoning0.74+0.03
iii.
Knowledge graph

Your knowledge as a graph.

Topic prerequisites & live mastery.

13 nodes · 15 edges
92Number properties
84Algebra
71Inequalities
66Word problems
58Geometry
32Combinatorics
41Probability
62Statistics
74CR · Assumption
69RC · Inference
48Two-Part Analysis
55Multi-Source
51Graphics Interpret.
Mastered · 80%+Stable · 60–79%Improving · 40–59%Weak · < 40%
iv.
Cognitive load profiling

The pace your brain actually wants.

Mean response time vs. accuracy, by topic.

Cohort · n = 2,847
ALGEBRA
DATA SUFF.
GEOMETRY
WORD PROBS
CRIT. REASON
READ. COMP
SENT. CORR
QUANT. REAS
0:080:301:001:302:002:303:00+
Accuracy0%100%
v.
Spaced retrieval

The forgetting curve, defeated.

Retention over thirty days, with and without revisits.

Topic · Inequalities · n = 412
TARGET 70%100%50%0%REVISIT80%<5%
Day 0Day 1Day 3Day 7Day 14Day 30
Ebbinghaus baseline · no reviewBrightroom scheduler · spaced revisitsR(t) = e−t/τ̂
vi.
Validation

Predicted score vs. observed.

Terminal estimate vs. test-day score.

Cohort · n = 8,432 · ≤ 30-day window
r = 0.97805605405OBSERVED · TEST DAY405605805PREDICTED · BRIGHTROOM
Best-fit regressiony = x · parity1 candidate · n = 8,432
vii.
Methodology

Six commitments we won’t break.

Standards we hold the engine to. Documented in BR-VAL-25-01, Appendix B; reviewed quarterly; never quietly relaxed.

I

Calibrate before we ship.

No item enters the live pool until it’s seen at least 400 pre-test responses across a stratified ability range.

II

Ground truth is test-day.

Every model we publish is evaluated against the only outcome that matters: the score on the official report.

III

Calibration drift is monitored weekly.

Item parameters re-fit on rolling 90-day windows. Drift over Δb > 0.4 triggers manual review.

IV

Negative results published.

Every approach we abandoned — multidimensional 4PL, transformer-based scoring, NLP-graded essays — is documented internally.

V

Reproducibility by default.

All validation analyses run on a single seeded notebook. Results are versioned, archived, and re-runnable.

VI

User data, never sold.

Aggregated calibration data stays on our servers. Individual response patterns are never sold or licensed.

viii.
The room is open

The math is on our side.
Now it’s on yours.

Run a five-minute diagnostic. Watch the engine fit a curve through your data — and tell you, within 24 points, the score you’d earn next Saturday.

Validation cohort N = 8,432Engine v4.9 · 18 hr agoBrightroom Research · Zürich · 2026