Calibration

How well do the probabilities match what actually happened?

Backtest — Euro 2024 + Copa América (n = 83)

0.583 Brier score — lower is better · random ≈ 0.667
53.0% Match accuracy · random ≈ 37%

Full backtest (Full backtest (2024, all competitions), n = 290): Brier 0.495, accuracy 63.8% — inflated by friendlies and qualifiers.

Reliability diagram — pooled outcomes

0% 100% 0% 100% Predicted probability Observed frequency 11% predicted, 16% actual (n=69) 27% predicted, 29% actual (n=104) 49% predicted, 42% actual (n=38) 69% predicted, 70% actual (n=23) 84% predicted, 67% actual (n=15)

Each dot is a probability bin pooling home-win, draw, and away-win predictions. Dots on the dashed diagonal = perfect calibration. Dot size scales with sample count.

Per-outcome summary

Outcome Predicted (avg) Observed Gap
Home win 44.9% 38.6% -6.4pp
Draw 21.9% 31.3% +9.4pp
Away win 33.2% 30.1% -3.0pp

These figures come from a backtest on Euro 2024 and Copa América 2024 — matches the model never saw during training. The tournament has not yet begun, so there are no live WC results to track. This page will switch to live WC data once the tournament starts and results are recorded.

Known limitations: on Euro-like fields with clustered Elo ratings (the closest analog to World Cup group play) the model hits roughly 49% match accuracy and a Brier score of around 1.05 per outcome — only marginally better than random. Draws are systematically under-predicted by ~4 percentage points, with away wins over-predicted by a matching amount. These biases are corrected in the per-match probability displays; see the methodology for details.