Calibration
How well do the probabilities match what actually happened?
Backtest — Euro 2024 + Copa América (n = 83)
Full backtest (Full backtest (2024, all competitions), n = 290): Brier 0.495, accuracy 63.8% — inflated by friendlies and qualifiers.
Reliability diagram — pooled outcomes
Each dot is a probability bin pooling home-win, draw, and away-win predictions. Dots on the dashed diagonal = perfect calibration. Dot size scales with sample count.
Per-outcome summary
| Outcome | Predicted (avg) | Observed | Gap |
|---|---|---|---|
| Home win | 44.9% | 38.6% | -6.4pp |
| Draw | 21.9% | 31.3% | +9.4pp |
| Away win | 33.2% | 30.1% | -3.0pp |
These figures come from a backtest on Euro 2024 and Copa América 2024 — matches the model never saw during training. The tournament has not yet begun, so there are no live WC results to track. This page will switch to live WC data once the tournament starts and results are recorded.
Known limitations: on Euro-like fields with clustered Elo ratings (the closest analog to World Cup group play) the model hits roughly 49% match accuracy and a Brier score of around 1.05 per outcome — only marginally better than random. Draws are systematically under-predicted by ~4 percentage points, with away wins over-predicted by a matching amount. These biases are corrected in the per-match probability displays; see the methodology for details.