Methodology

How the forecast is built, how accurate it is, and what it can't do.

How it works

Step 1 — Elo ratings. Every team starts from a seeded rating drawn from eloratings.net (January 2018), then the full match history from 2018 through the most recent international break is replayed match by match. Each result updates both teams' ratings with two modifications: a margin-of-victory multiplier (larger wins move ratings more) and a 60-point home-advantage adjustment. K factors vary by match type — 20 for friendlies, 30 for qualifiers, 50 for major tournaments, 60 for the World Cup — so recent competitive results carry more weight than older friendlies.

Step 2 — Dixon-Coles goal model. Elo ratings are converted into per-match expected-goal parameters using a mapping fit on recent international history. A 2D Poisson probability grid over goal counts (0–8 each side) gives the probability of every possible scoreline; summing the right cells yields P(home win), P(draw), and P(away win). The model uses the Dixon-Coles (1997) low-score correction: probabilities in the 0–0, 1–0, 0–1, and 1–1 cells are slightly adjusted via a single parameter ρ to fix Poisson's mild under-prediction of low-scoring draws. ρ was estimated by maximum likelihood on 5,882 competitive matches before June 2024.

Step 3 — Monte Carlo simulation. The entire tournament bracket is simulated 10,000 times. Each run plays out all 104 matches from the group stage through the final, sampling from the Dixon-Coles distributions and applying FIFA's actual tiebreaker rules and the Annex C third-place bracket table. The result is a per-team probability distribution over every round — P(advance from group), P(reach R16), P(reach QF), P(SF), P(Final), P(Champion). Snapshots are saved after each update so probability movements are visible over time.

Step 4 — Three-way fusion and AI commentary. For every fixture the model probability sits alongside two market-based estimates: sportsbook odds (vig stripped by proportional normalisation across 37–44 books, sourced from The Odds API) and Polymarket prediction-market prices (Polymarket Gamma API, per-match markets posted close to kickoff). Where all three agree, confidence is higher; where they diverge, the gap is a signal worth examining. A divergence threshold of 15 percentage points flags the most notable gaps, categorised as model-over-concentrated, model-under-concentrated, or disagree-on-favourite. Claude writes a match preview for every fixture and a short divergence note for flagged matches — see below.

How accurate is it

53.0% Match accuracy · random ≈ 37%

0.583 Brier score · lower is better · random ≈ 0.667

0.978 Log loss · lower is better · random ≈ 1.099

Backtest on Euro 2024 and Copa América 2024 — n = 83 matches the model never saw during training. See the calibration page for the full reliability diagram.

Euro 2024 alone — the closest analog for World Cup group-stage matches, where many teams have similar Elo ratings — produces 49.0% match accuracy. Good international-football models top out around 55–60% in ideal conditions; the World Cup group stage is not ideal conditions. The value here is calibrated probabilities and tournament simulation, not nailing every individual game.

Known limitations

Model is more concentrated at the top than markets. Pure Elo + Dixon-Coles assumes peak fitness every game with no injury or squad-rotation uncertainty. Small per-match over-confidence on heavy favourites compounds over a 7-match title path. The two top-rated teams together hold roughly 50% of the model's title probability versus ~34% in Polymarket — treat the exact title-odds figures as directional signals, not precise quantities, especially at the very top of the table.
A ~4pp draw bias exists and is corrected on match pages. The raw model under-predicts draws by roughly 4 percentage points and over-predicts away wins by a matching amount. Per-match probability bars apply a correction derived from the backtest. The calibration page shows the raw model reliability; the match pages show corrected probabilities. The correction is precisely what closes the draw under-prediction visible on the reliability diagram.
Residual confederation-pool drift. Elo ratings update only when teams from different confederations play each other. Teams that rarely face top-rated opposition have ratings shaped largely by within-confederation results, which can leave some CAF, AFC, and CONCACAF teams slightly over- or under-rated relative to global public Elo rankings. This is a structural limit of pure Elo with sparse inter-confederation data.
Host advantage is modelled for group-stage home matches only. USA, Mexico, and Canada each receive a 60-Elo home-advantage term for their 9 group-stage home matches (the fixtures where they appear as home team). This shifts their win probability by roughly 2–4 pp per match. All knockout matches are still simulated as neutral-venue games — host-nation venues are not guaranteed in the knockout stage, so we make no assumption there.
No player-level data. Injuries, suspensions, squad rotation, and short-term form are not inputs. When a key player is ruled out before a match, the forecast doesn't know. Market prices often move on such news before we re-run, which is one reason a large sportsbook divergence away from the model is worth examining.

The AI commentary experiment

Every match page includes a three-paragraph preview written by Claude (Anthropic's AI), generated from the numerical inputs — the model probabilities, sportsbook odds, and Polymarket prices. The 14 most statistically divergent fixtures also carry a short commentary note explaining the shape of the gap between sources.

This is explicitly an experiment. The goal is to see whether AI-written synthesis adds genuine value — distilling what the numbers collectively imply about a match — or whether it is decorative. The forecasts themselves are not AI-generated; the probabilities come entirely from Elo ratings, Dixon-Coles, and Monte Carlo simulation. AI writes prose, not predictions.

Known limitation: the commentary can produce imprecise comparatives and occasionally uses language that isn't grounded in the data inputs. It is not a substitute for reading the actual probability numbers. Judge for yourself whether the prose adds anything.

Data & sources

International Football Results (Kaggle) — historical match results from 1872, maintained through the March 2026 international break. The model's training set.
eloratings.net — source for the January 2018 seed ratings that initialise the Elo system and prevent confederation-pool drift in the cold start.
The Odds API — aggregated sportsbook odds for WC group-stage and outright-winner markets, refreshed before each update run. Vig is stripped by proportional normalisation across books.
Polymarket Gamma API — prediction-market prices for WC title and group-winner markets. Per-match h2h markets appear close to kickoff and are updated as they become available.
Data is refreshed manually before each update run, typically after each match day during the tournament.