Methodology

Engine architecture, validation protocols, and limitations.

UPDATED · APR 2026VERSION · v1.2READING · 14 MIN

Contents

01Engine architecture 02Calibration approach 03Walk-forward validation 04World Cup engine 05Limitations 06References

01 — Architecture

Five deterministic stages, end-to-end auditable.

Every probability traces back to a feature vector and a calibrated head. No ensembles, no opaque rerankers.

Pipeline

intl_match_log
  ↓  schema validation · QC checks · 14 source feeds
elo_poisson
  ↓  Elo ratings + bivariate Poisson goal model
lgbm-prematch-v1
  ↓  LightGBM classifier · Optuna-tuned · 5-fold CV
temperature_scaling
  ↓  post-hoc calibration on held-out validation set
probability_output
       1X2 distribution · O/U 2.5 distribution

Stage 1 — intl_match_log

Match-level events, squad selections, and venue data are ingested from multiple source feeds. A schema validator rejects partial records; a quality-control layer ensures temporal integrity of all features.

Stage 2 — elo_poisson

FIFA-style Elo ratings with variable K-factors per competition tier, combined with a bivariate Poisson model (Dixon-Coles correction for low-scoring matches). Trained on 32,101 international matches from 1990 to present.

Stage 3 — lgbm-prematch-v1

LightGBM gradient-boosted classifier with multinomial output. Hyperparameters tuned via Optuna over 200 trials with regularization priors (max_depth ≤ 6, min_child_samples ≥ 80) to suppress over-fitting on the long tail of low-frequency feature combinations.

Stage 4 — temperature_scaling

Logits divided by a learned temperature T, fitted by minimizing NLL on a held-out calibration set disjoint from training. Temperature scaling preserves rank ordering while sharpening or softening the probability distribution to match observed frequencies.

Stage 5 — probability_output

Final 1X2 distribution and Over/Under 2.5 distribution. Confidence indicator derived from feature density and historical calibration of the matched scenario cluster.

02 — Calibration

ECE — Expected Calibration Error.

A probabilistic forecaster is well-calibrated if outcomes labelled at p occur with frequency p. ECE measures the average gap, weighted by bin frequency.

ECE = Σᵢ (nᵢ / N) · | acc(Bᵢ) − conf(Bᵢ) |

  Bᵢ        ith probability bin (e.g. [0.40, 0.50])
  nᵢ        number of predictions in Bᵢ
  N         total predictions
  acc(Bᵢ)   observed frequency in Bᵢ
  conf(Bᵢ)  mean predicted probability in Bᵢ

An ECE of 0.023 on Over/Under 2.5 means our probabilities deviate from observed frequencies by 2.3 percentage points on average, weighted by bin density. The reliability diagram below visualizes the same quantity: each point is a bin, the diagonal is perfect calibration.

Reliability diagram · O/U 2.5 · ECE 0.023

Reported metrics

Engine	Market	ECE	Brier	Log-loss
National teams	1X2	0.026	0.21	1.04
National teams	O/U 2.5	0.023	0.18	0.97

03 — Validation

Walk-forward, not k-fold.

K-fold leaks future information into training when applied to time-ordered data. Walk-forward enforces strict temporal separation: at week T, the model has only seen matches before T.

Our protocol re-trains in monthly windows. Each prediction is evaluated against an outcome the model could not have observed at training time. Calibration metrics are aggregated across the full out-of-sample sequence — never on a single holdout set.

Walk-forward protocol · windowed re-training

Coverage

National teams

8 tournaments

WC 2010, 2014, 2018, 2022 + Euros 2012, 2016, 2020, 2024. 32,101 international matches in training corpus.

Training data

32,101

International matches from 1990 to present. Elo + Poisson model with walk-forward validation on 8 past tournaments.

04 — World Cup engine

worldcup-2026-engine-v1.

National-team football has fewer matches per side, longer gaps, and squad rotations that destabilize club-level features. Our World Cup engine uses an Elo-style rating updated bilaterally after each international fixture, combined with a bivariate Poisson goal model conditioned on team strength differential and venue.

Specifications

model              worldcup-2026-engine-v1
approach           Elo (FIFA-style) + bivariate Poisson
training data      32,101 international matches · 1990–2026
validation         walk-forward · 8 past tournaments
calibration        ECE 0.026 (1X2) · ECE 0.023 (O/U 2.5)
qualification      Monte Carlo · 100,000 simulations
update frequency   after every match (during tournament)

Retrospective validation

Tournament	Pre-tournament prob.	Outcome
Argentina to win World Cup 2022	25.4%	✓ Champion
France to reach final · WC 2022	14.0%	✓ Finalist
Morocco to reach semifinal · WC 2022	3.0%	✓ Semifinalist (4th)
Spain to win Euro 2024	14.5%	✓ Champion

These are pre-tournament forecasts. Once the tournament begins, probabilities update after every match.

05 — Limitations

Where the model is weaker, stated plainly.

A calibrated probability is still wrong sometimes — that is the definition. Below, the contexts where it is wrong more often than average.

Lineup uncertainty

National team lineups are announced late. Probabilities reflect expected squads; significant rotations are not always anticipated.

Closing-line gap

Sharp closing odds remain better calibrated than ours on average. We provide a structured pre-match read, not a market replacement.

Knockout formats

Single-leg knockout matches have higher variance than group stages. Confidence indicators reflect this, but tail outcomes remain harder to forecast.

Manager changes

A change in national team head coach within the prior 8 weeks reduces feature reliability. The confidence indicator drops accordingly.

Debutant teams

First-time World Cup participants have sparse international tournament history. Wider posterior intervals applied.

Markets we do not cover

Player-level props, asian handicap, draw-no-bet derivatives, in-play markets. The engine is a pre-match instrument.

06 — References

Selected reading.

Guo et al. (2017)On Calibration of Modern Neural NetworksICML

Dixon & Coles (1997)Modelling Association Football Scores and Inefficiencies in the Football Betting MarketJRSS C

Karlis & Ntzoufras (2003)Analysis of Sports Data by Using Bivariate Poisson ModelsJRSS D

Constantinou & Fenton (2012)Solving the Problem of Inadequate Scoring Rules for Football ForecastingIJF

Niculescu-Mizil & Caruana (2005)Predicting Good Probabilities With Supervised LearningICML

StatsBomb ResearchOpen data and methodology notesstatsbomb.com