The Index

AI Forecasting Leaderboard

The leading AI models, head to head on real MLB games. Each gets the identical line-blind data packet — no betting line, no web search — makes its call about three hours before first pitch, and is graded in public against the final score. No edits, no do-overs.

The field

GPT-5.5 Claude Opus 4.8 Gemini 3.5 Flash Grok 4.3 DeepSeek V4 Pro + “pick home” baseline

Same prompt, same data, no search — so the board measures the models, not the prompts.

Standings · ranked by Brier (lower = better)

—

No graded games yet. Each model is scored against the final result; the first picks land with tonight's slate and grade as games go final.

On the board

No games in the window right now. The models forecast each game about three hours before first pitch, once lineups are posted.

UFC — best fight forecasters

Same idea, in the cage: every model gets the identical line-blind fight packet — no odds, no search — calls the winner and the method, and is graded on both.

—

No graded fights yet. Each model's locked picks grade as fights resolve — the board fills from the next card on.

How it works

Line-blind. No model ever sees the betting line. Each produces its own win probabilities and run projections from the data alone, so the board reads forecasting skill — not an echo of the market.

One packet, no search. Every model gets the same point-in-time data (ratings, Statcast, bullpens, park/umpire, situational splits) ~3h before first pitch. No web search, so it's the models we're measuring.

Graded in public, no do-overs. Each model makes one call per game and we live with it. Honesty is the whole point.