Football Prediction Accuracy — Why Win Rate Misleads and Brier Score Doesn't

Most football prediction sites report accuracy as a percentage of correct calls. This measure is easy to understand but easy to game, and it hides the difference between a skilled forecaster and a lucky one. LaPreBet uses the Brier score instead — a calibration metric borrowed from meteorology and academic forecasting.

The problem with win rate

Imagine two predictors over the same 100 Premier League matches:

PredictorMethodCorrect callsWin rate
AAlways picks the home team4646%
BCarefully analyses form and picks the most likely outcome per match4646%

The win rate is identical. But Predictor A applied zero skill — home teams win roughly 46% of Premier League matches, so blindly picking home every time produces a 46% hit rate with no analysis at all. Predictor B's 46% might reflect genuinely difficult predictions across all three outcomes, including correctly identifying draws and away wins.

Win rate conflates being correct with being skilled. The same problem exists in fund management: a manager who always buys the S&P 500 index and a manager who actively selects stocks may have identical returns in a bull market — but they are not equally skilled.

What calibration measures instead

A well-calibrated forecaster's stated confidence matches their observed accuracy. If you say "70% likely" 100 times, and the event happens roughly 70 times, you're well-calibrated. If it happens 90 times, you were underconfident. If it happens 40 times, you were overconfident.

Calibration is important because it is the hardest thing to fake. A tipster can cherry-pick their best calls and show a 70% hit rate. But if you made them assign a probability to every match — including the ones they had no edge on — their calibration score would expose the strategy immediately: overconfident calls on uncertain matches push the score toward the worst-case range.

The Brier score formula

For a football match with three possible outcomes (home win, draw, away win), the multi-class Brier score is:

Brier = (p_home − o_home)² + (p_draw − o_draw)² + (p_away − o_away)²

Where p is your stated probability for each outcome (as a fraction 0–1) and o is the actual outcome (1 for the outcome that happened, 0 for the others).

Score range

ScenarioBrier scoreInterpretation
Perfect prediction (p=1.0 for actual outcome)0.00Perfect
Random (33%/33%/33%)0.67No information content
Maximum error (p=1.0 for wrong outcome)2.00Maximally wrong

Lower is better. A mean Brier score below 0.67 means you are adding information beyond random. A score below 0.50 is strong for football predictions.

Worked examples

Example 1 — Confident and correct

Manchester City vs Luton Town. You predict: Home 85%, Draw 10%, Away 5%. City win 3-0.

Brier = (0.85 − 1)² + (0.10 − 0)² + (0.05 − 0)² = 0.0225 + 0.01 + 0.0025 = 0.035

Excellent score — you were right and confident.

Example 2 — Confident and wrong

Same match. You predict: Home 85%, Draw 10%, Away 5%. Luton win 1-0.

Brier = (0.85 − 0)² + (0.10 − 0)² + (0.05 − 1)² = 0.7225 + 0.01 + 0.9025 = 1.635

Poor score — you were confident and wrong. The Brier score penalises this heavily, which is correct: a confident wrong prediction is worse than a timid wrong prediction.

Example 3 — Timid (hedged) prediction

Same match. You predict: Home 40%, Draw 30%, Away 30%. City win 3-0.

Brier = (0.40 − 1)² + (0.30 − 0)² + (0.30 − 0)² = 0.36 + 0.09 + 0.09 = 0.54

You were right but added little information. This is better than Example 2 (where you were confident and wrong) but much worse than Example 1.

Brier score across an entire season

The mean Brier score across all your predictions is the primary ranking metric. A single lucky correct call doesn't move it much; a season of well-calibrated predictions does. This is what makes it resistant to tipster gaming: a 20-game hot streak at 70% hit rate doesn't impress if you were saying "90% likely" on matches that were genuinely 55-45.

The relevant baseline for comparison is not zero — it's the bookmaker market. Bookmaker implied odds (de-vigged to sum to 100%) represent the combined wisdom of millions of bets plus sharp money. If your mean Brier score equals the market's, you're calibrated exactly as well as the aggregate market. If you beat it, you are adding genuine value. That comparison — your score vs market score vs AI score — is what LaPreBet shows on every completed fixture.

How LaPreBet computes it

For crowd picks and own picks, we use a degenerate probability approach: your stated outcome (home, draw, or away) maps to probability 1.0, the other two get 0.0. This means your Brier score is either 0.0 (right), 1.0 (right team, wrong framing), or 2.0 (maximally wrong) depending on the outcome and pick type. The full probabilistic version — where you state actual percentages — is planned for the leaderboard feature (Milestone 4).

The AI model and bookmaker Brier scores are computed from their actual probabilities, so those are fully calibrated scores. The comparison on the Score Reveal card is honest about this difference.

See it on a real fixture

Every completed fixture on LaPreBet shows a three-column comparison: your Brier score, the AI's, and the bookmaker's. Pick an upcoming match, record a prediction, and see how you score after the final whistle.

See today's fixtures →

Further reading

The Brier score was introduced by Glenn W. Brier in 1950 for meteorological forecast verification. It is the standard metric used by the Metaculus forecasting platform, Philip Tetlock's Good Judgment Project, and academic sports prediction research. It is also one of the metrics used internally by quantitative hedge funds to evaluate model calibration.

Related