A few years ago I got frustrated with the Pythagorean Theorem for predicting won-loss records in baseball for a very simple reason.
It projects expected wins and losses without including actual wins and losses in the formula.
For those of you not aware, this calculation (and variations of it) is what’s used on baseball-reference.com, ESPN and various other sites when calculating the “Pythagorean W-L” (baseball-reference) or “EXWL” (ESPN.com).
Just plug a team’s runs scored and runs given up into the formula and it spits out the expected winning percentage which you can use to calculate expected wins.
It turns out the formula is pretty good and can get you within a few games each year of the actual record.
For a nerd like me, that’s a pretty cool thing.
Still, I was bothered by the fact that the theorem ignored the results on the field (wins and losses) and at times the resulting number of expected wins could be a number that was not possible based on wins and losses that had already occurred.
Let me explain with a simple example.
Suppose the Astros open 2024 with a three-game series against the Rangers and Texas wins the first game 1-0, but Houston wins the next two games 10-0 and 11-1.
The Astros would then be 2-1, but because of the 21-2 run differential, the Pythagorean expected record would be 3-0 (when rounded).
The problem is in the scenario above the Astros will NEVER be 3-0. It’s impossible.
In most cases, this issue resolves itself over the course of 162 games and that’s how the Pythagorean Theorem works.
In short, the theorem says it’ll all work out in the end and is much ado about nothing.
But not always and that’s why I developed a calculation that includes run differential and actual results in calculating expected wins.
Below is a chart showing actual wins, expected Pythagorean Wins, the difference between actual and expected Pythagorean Expected Wins, Astros Projections Expected Wins and the difference between actual wins and Astros Projections Expected wins.
The results are clear, but here are a few notes:
Exact match: AP 8 Pythagorean 5
Closer to the actual number of wins head-to-head: 23-0-7 in favor of AP.
Total Games off (either way): Pythagorean 107 AP 53
AP Summary
8 teams with 0 game difference
7 teams with 1 game difference (either way)
6 teams with 2 game difference (either way)
While I’m proud of No. 3 and 4 above, which means the AP formula was more than twice as accurate for 2023 than the Pythagorean formula and I came within 2 games for 21 of 30 teams, No. 2 may be my favorite.
Simply put, this formula performed better for 23 teams and as well on the other 7. For no team did the Pythagorean Theorem do a better job of projecting wins than the AP formula.
Both systems struggle with teams that are outliers in one-run games (either winning or losing a high percentage), like the Marlins and Padres, but because the AP formula incorporates the actual game results it fares better (but not great) with these teams.
I’m not a mathematician and have no idea if my formula is mathematically sound, though I suspect it’s not and there will be no Fields Medal in my future.
All I know is it’s like Jose Altuve vs. the Yankees - it just works.