Assessing Expectations in EFL League 2 with the Pythagorean Formula

Jack T.
4 min readNov 10, 2021

--

The OG of baseball sabremetrics Bill James came up with a model he called the Pythagorean Formula to predict a team’s winning percentage for the coming season. The model can be used periodically throughout the season to gauge if a team is performing above or below expectations for the season. For baseball, predicting winning percentage is typically done by comparing the squares of Runs Scored against Runs allowed. The formula looks like this:

Wpct = (R^2) / (R^2 + RA^2)

R = Runs scored and RA = Runs Allowed. It’s not so simple with soccer. Yes, you could evaluate a team’s goal differential compared to their spot in the league table, but you still have the points system in soccer to account for — 3 points for a win, 1 for a draw and 0 for a loss. We can still compare table position with Goal Differential, though. It’s just not the best indicator of expectation evaluation, I believe. In 2016, StatsBomb came up with an “improved” version of the Pythagorean Formula, so I’ll be working with that (link to the article below).

I’ll start with comparing the actual EFL League 2 table with how the teams stack up compared to their goal differential. Essentially, we’ll be comparing total points with goal differential. My working assumption is that teams that have a higher position in the table will have a higher overall goal differential than those teams below them. Pretty standard. Let’s see.

EFL League 2 Table as of 11.7.21

Fairly straightforward table. Now let’s take a look at goal differential.

Goal Differential for EFL League 2 as of 11.7.21

First, not sure why this graph is so much larger than the one above! But there’s some interesting observations here. Forest Green Rovers not only lead the table but also have the best goal differential in the league. But Leyton Orient is right behind them in goal differential and has a better goal differential than the teams above them in the table. Goal differentials for the teams at the bottom of the table are to be expected…..terrible.

One might expect that having the second-best goal differential in the league would put the club at the top of the table. Are Leyton Orient just on the wrong side of Lady Luck, or are they performing below expectations?

That’s where the Pythagorean Expectation can come in. Originally designed to predict winning percentage over the course of the baseball season, the Pythagorean Expectation works well in sports where the teams rarely draw, like American football and basketball. Sports like soccer and hockey are more difficult because of the higher likelihoods of draws, so winning percentages are typically lower. From what I’ve read, attempting to calculate the predicted points taken by a club is the way to go.

From StatsBomb, the linear formula looks like this:

PredictedPoints = (0.677(GD) + (52.29) * (GP/38)

There’s a lot of fancy math involved with getting 0.677 and 52.29 that I’m not doing, but a potential issue/gripe I have with this model is that maybe it’s too simple? If I run this for every team in the table, I’ll get the same result as the goal differential table above. Maybe that’s ok, I’m not sure.

Another model I found from Simple Soccer Stats is fairly similar but it looks at points per game, which could still be useful. The author used data from 10 seasons for their analysis. I plan to run the model as is, and incorporate total games played from the StatsBomb model, as well. Simple Soccer Stats’ model looks like:

PtsPerGame = 1.7 * (GS - GA) / (GS + GA) + 1.35

The author doesn’t get into the math on how they came up with 1.7 and 1.35, so I’m just going to use the model as is. I’m going to run this first with only data available from the current 2021–22 season.

Running this formula on 1 team of the table, Leyton Orient, will give us this:

LeytonOrientExpPts = (1.7 * 13) / (37 + 1.35) * 15
21 Points

GS-GA is Goal Differential and Orient have a +13 GD, so that’s where that came from. 37 is the total sum of goals scored and goals allowed by Orient. And then I multiplied by 15 to see how many points the model predicts Orient should have at this point in the season and it comes out to roughly 21 points. So if we take this model and compare it to real-life table results it looks as if right now Orient are over-performing expectations by about 2 points based on their goal differential.

Now ANOTHER Pythagorean Expectation formula is out there (This shows how difficult it’s been to find one that works just as cleanly as baseball’s). This one is:

predictedpoints = (GF^1.22777 / (GF^1.072388 + GA^1.127248)) * 2.499973 * numberofgamesplayed

All this math and numbers are making my brain spin. Let’s see what this formula spits out for Leyton Orient…..hmmm. According to this model, after 15 games Orient’s expected points are up around 40. Which seems pretty high to me. The bottom of the table is a little more accurate, but once we get to mid-table I start seeing expected points totals 7–10 values off the actual points. In this model, every team is under-performing!

I’m going to end it here while I continue to ponder Pythagorean Expectation models and their applicability to soccer. The links to my GitHub with my (messy) code and the articles I used are below. Next time, I’m going to run some visualizations on the results I’m getting.

https://github.com/firstpitchstrike/Soccer-Analytics/blob/PythagoreanExpectation/README.md

http://www.simplesoccerstats.com/blog/2017/10/25/applying-the-pythagorean-expectation-to-soccer/

https://pena.lt/y/2012/12/03/applying-the-pythagorean-expectation-to-football-part-two/

https://statsbomb.com/2016/04/improving-soccers-version-of-the-bill-james-pythagorean/

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Jack T.
Jack T.

Written by Jack T.

Data enthusiast. Topics of interest are sports (all of them!), environment, and public policy.

No responses yet

Write a response