Some Soccer Data

Jack T.
3 min readAug 26, 2021

--

I’ve never been a HUGE fan of soccer, but my brother is and I definitely enjoy watching the World Cup every 4 years. Recently, I’ve been watching much more soccer than usual and it’s hard to exactly put a finger on why I am enjoying it more. I’ve watched Liverpool’s and Tottenham’s fixtures and saw Leverkusen put the screws on Gladbach. There’s so much fluidity in soccer, give and take, both teams probing the opponent to find the weakness where an expertly placed through ball and can lead to a score. When beautiful passes end in a brilliant goal, it’s clear why soccer is called “The Beautiful Game”.

My brother’s favorite team in international soccer is Liverpool. They have been juggernauts the last few seasons and I decided to look at their total season points against their expected goal differential for the Premier League seasons ending in 2018, ’19, ’20 and ’21. Included in the plots are every other Premier League team for those seasons. It’ll be interesting to see the distance between the top teams of the league and the middling ones.

I got the data using the worldfootballR package and some help from a tutorial I found on Medium — https://medium.com/analytics-vidhya/expected-goals-and-liverpool-an-intro-to-worldfootballr-da1f02c17622.

Here’s the code to get the data for our plots:

end_season_summary <- get_season_team_stats(country = "ENG",
gender = "M",
season_end_year = c(2018:2021),
stat_type = "league_table",
tier = "1st")

The new end_season_summary data frame includes key info such as the number of matches played, number of goals for and against each team, number of points and the expected goal differential for each team. I’ll be plotting the total number of season points against the expected goal differential for that season. Using facet_wrap I can generate 4 plots, 1 for each of the seasons.

ggplot(end_season_summary, aes(x = xGD, y = Pts, label = Squad)) +
geom_point(color = "red") +
xlab("Expected Goal Differential") + ylab("Season Points") +
labs(title = "Premier League Season Points and Expected Goal Differential") +
geom_text_repel() +
facet_wrap(~ Season_End_Year)

Manchester City, a perennial powerhouse in the Premier League, ruled the roost in 2018, well ahead of Manchester United, Tottenham and Liverpool both in total season points and expected goal differential. In 2019, Man City still took home the Premier League title but Liverpool made a huge jump in the table, both in season points and their expected goal differential. Liverpool absolutely dominated in 2020, winning 32 matches and drawing and losing only 3. Interestingly, Man City still had a larger expected goal differential than the Reds but was nowhere close in season points. Man City was back on top to end the 2021 season but there was much greater clustering of teams that season.

It will be interesting to track how Liverpool does this 2021–22 season. They’ve had two clean sheets to start the new season and they’re looking strong. I expect Man City and Tottenham to be right there with them, especially as it now looks like Harry Kane will stay with the Spurs at least through this season.

This was a fun first look at some soccer analytics and I’m really looking forward to diving deeper into other leagues throughout the world!

--

--

Jack T.
Jack T.

Written by Jack T.

Data enthusiast. Topics of interest are sports (all of them!), environment, and public policy.

No responses yet