Using Statcast

Jack T.
3 min readJul 4, 2021

--

One of the cooler things about working through “Analyzing Baseball Data with R” is the ability to create interesting visualizations using location data from Statcast and MLB’s PITCHf/x. Being able to plot the location of pitches and whether or not a batter swung at the pitch can tell a lot about a player’s performance. However, one of the frustrating things about trying to use the book as a template for my own analyses is that sometimes the datasets get updated and I have to work out what I need or want from the data. Which I guess I better get used to! It’s just interesting to spend 3+ hours researching different data and which variables I need and not be doing a lot of coding. It almost feels like I have nothing to show for my effort, but I feel like all this research and investigation is just part of the learning process.

So in one of the exercises I worked on, I plotted Miguel Cabrera’s swing tendency over 4 years of his career, looking at his swing tendency in general and how he adjusts his swing depending on what count he’s facing.

There were over 6,000 pitches in the dataset so I just took a sample of 500 pitches. Obviously, Cabrera was a stellar hitter in his prime and you can see in the visualization above that he’s not too prone at swinging at pitches outside of the strike zone.

Here’s where I ran into some issues trying to replicate this analysis (for now) with Kris Bryant and Jose Abreu, of the Cubs and White Sox respectively. The data I downloaded from MLB’s Statcast page wasn’t exactly like the data I was working with from the book. I had to improvise a little bit and the charts I ended up making show a sample of 500 swinging strikes from each player for the 2021 season.

These aren’t the prettiest visualizations, like the Cabrera plot above. But these can tell us a lot, too. All of these plots are from the catcher’s point of view, and Bryant and Abreu are both right-handed batters. You can see that pitchers are getting both batters to swing at balls low and outside of the strike zone. Bryant and Abreu both have really good power and I’m willing to bet that most of these swinging strikes on these low and outside pitches are off-speed stuff that would likely induce weak contact and a soft ground ball.

I’m going to do some more exploration of Statcast and PITCHf/x to see if I can find any data where I can also analyze pitches the batters didn’t swing at. This is actually one of the parts about doing this stuff that I find most rewarding — searching for a while to find what I needed, wrangling that and then using it to figure something out.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Jack T.
Jack T.

Written by Jack T.

Data enthusiast. Topics of interest are sports (all of them!), environment, and public policy.

No responses yet

Write a response