Doubles and Triples and Homers, Oh My!

Jack T.
5 min readJun 20, 2021

--

Yep, I’m back in the magical year of 2011 and this time I’m interested in learning about the average run value produced by a double during the season. I swear, one day, I’ll stop living in the past.

During the 2011 season, about 53% of doubles came with no runners on base. The majority of these doubles came with nobody on and no outs. So how valuable were doubles during this season? To figure this out, I use similar code that I used to find the value of stolen bases. I’m definitely expecting doubles to have a higher average run value than a stolen base, but not as much as a triple or a home run.

I’ve filtered the Retrosheet 2011 data to only account for doubles and I compute the average run value for those plays. Turns out that all the doubles in the filtered data frame are worth 0.736 runs.

mean_double <- doubles %>%
summarise(mean_double_value = mean(run_value))
mean_double
# A tibble: 1 x 1
mean_double_value
<dbl>
1 0.736

Now I’ll construct a histogram of the run values for all doubles hit in 2011.

Plot6 <- ggplot(doubles, aes(run_value)) +
geom_histogram() +
geom_vline(data = mean_double, aes(xintercept = mean_double_value),
colour = "red", size = 2) +
annotate("text", 1.7, 2000,
label = "Mean Run \nValue", colour = "red")
Plot6

The graph can tell us that most of the doubles hit have run values between 0 and 1, indicating that nobody scored on those hits. Either the bases were empty or maybe the runner on first was slow or something else happened. But with the majority of the histogram’s bins between 0 and 1, it makes sense that the average run value of a double is 0.736, as most doubles hit during the 2011 season came with nobody on base.

But there’s still a good cluster of bins at the 1 value and beyond, with a couple past 2. Let’s find out which runner/out situations led to the most valuable doubles hit in 2011.

doubles %>%
arrange(desc(run_value)) %>%
select(STATE, NEW.STATE, run_value) %>%
head(3)
# A tibble: 3 x 3
STATE NEW.STATE run_value
<chr> <chr> <dbl>
1 111 2 001 2 2.56
2 111 2 001 2 2.56
3 111 2 010 2 2.55

The most valuable doubles in baseball come with the bases loaded and two outs. This makes sense — it’s a high pressure situation, the opportunity is ripe to get runs on the board with two outs, and the batter comes through with a clutch double!

Now about triples in 2011, how valuable were those? We’ll do the exact same steps to find out. Just over 55% of triples in 2011 came with nobody on base and with 0, 1, or 2 outs. Seems to be a trend, huh?

mean_triples <- triples %>%
summarise(mean_triple_value = mean(run_value))
mean_triples
# A tibble: 1 x 1
mean_triple_value
<dbl>
1 1.06

The average value of a triple in 2011 was 1.06 runs, which is greater than the average value of a double and also makes sense as it’s a more valuable hit than a double or a single.

Plot7 <- ggplot(triples, aes(run_value)) +
geom_histogram() +
geom_vline(data = mean_triples, aes(xintercept = mean_triple_value),
colour = "blue", size = 2) +
annotate("text", 2.0, 450,
label = "Mean Run \nValue", colour = "blue")
Plot7

Most of the triples hit in 2011 had run values between 0 and 1, with a lot of triples (nearly 300) coming in exactly at the 1 spot on the run_value axis. There’s still a good chunk of triples that are more valuable, though.

triples %>%
arrange(desc(run_value)) %>%
select(STATE, NEW.STATE, run_value) %>%
head(3)
# A tibble: 3 x 3
STATE NEW.STATE run_value
<chr> <chr> <dbl>
1 111 1 000 1 2.78
2 011 2 000 2 2.56
3 111 2 001 2 2.56

The three most valuable situations in which to hit a triple are either with the bases loaded and 1 or 2 outs or with runners on 2nd and 3rd with 2 outs in the inning. In the first situation, you can see that the play started with the bases loaded and 1 out but ended with the bases empty and still only 1 out. There was likely an error committed on the play that allowed the batter runner to score — a sort of Little League home run. Same thing in the second situation. The third most valuable situation ends how we expected, with the bases cleared and the batter standing on third, celebrating his triple.

Finally, let’s look at the run value of what so many people love — the long ball, the dinger, the grand salami — Home Runs.

Over 58% of home runs hit in 2011 came with nobody on base. Let’s get deeper into the weeds of home runs and their values.

Mean_HR <- Home_Runs %>%
summarise(mean_HR_value = mean(run_value))
Mean_HR
# A tibble: 1 x 1
mean_HR_value
<dbl>
1 1.39

We see that home runs in 2011 were worth an average of 1.39 runs. This is what it looks like in histogram form.

Plot8 <- ggplot(Home_Runs, aes(run_value)) +
geom_histogram() +
geom_vline(data = Mean_HR, aes(xintercept = mean_HR_value),
colour = "green", size = 2) +
annotate("text", 2, 2000,
label = "Mean Run \nValue", colour = "green") +
xlab("Run Value") + ylab("Number of HRs")
Plot8

This confirms that the majority of home runs hit in 2011 came with nobody on base. Still, there’s a good amount of home runs that have values over 1.5. Let’s find out the most valuable home runs hit during the 2011 campaign.

Home_Runs %>%
arrange(desc(run_value)) %>%
select(STATE, NEW.STATE, run_value) %>%
head(1)
# A tibble: 1 x 3
STATE NEW.STATE run_value
<chr> <chr> <dbl>
1 111 2 000 2 3.34

The most valuable home runs, not surprisingly, are grand slams that come with 2 outs in the inning.

So there’s some more decade-old baseball data analysis for ya! Stay tuned, because soon I’ll bring it back to the current time.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Jack T.
Jack T.

Written by Jack T.

Data enthusiast. Topics of interest are sports (all of them!), environment, and public policy.

No responses yet

Write a response