Nathan Ray Seebeck-USA TODAY Sports
Have you ever had a friend enthusiastically recommend that you watch a TV show and then say, “It takes a few episodes to get going, and the timeline gets weird at the end, and one or two of the main characters can be kind of annoying, but other than that it’s SO GOOD.” And initially you might be put off, thinking that a truly good show wouldn’t require that many qualifiers. Sometimes you’re right about that, but sometimes it turns out the show is Parks and Recreation and even though the first season is about as appealing as living in a pit, the rest of the show is an absolute treat.
Sometimes small components of a larger body of work do a poor job of representing the work as a whole. The oddities that occur in small samples are likely not a new concept to FanGraphs readers, nor will it shock anyone when I note that what constitutes a small sample depends on what exactly we want to measure.
Recently, the fine folks at MLB Advanced Media gifted us with a handful of new metrics that make use of Statcast’s bat tracking technology. Every time we dig into a new metric, we must consider the appropriate serving size to satiate our hunger for knowledge, lest we find ourselves hangrily generating takes that we later regret.
For this article, we’ll attempt to determine appropriate sample thresholds for measuring a hitter’s average bat speed; so that players without bats don’t feel left out, we’ll do the same for sword rate from the pitcher’s perspective. For many metrics, the sample size is measured in pitches or plate appearences, but since both bat speed and sword rate are tied specifically to bat movement, their samples will be composed of swings. To determine reasonable sample sizes, I used the split-half correlation method. The idea is to randomly select two samples of size X from a player’s collection of swings, calculate the player’s average bat speed or sword rate for both samples, lather/rinse/repeat for a bunch of players, then take the full set of two-sample pairs for all players and see how well they correlate. We complete the experiment by repeating the process for progressively larger sample sizes. And just to be super thorough, we’ll re-run the experiment several times and average the correlation values.
The theory behind the method is that with large enough samples, the metric will contain more signal and less noise, thus representing the player more accurately. Therefore, two samples of sufficient size should look comparable to one another. Once we hit a sample size where the correlation is strong enough that the metric is considered to be what statisticians term “reliable,” that sample size becomes our minimum threshold for relying on the descriptive power of the metric.
The poor six-episode showing from Parks and Recreation in its first season did not wind up providing a large enough sample to accurately depict the series’ overall episode quality. We needed to see more from the folks in Pawnee.
Starting with average bat speed, the chart below depicts the results of each experiment (in gray) and the average of all experiments (in green), with the sample sizes on the horizontal axis and the corresponding correlation coefficient on the vertical axis. Statistical standards dictate that once the correlation rises above 0.8, we’re in good shape. With that in mind, the output suggests that average bat speed becomes a reliably descriptive metric around 30 swings, which most players accumulate over 20ish plate appearances. To emphasize the importance of the 30-swing minimum, I decided to find the wackiest 20-swing stretches in…
[Images HTML]
The Vargas example highlights an important aspect of the average bat speed calculation. Per Baseball Savant: “The fastest 90% of a player’s swings, plus any 60+ MPH swings resulting in an exit velocity of 90+ MPH, are deemed to be his ‘competitive’ swings. The average of these swings are his seasonal average.” It’s possible that more complex logic is used on the backend, but from what I could find, no omissions are made for check swings, bunts, foul tips, etc. Furthermore, a spot check of the season-long averages I calculated against Savant’s bat speed leaderboard matched up nicely. To me, this says that the calculation relies heavily on throwing out the bottom 10% of swings to remove these less earnest offerings. And in a sample of 50 swings, a bunting spree à la Vargas would get lopped off (admittedly this concentration of bunting is unusual), but 10%…
[Images HTML]
Judging Vargas based on this 20-swing stretch would be a bit like judging The Wire based solely on season two (which I liked, but many didn’t). Vargas temporarily went all-in on bunting, while The Wire went all-in on the stevedores storyline, patterns of behavior that ultimately wouldn’t last. While Vargas was hurt by a high volume of bunt attempts, others got dinged by their check swing habits.
Juan Soto is famous for his knowledge of the strike zone and patience at the plate, but this means he likes to gather as much information as possible before committing to a swing, frequently pulling his bat back at the last second. During two games against the Mariners and their excellent pitching in late May, Soto pulled his bat back seven times, logging partial swings with low bat speeds, and dragging his 20-swing average 15 mph below his full-season number. The “swing” below registered a bat speed of 10 mph, and because he…
[Images HTML]
The TV comp for Soto’s rough 20-swing stretch might be a Ross-heavy episode of Friends, which is to say, an overall good show/hitter that occasionally gives too much emphasis to an annoying character or particular habit.
Fernando Tatis Jr. is also a big check-swinger, but during a series against the Mets in mid-June, a few abandoned swings buddied up with a smattering of oddly hit foul balls to drag his small sample bat speed 19 mph below his full-season mark. The foul ball shown below resulted from a swing clocked at 43 mph:
[Images HTML]
The weirdness of the Tatis 20-swing sample could be considered akin to an episode from the gas leak season of Community, which, after parting ways with the original creator, still looked like the same show only with poorer execution, leading to mishits and uncertain decision-making.
Moving on to sword rate, finding an adequate sample size turned out to be a tough ask, mostly because the correlation graph (which you can see below) resembles television static from back when TVs were big boxy things; if the cable cut out, you were left with nothing to watch but squiggly black and white chaos. Here we see no gradual improvement as the sample expands; the correlation tops out around 0.2, well shy of the 0.8 target:
[Images HTML]
This analysis suggests that getting swords at a consistent rate is not a reliable skill for pitchers, at least not given the currently available samples. Perhaps if we have full-season samples to work with, the measurement will stabilize, but the lack of any distinct upward trend in the…