I’ve been thinking a lot about handedness recently. I have two boys that I hope play a little baseball. Both are tracking to be tall-ish and may be athletic. I really want them to pitch. It’s become a pretty accepted point of view that left handed pitchers are at a premium in MLB. The one case the really make that clear was a pitcher named JA Happ. Happ, by all accounts, is a slightly below average left handed starting pitcher. This offseason, Happ signed a 3 year deal with the Blue Jays for 36 milliion dollars. When the news broke someone on twitter wrote, “My God. Parents, put a baseball in your child’s left hand and hope for the best.”
So, I want to test that assumption. The Lehmann database is great, but the information I need is in several different data files. I need the master file for the pitching hand, I need salaries (obviously), and pitching stats to compare apples to apples.
Data Cleaning
I don’t really need to look back at pitching salaries from 1910. I am going to pick an arbitrary cut point (2005). Salaries really started to explode after that.
Visualization
Okay, I’ve got the data in a format that I can use. Let’s visualize. Let’s create a dataframe of just lefties and just righties.
So, there’s nothing there. Less than $10,000 difference in the two samples. Let’s press onward.
I’m not going to display a lot of what I did behind the scenes but it’s a lot of subsetting and creating color palettes. Let’s go right to visuals.
This is also inconclusive. Just take 2011-2013. In 2011, lefties and righties made basically the same. In 2012 lefties made (on average) more a million dollars more than righties. However in 2013, righties made a couple hundred grand more than lefties.
Looking at salaries in the AL vs the NL is interesting. Lefties in the National League made more money than righties for 2010-2013. The story is a little more mixed for righties.
Let’s take a look at a scatterplot for ERA and salary.
I truncated this data on both the x and the y axes. Any ERA over 10 is not going to keep you in the league for a long time so those were dropped. And any salary below 500k is going to be a player that has not reached arbitration and therefore is not really getting paid what the market will bear. So the picture is mixed so far. The next step would be a regression.
Regression and Matching
I’ve also included a dotwhisker plot that helps to visualize a regression. If the vertical dashed line is not intersected by the dots or the horizontal line (the confidence intervals) then it’s statistically significant. Or you could read the regression table.
So salary is our dependent variable and I’m going to use a lot of the stats that should predict a better pitcher. ERA, wins, losses, etc. Unfortunately this data has a lot of noise in it. A good example? Losses actually predict a higher salary. That may be because losses denote starting pitchers and starting pitchers are much more likely to take a loss than a reliever. Games pitched predicts lower salary but that’s probably because relievers can show up in 80 games a year while starters average around 35 or so. Strike outs drive up salary and walks drive it down. Interestingly enough. Throwing right handed is not statistically significant.
The next thing I want to do is coarsened exact matching. Gary King and some others wrote the package. What it does is essentially this: it fights someone in the treatment case (in our example that’s left handed pitchers) and finds someone in the control case (righties) who is very close in terms of performance metrics. So this will compare apples to apples. It will help to correct the problems of pitchers have more games or less games. It will compare pitchers with lower ERAs to those with lower ERAs and so on. The one thing that needs to be done is variables need to be binned together. In order for the package to actually find a match it needs era to be broken up into several ranges (3.00-3.50, 3.51-4.00). I will do that below.
After all that, the answer is really not exciting at all. There is no statistical relationship between throwing hand and pitcher’s salary. Being left handed could mean anything from making 600k more or 300k less than a right hander. In other words? It means nothing.
Concluding Thoughts
So, if the perception is that left handers make more than right handers why doesn’t the data bear this out? I have a theory, at least. Maybe two.
Baseball has a really weird salary structure. Not to go too far into it but for the first three years that a player is in the majors, he basically makes the league minimum (around 500k). After that he goes through three years of arbitration where his salary rises each of those three years. He is still not receiving his market value. Really, that doesn’t happen until free agency which doesn’t happen for most players until they are 28-30 years old. Many elite pitchers will then sign a huge deal for six or seven years. They really only get one bite at the apple.
Relievers screw everything up. As another Kaggle user found, teams overpay closers. That also means that they underpay middle relievers. If I could break this down to just starting pitchers I might see something different but I didn’t do that is because lefties seem to be more important in the bullpen. Guys like Randy Choate was a LOOGY. He couldn’t really do much well except get out other left handers. And he pitched for a long time doing just that. A left handed starter cannot be a LOOGY.
This data is just noisy. Inflated salaries have not existed long enough to really have a large enough dataset.