Given that the NBA regular season is just about over, it’s time to transition into MLB, an entertaining and unusual change of pace from the relative stability of basketball. One of my favorite things about MLB is that both main DFS sites have late swap options, an element that both reduces late injury news frustration and will help your overall edge if applied properly. Baseball is also tremendously fun because of its unpredictability, and we will discuss how to take advantage of that in this article and in more to follow.
What Is the 99th Percentile?
As a very unpolished statistics student, I am fascinated by variance and its impact on DFS and other aspects of business and life. In a normal distribution curve, 68% of probable outcomes in a simulation will fall into the first standard deviation, 95% will fall within the 2nd SD, and 99.7% will fall into the 3rd SD.
In some of my NBA articles, I discussed that a player like Russell Westbrook is a low variance player, as he might average 65 DraftKings points with a SD of 13 points, a seemingly large number, but only 20% of his average points. The way we calculate his first SD range is by adding or subtracting his SD from his average, giving him a 1st SD range of (65-13) to (65+13), or 52-78 points.
Summarily, in his 3rd SD outcomes, we can estimate Westbrook’s scoring range to be (65-(13*3)) to (65+(13*3), which is a range of 26-104 fantasy points, an outcome we can count on happening (ignoring OT or injury) approximately 99.7% of the time.
The main difference between calculating scoring ranges in MLB and NBA is that there is not a normal distribution, so the SD rule of 68%/95%/99.7% doesn’t really apply. Take a player like Francisco Lindor, who I am guesstimating averages 9 DK points per game, with a SD of about 8 points. In his case, a “normal” distribution would estimate his 3rd SD or 99.7 percentile results falling between (9-(8*3)) to (9+(8*3)), for a range of -15 to 33 points. He scored 46 points several days ago, clearly well outside of his boundaries.
A quick look at that range would indicate that the distribution is not normalized, and is in fact “long-tailed”, meaning that if you were to graph his results, you’d see no results less than 0, with about half his results between 0-7 points, and the other half a wide array of values over 7 points. It means that his bad games (relative to his average) aren’t that bad, but his monster games are a sight to behold.
How to Estimate Floors and Ceilings in MLB
One pro tip I can give you is that if you’re thinking only of average projections as your model in MLB, you might be making a mistake. What we should be concentrating on are median projections, which is essentially the 50 percentile score that a player is equally likely to go over or under. I actually ran some stats on Lindor from last season and got more precise values for him:
AVG DraftKings points 8.2
As you can calculate, his average is actually significantly higher than his median number, meaning that all his poor scores (0-6.5) are compensated for by a couple of absolute monster games that he’ll have a handful of times per season. In fact, his average is almost 26% higher than his median, which is a clear indicator that he (like most hitters) has large upside.
Consequently, calculating accurate ceiling projections is a bit difficult, as if a player hits 3 (or somehow more) home runs in a game his ceiling values for the season will be skewed based upon his doing that. According to my very simplistic math that I wrote before, Lindor’s 99 percentile ceiling is roughly 31 points, which is clearly wrong since he’s already smoked that by scoring 46 points several games into this season, and he only hit two home runs in that game!
Last season, Kris Bryant had one 60 point game and two others above 40, yet according to my data the odds of this happening are immensely improbable, which tells me that my data is likely undervaluing the upside of hitters.
As a result, if you are basing your projections around ceiling values for hitters, I would recommend boosting them up a notch (even as small as 5-10%), which as a result will likely make your lineups more weighted towards hitters rather than pitchers. For example, one site I’m looking at has Kris Bryant’s 99 percentile ceiling at 35 points tomorrow in a great matchup @MIL vs. Jimmy Nelson, which seems a bit low given that he had three games in 2016 where he absolutely decimated that number. Not only can he hit two or more home runs in any game, he can do so with men on base, a factor that is out of his control.
On the flip side to show some love for pitchers, this same site has Max Scherzer’s ceiling at 49, but he did put up a 59 last season while tying a major league record. Getting a ceiling game from a 13k Scherzer is always massive, but the tricky part is getting one when he’s not heavily owned!
My general philosophy in the larger GPPs is to target good-great hitters more so than pitchers, given that there are a lot more hitters to choose from who can have multi-homer games at a low ownership. Furthermore, pitchers have ceiling limits given that they won’t pitch in extra innings games due to pitch count limits, which hitters do not have. Hitters in theory can play infinite innings and have infinite plate appearances, thus making their upside a lot higher than what standard data models reflect, in this writer’s opinion.
To summarize, having looked over the data it looks like hitters generally have higher upside than what traditional standard deviation models are showing. Keep in mind that all of the top hitters are generally going to be really popular, especially in fantastic matchups where their teams have the larger run lines of the day. This is generally a time where it makes sense to pay for pitching in GPP, hoping that the expensive hitting chalk busts, or pivoting to some great hitters in mediocre matchups.
Looking at some of the top run lines for the main slate today we see:
This is generally a spot where in tournaments I’m generally going to try to steer clear of the elite hitters on CLE, TEX, and CHC due to heavy ownership. Instead, I will focus on some of the slightly overpriced, but likely underowned studs such as:
Khris (or Chris) Davis
There are plenty more, but due to the overwhelming value on CLE/TEX/CHC, I expect these types of players to be lower owned, which is a great spot in GPP as they are all hitters with gigantic ceilings. Baseball GPPs aren’t really about getting maximum median value from your lineups so much as having a wide selection of low owned hitters and stacks who are all quite capable of hitting a monster ceiling game.
When Lindor scored 46 points he was 2-3% owned in most of my contests, simply because he was facing Cole Hamels and he wasn’t cheap. These games will be quite rare, but if you can get a few monster games from players who aren’t owned, you should be well on your way to winning a GPP.