AI News, Pitcher Prognosis: Using Machine Learning to Predict Baseball Injuries

Pitcher Prognosis: Using Machine Learning to Predict Baseball Injuries

In the multibillion dollar world of sports entertainment, we often think of injuries as being chance events.

Although professional players are placed under a high level of medical scrutiny, I reasoned that the information encoded in performance statistics might add a useful leading indicator of injury risk to the medical toolbox.

Then, I would aggregate the player’s statistics from preceding games and use those as features.The idea is thus that a coach, medical support staff member, or even a player him- or herself, could then enter their accumulated statistics on a given day (the “intervention point”) into my model and see what the likelihood would be that playing on that day could precede an injury.

In my case, the well-structured nature of baseball and prior familiarity with the dataset had assured me that my data were relatively clean, so the most urgent question confronting me was whether game statistics in fact contained any predictive information at all in relation to injuries.

although in many careers, the early forties are a highly productive time, the extreme physical demands of baseball mean that few players can continue to perform at the professional level that long.

Feature Engineering To hone the predictive power of my features, first I generated new features by applying different aggregation windows: for each player, I created separate features for each performance metric for one game preceding the intervention point, for the average of seven games preceding the intervention point, and for the player’s entire career.

For a relatively casual baseball fan like myself, it is difficult to draw consistent, distinct categories of pitching style from expert commentary or from the statistical data that I had already collected.

projected the term frequency vectors I had created, which had a dimensionality on the order of the total number of terms present, onto a two-dimensional space using multidimensional scaling, which is meant to preserve the approximate relation of each of the pitcher descriptions to all of the others.

In the way that I set up the term frequency vectors, a single word can occur more than once because I accounted for the frequency of bigrams, or pairs of words occurring together, and trigrams as well as single words.

optimized the random forest hyperparameters to maximize the area under an ROC curve, which has two characteristics that make it better than accuracy score for this sort of situation: 1) the value of this metric is still meaningful with greatly imbalanced datasets - and there are many more games preceding noninjuries in baseball than games preceding injuries - and 2) how a risk-predicting application may be used is not necessarily known before deployment: avoiding false positives may matter more than avoiding false negatives, or vice versa.

The hyperparameters I focused on were the number of features each decision tree could choose from at each step in its creation and the maximum depth of those trees, or the total number of features that could be used in the classification of a single point.

although I saw little increase in performance beyond 300 trees, I settled on 1,000 because compute time was not limiting and having redundancy within the forest would not be expected to harm model performance.

The performance metric I chose to maximize with my grid search was area under the ROC curve, which has two characteristics that make it better than the standard accuracy score for this sort of situation: 1) the value of this metric is still meaningful with greatly imbalanced datasets - and there are many more games preceding noninjuries in baseball than games preceding injuries - and 2) how a risk-predicting application may be used is not necessarily known before deployment: avoiding false positives may matter more than avoiding false negatives, or vice versa.

The “injury score” output by the random forest model is notionally a probability of a particular set of feature values of indicating that an injury will occur, or more precisely the average of this probability across all of the decision trees in the forest, although depending on how one deals with the class imbalance in injury prediction problem, this interpretation is not necessarily correct.

To avoid forcing baseball players and coaches to deal with the intricacies of random forest output, the web application I designed compares the injury score for a given player’s input to all of the scores in the database used for the modeling and outputs the player’s injury score percentile, which should be readily understandable to many people.

Some users may distrust what seems like a data science black box, and to provide more persuasive analysis or explanation, I also use nearest neighbors analysis to identify games similar to the user’s entered values.

Incidence of Injuries in High School Softball and Baseball Players

During the 12-week season, the overall injury rates were low for interscholastic softball (5.6/1000 AEs) and baseball (4.0/1000 AEs) players.

These rates are higher than those indicated for high school softball and baseball players in recent reports.2,3 When we compared the injury rates between these sports, our results were in agreement with those of several authors3,6,13,15 but disagreed with those observed by Rechel et al4 in a larger, more geographically diverse study of softball and baseball players.

The distinction between the initial injury and subsequent injuries has been reported in only a few previous high school studies reporting injuries.9,13 We described our findings in this manner because the occurrence of an injury may be a risk factor for subsequent injury at the same site.16,17 We found that the initial injury rate was higher than the subsequent injury rate for both softball and baseball players.

Although we are not aware of any authors who have examined the effect of subsequent injury in these 2 sports with respect to AEs, our finding is in direct contrast to findings in high school cross-country runners.9,11 For comparative purposes, we calculated the initial and subsequent injury risks per 100 athletes.

This risk was not statistically significant and was probably due to the small sample sizes, yet this finding suggests that softball players may incur additional injuries and should be monitored closely after their initial injury for the remainder of the season.

How The Dodgers Are Using Baseball’s New DL Rules To Get An Edge

In a season when the Los Angeles Dodgers are dominating everything in sight, they also lead the majors in a less praiseworthy category: trips to the disabled list.

And the Dodgers are certainly leading the way in this practice, which became much easier to pull off after a rule change this season shortened the length of a short-term DL stint from 15 days to 10.

So teams have come up with all sorts of ways to overcome roster-size limitations, ranging from sending an endless churn of relievers back and forth between triple-A and the majors to creating potential dual-role position player-pitcher hybrids.

Here’s a chart showing MLB teams’ use of the short-term disabled list by season since 20091 (for comparison’s sake, I included uses for the short-term DL’s longer, 60-day brother).

Since Andrew Friedman left the Tampa Bay Rays to become president of baseball operations for the Dodgers after the 2014 season, the team has led the league in short-term DL stints every year (only once during those three seasons did the Dodgers have the most long-term DL stays —

Increases in injuries described with the words “fatigue,” “tightness” or “strain” have together accounted for almost 50 percent of the total jump in short-term disabled list trips since 2014.

Unless baseball became a lot more tiring and stressful in the past three years, it seems as though teams may be exaggerating small issues in an effort to free up roster spots.

Extended rest (five or more days) seems to reduce the probability of a serious injury by 20 percent, so a smart team might try to frequently rest fragile starters to minimize the risk that they will become severely hurt.

And the Dodgers seem to have gotten their money’s worth: They racked up more disabled list trips than any other team in the league in 2016, even if you focus only on the 60-day list (for which there is no tactical value to overuse).

But despite all that missed time, the Dodgers’ rotation has also been very successful, earning the second-most wins above replacement in baseball since 2014.2 As if harkening back to his career with the low-budget Rays, Friedman managed to put together one of the league’s best starting units using cheap talent and a clever strategic advantage.

STL@ATL: Winkler exits the game with an arm injury

Daniel Winkler grabs his arm after a pitch and exits the game with an injury in the top of the 7th inning Check out for our full archive of ..

ALCS Gm3: Bauer leaves game with finger wound in 1st

Trevor Bauer has to leave the game in the bottom of the 1st inning when his previous finger wound begins to bleed while pitching Check out ...

BOS@TOR: Umpire DiMuro leaves game with injury

Devon Travis fouls a ball straight back into the mask of home plate umpire Mike DiMuro, who visits with the trainer and leaves the game Check out ...

8 Athletes Who Died On Camera

1. Patrick Ekeng (26 March 1990 – 6 May 2016) Seven minutes after his entrance, with his team leading 3–2, he collapsed. He was transported and resuscitated ...

COL@LAD: Pitcher Gurka plays right field in the 16th

9/15/15: With the Rockies out of position players in the 16th inning, pitcher Jason Gurka plays right field for an injured Carlos Gonzalez Check out ...

LAD@MIL: Pederson exits after crashing into the wall

Joc Pederson makes an outstanding catch on the run in center, but immediately crashes into the wall and has to leave the game Check out ...

KC@BAL: Tempers flare for Yordano Ventura and Manny Machado

After taking exception to Yordano Ventura's inside pitches in the 2nd, Manny Machado is hit by a pitch in the 5th, leading to an altercation Check out ...

Saunders breaks arm while throwing a pitch

5/26/99: A bone in Tony Saunders' arm breaks as he throws a pitch, resulting in him being carted off the field Check out for our full archive ..

BOS@OAK: Norris gets hit on backswing, leaves game

6/22/14: Derek Norris takes a backswing to the head and leaves the game in the 10th Check out for our full archive of videos, and ..

MIA@ARI: Peralta beaned, retaliation spurs ejection

7/22/15: A D-backs coach is tossed after David Peralta is beaned in the 6th, and Dominic Leone gets tossed for retaliating in the 7th Check out ...