Repository logo
 

Machine Learning-based Detection of Pitching Patterns in Major League Baseball: An Analysis of Pitch Metrics Prior to Ulnar Collateral Ligament Reconstruction

aut.embargoNo
aut.thirdpc.containsNo
dc.contributor.advisorMcGuigan, Mike
dc.contributor.advisorWhatman, Chris
dc.contributor.authorOzaki, Ryotaro
dc.date.accessioned2026-06-18T21:49:43Z
dc.date.available2026-06-18T21:49:43Z
dc.date.issued2026
dc.description.abstractPurpose: Ulnar collateral ligament reconstruction (UCLR) is the most prevalent surgically treated injury among Major League Baseball (MLB) pitchers, yet early detection of pre-surgical biomechanical deterioration remains limited. The purpose of this study was to develop and evaluate an unsupervised machine learning-based anomaly detection framework capable of identifying multivariate pitching metric changes in MLB pitchers in the period preceding UCLR. Methods: Pitch-tracking data from Statcast were collected for 46 MLB pitchers who underwent primary UCLR between 2016 and 2024. Pitcher-specific vanilla autoencoders were trained on game-level aggregated pitching metrics spanning a 400-day baseline window (200–600 days before last appearance) and applied across a 200-day detection window (0–200 days before last appearance). Reconstruction error served as the anomaly detection metric. Six pitch types were analysed: four-seam fastball, sinker, slider, cutter, changeup, and curveball. Five percentile-based reconstruction error thresholds (90th to 99th) were evaluated. As an exploratory validation analysis, a propensity score-matched control group of 45 non-UCLR pitcher pairs was employed to contextualise the specificity of the pre-surgical signal. Results: Mean per-game reconstruction error escalated from 0.877 (151–200 days before last appearance) to 2.326 (0–50 days), representing a 2.7-fold increase with a broadly monotonic trajectory. At the 95th percentile threshold, anomaly rates were 67.2%, 76.4%, 71.0%, and 73.6% across the 151–200, 101–150, 51–100, and 0–50 day bins respectively. Escalation ratios increased monotonically with threshold stringency (p90: 0.98; p99: 1.18), with the most extreme deviations concentrated in the final 50 days. Feature-level analysis identified a two-phase deterioration structure: an early phase characterised by elevated slider movement and cumulative workload errors, followed by a late phase dominated by a 24-fold escalation in changeup usage error and sharp increases in rest interval deviation. In the exploratory matched control comparison, UCLR pitchers showed a statistically significant difference in median reconstruction error relative to matched controls in the proximate pre-surgical window (W = 1245, p = 0.036), with no equivalent directional escalation observed in controls. A Bonferroni-corrected comparison of multivariate and univariate detection identified only marginal gains from multivariate encoding in detection sensitivity, though the autoencoder provided structural interpretability and coherent feature-pattern identification not available through univariate monitoring. Conclusion: Pitcher-specific autoencoder models applied to Statcast pitch-tracking data can identify progressive multivariate deterioration in the period preceding UCLR, with a signal that appears at least partially specific to the pre-surgical period relative to matched controls. These findings suggest that routine monitoring of individualised pitching profiles may provide a clinically actionable detection window prior to UCL failure.
dc.identifier.urihttp://hdl.handle.net/10292/21435
dc.language.isoen
dc.publisherAuckland University of Technology
dc.rights.accessrightsOpenAccess
dc.titleMachine Learning-based Detection of Pitching Patterns in Major League Baseball: An Analysis of Pitch Metrics Prior to Ulnar Collateral Ligament Reconstruction
dc.typeThesis
thesis.degree.grantorAuckland University of Technology
thesis.degree.nameMaster of Sport, Exercise and Health

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
OzakiR.pdf
Size:
3.02 MB
Format:
Adobe Portable Document Format
Description:
Thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
890 B
Format:
Item-specific license agreed upon to submission
Description:

Collections