Repository logo
 

Machine Learning-based Detection of Pitching Patterns in Major League Baseball: An Analysis of Pitch Metrics Prior to Ulnar Collateral Ligament Reconstruction

Date

Authors

Ozaki, Ryotaro

Supervisor

McGuigan, Mike
Whatman, Chris

Item type

Thesis

Journal Title

Journal ISSN

Volume Title

Publisher

Auckland University of Technology

Abstract

Purpose: Ulnar collateral ligament reconstruction (UCLR) is the most prevalent surgically treated injury among Major League Baseball (MLB) pitchers, yet early detection of pre-surgical biomechanical deterioration remains limited. The purpose of this study was to develop and evaluate an unsupervised machine learning-based anomaly detection framework capable of identifying multivariate pitching metric changes in MLB pitchers in the period preceding UCLR. Methods: Pitch-tracking data from Statcast were collected for 46 MLB pitchers who underwent primary UCLR between 2016 and 2024. Pitcher-specific vanilla autoencoders were trained on game-level aggregated pitching metrics spanning a 400-day baseline window (200–600 days before last appearance) and applied across a 200-day detection window (0–200 days before last appearance). Reconstruction error served as the anomaly detection metric. Six pitch types were analysed: four-seam fastball, sinker, slider, cutter, changeup, and curveball. Five percentile-based reconstruction error thresholds (90th to 99th) were evaluated. As an exploratory validation analysis, a propensity score-matched control group of 45 non-UCLR pitcher pairs was employed to contextualise the specificity of the pre-surgical signal. Results: Mean per-game reconstruction error escalated from 0.877 (151–200 days before last appearance) to 2.326 (0–50 days), representing a 2.7-fold increase with a broadly monotonic trajectory. At the 95th percentile threshold, anomaly rates were 67.2%, 76.4%, 71.0%, and 73.6% across the 151–200, 101–150, 51–100, and 0–50 day bins respectively. Escalation ratios increased monotonically with threshold stringency (p90: 0.98; p99: 1.18), with the most extreme deviations concentrated in the final 50 days. Feature-level analysis identified a two-phase deterioration structure: an early phase characterised by elevated slider movement and cumulative workload errors, followed by a late phase dominated by a 24-fold escalation in changeup usage error and sharp increases in rest interval deviation. In the exploratory matched control comparison, UCLR pitchers showed a statistically significant difference in median reconstruction error relative to matched controls in the proximate pre-surgical window (W = 1245, p = 0.036), with no equivalent directional escalation observed in controls. A Bonferroni-corrected comparison of multivariate and univariate detection identified only marginal gains from multivariate encoding in detection sensitivity, though the autoencoder provided structural interpretability and coherent feature-pattern identification not available through univariate monitoring. Conclusion: Pitcher-specific autoencoder models applied to Statcast pitch-tracking data can identify progressive multivariate deterioration in the period preceding UCLR, with a signal that appears at least partially specific to the pre-surgical period relative to matched controls. These findings suggest that routine monitoring of individualised pitching profiles may provide a clinically actionable detection window prior to UCL failure.

Description

Keywords

Source

DOI

Publisher's version

Rights statement

Collections