Feature Engineering for Sports ML: From Raw Stats to Injury Predictions
Domain expertise encoded as code: 8 feature categories powering 12 ML models, from rolling windows to the cumulative fatigue score that predicts NBA injuries.
8 Feature Categories
1. Categorical Encoding: Position (Guard=0 to Center=4). 2. Rolling Windows: 7/14-day averages for minutes, points, rebounds. 3. Trend Features: Differenced values showing trajectory. 4. Schedule Density: Back-to-back flags, games in 5-day window. 5. Physical Load: speed × distance from tracking data — the key innovation. 6. Efficiency/Usage: Usage per efficiency = usage_rate / true_shooting_pct. 7. Game Context: Point differential, home/away, opponent strength. 8. Interaction Features: age×position, veteran_back_to_back, cumulative fatigue score.
The Cumulative Fatigue Score
The KEY composite feature: combines minutes spike (0-1 scaled), back-to-back count, rest days (inverted), games density, and shooting decline. high_risk_veteran_workload = (age>32) * (14d_minutes>400) * (fatigue>0.5). Significantly improves injury risk prediction.