Sharp Retriever

Transparency

What the model is, what data it uses, and how the record is kept.

Model overview

An XGBoost classifier trained on Retrosheet game logs from 2015–2024. Separate models for moneyline (winner prediction) and totals (over/under run totals). The models are retrained when meaningful new signal is added. Training history is kept in version control.

The market line is not an input feature. The model is built on baseball signal only. This is a deliberate design choice: we believe durable performance comes from knowing something the market underweights, not from comparing prices.

Data sources

Game resultsRetrosheet game logs (2015–2024)

Pitcher statsBaseball Savant Statcast (xwOBA, CSW%, swinging-strike %, barrel %, hard-hit %)

Bullpen stateRetrosheet play-by-play (appearances, IP, days of rest over 1/2/3 days)

Park factorsMulti-year run and HR park factors from publicly available data

UmpireHome-plate umpire run-scoring and strikeout tendencies from historical data

WeatherTemperature, wind speed, wind direction, precipitation, roof type

HandednessStarter hand from MLB Stats API; team batting splits vs LHP/RHP from Statcast

What we do NOT use

We do not use betting market prices, opening lines, line movement, public betting percentages, sharp/square money data, or any information derived from sportsbook markets as a model input. The model's output is independent of the market.

How picks are published

The pipeline runs six times per day between 11:30 AM and 10:00 PM UTC. Each run assembles the latest game-day features and scores today's games. Model picks are locked and published. Once published, a pick is never altered or removed — the side, the market, and the game date are immutable. Market odds shown on the card are the open-market price at publication time, not a projected line.

How the record is kept

Results are graded after game completion. Win / Loss / Push is determined by the published side against the final score. No retroactive exclusions, no sample trimming, no “all-time” numbers that omit difficult windows.

What we will not publish

No win-rate or ROI claims until we have a meaningful forward sample (minimum 200 graded picks from the current model version). No backdated backtests presented as live results. No simulated performance. If the model underperforms, the record will show it — positive or negative, the forward ledger is public.

About →Track Record →Start 7-Day Trial — $19

Need help? Contact us — we respond within 24 hours.