Not at this time. I've spent so many hours investigating this in my free time. ROI is our north star, with logloss and accuracy being secondary metrics. This isn't a perfect test because the betting markets shift as time moves. For example, 2016 saw Vegas favorites win only 61% of the time. 2024 saw favorites win 70% of the time and UFC red corner is usually the betting odds favorite. So this isn't the end-all be-all conclusion, but it is absolutely the point in time conclusion.
The long and short of the massive amount of trial and error is this backtested ROI based on the last year of fights the model has never seen:
ROI Comparison: Balanced vs Unbalanced Training
ROI of balanced 50/50 fighter1 win/lose:

ROI of unbalanced 59/41 fighter1 (UFC assigned red corner) win/lose:

Based on how I do feature engineering and model tuning, balancing the fighters' win rate before training the model is terrible. If I do some various feature and model tuning I can see the balanced model start to perform closer to the unbalanced model but on average it is basically always lower ROI. I have interrogated my code a thousand times with Claude4 Opus and Gemini 2.5 Pro to try to suss out any logical errors and I cannot find any.
Why Does Balanced Training Destroy Performance?
*Begin speculation*
Distribution shift between training and inference creates systematic probability miscalibration. When you balance the dataset to 50/50, you're teaching the model that P(fighter1_wins) = 0.5 across all feature combinations. But in reality, fighter1 (red corner) wins ~60% of the time because the UFC systematically assigns the red corner to champions in title fights and generally more experienced/favored fighters in regular bouts.
The calibration mismatch:
- Training: Model learns P(fighter1_wins | features) where the base rate is artificially 0.5
- Inference: Model predicts on data where the true base rate is 0.6
This specifically destroys betting ROI because:
- Systematic underestimation of favorites: When a red corner fighter should win with 70% probability, your balanced model might predict 60%, causing you to miss profitable favorite bets
- Systematic overestimation of underdogs: When a blue corner fighter should win with 30% probability, your model might predict 40%, leading to negative EV underdog bets
- Market inefficiency amplification: The higher ROI without balancing implies that the model is learning this pattern of corner bias more efficiently than the market is.
The model's probability outputs are fundamentally miscalibrated for the real world's corner assignment bias. You're essentially training a model for a balanced fantasy UFC and then applying it to the systematically biased real UFC, where corner assignments carry predictive information about fight outcomes.
Bottom line: The balanced training throws away valuable signal (red corner = usually stronger fighter) and teaches the model incorrect base rates, leading to systematically poor probability estimates that destroy betting performance.
Doesn't Calibrating the Winrate Just Create a Proxy for the Odds?
No, and here's why that concern misses the point:
The model isn't learning "red corner = bet favorite." It's learning complex feature interactions from granular performance statistics that happen to correlate with corner assignments. The red corner correlation exists because the UFC assigns corners based on ranking, championship status, and experience - the same underlying factors that drive fight outcomes.
Key distinctions:
- Betting odds reflect public perception, line movement, and bookmaker risk management
- The model analyzes actual performance metrics: strike accuracy trends, takedown defense patterns, cardio indicators, opponent-adjusted statistics, etc.
- The value comes from divergence identification based on as much statistical information as possible. The corner bias is simply real world information that the model can learn to incorporate better than average bettors.
The unbalanced training preserves the signal that corner assignments carry meaningful information about fighter quality, information that's already baked into the real-world problem you're trying to solve. Throwing away that signal artificially handicaps the model's ability to make properly calibrated predictions.
What About Including the Odds?
This is a hotly debated topic in the sports betting community. Bill Benter, one of the fathers of algo betting, argued that the odds are best to be included because they encode so much information. Having been testing this hypothesis for many thousands of hours, my conclusion is that if you don't have a certain level of extremely high quality engineered features, then yes you should include the odds. But at a certain point, the features you engineers will actually encode more information in combination than the odds do. At that point, including the odds simply lowers the ROI.
Mathematical Mechanism
When you include odds as a feature, you're introducing a variable that represents the market's aggregated probability estimate: P_market(fighter1_wins). This creates several mathematical problems:
1. Feature Dominance and Prediction Convergence
Tree-based models will heavily weight the odds feature because it exhibits high mutual information with the target across the entire dataset. The model's predicted probabilities become:
P_model(fighter1_wins) ≈ α·P_market + (1-α)·P_stats where α > 0.5
This forces convergence toward market consensus.
2. Outcome Distribution Skew
Including odds biases the model toward predicting high-frequency, low-profit outcomes (favorites). Your engineered statistics without odds bias toward identifying low-frequency, high-profit outcomes (underdog value).
3. +EV Prediction Accuracy Inversion
The critical insight: ROI optimization requires accuracy specifically on profitable bets, not overall accuracy.
- With odds included: Model achieves ~74% accuracy but concentrates correct predictions on favorites (odds = 1.2-1.8, profit margin = 20-80%)
- Without odds included: Model achieves ~71% accuracy but concentrates correct predictions on underdogs (odds = 2.5-4.0, profit margin = 150-300%)
ROI Mathematics
Consider two hypothetical models:
- Model A: 75% accuracy, predicts favorites 85% of the time → Expected ROI ≈ -2% (high accuracy, low margins, vig erosion)
- Model B: 50% accuracy, predicts underdogs 60% of the time → Expected ROI ≈ +15% (lower accuracy, exponentially higher margins)
The fundamental equation:
ROI = Σ(accuracy_i × frequency_i × profit_margin_i)
Your empirical results demonstrate that excluding odds increases accuracy_underdog dramatically while slightly decreasing accuracy_overall. Since profit_margin_underdog >> profit_margin_favorite, the ROI optimization occurs through maximizing performance on the subset of predictions with the highest profit potential.
Information Compression Loss
Odds represent market consensus based on public information flow. Your engineered features capture orthogonal signals that specifically identify cases where statistical analysis diverges from public perception—exactly the scenarios that generate +EV opportunities. Including odds suppresses these divergent signals in favor of consensus alignment, destroying the edge that creates profitable betting opportunities.
The Vanity Metrics Problem in Machine Learning
You will see many other novice machine learning engineers practice their skills against sports prediction. They will calibrate their models against vanity metrics like accuracy in most cases. They will see evaluations like 85% accuracy, then not have any motivation to figure out why their model will fail in the real world then make a YouTube or Medium post about it.
This represents the fundamental disconnect between academic machine learning and profitable real-world application. The sports prediction space is littered with impressive-sounding accuracy claims that evaporate when subjected to actual betting markets. These engineers optimize for metrics that sound impressive in blog posts rather than metrics that generate alpha.
The vanity metrics obsession creates a particularly insidious blind spot: data leakage and overfitting become invisible when you're chasing high accuracy numbers. Consider the classic example in YouTube tutorials (https://www.youtube.com/watch?v=LkJpNLIaeVk) where a model achieved impressive accuracy predicting UFC fights using Elo ratings... except they used post-fight Elo ratings for training, meaning the model literally saw fight outcomes during training. This is textbook data leakage: using future information to predict past events.
Data Leakage Examples in Sports Prediction:
- Training on post-game statistics to predict game outcomes
- Including betting line movements that occurred after the event
- Using season-end rankings to predict mid-season games
- Including opponent-adjusted metrics calculated after the fight
The overfitting trap compounds this problem. Hyperparameters are the configuration settings that control how a model learns: things like learning rate, tree depth, regularization strength. I can easily tune these hyperparameters to achieve 78% accuracy on my validation set by optimizing for that specific data slice. But this creates a model that memorizes the validation set's quirks rather than learning generalizable patterns.
The feedback loop is broken: High accuracy on test data → immediate gratification → publish results → never discover real-world failure. There's no motivation to dig deeper because the vanity metric has been satisfied. The engineer never learns that their 85% accuracy model would lose money consistently because they never actually test it against betting markets.
Real-World Adversarial Markets
Real-world sports betting is an adversarial market where you're competing against:
- Professional odds compilers with decades of experience
- Sophisticated betting syndicates with proprietary data
- Market makers who adjust lines in real-time based on money flow
- The inherent vig that makes break-even betting a losing proposition
Simply achieving high accuracy on historical data means nothing if your predictions can't consistently identify mispriced markets. The difference between academic exercise and profitable application is the difference between predicting outcomes and finding edges.
Lessons from Four Years of Iteration
I'm four years into this space and I've made all the same mistakes. I've built models that achieved 78% accuracy and lost money consistently. I've spent months optimizing for log loss improvements that translated to worse ROI. I've fallen into every trap outlined above because the feedback loop between model performance and real-world profitability is opaque until you actually start tracking betting results over extended periods.
This is a far more complex problem than I ever initially thought 4 years ago, but I believe what's outlined above is one of the reasons I've been seeing almost 20% ROI over the past few years in my free, public predictions. The key insights around dataset balancing, odds inclusion, and ROI-focused optimization came from years of iterative failure and debugging, not from following standard ML tutorials.
The engineering rigor required to build profitable models demands treating accuracy as a vanity metric and ROI as the only metric that matters with the knowledge that betting markets shift as time goes on and you must constantly retrain the model to match the current zeitgeist. Hence why I'm on version 6.3 right now. I've redone 100% of the code 6 times now. If you want to avoid the same mistakes I've made, feel free to reach out. I'm an open book on exactly how I built this model, no secrets. And also, as usual, shoutout to Chris from Wolftickets.ai for being one of the very very few people in this ML for sports prediction field that actually shared his technical knowledge and saved me endless hours of wasting my time. I like paying that forward to others who are interested in this space.