After countless requests from the community, I'm finally pulling back the curtain on how our UFC fight prediction algorithm works. This isn't just another black-box system—it's a carefully engineered, multi-level approach to fight analysis that's consistently outperforming Vegas odds.
Why Share Our Secret Sauce?
Many have asked why I'd reveal the inner workings of a profitable system. The truth is simple: I don't believe in keeping this knowledge secret. Much of this algorithm was developed with the assistance of AI tools that generated about 90% of my code. As someone who's been an open source developer for 15 years, I firmly believe that open knowledge sharing accelerates human progress. I don't have a moat to protect—and frankly, the more people who understand these concepts, the faster our collective knowledge will advance. Sports prediction is hard enough that even with this blueprint, you'll still need to invest thousands of hours to truly master it.
The Four Levels of Fight Prediction
Level 1: Foundation Stats + Bayesian Gamma-Poisson Smoothing
We start with raw fight statistics—strikes thrown, takedowns landed, control time, and dozens more. But raw numbers can be misleading, especially for rare events like submissions. This is where Bayesian Gamma-Poisson smoothing comes in.
Imagine a fighter who's attempted only one submission in their career and landed it. Is their submission accuracy really 100%? Probably not. Gamma-Poisson smoothing helps us balance observed data with prior knowledge, preventing outliers from skewing predictions. For high-volume stats like significant strikes, the smoothing effect is minimal. But for rare events like submissions or knockdowns, it provides crucial stability by "pulling" extreme values toward more realistic expectations based on the fighter's division and overall UFC averages.
Level 2: Comparative Analysis
Next, we calculate a fighter's efficiency metrics: accuracy (how often attacks land), defense (how often they avoid opponents' attacks), output per minute, and the ratios between fighters across all statistics. This gives us a clearer picture of how fighters perform relative to each other, beyond just counting actions. A fighter might land fewer strikes overall but have significantly higher accuracy—a crucial distinction our model captures.
Level 3: Time-Weighted Averages + Variability
MMA evolves rapidly, and a fighter today isn't the same as they were three years ago. We calculate both standard and time-decayed averages with a 1.5-year half-life for all statistics. This means a fight from three months ago has far more impact on our predictions than one from three years ago.
Additionally, we measure the variability of these stats using Median Absolute Deviation (MAD) instead of standard deviation. MAD is less affected by extreme outliers, providing a more stable measure of how consistent a fighter's performance has been.
Level 4: Adjusted Performance (AdjPerf)
Our most sophisticated metric answers a critical question: how does a fighter perform against a specific opponent compared to how that opponent's previous adversaries performed? This z-score normalization looks like:
stat_adjperf = (fighter1_stat - fighter2_stat_prev_opp_avg) / fighter2_stat_prev_opp_mad
In plain English: we're measuring how much better or worse a fighter performed against their opponent compared to what we'd expect based on the opponent's history. If a fighter lands more strikes against someone who's historically difficult to hit, that's far more impressive than landing the same number against someone who's easily hit. AdjPerf captures this crucial context.
Building the Prediction Model
With thousands of engineered features from these four levels, we use Autogluon (thanks to Chris from Wolftickets.ai for this recommendation) to train our prediction model. Using presets like "experimental" and time-ordered data splitting (typically 80/20 or 90/10 train/test), we run extensive cross-validation to ensure reliability.
There's no definitive guide to sports prediction—we've conducted thousands of hours of testing to find what consistently beats Vegas odds. Everything from feature selection to hyperparameter tuning requires constant experimentation and refinement. What works for NFL might not work for UFC, and what worked last year might not work next year.
The final model combines about the 30 best of these features, weighted according to their predictive power. The result is a system that consistently identifies value bets where our predicted win probability exceeds what Vegas odds imply.
The Practical Limitations of ML Probabilities
One of the humbling lessons I've learned over the years is about the relationship between machine learning and betting strategy. Despite the sophisticated nature of our model, I've had to accept that ML algorithms have inherent limitations when it comes to probability calibration. While traditional betting approaches rely heavily on expected value (EV) calculations, I've discovered that applying these same principles directly to ML outputs can be problematic. Our tabular ML models excel at binary classification—essentially determining which fighter is more likely to win—but their confidence scores aren't necessarily true probabilities in the statistical sense. Even when optimizing for log loss (which theoretically improves probability calibration), there remain subtle biases and distortions in how these models estimate probabilities. Through trial and error, I've found that treating the model's outputs as relative confidence levels rather than exact win probabilities leads to more consistent results. Instead of rigidly applying EV formulas, we use confidence thresholds as a filtering mechanism to identify promising bets. This pragmatic approach acknowledges the model's strengths in pattern recognition while respecting its limitations in precise probability estimation—a balance that has proven more effective than assuming perfect calibration.

Visualization of our model's prediction distribution compared to actual outcomes, showing the effectiveness of our approach despite calibration challenges.
Why Current LLM-Based Fight Predictions Fall Short
I've extensively tested Large Language Models (LLMs) like GPT-4 for fight predictions, and the results have been consistently disappointing. The fundamental limitation is clear: today's LLMs lack the ability to analyze fight footage. They're trained primarily on text data, which means they miss crucial visual information—a fighter's movement patterns, subtle defensive vulnerabilities, changes in stance, or signs of fatigue that only appear on video. Statistical data can tell us a lot, but the visual dimension of fighting contains irreplaceable insights that no spreadsheet can capture. The good news? This limitation is temporary. Within the next 1-2 years, we'll see multimodal AI systems trained on vast libraries of video footage, capable of analyzing thousands of fights frame-by-frame. When that happens, AI fight prediction will undergo a revolutionary leap forward. Until then, the most effective approach remains combining sophisticated statistical modeling with human expertise for context and interpretation.
Conclusion: The Never-Ending Journey
MMA prediction remains as much art as science. While our technical approach provides an edge, the sport constantly evolves, and so must our model. Every event brings new data, every fighter brings new patterns, and our system continually adapts to these changes.
Whether you're a casual fan or a serious bettor, I hope this behind-the-scenes look helps you appreciate the depth of analysis that goes into each prediction you see on MMA-AI.net. And if you're inspired to build your own model? Even better—innovation thrives when knowledge is shared.