New Key Feature: Adjusted Performance

Alright, so I'm finally ready to talk about my new key feature: Adjusted Performance. This has been one of those "why didn't I think of this sooner" moments, but also "holy crap this is a nightmare to implement." Let me give you a quick taste of what it looks like mathematically:

fighter1_stat_adjperf = (fighter1_stat − fighter2_stat_opp_avg) / fighter2_stat_opp_sdev

We'll call this f1_stat_adjperf for short. The big idea is to measure how much a fighter's performance in any given fight exceeded or fell short of what we'd expect their opponent to allow. If you go in there against some unstoppable jab machine who normally forces everyone to eat 50 jabs per round, but you manage to hold them to only 20, well, that's huge. But if you never do the math to figure out what your opponent's "baseline" is, you'll just record "Fighter1 absorbed 20 jabs" and maybe think it's not that great. Meanwhile, that's incredibly good compared to the 50 jabs everyone else took. Hence, adjusted performance.

Understanding the _opp Suffix

Before going any further, let's talk about the _opp suffix. First, _opp is the post-fight stats your opponent did against you. So if you see something like f2_stat_opp_avg, that means the stat is referencing "what your opponent's opponents did against them." For example:

  • f1_stat_opp_avg: The average of your opponents' performances against you in all their fights.
  • f2_stat_opp_avg: The average of your opponent's opponents' performances against your opponent in all their previous fights.
  • f2_stat_opp_sdev: The standard deviation of your opponent's opponents performance in those same fights, giving us a measure of the volatility or variability in their performance.

So, if your opponent has 40 strikes landed against them then f2_strikes_opp_avg is 40.

This means, to measure your adjusted performance, you compare your post-fight stat in the fight to their pre-fight _opp_avg, and then scale it by pre-fight _opp_sdev. If you get 60 strikes against someone who averages getting hit 30 times, you're 1 standard deviation above average against that person.

Why It's So Valuable

Let's be honest: raw stats can lie, or at least mislead. If a fighter's output is just "I landed 30 strikes," that doesn't tell me how many strikes they should have been able to land. If they were fighting an iron-clad defensive wizard who typically only allows 10 strikes, then landing 30 is insane. On the flip side, if your opponent is basically an open punching bag who gives up 60 strikes on average, then landing 30 is actually pretty weak.

Adjusted performance changes the game: it says, "30 strikes might be good or bad, but let's see how it compares to what your opponent usually allows." Then, for even more nuance, it's scaled by the opponent's historical standard deviation—so you don't artificially inflate your stats just because you faced someone with wide variability on a certain stat.

The Complexity: Where Do You Even Get These Numbers?

The problem with pulling off something like f1_stat_adjperf is that you actually have to calculate your opponent's _opp_avg and _opp_sdev. In other words:

  • Grab all your opponent's previous fights.
  • For each of those fights, figure out the stats they allowed to their opponents.
  • From that, compute the average allowed stats and the standard deviation of those stats.
  • Then bring that back into your current fight to see how well or poorly you did.

This is where the data pipeline can get insane, because you might have a fighter who has 15 or 20 fights, each with a different set of opponents. Some of those opponents have 30 fights apiece. Doing this for every fighter means you need to traverse a huge web of fight stats.

If you run a naive approach—like just using Panda's groupby and merges all day—it becomes unbelievably slow at scale. This is why I had to do some major refactoring, rewriting a chunk of my data pipeline to pull from a properly indexed database (Postgres, in my case). Once your data is properly structured, it's much faster to do these calculations in a single pass or via specialized queries, rather than stumbling around in memory merges.

Time-Decayed Averages & Time-Decayed StdDev

But wait—there's more. I decided to do a time-decayed average (and corresponding standard deviation) with a 1.5-year half-life. That means that a fight 3 months ago is given a lot more weight than a fight 5 years ago, which is basically ancient history in fight years.

Now the complexity is multiplied by, like, a factor of 10. Because to get f2_stat_opp_avg, I can't just average everything your opponent has done; I have to:

  • Grab each fight's stat.
  • Weight it exponentially based on how recently it happened.
  • Sum up those weighted stats.
  • Divide by the sum of the weights.
  • Then do it all over again for the standard deviation.

And let's not forget: we do this for every single fight, across thousands of fights, across hundreds of fighters. That's why I always say data engineering is half the battle.

Why Bother?

So why go through this code-wrangling fiasco when your standard raw stats might be "good enough?" Because fights are context-dependent. If you have a stand-up specialist with insane takedown defense, but no one has tested it in years, your raw stats might not reflect the real picture of how they handle a brand-new style. By blending in adjusted performance stats, you're no longer stuck just describing how many strikes or takedowns a fighter landed; you're describing how well they did compared to what their opponent typically experiences—and you're discounting or boosting older fights according to their recency.

This is how we start to see nuanced differences that no raw stat alone can show. It's the difference between "Fighter A landed 40 strikes" vs. "Fighter A forced a 1.5 standard-deviation drop in Fighter B's typical striking output." That second statement captures so much more power. If you can integrate these insights into your model, you get a far more realistic prediction of how a matchup might turn out.

MMA-AI.net v5 New Years Updates

v5 beta is *might* be done in time for the next event. I'm 100's of hours in. The data processing and model training is basically done, I just need to figure out the final feature set and do some tuning. Then I need to write the future prediction code for creating and cleaning the data of future fights. Finally, we're beating Vegas accuracy/log loss without including the odds or rating/ranking stats like an Elo score. That's not to say I won't include those in the future, but it's a great sign that the fundamentals of the model are improving. I would like to thank my $200/mo ChatGPT o1 Pro subscription for making this possible. That model absolutely rules at complex code, especially statistics and math.

Here's a recent training run to give you an idea of current levels of performance. This isn't the final model performance, it's just from tinkering with the feature set and training parameters:

Model Performance:
Training log loss: -0.611
Validation log loss: -0.575
Test Log loss: -0.612
Training accuracy: 0.671
Validation accuracy: 0.698
Test accuracy: 0.691

Basically, we're seeing somewhere around 68% to 69% accuracy on last year's fights (Vegas is currently about 67% accurate) that the model has never seen before with excellent log loss. Log loss is the ability of the model to accurately predict the chance a fighter will win. For example, if in its training data it predicts 50 fighters to have a 70% chance to win, and those fighters won 68% of the time, then the log loss is nicely calibrated. This is what allows us to check the EV of AI predictions versus Vegas odds.

Changes:

  • Total rewrite
    • Switched from just pandas dataframes (SLOW) to Postgres SQL database (FAST)
    • Greatly improved code readability, design, and efficiency for easier future updates
  • Features
    • Total feature overhaul
    • Per minute
      • Uses Bayesian Posterior Mean to smooth the outliers, zeros, and noise reduction
    • Accuracy/defense
      • Uses Bayesian smoothing with a Beta prior to smooth the outliers, zeros, and noise reduction
      • Priors are calculated on a historical pre-fight per-weightclass, per-stat basis
    • Ratio
      • Now does bounded ratio to prevent division by zero and outlier ratios
    • Average
      • Now includes time decayed average rather than recent average

The reason we do all this smoothing is because when a fighter attempts 0 submissions and lands 0 submissions, giving them 0% accuracy is an unrealistic measure of what their submission accuracy would've been had they attempted any submissions. With smoothing it looks more like this:

Subs_land / subs_attempted = accuracy
0 / 0 = 20% acc (similar to historical weightclass average submission accuracy)
0 / 1 = 18% acc
0 / 10 = 2% acc
These are just examples, not the actual numbers but you get the idea. It punishes fighters who never land any attempts but doesn't overpunish them which means future stats that are layered on top will be more realistic to their actual performance based on sparse data.

Second, no more recent averages over an arbitrary number of fights or dates. Time decayed averages are where it's at. Set a half life, like 1 year. Fights within this 1 year account for, let's say, 50% of the time decayed average. Fights the year prior account for 25% of the time decayed average, etc. This gives a much more precise measure of the fighter as he stands today rather than letting how he stood 5 years ago affect his current stats.

All of these changes above were nice and showed a moderate increase in the model's reliability. However, the real coup de grace was a final layer of statistical analysis over the stats that solved the following problem: if you fought nothing but cans 20 fights in a row your stats would make you look better than Jon Jones. But if you fought nothing but top flight competition 20 fights in a row, your stats would look average to below average despite the fact you'd crush the can crusher. The main way people have solved this has been ranking or rating systems like Elo scores. While this is pretty effective it doesn't solve the core problem of the fighter's individual fight stats being a kind of isolated measurement of the fighter that lacks perspective. How do you turn these core, fight by fight stats into an interconnected web where each individual fight's stats can inform and affect the stats of other related fights?

I'll go into detail about the exact math I did to solve that problem along with the mathematical implementation of the Bayesian smoothing on my Patreon for subscribers here soon. See you next event, hopefully!

The Art of Not Sucking at AI: A Post-Mortem of My Model's Spectacular Face-Plant

After watching UFC 309 systematically demolish my model's predictions with the precision of a prime Anderson Silva, I've spent the past week in a caffeine-fueled debugging frenzy. Between muttering obscenities at my terminal and questioning my life choices, I've had some genuine epiphanies about AI development that might save others from my particular flavor of statistical hell.

The Problem with Yes-Man AI

Large Language Models are like that friend who encourages your 3 AM business ideas. "A blockchain-based dating app for cats? Brilliant!" This becomes particularly dangerous when you're knee-deep in feature engineering and looking for validation rather than criticism.

After extensive testing, I've discovered something fascinating about GPT o1's mathematical capabilities. While most LLMs give basic statistical approaches, GPT o1 can dive deep into complex statistical problems. But the real breakthrough came from building an AI feedback loop: get statistical approaches from GPT o1, feed them to Claude for implementation (it writes cleaner code), then feed Claude's code back to GPT o1 for validation.

Even debugging has improved. When Claude's code throws an exception, feeding the error back works once. But for persistent issues, asking Claude "what do you need to debug this error?" is far more effective. It responds with diagnostic code that, once fed with real data, leads to actual fixes rather than band-aids.

This iterative process, combined with extensive prompt engineering and lots of sample data to help GPT o1 truly understand the problem domain, has led to the first major mathematical breakthrough in V5's development: our new Bayesian approach to handling fight statistics.

Bayesian Beta Binomial: The Zero Division Solution

This is only one of many many improvements in V5 but I find it super interesting so I'm writing about it and fuck you if you don't want to hear about it. Let's dive deep into how we handle the dreaded divide-by-zero problem in fight statistics. When calculating success rates (like submission accuracy or strike accuracy), we use Bayesian Beta Binomial analysis to provide meaningful priors that smoothly handle edge cases.

The approach works like this: Instead of naive division that breaks on zeros, we model each ratio as a Beta distribution where:

  • α (alpha) represents successful attempts plus prior successes
  • β (beta) represents failed attempts plus prior failures
  • The posterior mean (α / (α + β)) gives us our smoothed estimate

For example, with submission accuracy:

submission_accuracy = (submissions + α) / (submission_attempts + α + β)

We determine our priors (α, β) through empirical Bayes by analyzing the historical distribution of success rates across all fighters. These priors vary by stat type, reflecting the different base rates we see in MMA:

  • Submissions: Lower α and higher β values reflecting their relative rarity
  • Strikes: More balanced α and β values reflecting higher occurrence rates
  • Takedowns: Intermediate values based on historical success rates

This approach elegantly handles three critical cases:

  • Zero attempts: Returns the prior mean (α/(α+β))
  • Small sample sizes: Heavily weights the prior
  • Large sample sizes: Converges to the empirical rate

To understand why this matters, consider how submission accuracy was traditionally handled: A fighter attempting 10 submissions and landing none would be assigned 0% accuracy. This creates two problems: it skews averages downward, and when comparing fighters (fighter1_sub_acc / fighter2_sub_acc), we risk another divide-by-zero error.

Our Bayesian approach instead provides more nuanced, realistic estimates. For example:

  • 10 attempts, 0 successes = 3.5% accuracy
  • 9 attempts, 0 successes = 3.8% accuracy
  • 8 attempts, 0 successes = 4.1% accuracy

This prevents over-punishing unsuccessful attempts while ensuring we never hit true zero. The accuracy gradually increases as sample size decreases, reflecting our increasing uncertainty with smaller sample sizes.

The V5 Redemption Arc

For V5, we're continuing to embracing AutoML (specifically AutoGluon) to eliminate the uncertainty in model optimization. Through V1-V3, I spent countless hours manually tuning gradient boosted algorithms like XGBoost and CatBoost. While I learned an enormous amount about hyperparameter optimization and various other tuning, I was never entirely confident I was meeting professional machine learning engineering standards.

AutoML removes that uncertainty. It systematically explores model architectures and hyperparameters in ways that would take me months to do manually. I still do significant tuning, but now it's faster and more reliable. No more wondering if I missed some crucial optimization or made rookie mistakes in the model architecture.

What This Means For Users

MMA-AI.net will continue hosting Model V4 predictions while V5 is under development. However, there won't be any improvements to the current model during this transition. If you want to take a step back and not ride with me for a month or two while I finish V5, I completely understand. This isn't about quick fixes - it's about building something that actually works.

The Bottom Line

Sometimes you need to lose six parlays in a row to light a fire under your ass and actually learn some math. But at least we're failing forward, and V5 will be built on more solid foundations.

P.S. To the UFC 309 fighters: those 3 AM tweets weren't personal. It was just the model and the Monster Energy talking.

Success! Then Failure?

I'm absolutely furious. UFC 309 just wiped out 6 units with 6 consecutive parlay losses. After four years of development, countless sleepless nights, and what I thought was a breakthrough in betting strategy, reality hit hard.

The Journey to Version 4

The evolution of MMA-AI.net has been a marathon. Version 1 took two years of meticulous development. When ChatGPT emerged, I saw an opportunity to rebuild everything from the ground up. Version 3 emerged after three months of intense work—I'm talking 5 AM bedtimes, night after night.

Then came Version 4, inspired by Chris from wolftickets.ai. His analysis revealed something fascinating: parlays could be more profitable than single picks when accounting for odds. This insight led to another complete overhaul, this time leveraging autoML to eliminate inefficiencies and using ChatGPT for guidance. The results were stunning: 50-60% ROI over six months.

Success Breeds Complacency

Those six months of success? They made me soft. I coasted. My initial $270 investment had grown to $13,000 purely through profit reinvestment—no additional capital needed. The model was working so well that I focused instead on rebuilding this website using Cursor, an AI-powered IDE. Despite knowing nothing about web development, HTML, or CSS, I managed to transform MMA-AI.net from an ad-riddled eyesore into what you see today.

Traffic surged to all-time highs. Everything seemed perfect. Then UFC 309 happened.

The Wake-Up Call

Six parlays. Six losses. 100% drawdown. The rage I'm feeling isn't just about the money—it's about the realization that I've been resting on my laurels. Those six months of coasting caught up with me in the most painful way possible.

This isn't the end of MMA-AI.net. Honestly, it's just what I needed to get my ass in gear and learn some math. I'm already planning the next evolution.

Stay tuned for my next post about where we go from here.

Welcome to the new MMA-AI.net

After three years of development, thousands of hours of feature engineering, and endless testing, I finally redid the stupid website. Goodbye ads, hello pretty new design. This platform represents what I believe to be one of the most sophisticated sports prediction models available today.

The past 5 months have been particularly exciting as we've cracked one of the final pieces of the profit-maximization puzzle: the betting strategy. Chris from wolftickets.ai and I knew there had to be a better approach than simply betting straight picks to maximize ROI. Chris was the first to discover it: parlays.

Since both of our models maintain significant accuracy and log loss advantages over Vegas, we can multiply that edge using parlays. Why? Because parlay odds aren't additive—they're multiplicative. Through extensive testing, we've found that 3-leg parlays offer the optimal balance of risk and reward. While 4-leg parlays actually showed higher ROI in testing, their boom-or-bust nature led to more extreme bankroll swings and higher bankruptcy risk. Since implementing the 3-leg strategy five months ago, the model has achieved a 50% ROI.

The Parlay Strategy

Our approach is straightforward: randomly selected AI picks combined into parlays, with no fighter appearing more than twice to prevent single-fighter dependency. The most common question I get is "Why not just parlay the +EV fighters together?" Well, I've tested hundreds of parlay permutations: underdogs only, favorites only, +EV only, 1-5 leg combinations—every variation imaginable—against a year of unseen fight data. Surprisingly, the +EV-only strategies consistently underperformed compared to randomly selected AI pick parlays.

This might seem counterintuitive, but it likely stems from how we're solving a binary classification problem. Our models excel at distinguishing wins (1) from losses (0), but may be less refined at setting precise win probabilities. I train on log loss, not accuracy—log loss being a metric that heavily penalizes confident mistakes while rewarding confident correct predictions. You can see evidence of this in the calibration curve on our About page.

But honestly? I don't care about the "why." Too many people get tunnel vision focusing on EV, which makes sense if you're working without a mathematical model like MMA-AI.net. But when you have a proven model with demonstrable advantages over Vegas odds, the only metric that matters is ROI. And our testing shows that random AI pick parlays consistently deliver the highest returns.

So here's to the new site, the new strategy, and the new model. You can find my predictions posted here on the home page and occasionally on Twitter/X or https://reddit.com/r/mmabetting before each event.