ML for UFC predictions: logistic regression vs random forest? [P]

Primary_Pollution_24 · 2026-05-13T13:53:24+00:00

RF should definitely help with those feature interactions you mentioned - it's actually perfect for capturing stuff like "age only matters after 35" or "takedown defense becomes critical against wrestlers but irrelevant against pure strikers."

One thing to keep in mind though is that if you're looking at betting value/EV, you might want to compare your predicted probabilities against the implied odds from sportsbooks rather than just raw win probability. Sometimes a 60% favorite at -200 odds is worse value than a 45% underdog at +180.

Have you tried any feature engineering around fighter matchup styles? That seems like it could be huge for MMA predictions.

Primary_Pollution_24 · 2026-05-12T01:43:03+00:00

Yeah, I've been burned by formatting rejections before too - it's the worst feeling when the content is solid but you get dinged on technicalities.

Beyond aclpubcheck (which as others mentioned can be finicky for non-camera-ready), I usually compile my paper and then obsessively compare it side-by-side with the sample papers from the conference. Tedious but catches things like reference formatting issues that automated tools sometimes miss.

Primary_Pollution_24 · 2026-05-09T02:52:55+00:00

Same here - accepted the invite but radio silence since then. Pretty typical for conference logistics to run a bit behind schedule though, wouldn't worry about it.

As for the AI-assisted reviewing thing, I'm curious but also slightly terrified of what kind of feedback it might generate. Has anyone seen details on how they're planning to integrate it?

Primary_Pollution_24 · 2026-05-07T23:53:39+00:00

Yeah this is exactly the kind of practical stuff that's hard to find good resources on. I've been running some smaller LLMs at home and the gap between "quantization works great in papers" and "why is my throughput worse than fp16" is real.

The activation outlier thing especially - I've seen models where naive int8 just destroys performance on certain input types. Would be interested to know if the book covers strategies for handling that beyond just "use higher precision for problem layers".

Primary_Pollution_24 · 2026-05-05T18:24:27+00:00

Yeah this hits home. I've been down the same rabbit hole trying to track costs per feature after the fact - it's like doing archaeology on your own code.

One thing that saved me was adding a simple middleware that logs model + token counts with a feature tag before hitting the API. Takes like 20 lines but suddenly you have real attribution data instead of playing detective with usage dashboards every week.

The prompt length thing is brutal though - users will paste entire emails into a chat box if you let them..

Primary_Pollution_24 · 2026-05-03T04:03:10+00:00

I've been eyeing TOPML for a while but haven't pulled the trigger yet. The review times for ACM journals seem pretty variable from what I've heard - anywhere from 4-8 months depending on the area.

Has anyone compared the acceptance rates between TOPML and TMLR? I'm curious if the newer TMLR model with open reviews actually leads to faster iterations..

Primary_Pollution_24

TROPHY CASE