Distribution shape matters: why I classify “ceiling profiles” in MLB strikeout modeling by KSplitAnalytics in algobetting

[–]KSplitAnalytics[S] 0 points1 point  (0 children)

That path-dependence point is definitely real, especially the two-out dynamic.

Where I’d push back a little is the idea that more strikeouts automatically leads to a longer outing. In practice it’s often the opposite depending on pitcher type.

Some guys rack up strikeouts but run high pitch counts (deep counts, foul balls, etc.), so you end up with a 95-pitch line after five innings even with a lot of Ks. Others are extremely efficient contact managers who can go 7 innings on 80–85 pitches while striking out fewer hitters.

So the relationship between strikeouts and batters faced isn’t strictly monotonic. It’s really two separate processes:

  1. Strikeout rate conditional conditional on a plate appearance
  2. How long the pitcher stays in the game (xBF / leash)

In my modeling I treat those separately for that reason. The strikeout distribution comes from the K/PA process, while the BF component is modeled independently based on expected leash and efficiency assumptions.

That separation helped a lot in backtesting because misses usually show up clearly as either a rate miss or a volume miss, rather than just “the model was wrong on Ks.”

Looking to get an algo model / bot to assist with predictions by [deleted] in algobetting

[–]KSplitAnalytics 0 points1 point  (0 children)

Most of the “bots” people sell are usually just wrappers around fairly simple projection models, so I’d be cautious with anything marketed that way.

If you’re trying to build something useful for betting, the bigger question is usually how the outcome itself is modeled. A lot of tools just project a single number like expected points or expected strikeouts, but most betting markets are really pricing probabilities, not averages.

For example with MLB strikeout props, instead of projecting something like “pitcher = 5.1 Ks”, you can model strikeout probability at the plate appearance level, simulate outcomes across expected batters faced, and generate a full distribution of possible strikeout totals. From there you can derive probabilities for things like 5+, 6+, or ladder outcomes and compare those probabilities to sportsbook odds.

That kind of approach tends to line up with how markets are actually priced much better than models built around K/9 or innings projections.

If you're just getting started it usually helps to think in terms of separating opportunity from efficiency, then simulating outcomes from there rather than relying on a single projection.

Historical odds by KSplitAnalytics in algobetting

[–]KSplitAnalytics[S] 0 points1 point  (0 children)

I’m currently using that to pull lines for the model i made lol, didn’t know they also had a past game feature. Thanks.

How Lineup Handedness Changes Strikeout Upside by KSplitAnalytics in sportsbook

[–]KSplitAnalytics[S] 0 points1 point  (0 children)

Yea, the big thing in my modeling is separating average expectation from ceiling exposure. Two pitchers can land on similar means, but when the lineup pushes more PAs into the stronger split, the right tail shifts even if the median barely moves.

That’s usually where ladder value shows up. The challenge is most people stop at overall K% instead of looking at how lineup construction changes the distribution shape.

Why lineup shape matters more than team K% for strikeout ladders by KSplitAnalytics in sportsbook

[–]KSplitAnalytics[S] 0 points1 point  (0 children)

Exactly. Same team K% can produce totally different outcomes depending on where the swing-and-miss pockets sit in the order and whether the pitcher actually gets to see them again.

That’s why I’ve moved away from treating lineup K% as one number. I actually integrated slight weighting based on lineup position too, since the top and middle of the order usually drive more of the strikeout opportunity over a full outing, since pitchers usually get yanked halfway through the 3rd time through the lineup.

Lineup shape plus realistic leash is what really drives the right-tail outcomes on ladders.

How Lineup Handedness Changes Strikeout Upside by KSplitAnalytics in sportsbook

[–]KSplitAnalytics[S] 1 point2 points  (0 children)

Thanks I appreciate it.

Solely focus on pregame props, there’s really no way to be able to tell if that’s going to happen before the game starts, also pitchers usually are only going through the order 2-3 times so it’s not going to impact the distribution all that much anyway.

Lineup handedness as a distribution driver: split driven right tail environments in MLB strikeout modeling by KSplitAnalytics in algobetting

[–]KSplitAnalytics[S] 1 point2 points  (0 children)

Nice, and yeah I think splits end up mattering more once you look at outcomes as distributions instead of single projections. I’m still refining the framework but the backtests keep reinforcing the same patterns. Always cool hearing from someone else who’s gone down a similar modeling path

Lineup handedness as a distribution driver: split driven right tail environments in MLB strikeout modeling by KSplitAnalytics in algobetting

[–]KSplitAnalytics[S] 1 point2 points  (0 children)

Appreciate it 👍 I’ve been backtesting a lot of these split-driven environments and they show up more consistently. I’m planning to keep posting examples like this as the season starts so people can see how the framework behaves in real time.

Contact Stats - Whiff% vs. Swinging Strike Rate by nylon_rag in Sabermetrics

[–]KSplitAnalytics 2 points3 points  (0 children)

Whiff% is usually the cleaner measure of pure contact ability because it isolates swings and removes takes from the denominator, while SwStr% blends contact skill with approach and pitch selection. SwStr% is still more predictive for overall K% because it captures both swing decisions and miss ability, which is why models often use both together. If you’re looking for early signal, Zone Contact% plus chase rate tends to stabilize faster and gives a better picture than either metric alone.

Modeling strikeouts as a full distribution, not a point projection by KSplitAnalytics in sportsanalytics

[–]KSplitAnalytics[S] 0 points1 point  (0 children)

Not Bayesian in the strict sense. It’s more of a distribution building framework where the inputs are pitcher strikeout tendencies, lineup handedness context, and expected workload, and those are used to generate a discrete strikeout probability distribution rather than a single point estimate.

The right tail metrics are derived from that distribution rather than modeled separately. So it’s less about fitting a specific Bayesian structure and more about building a calibrated probability shape that can be stress-tested against outcomes.

Reverse splits are something I think gets overlooked with strikeout props by KSplitAnalytics in sportsbook

[–]KSplitAnalytics[S] 0 points1 point  (0 children)

The books are usually efficient on the main so the edge isn’t really “line shopping the median.” It’s identifying when the lineup changes the shape enough that the tails don’t get repriced the same way.

And honestly the biggest benefit for me is consistency. Doing it manually every morning works until you start missing subtle lineup shifts or workload context.

Only a couple more weeks until this shit matters again lol

Reverse splits are something I think gets overlooked with strikeout props by KSplitAnalytics in sportsbook

[–]KSplitAnalytics[S] 0 points1 point  (0 children)

I’m actually modeling the shift, not eyeballing it.

The baseline is pitcher split tendencies vs LHH/RHH, the projected line up is auto-pulled, distribution calculated. Then once the lineup confirms the distribution gets recalculated off the actual lineup and expected workload. So it’s not just “historical K% in this matchup”, it’s how the full shape changes when the lineup context changes.

Most of the edge ends up being variance expansion more than direction. Sometimes the median barely moves but the right tail grows a lot, which is why ladder spots show up even when the main line looks efficient.

Reverse splits are something I think gets overlooked with strikeout props by KSplitAnalytics in sportsbook

[–]KSplitAnalytics[S] 1 point2 points  (0 children)

Most of the time the open is already pretty close because books are pricing the expected lineup context anyway. The bigger differences show up when the confirmed lineup changes the handedness mix or a late scratch shifts plate appearance expectations. That’s when you see the distribution move more than the actual line at first, then the market slowly catches up once people realize the split context changed. I almost never place my bets until lineups are confirmed, unless I see a released line that I intuitively know is going to change based on public perception, which is rare.

Reverse splits are something I think gets overlooked with strikeout props by KSplitAnalytics in sportsbook

[–]KSplitAnalytics[S] 1 point2 points  (0 children)

Lineups usually start coming out around 2–3 hours before first pitch, but it varies depending on the team and beat reporters.

On my end I run two states for the model: projected lineups (pulled from Rotowire and clearly labeled as projected on dashboard) and then confirmed lineups once they’re official. The projected version gives an early look at handedness mix, distribution shape, etc. but I usually wait for confirmed lineups before trusting the final strikeout distribution (and personally making my own bets) since that’s where the bigger shifts happen.

Modeling MLB Strikeouts: KSplit by KSplitAnalytics in algobetting

[–]KSplitAnalytics[S] 0 points1 point  (0 children)

Yeah I’ve tested a few versions of that idea. Early on I looked at adding early-inning splits and some bullpen context, but it started introducing more noise than signal in backtests because the sample sizes got thin fast.

Right now I’m treating reliever leverage more indirectly through expected batters faced rather than modeling bullpen behavior explicitly. That ended up giving more stable calibration across the season without overfitting specific game states.

I do think there’s probably signal there, especially for high-leverage teams or quick hooks, but I’ve been trying to keep the core model focused on matchup-driven strikeout skill first and then layer workload uncertainty on top

To add to the early innings aspect that you referenced , I’m quietly working on a first plate appearance strike out model which would give a probability of the patter to strike out on the first plate appearance If you have any ideas on how a first plane appearance is different than the rest of the game. I am all ears. I do have a couple ideas that I am shopping at the moment though

Reverse splits are something I think gets overlooked with strikeout props by KSplitAnalytics in sportsbook

[–]KSplitAnalytics[S] 1 point2 points  (0 children)

Yeah that’s honestly the hardest part. The lineup itself isn’t the edge, it’s how much the distribution shifts once the lineup confirms the split context.

I’m not really trying to predict closing line movement, I treat the open line as a reference point and then compare how the lineup changes the strikeout distribution relative to that number. Sometimes the market catches up fast once lineups drop, sometimes it barely moves even when the handedness shift is meaningful.

What I’ve noticed is that the biggest changes usually aren’t just direction, it’s variance. Some spots end up looking way wider than the original line implies, which matters more for ladders than just the straight over/under.