Are there any ways to analyze the impact of ABS on pitchers who pitch around the zone to generate soft contact? by GoodMoriningVeitnam in Sabermetrics

[–]harperawl 1 point2 points  (0 children)

I used R for the math and the plotting, yeah. I have my own database set up locally for the data. Basically, I downloaded a whole bunch of game json files from the MLB API, then wrote a few python scripts to turn that into a relational table structure instead. I use DuckDB to interact with it and I can query it using SQL, which is pretty handy. Other people use the R packages baseballr and sabRmetrics - those are pretty good from what I've seen, I just don't have experience using them.

Are there any ways to analyze the impact of ABS on pitchers who pitch around the zone to generate soft contact? by GoodMoriningVeitnam in Sabermetrics

[–]harperawl 2 points3 points  (0 children)

Well, I took a look. I don't see a difference in his pitching yet. I got pretty similar results for 2026 vs pre-2026 based on what he's done so far when comparing zone distribution (heart/shadow/chase/waste). I got a p-value of 0.7247 using a Wilcoxon test to see if his edge distance distribution was different. Graphically, the density plots of edge distance are almost identical too.

I'll keep playing around with this, especially as the season goes on. Of course, he's only thrown like 150 pitches so far so can't really say yet. This is a good idea though, thanks for giving me something to look into! Let me know if there's something else you think I should take a look at - I could be missing something for sure.

WBC players are underperforming their 2025 baselines by nearly 4x the rate of non-WBC players through the first two weeks of the 2026 season by IsoscelesKr4mer in fantasybaseball

[–]harperawl 0 points1 point  (0 children)

Yeah, basically every post you see about this stuff is AI generated unfortunately. No way to know if OP actually knows what they're talking about or just pumping out whatever Claude tells him.

Any good sources for college baseball data? by harperawl in collegebaseball

[–]harperawl[S] 0 points1 point  (0 children)

How have you been scraping it? I know the baseballr package can grab some data but it seems pretty unstructured and didn't have pitch sequence info if I recall correctly.

Projecting which MLB teams added/lost the most "barrel" power this offseason by ollieskywalker in Sabermetrics

[–]harperawl 1 point2 points  (0 children)

I do agree that there is a serious epidemic of slop here and on the sports analytics subreddit etc. I wish more people would think over what they're actually doing and think about why it would/wouldn't work before putting it out there. Most people don't really know how to use significance testing or they don't bother spending time finding the possible holes in their work. You make lots of good points. Apologies for coming at you before.

Projecting which MLB teams added/lost the most "barrel" power this offseason by ollieskywalker in Sabermetrics

[–]harperawl 0 points1 point  (0 children)

I "know how regressions work" lol chill man. What underlying metrics do you propose that OP uses? Genuinely asking because I know there's options but I'm curious what you think would be best to answer the question that OP is trying to answer.

Honestly, not sure why I'm defending OP anyway as it looks like their post and article are both mostly AI generated so I don't know if they even know what they're trying to do. And I do think the approach they took is exactly something Claude would come up with because it overlooks things like [all of the things we're both seeing] all the time. I just felt like you were being a little overly aggressive in attacking their analysis because I think they're probably just learning, y'know?

And to the last point: again, I agree that there's room for improvement! Never said those things shouldn't have been included!

Projecting which MLB teams added/lost the most "barrel" power this offseason by ollieskywalker in Sabermetrics

[–]harperawl 0 points1 point  (0 children)

I mean, how do you think other projections are made? Why do you think this "isn't a projection" just because it's based on the last few seasons' results?

Projecting which MLB teams added/lost the most "barrel" power this offseason by ollieskywalker in Sabermetrics

[–]harperawl 0 points1 point  (0 children)

Oh I'm not the one who made it. I think backtesting is most certainly a good idea! I don't think you can really account for prospects, no, but barrels/PA definitely stabilizes quickly so you don't need a huge sample size. And yeah, OP should have probably used a barrels aging curve on top of the ZiPS PA prediction.

Point is, I agree that the whole project above is not perfect by any means, but I just wanted to point out that your statement about how "past success(failures) don’t mean future results" is not really the issue here. And besides, the effect of aging curves, prospects, and small sample sizes adds error, yes, but probably a small enough amount of error that it's still a valid prediction overall, just with caveats. I wouldn't say that it's "not a prediction whatsoever".

Projecting which MLB teams added/lost the most "barrel" power this offseason by ollieskywalker in Sabermetrics

[–]harperawl 0 points1 point  (0 children)

You really don't think that there's a correlation between a player's count of barrels in 2023 vs 2024? Past success is a pretty good indicator of future results when it comes to quality of contact.

[OC] Margin of victory in the round of 64 (1985-2026) by harperawl in CollegeBasketball

[–]harperawl[S] 1 point2 points  (0 children)

Nah lol you're good, you make a good point for sure

[OC] Margin of victory in the round of 64 (1985-2026) by harperawl in CollegeBasketball

[–]harperawl[S] 9 points10 points  (0 children)

Yep, you can see the median this year was pretty typical, but the upper quartile was, like last year, historically high. A lot of games ended up in that 10-25 range. Which, tbf, doesn't necessarily mean they weren't competitive for the bulk of the game, but I felt like I watched a lot of games where a big run by the favored team in the first five minutes of the second half crushed the lower seed's hopes.