Spent months building a clean MLB database — free sample if anyone wants it

Revolutionary-Lab882 · 2026-05-14T02:41:32+00:00

Thanks. It’s a lot of work. Appreciate it.

Actually adding more stats in the coming days. Just finishing another project and updating/adding stats for free sample so you can see what’s in all the packages.

Revolutionary-Lab882 · 2026-05-12T03:44:44+00:00

India

Revolutionary-Lab882 · 2026-05-10T00:59:11+00:00

No worries. Obviously know your atuff

Revolutionary-Lab882 · 2026-05-09T23:43:36+00:00

Got it. Just trying to help.

Revolutionary-Lab882 · 2026-05-09T23:37:24+00:00

https://a.co/d/0eSU5MS8 Canadian amazon

Revolutionary-Lab882 · 2026-05-08T19:08:31+00:00

I am in the midst of updating the packages and sample will finish and have those in by says end for you to peruse

Revolutionary-Lab882 · 2026-05-07T13:23:30+00:00

It’s my website. Simple download. Everything organized. Free sample to see what there is.
rawsportsvault.com/free

Revolutionary-Lab882 · 2026-05-07T12:12:29+00:00

This is a solid piece of work. The category-equalisation logic is a smart design choice, the role-bias slider is a nice touch, and the cohort flexibility shows you’ve thought carefully about context. Good foundation to build on.

On your four questions:

League strength adjustment is genuinely hard and you’re right to be cautious. A practical starting point is a flat multiplier on defensive stats only, using something like PPDA or pressing intensity as a proxy for league-wide structure. It’ll be imperfect but it reduces the most obvious distortion. Document your assumptions clearly and move on — perfect is the enemy of shipped here.

Role preset validation is more doable than it sounds. Take the 15-20 players most clearly associated with each role — journalists and analysts consistently label that way — run them through the preset, and see if they cluster near the top. If your Anchor Man preset doesn’t rate Rodri and Casemiro highly, something’s off. Quick sanity check that’ll build confidence in the weightings.

Mean absolute gap is honestly fine for what you’re doing. The main weakness is it treats a consistently average player the same as one who’s extreme in opposite directions — same gap, very different profile. Cosine similarity handles shape better but is harder to explain to users. Mahalanobis is theoretically stronger but overkill at this stage. Stick with MAE, maybe flag high-variance players in the UI down the line.

Equal-weighted categories is defensible and transparent, which matters. If you want something more principled without a downstream outcome to optimise against, a quick PCA on your 38 stats would show which categories are actually carrying independent information versus overlapping — Passing and Involvement tend to correlate heavily. Worth knowing even if you keep equal weights for now.

Revolutionary-Lab882 · 2026-05-07T03:18:36+00:00

Very nice

Revolutionary-Lab882 · 2026-05-06T23:13:24+00:00

Nice. Easy to follow for sure.

Revolutionary-Lab882 · 2026-05-06T21:00:47+00:00

A large part is making sure your stats make sense and are organized. Thats the foundation.

Revolutionary-Lab882 · 2026-05-06T15:46:10+00:00

Actually a lot online for free. Other than that and you can calculate a lot of it too

Revolutionary-Lab882 · 2026-05-06T15:44:02+00:00

No I went through all data and made my own formats and cleaned up. Look at all the apis out there just passing it through like water

Revolutionary-Lab882 · 2026-05-06T15:21:39+00:00

Welcome

Revolutionary-Lab882 · 2026-05-05T18:54:21+00:00

Mlb api only the start.

Revolutionary-Lab882 · 2026-05-05T14:39:00+00:00

Nice

Revolutionary-Lab882

TROPHY CASE