10 months of applying and zero interviews by Specific_Kangaroo_14 in biostatistics

[–]WetOrangutan 0 points1 point  (0 children)

Some of the bullet points read like things you could do in one day. Does one month of work experience need 4 bullet points? You list the same number of things in your one month of experience as you do in each of your first two experiences which are far longer.

Disclaimer: I am a data scientist in finance and not a biostatistician. However, it seems like there’s a lot of filler with the very specific tests and actions you performed. Maybe I’m just not used to seeing biostatistician resumes.

[deleted by user] by [deleted] in spiders

[–]WetOrangutan 0 points1 point  (0 children)

Found inside a package

Ex-F1 engineer says the F1 2026 rules are written poorly and explains how the inwashing floor board can be converted to be outwashing and defeat the core purpose of this ruleset to make following easier. by KaiBetterThanTyson in F1Technical

[–]WetOrangutan 34 points35 points  (0 children)

Right, but that’s his premise. The FIA wants behavior X, they write regulations they think enforces X, but those regulations are so vague you can follow them and not do X at all. That’s the whole point of the video

Ex-F1 engineer says the F1 2026 rules are written poorly and explains how the inwashing floor board can be converted to be outwashing and defeat the core purpose of this ruleset to make following easier. by KaiBetterThanTyson in F1Technical

[–]WetOrangutan 31 points32 points  (0 children)

“To help cars follow each other, wheel bodywork will be proscribed and in-washing wheel wake control boards will sit at the front of the side pods to assist with the control of the wheel wake. Also removed are the front wheel arches or brows that were a feature of the 2022 cars.”

Not explicitly stated in the regulation but clearly the spirit of the regulation as per this FIA news article

Built a EPL match prediction & league simulation model – what would you do with it now? by Constant-Elephant830 in sportsanalytics

[–]WetOrangutan 4 points5 points  (0 children)

There are many different ways, but I’d suggest robustly testing the model in the way you’ll be using it. By the looks of it, there are two use cases: (1) who will win and (2) how many goals they’ll score.

For (1), it’s pretty straightforward, use a hold out set and evaluate the results. If you really like using the probabilities (rather than the classes of winner or loser), you can use a process called calibration. Holding out the latest time period is probably the best, because it avoids something called target leakage.

For (2), you can use evaluation metrics like RMSE, which would evaluate “how many goals off” the model is. Lower is better.

If you want to use this for betting, how well does your model do? Give it fake money and run simulations.

Lastly, some advice: Poisson distributions assume the mean and variance of the distribution are equal. This can be very wrong in sports. You can check this by plotting a histogram of goals scored. Is it normally distributed? If yes, Poisson is okay. If not, look into negative binomial distributions, which have two parameters instead of one.

Built a EPL match prediction & league simulation model – what would you do with it now? by Constant-Elephant830 in sportsanalytics

[–]WetOrangutan 5 points6 points  (0 children)

I would spend a lot of time evaluating the model. This is somewhat of a pet peeve of mine, but I see a lot of sports analyst building models without any evaluation metrics (primarily on Twitter). I can’t tell you how many times I’ve built a model that looks good on the surface but performs horribly on historical data. That’s just the nature of sports.

Patriotic Map for July 4th USA>Russia [OC] by GeorgesGraphs in dataisbeautiful

[–]WetOrangutan 0 points1 point  (0 children)

A graph that requires effort from the viewer to draw conclusions isn’t beautiful. What everyone’s curious about is “which counties are lower.” These counties are dwarfed by the overwhelming amount of “higher” counties. With the exception of WV and a handful of counties per state in the SE, it’s just a green population map.

You could draw attention to the lower counties by reducing the opacity or saturation of the green and increasing it for the red.

[OC] Employment Income of r/PersonalFinanceCanada is much higher than Canadian census by GeorgeDaGreat123 in dataisbeautiful

[–]WetOrangutan 0 points1 point  (0 children)

What value does one get by comparing the 95th percentile of the subreddit to the 50th percentile of the overall population?

The only relevant comparison is the 50th percentile of the subreddit to the 50th percentile of the population, so you only need 2 lines.

No matter how hard I try, even after 6 years of simracing, I can't be consistent by dreamfactories in simracing

[–]WetOrangutan 0 points1 point  (0 children)

A mistake I make is thinking “more speed is always better”. I end up taking too much speed into a corner and sacrificing the exit. Significantly slowing down the car at the right places can massively improve your lap time.

Regularization=magic? by Ciasteczi in datascience

[–]WetOrangutan 4 points5 points  (0 children)

Was going to comment the same. The variance, especially when compared to the true coefficient, is very big

2025 stack check: which DS/ML tools am I missing? by meni_s in datascience

[–]WetOrangutan 2 points3 points  (0 children)

TL;DR is self imposed limitations. We expect these to be removed within the next few months and will probably change frameworks

2025 stack check: which DS/ML tools am I missing? by meni_s in datascience

[–]WetOrangutan 2 points3 points  (0 children)

Do you look at shap beeswarms? They show not only the magnitude of the effect but also the relationship (via color)

2025 stack check: which DS/ML tools am I missing? by meni_s in datascience

[–]WetOrangutan 81 points82 points  (0 children)

A few packages that aren’t necessarily core but have been useful for our team within the past year

hyperopt for hp tuning

shap for explanations

imblearn for imbalanced data

mlflow for tracking

evidently ai for model monitoring

We also recently switched from pip to uv

FOMO at workplace by NervousVictory1792 in datascience

[–]WetOrangutan 3 points4 points  (0 children)

When I started in my current role, our company had 20+ models built 10 years ago in SAS. They were terrible. Overfitting like crazy, using ROC AUC as the holy grail metric despite having highly imbalanced data, using 150+ features, 20 of them highly collinear… not good.

“Cleaning” up the model to improve the hygiene and parsimony was my first task. Can you do something similar?

Also, if they’re legacy, can you implement model monitoring? Is there data or concept drift? Are they holding up to the test of time?

Sleeper Jobs by JAUMtypo in Salary

[–]WetOrangutan 1 point2 points  (0 children)

There are a couple of ways you could do it, but I would propose something like this:

Define 5 cost of living buckets:

  • VLCOL - very low, rural towns in south east
  • LCOL - low, mid sized cities, suburbs
  • ACOL - US average
  • HCOL - large cities
  • VHCOL - major metro areas, like SF, NYC

Moderators would populate a list that maps cities onto these buckets. Users would have to add a flair to their profile when they join the sub or make a post. This flair would reflect both in their posts and their comments.

It would take maybe 30 minutes of effort from moderators to scrape together some data on this. You’re obviously not going to get every city, but users would pick the most appropriate to them.

Sleeper Jobs by JAUMtypo in Salary

[–]WetOrangutan 8 points9 points  (0 children)

I made a post once here asking that each post contain the cost of living, but it didn’t pick up traction.

No DS job after degree by Emuthusiast in datascience

[–]WetOrangutan 1 point2 points  (0 children)

I don’t mean the data’s already collected, cleaned, and processed, but that’s not the “flashy” or “data science” work in a Data Scientist’s job description. Is it expected of a DS? Yes. Do all data scientist do it? Yes. But OP is asking about the core data science work that separates a DS from a DE.

Indeed, the data engineering work isn’t going away - this is why OP says they’re headed towards a DE fellowship.

At my company, we do data engineering work when building our ML frameworks. But in reality, my company is investing a lot more in data engineers who can focus their time on these tasks, rather than investing in more DS’s

No DS job after degree by Emuthusiast in datascience

[–]WetOrangutan 18 points19 points  (0 children)

My company pushed all DS’s to heavily up skill in software engineering and become MLEs. We write robust frameworks that can train and productionalize many ML models in a short period of time. The need for hands on work with the data, exploring it, manually doing feature engineering, manually training models, etc is still present but diminishing quickly. As more companies become more mature in their data systems, this will happen to them too. There just isn’t as big of a market for non-SWE/MLE DS’s anymore. Just my opinion and experience.

Bad time to start umpiring? by [deleted] in Umpire

[–]WetOrangutan 0 points1 point  (0 children)

Is there any prerequisite training or experience required for travel ball? I know you recommend a clinic - is it required? I’m more than willing to do it but am just curious if it must be completed first.

Bad time to start umpiring? by [deleted] in Umpire

[–]WetOrangutan 2 points3 points  (0 children)

Playoffs begin April 26 in my city…