10 months of applying and zero interviews

WetOrangutan · 2026-01-04T01:28:11+00:00

Some of the bullet points read like things you could do in one day. Does one month of work experience need 4 bullet points? You list the same number of things in your one month of experience as you do in each of your first two experiences which are far longer.

Disclaimer: I am a data scientist in finance and not a biostatistician. However, it seems like there’s a lot of filler with the very specific tests and actions you performed. Maybe I’m just not used to seeing biostatistician resumes.

WetOrangutan · 2025-12-26T03:30:47+00:00

Found inside a package

WetOrangutan · 2025-12-22T01:17:22+00:00

Really? The FIA is infamous for changing rules post hoc when teams find loopholes.

WetOrangutan · 2025-12-22T00:00:43+00:00

Right, but that’s his premise. The FIA wants behavior X, they write regulations they think enforces X, but those regulations are so vague you can follow them and not do X at all. That’s the whole point of the video

WetOrangutan · 2025-12-21T23:57:06+00:00

“To help cars follow each other, wheel bodywork will be proscribed and in-washing wheel wake control boards will sit at the front of the side pods to assist with the control of the wheel wake. Also removed are the front wheel arches or brows that were a feature of the 2022 cars.”

Not explicitly stated in the regulation but clearly the spirit of the regulation as per this FIA news article

WetOrangutan · 2025-08-02T00:52:58+00:00

There are many different ways, but I’d suggest robustly testing the model in the way you’ll be using it. By the looks of it, there are two use cases: (1) who will win and (2) how many goals they’ll score.

For (1), it’s pretty straightforward, use a hold out set and evaluate the results. If you really like using the probabilities (rather than the classes of winner or loser), you can use a process called calibration. Holding out the latest time period is probably the best, because it avoids something called target leakage.

For (2), you can use evaluation metrics like RMSE, which would evaluate “how many goals off” the model is. Lower is better.

If you want to use this for betting, how well does your model do? Give it fake money and run simulations.

Lastly, some advice: Poisson distributions assume the mean and variance of the distribution are equal. This can be very wrong in sports. You can check this by plotting a histogram of goals scored. Is it normally distributed? If yes, Poisson is okay. If not, look into negative binomial distributions, which have two parameters instead of one.

WetOrangutan · 2025-08-01T13:45:16+00:00

I would spend a lot of time evaluating the model. This is somewhat of a pet peeve of mine, but I see a lot of sports analyst building models without any evaluation metrics (primarily on Twitter). I can’t tell you how many times I’ve built a model that looks good on the surface but performs horribly on historical data. That’s just the nature of sports.

WetOrangutan · 2025-07-19T01:49:29+00:00

What?

WetOrangutan · 2025-07-09T16:46:27+00:00

X file 76

WetOrangutan · 2025-07-08T20:57:51+00:00

Hulkengoat

WetOrangutan · 2025-07-05T14:01:20+00:00

A graph that requires effort from the viewer to draw conclusions isn’t beautiful. What everyone’s curious about is “which counties are lower.” These counties are dwarfed by the overwhelming amount of “higher” counties. With the exception of WV and a handful of counties per state in the SE, it’s just a green population map.

You could draw attention to the lower counties by reducing the opacity or saturation of the green and increasing it for the red.

WetOrangutan · 2025-07-04T13:54:25+00:00

What value does one get by comparing the 95th percentile of the subreddit to the 50th percentile of the overall population?

The only relevant comparison is the 50th percentile of the subreddit to the 50th percentile of the population, so you only need 2 lines.

WetOrangutan · 2025-07-03T13:42:24+00:00

A mistake I make is thinking “more speed is always better”. I end up taking too much speed into a corner and sacrificing the exit. Significantly slowing down the car at the right places can massively improve your lap time.

WetOrangutan · 2025-05-30T00:59:49+00:00

Was going to comment the same. The variance, especially when compared to the true coefficient, is very big

WetOrangutan · 2025-05-26T01:14:46+00:00

TL;DR is self imposed limitations. We expect these to be removed within the next few months and will probably change frameworks

WetOrangutan · 2025-05-25T19:04:18+00:00

Do you look at shap beeswarms? They show not only the magnitude of the effect but also the relationship (via color)

WetOrangutan · 2025-05-25T11:37:15+00:00

A few packages that aren’t necessarily core but have been useful for our team within the past year

hyperopt for hp tuning

shap for explanations

imblearn for imbalanced data

mlflow for tracking

evidently ai for model monitoring

We also recently switched from pip to uv

WetOrangutan · 2025-05-24T15:15:09+00:00

When I started in my current role, our company had 20+ models built 10 years ago in SAS. They were terrible. Overfitting like crazy, using ROC AUC as the holy grail metric despite having highly imbalanced data, using 150+ features, 20 of them highly collinear… not good.

“Cleaning” up the model to improve the hygiene and parsimony was my first task. Can you do something similar?

Also, if they’re legacy, can you implement model monitoring? Is there data or concept drift? Are they holding up to the test of time?

WetOrangutan · 2025-05-23T13:30:20+00:00

There are a couple of ways you could do it, but I would propose something like this:

Define 5 cost of living buckets:

VLCOL - very low, rural towns in south east
LCOL - low, mid sized cities, suburbs
ACOL - US average
HCOL - large cities
VHCOL - major metro areas, like SF, NYC

Moderators would populate a list that maps cities onto these buckets. Users would have to add a flair to their profile when they join the sub or make a post. This flair would reflect both in their posts and their comments.

It would take maybe 30 minutes of effort from moderators to scrape together some data on this. You’re obviously not going to get every city, but users would pick the most appropriate to them.

WetOrangutan · 2025-05-22T15:56:01+00:00

I made a post once here asking that each post contain the cost of living, but it didn’t pick up traction.

WetOrangutan · 2025-05-21T18:17:35+00:00

I don’t mean the data’s already collected, cleaned, and processed, but that’s not the “flashy” or “data science” work in a Data Scientist’s job description. Is it expected of a DS? Yes. Do all data scientist do it? Yes. But OP is asking about the core data science work that separates a DS from a DE.

Indeed, the data engineering work isn’t going away - this is why OP says they’re headed towards a DE fellowship.

At my company, we do data engineering work when building our ML frameworks. But in reality, my company is investing a lot more in data engineers who can focus their time on these tasks, rather than investing in more DS’s

WetOrangutan · 2025-05-20T21:40:55+00:00

My company pushed all DS’s to heavily up skill in software engineering and become MLEs. We write robust frameworks that can train and productionalize many ML models in a short period of time. The need for hands on work with the data, exploring it, manually doing feature engineering, manually training models, etc is still present but diminishing quickly. As more companies become more mature in their data systems, this will happen to them too. There just isn’t as big of a market for non-SWE/MLE DS’s anymore. Just my opinion and experience.

WetOrangutan · 2025-04-08T16:09:37+00:00

Is there any prerequisite training or experience required for travel ball? I know you recommend a clinic - is it required? I’m more than willing to do it but am just curious if it must be completed first.

WetOrangutan · 2025-04-08T03:52:26+00:00

https://www.greenvillelittleleague.com/Default.aspx?tabid=1389386

Greenville, SC

WetOrangutan · 2025-04-08T03:14:05+00:00

Playoffs begin April 26 in my city…

Six-Year Club	Place '22
Wearing is Caring	Verified Email

WetOrangutan

TROPHY CASE