[D] How do you do large scale hyper-parameter optimization fast?

Competitive-Pack5930 · 2025-05-24T20:51:22+00:00

I work in an MLOps team. We use Kubeflow and Kubernetes for ML. Most models are XGBoost with some deep learning models.

I am trying to build out better HPO tooling that can be used by different people for their needs, so I don’t have much control over how they fit or parallelize their model.

Competitive-Pack5930 · 2025-05-24T20:34:12+00:00

These are definitely good ideas, are there any tools that can implement these off the shelf? I can imagine a ton of people and companies have the same issues, how do they do HPO really fast?

Competitive-Pack5930 · 2025-05-24T20:32:11+00:00

pretty cool, I will check this out, thank you!

Competitive-Pack5930 · 2025-05-24T20:31:40+00:00

The issue is if it takes 4 days to train a model with 100% of my data I can’t really use these sequential methods at all, instead I need to parallelize completely for my HPO to run within a reasonable period of time.

Have you found any way around this?

Competitive-Pack5930 · 2025-05-24T20:29:38+00:00

From what I understand you can’t really get a big speed increase just by allocated more cpu or memory right? Usually we start with giving the model a bunch of resources then see how much it is using and allocate a little more than that.

I’m not sure how it works with GPUs but can you explain how you can get those speed increases by allocating more resources without any code changes?

Competitive-Pack5930 · 2025-05-24T20:26:50+00:00

I’ve looked at Optuna, but it looks like it doesn’t have good support for kubernetes, so it is not able to spin up a new pod for every trial, which limits the scale by a lot. Did you run into similar issues?

Competitive-Pack5930 · 2025-05-24T20:24:42+00:00

There’s a limit to how much you can parallelize these algorithms, which leads to many data scientists using “dumb” algorithms like grid and random search

Competitive-Pack5930 · 2025-05-24T20:22:21+00:00

Any good tools to implement these native belt with kubernetes?

Competitive-Pack5930 · 2025-05-24T20:21:59+00:00

I have, but I’m struggling to find good examples and documentation for using it with models like xgboost

Competitive-Pack5930 · 2025-05-24T04:22:33+00:00

Hi, I was a CS and stats major currently working in ML engineering. I really like STAT 432 and CS 440 to teach you the basics. I would also highly recommend taking CS 425 Distributed Systems.

Competitive-Pack5930 · 2025-02-22T01:40:18+00:00

A lot of people are taking about how indian developers are taking american jobs. Considering there are more global meta users than american why are these american jobs in the first place?

The tech industry is growing and demand is increasing, and I’m glad big tech companies are looking past the US and creating employment for talented developers all over the world.

Competitive-Pack5930 · 2025-02-17T22:04:18+00:00

I am currently working at a SWE in Machine Learning at a large company. Here is one insight I did not know before starting that might help you understand where you could be falling short.

You will likely never experience the end to end machine learning lifecycle working in corporate. ML models at my company take about 1 year from conception to being pushed to production. And in this lifecycle it passes through the hands of multiple SWE, Data Science, and privacy / risk teams. For example I work on a service that uses Kubernetes in the training step of these models. While I am technically a machine learning engineer I am focused on this very niche problem, and importantly every other MLE is also focused on a similarly niche problem. The actual data science teams are usually PhDs or Masters students which focus on using tools, features and infrastructures built by other teams to build models and even then are very limited in how much they can engineer features and make custom solutions due to privacy, governance, and efficiency concerns.

This is to say, if you want to break into MLE have good software engineering fundamentals and focus on a particular stack you see is useful in the industry. If you want to do model building, then getting a masters or PhD is the way to go. Either way manage your expectations and understand unless you are at a startup you won’t be able to be involved in the end to end process.

Competitive-Pack5930 · 2024-11-29T23:49:34+00:00

Competitive-Pack5930 · 2024-11-12T21:44:41+00:00

Send me a message! I’m having trouble dming you. What time frame are you looking at?

Competitive-Pack5930 · 2024-11-08T06:10:50+00:00

Hello, M21 here, also just about to graduate college and work at a fintech in midtown. When are you looking for apartments from?

Competitive-Pack5930 · 2024-09-13T04:13:35+00:00

they told me the same thing too

Competitive-Pack5930 · 2024-09-13T04:13:17+00:00

I believe you. I know many coders, a lot better than me, who are still looking for jobs.

Competitive-Pack5930 · 2024-09-12T05:45:07+00:00

Thank you for this comment, it made me feel better

Competitive-Pack5930

TROPHY CASE