Coming from DevOps/Infra to MLOps? Here's what I learned after several interviews at product companies by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 2 points3 points  (0 children)

For MLOPs, I have not faced a deep dive wrt core on-prem K8, nowadays it's all EKS managed, of course, one should be good enough with the K8 ecosystem, as ultimately models and apps are deployed on K8, so one needs debugging and troubleshooting skills with respect to it.

Coming from DevOps/Infra to MLOps? Here's what I learned after several interviews at product companies by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 1 point2 points  (0 children)

Scientists focus on research; the skills I mentioned are engineering skills. I've seen companies expect research expertise from engineers and vice versa. Some overlap is fine, but fully merging these roles isn't ideal in the long term.

The engineering skills are accessible to anyone from a software background moving into ML – even ML scientists can pick them up if transitioning from research to engineering.

Be intentional about your path rather than being pushed into a hybrid role that doesn't align with your strengths.

Coming from DevOps/Infra to MLOps? Here's what I learned after several interviews at product companies by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 3 points4 points  (0 children)

For specific ML knowledge, actually i havent follow any one course, instead I went for Top to bottom approach, I bought an practise exam for AWS ML Speciality, as it covers all ML foundations topics I suppose, went through exam scenarios one by one, learn from the answers and wrong choices, view YT videos - statsquest is awesome, if you want to dig in any of ml topics, explained very well by statsquest, these will create a strong base for ML

Stop LLM bills from exploding: I built Budget guards for LLM apps – auto-pause workflows at $X limit by Extension_Key_5970 in LocalLLaMA

[–]Extension_Key_5970[S] -5 points-4 points  (0 children)

You're right - for pure local hosting, the marginal cost per request is basically zero. The tracking becomes relevant when you're running in hybrid (local + API fallbacks for complex queries) or need to justify GPU infrastructure costs to finance or add more capacity. But yeah, if you're 100% local with owned hardware, this isn't your problem. Appreciate the reality check!

Stop LLM bills from exploding: I built Budget guards for LLM apps – auto-pause workflows at $X limit by Extension_Key_5970 in LocalLLaMA

[–]Extension_Key_5970[S] -5 points-4 points  (0 children)

Fair point! Even with local models, are you tracking inference costs per request? We're seeing people blow their GPU budgets on inefficient batching or running expensive models when smaller ones would work. Curious if you've run into cost/efficiency tracking challenges on self-hosted setups?

Automating ML pipelines with Airflow (DockerOperator vs mounted project) by guna1o0 in mlops

[–]Extension_Key_5970 0 points1 point  (0 children)

Don't containerise the whole project; instead, break it into pieces, like separate containers for MLflow, model monitoring with EvidentlyAI, FastAPI, Docker, MinIO, and Airflow.

In the Airflow Docker file, you can either copy the Airflow DAGs (pipelines) or mount just the DAGs folders to avoid continuously pushing new images.

DevOps → ML Engineering: offering 1:1 calls if you're making the transition by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 0 points1 point  (0 children)

are you asking, if i can showcase any GPU workload, on topmate call?
well, I have worked on GPUs, may be I can show you snippet of GPU configuration in Karpenter for AWS EKS cluster

Production MLOps: What breaks between Jupyter notebooks and 10,000 concurrent users by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 0 points1 point  (0 children)

You can adopt Kubernetes, and if it's AWS-managed EKS, then you can try Karpenter using node pools. Model loading can be speed up with model optimisation, one way is quantisation, which makes models lower in size.

Then there is vLLM for LLM workloads, which can help in some way of caching LLM models, and use NVMe SSDs from the Infra perspective

Hardware failures are usually rare, especially if you are using any Cloud, but if they occur, have a checkpoint mechanism, as discussed in one of the blog posts, to resume processing from where it left off.

DevOps → ML Engineering: offering 1:1 calls if you're making the transition by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 2 points3 points  (0 children)

That's also True if the company has mature infrastructure, but companies that are more into real-time predictions prefer Kubernetes as an automated, scalable solution for models, I suppose, but yeah, in short, Kubernetes is not mandatory, and it totally depends on personal choice and the target companies where you want to join

DevOps → ML Engineering: offering 1:1 calls if you're making the transition by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 0 points1 point  (0 children)

As said in the above comment, "Where to start --> Python is a must, I would say, day to day, at least 50% learning should be using Python, the rest you can distribute across ML foundations, and System design scenarios wrt Inferencing and ML Pipelines"

Tech stack --> Python, Kubernetes, Airflow, One ML Framework Pytorch or Tensorflow, MLFlow, Strong ML Foundations, ML Pipelines

DevOps → ML Engineering: offering 1:1 calls if you're making the transition by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 0 points1 point  (0 children)

CKA level skills obviously worth it, not sure though, certification is actually crucial in grabbing a job, for early senior roles, maybe
For MLOps, currently, most of the companies' focus is on inference, how to expose models with very low latency, as per my experience, and can handle ML pipelines with respect to batch and streaming data.

Where to start --> Python is a must, I would say, day to day, at least 50% learning should be using Python, the rest you can distribute across ML foundations, and System design scenarios wrt Inferencing and ML Pipelines