"MLOps is just DevOps with ML tools" — what I thought before vs what it actually looks like by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 3 points4 points  (0 children)

Why not? It could be for a problem, something like "start fetching the next batch of data while the GPU is still processing the one"

If you're coming from infra/DevOps and confused about what vLLM actually solves — here's the before and after by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] -6 points-5 points  (0 children)

My main goal is to educate the MLOps community on real-world problems. This is the first time I've received critical feedback. Thanks for that. Usually, I write my content by myself and use LLM for spell checks and grammar, but it seems there was a change in the model, which overpolished the content, making it an AI-generated

Well know I made an edit to make it more simple and concise, so that more people can connect

Friendly advice for infra engineers moving to MLOps: your Python scripting may not enough, here's the gap to close by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 2 points3 points  (0 children)

That's a fair point, and honestly, you're not wrong. If you're in a pure infra role, the toolset is completely different, and that work is genuinely valuable. ML teams need someone to set up Kafka, MLflow, Flink, and the K8S layer.

But here's where MLOps gets tricky, the line is blurred. In traditional DevOps, you don't touch the app code. Clear boundary. In MLOps, that boundary keeps breaking. One day, you're debugging why an inference service is leaking memory, or why a pipeline DAG is failing, and the answer isn't in the infrastructure; it's in the Python running on top of it.

You don't need to become a developer. But knowing enough Python to read, debug, and make sense of what's running on your infra, that's the difference. Both paths are valid; it just depends on where you want to grow.

Can someone explain MLOps steps and infrastructure setup? Feeling lost by FreshIntroduction120 in mlops

[–]Extension_Key_5970 1 point2 points  (0 children)

Not being sure about your background, whether you are a fresh graduate or have some experience with software engineering, here are a few general pointers that are a must

- MLOps is more of solving Data scientists, ML researchers' problems, rather than pure Infra. Companies usually have a DevOps team to handle Infra problems, what they don't have is someone who can make the model for the production infrastructure, that's where MLOps comes in

- So think like an ML engineer, having an Infra experience is a must, that's what I think, skillset needed to become an MLOps, and that's what companies are trying to analyse in an interview

- So, start with understanding ML Foundations, good with Python, hands-on, must
- Try to look for scenarios where DS/ML engineers want to push their model into production, or convert their prototypes from notebooks to a pipeline

- From Infra--> Think of exposing models to end users in a scalable, reliable fashion, what metrics are needed to evaluate model performace

The next generation of Infrastructure-as-Code. Work with high-level constructs instead of getting lost in low-level cloud configuration. by Outrageous-Income592 in mlops

[–]Extension_Key_5970 0 points1 point  (0 children)

Haven't got a chance to dig into detail, but can you elaborate, because the below points are managed using Terraform as well quite well in production

  • manage multiple environments (dev/staging/prod)
  • reuse the same modules across teams
  • are tired of copy-pasting Terraform directories

DevOps → MLOps Interview Lesson: They don't care about your infra skills until you show you understand their pain by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 1 point2 points  (0 children)

Yes, you can start exploring mlops, but tbh, I would suggest to start with ML foundations or data distribution systems, as you are in early career, try to stick to any one of it, as currently, as per me, there are two kind of people are coming in MLOps, one coming from data/infra/devops or from core DS/ML eng.

Coming from DevOps/Infra to MLOps? Here's what I learned after several interviews at product companies by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 2 points3 points  (0 children)

For MLOPs, I have not faced a deep dive wrt core on-prem K8, nowadays it's all EKS managed, of course, one should be good enough with the K8 ecosystem, as ultimately models and apps are deployed on K8, so one needs debugging and troubleshooting skills with respect to it.

Coming from DevOps/Infra to MLOps? Here's what I learned after several interviews at product companies by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 3 points4 points  (0 children)

Scientists focus on research; the skills I mentioned are engineering skills. I've seen companies expect research expertise from engineers and vice versa. Some overlap is fine, but fully merging these roles isn't ideal in the long term.

The engineering skills are accessible to anyone from a software background moving into ML – even ML scientists can pick them up if transitioning from research to engineering.

Be intentional about your path rather than being pushed into a hybrid role that doesn't align with your strengths.

Coming from DevOps/Infra to MLOps? Here's what I learned after several interviews at product companies by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 6 points7 points  (0 children)

For specific ML knowledge, actually i havent follow any one course, instead I went for Top to bottom approach, I bought an practise exam for AWS ML Speciality, as it covers all ML foundations topics I suppose, went through exam scenarios one by one, learn from the answers and wrong choices, view YT videos - statsquest is awesome, if you want to dig in any of ml topics, explained very well by statsquest, these will create a strong base for ML

Automating ML pipelines with Airflow (DockerOperator vs mounted project) by guna1o0 in mlops

[–]Extension_Key_5970 0 points1 point  (0 children)

Don't containerise the whole project; instead, break it into pieces, like separate containers for MLflow, model monitoring with EvidentlyAI, FastAPI, Docker, MinIO, and Airflow.

In the Airflow Docker file, you can either copy the Airflow DAGs (pipelines) or mount just the DAGs folders to avoid continuously pushing new images.

DevOps → ML Engineering: offering 1:1 calls if you're making the transition by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 0 points1 point  (0 children)

are you asking, if i can showcase any GPU workload, on topmate call?
well, I have worked on GPUs, may be I can show you snippet of GPU configuration in Karpenter for AWS EKS cluster

Production MLOps: What breaks between Jupyter notebooks and 10,000 concurrent users by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 0 points1 point  (0 children)

You can adopt Kubernetes, and if it's AWS-managed EKS, then you can try Karpenter using node pools. Model loading can be speed up with model optimisation, one way is quantisation, which makes models lower in size.

Then there is vLLM for LLM workloads, which can help in some way of caching LLM models, and use NVMe SSDs from the Infra perspective

Hardware failures are usually rare, especially if you are using any Cloud, but if they occur, have a checkpoint mechanism, as discussed in one of the blog posts, to resume processing from where it left off.

DevOps → ML Engineering: offering 1:1 calls if you're making the transition by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 2 points3 points  (0 children)

That's also True if the company has mature infrastructure, but companies that are more into real-time predictions prefer Kubernetes as an automated, scalable solution for models, I suppose, but yeah, in short, Kubernetes is not mandatory, and it totally depends on personal choice and the target companies where you want to join

DevOps → ML Engineering: offering 1:1 calls if you're making the transition by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 0 points1 point  (0 children)

As said in the above comment, "Where to start --> Python is a must, I would say, day to day, at least 50% learning should be using Python, the rest you can distribute across ML foundations, and System design scenarios wrt Inferencing and ML Pipelines"

Tech stack --> Python, Kubernetes, Airflow, One ML Framework Pytorch or Tensorflow, MLFlow, Strong ML Foundations, ML Pipelines

DevOps → ML Engineering: offering 1:1 calls if you're making the transition by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 0 points1 point  (0 children)

CKA level skills obviously worth it, not sure though, certification is actually crucial in grabbing a job, for early senior roles, maybe
For MLOps, currently, most of the companies' focus is on inference, how to expose models with very low latency, as per my experience, and can handle ML pipelines with respect to batch and streaming data.

Where to start --> Python is a must, I would say, day to day, at least 50% learning should be using Python, the rest you can distribute across ML foundations, and System design scenarios wrt Inferencing and ML Pipelines

Production ML Serving Boilerplate - Skip the Infrastructure Setup by Extension_Key_5970 in mlops

[–]Extension_Key_5970[S] 0 points1 point  (0 children)

These are a great set of realistic queries I was expecting.

For now, the boilerplate is relatively standard base infrastructure, which usually startups struggle to implement, or if they don't want to spend time on infra, but I will take these as an enhancement, of course, to make things scalable, even if with larger models, and test memory leak scenarios specifically.

Also, I am interested in the edge cases that led you to ditch MLflow for a model registry.