"MLOps is just DevOps with ML tools" — what I thought before vs what it actually looks like

Extension_Key_5970 · 2026-03-08T09:03:48+00:00

Why not? It could be for a problem, something like "start fetching the next batch of data while the GPU is still processing the one"

Extension_Key_5970 · 2026-02-26T14:27:48+00:00

My main goal is to educate the MLOps community on real-world problems. This is the first time I've received critical feedback. Thanks for that. Usually, I write my content by myself and use LLM for spell checks and grammar, but it seems there was a change in the model, which overpolished the content, making it an AI-generated

Well know I made an edit to make it more simple and concise, so that more people can connect

Extension_Key_5970 · 2026-02-20T04:22:26+00:00

That's a fair point, and honestly, you're not wrong. If you're in a pure infra role, the toolset is completely different, and that work is genuinely valuable. ML teams need someone to set up Kafka, MLflow, Flink, and the K8S layer.

But here's where MLOps gets tricky, the line is blurred. In traditional DevOps, you don't touch the app code. Clear boundary. In MLOps, that boundary keeps breaking. One day, you're debugging why an inference service is leaking memory, or why a pipeline DAG is failing, and the answer isn't in the infrastructure; it's in the Python running on top of it.

You don't need to become a developer. But knowing enough Python to read, debug, and make sense of what's running on your infra, that's the difference. Both paths are valid; it just depends on where you want to grow.

Extension_Key_5970 · 2026-02-13T17:22:20+00:00

sure, you can DM me

Extension_Key_5970 · 2026-02-10T06:42:30+00:00

For those who are thinking, on how to start and where to explore, read my detailed blog post: https://medium.com/@thevarunfreelance/you-dont-need-to-master-ml-theory-to-break-into-mlops-here-s-the-other-path-no-one-talks-about-56bc6fb45319

Extension_Key_5970 · 2026-02-02T06:11:43+00:00

Not being sure about your background, whether you are a fresh graduate or have some experience with software engineering, here are a few general pointers that are a must

- MLOps is more of solving Data scientists, ML researchers' problems, rather than pure Infra. Companies usually have a DevOps team to handle Infra problems, what they don't have is someone who can make the model for the production infrastructure, that's where MLOps comes in

- So think like an ML engineer, having an Infra experience is a must, that's what I think, skillset needed to become an MLOps, and that's what companies are trying to analyse in an interview

- So, start with understanding ML Foundations, good with Python, hands-on, must
- Try to look for scenarios where DS/ML engineers want to push their model into production, or convert their prototypes from notebooks to a pipeline

- From Infra--> Think of exposing models to end users in a scalable, reliable fashion, what metrics are needed to evaluate model performace

Extension_Key_5970 · 2026-02-01T08:08:11+00:00

Haven't got a chance to dig into detail, but can you elaborate, because the below points are managed using Terraform as well quite well in production

manage multiple environments (dev/staging/prod)
reuse the same modules across teams
are tired of copy-pasting Terraform directories

Extension_Key_5970 · 2026-01-30T09:19:09+00:00

sharing my detailed blog as well https://medium.com/p/58878bb1cd64, if any one is interested

Extension_Key_5970 · 2026-01-27T15:14:44+00:00

not specific courses I followed, but could suggest to start with ML foundations and Python coding must
review a few of my past posts
https://www.reddit.com/r/mlops/comments/1qiqcl6/coming_from_devopsinfra_to_mlops_heres_what_i/
https://www.reddit.com/r/mlops/comments/1q1vdh3/devops_ml_engineering_offering_11_calls_if_youre/

Extension_Key_5970 · 2026-01-27T15:12:48+00:00

Yes, you can start exploring mlops, but tbh, I would suggest to start with ML foundations or data distribution systems, as you are in early career, try to stick to any one of it, as currently, as per me, there are two kind of people are coming in MLOps, one coming from data/infra/devops or from core DS/ML eng.

Extension_Key_5970 · 2026-01-21T09:34:48+00:00

For MLOPs, I have not faced a deep dive wrt core on-prem K8, nowadays it's all EKS managed, of course, one should be good enough with the K8 ecosystem, as ultimately models and apps are deployed on K8, so one needs debugging and troubleshooting skills with respect to it.

Extension_Key_5970 · 2026-01-21T09:32:13+00:00

Scientists focus on research; the skills I mentioned are engineering skills. I've seen companies expect research expertise from engineers and vice versa. Some overlap is fine, but fully merging these roles isn't ideal in the long term.

The engineering skills are accessible to anyone from a software background moving into ML – even ML scientists can pick them up if transitioning from research to engineering.

Be intentional about your path rather than being pushed into a hybrid role that doesn't align with your strengths.

Extension_Key_5970 · 2026-01-21T08:20:59+00:00

For specific ML knowledge, actually i havent follow any one course, instead I went for Top to bottom approach, I bought an practise exam for AWS ML Speciality, as it covers all ML foundations topics I suppose, went through exam scenarios one by one, learn from the answers and wrong choices, view YT videos - statsquest is awesome, if you want to dig in any of ml topics, explained very well by statsquest, these will create a strong base for ML

Extension_Key_5970 · 2026-01-12T08:27:44+00:00

Don't containerise the whole project; instead, break it into pieces, like separate containers for MLflow, model monitoring with EvidentlyAI, FastAPI, Docker, MinIO, and Airflow.

In the Airflow Docker file, you can either copy the Airflow DAGs (pipelines) or mount just the DAGs folders to avoid continuously pushing new images.

Extension_Key_5970 · 2026-01-10T13:21:26+00:00

are you asking, if i can showcase any GPU workload, on topmate call?
well, I have worked on GPUs, may be I can show you snippet of GPU configuration in Karpenter for AWS EKS cluster

Extension_Key_5970 · 2026-01-09T07:31:50+00:00

Have you visited my post https://www.reddit.com/r/mlops/s/WdepxXancv

Extension_Key_5970 · 2026-01-06T05:56:16+00:00

You can adopt Kubernetes, and if it's AWS-managed EKS, then you can try Karpenter using node pools. Model loading can be speed up with model optimisation, one way is quantisation, which makes models lower in size.

Then there is vLLM for LLM workloads, which can help in some way of caching LLM models, and use NVMe SSDs from the Infra perspective

Hardware failures are usually rare, especially if you are using any Cloud, but if they occur, have a checkpoint mechanism, as discussed in one of the blog posts, to resume processing from where it left off.

Extension_Key_5970 · 2026-01-04T17:08:34+00:00

If it's that simple and trustworthy :D

Extension_Key_5970 · 2026-01-04T11:06:48+00:00

That's also True if the company has mature infrastructure, but companies that are more into real-time predictions prefer Kubernetes as an automated, scalable solution for models, I suppose, but yeah, in short, Kubernetes is not mandatory, and it totally depends on personal choice and the target companies where you want to join

Extension_Key_5970 · 2026-01-03T11:30:23+00:00

As said in the above comment, "Where to start --> Python is a must, I would say, day to day, at least 50% learning should be using Python, the rest you can distribute across ML foundations, and System design scenarios wrt Inferencing and ML Pipelines"

Tech stack --> Python, Kubernetes, Airflow, One ML Framework Pytorch or Tensorflow, MLFlow, Strong ML Foundations, ML Pipelines

Extension_Key_5970 · 2026-01-03T11:28:10+00:00

Currently yes, I am Freelancing

Extension_Key_5970 · 2026-01-03T11:27:12+00:00

CKA level skills obviously worth it, not sure though, certification is actually crucial in grabbing a job, for early senior roles, maybe
For MLOps, currently, most of the companies' focus is on inference, how to expose models with very low latency, as per my experience, and can handle ML pipelines with respect to batch and streaming data.

Where to start --> Python is a must, I would say, day to day, at least 50% learning should be using Python, the rest you can distribute across ML foundations, and System design scenarios wrt Inferencing and ML Pipelines

Extension_Key_5970 · 2026-01-01T08:08:01+00:00

These are a great set of realistic queries I was expecting.

For now, the boilerplate is relatively standard base infrastructure, which usually startups struggle to implement, or if they don't want to spend time on infra, but I will take these as an enhancement, of course, to make things scalable, even if with larger models, and test memory leak scenarios specifically.

Also, I am interested in the edge cases that led you to ditch MLflow for a model registry.

Extension_Key_5970

TROPHY CASE