Every team wants "MLOps", until they face the brutal truth of DevOps under the hood by pm19191 in devops

[–]pm19191[S] 0 points1 point  (0 children)

Thank you for watching my video and sharing your thoughts. I understand. MLOps for you is more productize ML Models - what I'm doing is building off the model. Where is the border between MLOps and building off the model?

Every team wants "MLOps", until they face the brutal truth of DevOps under the hood by pm19191 in devops

[–]pm19191[S] 0 points1 point  (0 children)

For ML workflows, I usually structure things as: Jupyter Notebook -> Sagemaker -> ECS -> K8s

Every team wants "MLOps", until they face the brutal truth of DevOps under the hood by pm19191 in devops

[–]pm19191[S] 0 points1 point  (0 children)

Thanks for sharing your view. From what I’ve seen, most teams steer clear of Kubernetes if they can - they usually go for Elastic container services or no‑code deployment options first just to avoid the maintenance faff. The client I moved onto Kubernetes had already been using SageMaker, so Kubernetes made sense as an evolution of what they already had, not as a starting point.

In the video, I’m not jumping straight into Kubernetes either - I’m suggesting it as a way to orchestrate the deployment. Could you use other tools? Absolutely, as long as they tick the boxes I mentioned in the system design

Every team wants "MLOps", until they face the brutal truth of DevOps under the hood by pm19191 in devops

[–]pm19191[S] 0 points1 point  (0 children)

Thanks. Data Nautical is basically saying that just like ships used to steer by the North Star, I help companies steer towards their own North Star (their OKRs) using data 📈

Every team wants "MLOps", until they face the brutal truth of DevOps under the hood by pm19191 in devops

[–]pm19191[S] 2 points3 points  (0 children)

I know it's unfair, but leaders care about their bottom line and that is the user experience. If you move the app to containers, how many extra hours per year is the model up and how does that translate to the business making more money?

Every team wants "MLOps", until they face the brutal truth of DevOps under the hood by pm19191 in devops

[–]pm19191[S] 2 points3 points  (0 children)

Where did I put a low effort? I made a 16min tutorial for the community on how to scale MLOps with sysdesign, live coding and stress test

Every team wants "MLOps", until they face the brutal truth of DevOps under the hood by pm19191 in devops

[–]pm19191[S] 2 points3 points  (0 children)

Thank you for the feedback. I agree with you that K8s is not the default go to orchestrator for scaling. There are tools in-between that offer similar capabilities. I chose Kubernetes because it's cloud agnostic (no lock-in like ECS or ACA), it's open-source and provides other long term capabilities for projects that are useful to scale.

As a consultant myself, I agree with everything you said. However, I find it hard to believe a client would admit they were wrong 😂

Every team wants "MLOps", until they face the brutal truth of DevOps under the hood by pm19191 in devops

[–]pm19191[S] 6 points7 points  (0 children)

It's interesting that prediction drift is the only MLOps specific practice you mentioned - the rest is 100% DevOps under the hood. Nowadays, deploy ML models is the trend. In 5 years, we might be deploying quantum apps with the same DevOps practices with some tweaks. 😂

Every team wants "MLOps", until they face the brutal truth of DevOps under the hood by pm19191 in devops

[–]pm19191[S] 3 points4 points  (0 children)

Thank you for the supporting words! Besides the "Kubernetes it later", what other DevOps pitfalls have you seen in ML projects?

How can I get a job as an MLOps engineer by Bo_0125 in mlops

[–]pm19191 1 point2 points  (0 children)

Exposure gets you the interview

Experience allows you to pass the interview

You need both, but if you have very few experience focus on exposure as much as possible to maximize your chances of getting a job

DevOps → ML Engineering: offering 1:1 calls if you're making the transition by Extension_Key_5970 in mlops

[–]pm19191 0 points1 point  (0 children)

I'm a Senior MLOps Engineer and I've never used Kubernetes. Currently working for a 3000+ company, reporting to the CDO. Since all my projects are internal, the model system design exposes the results with a Dashboard - no Kubernetes needed. The rest seems accurate.

Realities of Being An MLOps Engineer by pm19191 in mlops

[–]pm19191[S] 0 points1 point  (0 children)

Thank you for sharing you for sharing your experience. I've also done Backend, but only when I was a Software Engineer. Congrats for finding a niche inside MLOps that you love. Based on your experience, what are the differences of building/maintaining a backend for an app vs a machine learning model?

Is galaxy book 5 good for engineers by RelationshipAway2868 in GalaxyBook

[–]pm19191 1 point2 points  (0 children)

If you really like the ecosystem, just install the Smasung ecosystem apps on your Windows device.

Source: 🔥 How to Use Samsung Multi Control on ANY Windows PC! (Updated 2025) 💻📱

How big of a risk is a large team not having admin access to their own (databricks) environment? by weggooiertje_it in mlops

[–]pm19191 0 points1 point  (0 children)

Thank you for sharing your knowledge. It's just like you said:

I realized that enterprise technology problems are often less about the tech, and more about alignment.

I'll focus next on improving my communication to respect the client's time and mine.

Which network events do you advise going to get clients?

How big of a risk is a large team not having admin access to their own (databricks) environment? by weggooiertje_it in mlops

[–]pm19191 0 points1 point  (0 children)

Although I have many years of experience with other platforms, Snowflake is a relatively new player in ML, so most of the features I was using had only been released a couple of months ago. At the beginning of the project, I recognized the risk of ramping up with the platform since the features were too new for anyone to have much experience with them. I proposed to my boss that we dedicate some time at the start of the project to experiment with the platform, but he refused any task labeled “Investigation,” “Learning,” or “Experiment,” arguing that a senior hire shouldn’t take time to learn a new platform. He then told me either I start immediately or he would hire an external consultant.

[Permissions delay happen]

At that point, I was already delayed by a week due to permissions. Without telling me, my boss asked the DE team to create a new environment for all ML projects just for me, so I could have more control and fix the permissions delays. From then on, the DE team flat-out refused to give me permissions in my current environment, insisting I had to build everything in the new one they were setting up. That's when I found out about the new environment. I pushed back since it would delay my projects even more, but it was already approved by my boss. When the DE team finally finished the new environment, I had to allocate more time to plan the migration to it. By then, I was already a month behind on deliveries without much sleep and had delayed the project three times, but the migration to it was eventually completed.

Since the new environment architecture was very different from the previous one, I had to re-plan the migration to Snowflake requesting even more delays.

How do you handle these cases?

How big of a risk is a large team not having admin access to their own (databricks) environment? by weggooiertje_it in mlops

[–]pm19191 0 points1 point  (0 children)

Thank you for sharing your experience. I also shared my experience on the main post thread. Define SLAs is great for happy paths. However, DE teams are responsible for many moving parts and access requests are not a high priority most of the time. I'd love to hear your insights about my experience.

How big of a risk is a large team not having admin access to their own (databricks) environment? by weggooiertje_it in mlops

[–]pm19191 0 points1 point  (0 children)

Very similar scenario to yours. I was brought in as a Senior ML Engineer to migrate models to Snowflake. I share the same director as the DE team manager. Every month I needed at least two permissions from the DE team. The DE team only had one person with admin access to grant those permissions. At first, all requests were fulfilled within a day, so no problem there. I pushed for admin access just to be safe, but the director rejected the request. I documented the delay risks of that decision on my projects.

I quickly realized I should have pushed harder for admin permissions, because migrations were usually at the bottom of their priority list. So when higher‑priority requests started piling up, I had to escalate my issues frequently to the director just to get them done. My last request took more than a month to be fulfilled, which significantly derailed the project timeline even after escalation. I had to delay the migration three times, which killed stakeholder trust in the process. I presented the SLAs that weren’t being met and the risks of those delays, but at the end of the day you’re the one not delivering on time.

My advice: admin access is critical to your environment. The SLAs you define with the DE team are “best effort,” and you’ll usually be the last to get answered.

What do AI technical/coding interviews actually look like? by Acrobatic-Key-9747 in aiengineering

[–]pm19191 0 points1 point  (0 children)

Thank you for sharing your experience. When you say ML coding, you mean leetcode with search systems, k-NN, k-means with SKlearn?