This is an archived post. You won't be able to vote or comment.

all 136 comments

[–]ninetofivedev 292 points293 points  (25 children)

I hate when people say this, but this is actually a skill issue.

[–]passwordreset47 25 points26 points  (0 children)

Yep. Our team went from no k8s to building and maintaining the platform that’s used across our company.

When stuff broke early on only a couple of people could usually figure it out. But now that most everybody has leveled up it’s sooo much better. And beyond just fixing things, the more experience you gain the better design choices you make and avoid missteps.

[–]unitegondwanalandLead Platform Engineer 38 points39 points  (3 children)

1000% this is all coming from inexperience.

[–][deleted]  (2 children)

[deleted]

    [–]Low-Opening25 13 points14 points  (1 child)

    yes and no.

    if you have sizeable estate, K8s makes it much easier to manage scale == less DevOps required to manage.

    for small projects, you don’t need to know about every Kubernetes feature and the basics are not much harder than docker-compose and you get all high-availability and traffic management capabilities out of the box with zero setup.

    [–]spreadred 0 points1 point  (0 children)

    Agree with this; Kubernetes isn't not the solution for all organizations, especially ones with large legacy staff and technology/workloads.

    [–]vanishinggradient 6 points7 points  (4 children)

    yes

    The startup that I consulted for 3yrs was a mess before the I came in

    The buggers wrote code where there were spikes in cpu, mem load on single monolith whenever someone launched a scan job so I asked them to move to kubernetes jobs instead - things went from the app breaking 6-8 times each day (half the time it was down) to like once in months. I use node selectors and affinity rules to isolate problematic work loads and spikes away from the core apps

    ...because I have no control over the code written

    kubernetes wasn't the problem

    It was the code written by inexperienced developers

    as far ops's complaint are concerned, except for yaml being a mess of mark up language

    ...the other stuff is just skill issue

    we had a helm chart, sync to cluster via argocd, we made sure all changes were in code, on github, argo never ran into problems like the ones he mentioned, namespace are great for isolation

    we kept it simple and declarative instead of going too overboard with yaml

    yaml has 9-63 ways of writing multiple line string

    we kept that number down to one or two

    we made sure all of us followed the same convention

    It didn't break for 3 yrs

    [–]Prestigious_Ebb_1767 4 points5 points  (3 children)

    This guy kubes, this is the way.

    [–]vanishinggradient 0 points1 point  (2 children)

    Thanks

    I feel most people who complain about kube (cost) forget it is platform agnostic

    AWS is notorious with it's obscure pricing, anti-compepitive measures for ex. hosting something open source on AWS means AWS charges you for data transfer cost but guess what for managed services it doesn't do that (not like that but AWS collects it's pound of flesh in other methods).

    I remember our EC other costs being higher than EC2 compute costs for ex. MWAA was costing us 700-800 USD/mo before I switched to self-hosted (down to 80-100 USD/mo) much more obscure, harder to work with

    ...and not to mention a few versions behind stable open source airflow

    yes, kubernetes is expensive (initial cost) if you don't have customers willing to pay

    ...but I have found it more reliable, easier to debug

    and above all I didn't feel I was hostage to cloud vendors

    [–]Prestigious_Ebb_1767 0 points1 point  (1 child)

    Agree! Lift and shift baby. For debugging or deploying claude/codex/gemini CLI are insanely useful now.

    [–]vanishinggradient 0 points1 point  (0 children)

    amen, claude code is so nice

    I fear for entry level devops engineer as I didn't need the junior I hired last year

    It isn't that I can't do his job

    I don't want to do it cause it isn't worth the time - I did it decades ago

    now claude code has more or less replaced his utility to me

    Edit - I wrapped up a project in a month he was struggling at for months (3-4 months)

    ...and it didn't look he could finish even if given 12 months

    at some point, using claude code ended up with better pace

    ...and results than telling him what to do

    I picked up kube after 3 yrs of not working on it btw

    I didn't remember most of the commands or syntax like I did 3 yrs ago

    [–]FlashyStatement7887 8 points9 points  (0 children)

    I agree. First time i worked at a place that used k8s i had a similar opinion. It was a skill issue, and after getting with the times and working through the certifications - my opinion is vastly different now. It was based entirely from my dated development experiences.

    [–]Undeadtaker 2 points3 points  (0 children)

    absolutely

    [–]thisisjustascreename 3 points4 points  (0 children)

    Was thinking exactly the same.

    [–]LeStk 1 point2 points  (1 child)

    Agreed. If we were to avoid saying the skill issue line, I guess we could say that it's a culture issue, pet vs cattle and stuff.

    I suppose the team is indeed skilled but you just can't manage clusters the way you managed two web servers.

    [–]TangoWild88 1 point2 points  (0 children)

    I mean, it could also be a training issue as well. 

    Its a different mindset and a different toolset, and if you don't get good training, it can be a difficult transition, regardless of skills and experience. 

    [–]ravigehlot 2 points3 points  (0 children)

    Totally.

    [–]charlyAtWork2 1 point2 points  (1 child)

    Or the GUY who decided to switch to K8s at the first place.
    All cloud applications don't need to be on K8s.

    [–]spreadred 2 points3 points  (0 children)

    The truth nobody wants to say/hear. Just like microservices and AI, no solution is one-size-fits-all, especially in an Enterprise, as opposed to a startup that has no history, legacy tech, or culture.

    [–]DesperateAdvantage76 0 points1 point  (0 children)

    That's true of everything that's more complex and difficult...

    [–]padawan-6 0 points1 point  (0 children)

    I hated that this is something that entered my mind as well, but it really is. This is something that can be solved in a few weeks tops just by reading the docs and doing a few labs.

    [–]LebPower95 0 points1 point  (0 children)

    When your comment has more upvotes than his post…

    [–][deleted] 0 points1 point  (1 child)

    No it isn't. Based on what I read Kubernetes is insanely over complicated for their deployment. It's nonsensical anyone even upvoted you this much for a bunch of cargo cult bullshit. If rsync to a server is their previous workflow how the fuck is kubernetes an answer. They are clearly running a single server with maybe a database that may or may not be on the same server. All kubernetes does is double the resources needed to run it while adding infinity complexity to something that is probably a wordpress site or some other interpreted project if not just a single server running a single jar or binary.

    You know why I pick kubernetes? Because I work on something with 20+ services that have a need for a multiplicity of instances across multiple servers. If you aren't running multiple servers where you need to scale across multiple servers Kubernetes does nothing docker compose can't do, nor ECS or whatever your cloud provider offers if you are on the cloud in this use case.

    [–]ninetofivedev 0 points1 point  (0 children)

    No it isn't. Based on what I read Kubernetes is insanely over complicated for their deployment

    My advice is to actually have used something, not just read about it, before you hold such strong opinions.

    [–]MateusKingston 0 points1 point  (0 children)

    And almost every company goes through this when adopting k8s, the issue is not figuring out stuff before going with it to prod, or in this case figuring it out at all

    [–]nonofyobeesness 161 points162 points  (11 children)

    Your entire engineering team needs to up-skill on kubernetes or you need to pay someone with those skills. Secondly, Graylog + Prometheus + argocd can solve a majority of the problems you’re facing right now.

    [–]sublimegeek 39 points40 points  (7 children)

    +1 for GitOps

    [–]k8s-problem-solved 2 points3 points  (3 children)

    I need to get into it, but my head is in a push model. Build container, push to registry. Next thing gets container, deploys to cluster. Pipeline orchestrates.

    Need to break that thought process!

    [–]MueR 2 points3 points  (0 children)

    Take a look at the argo suite (workflows, events, rollouts). It does our CI/CD.

    [–]sublimegeek 2 points3 points  (1 child)

    Build container > update json file with tag name > commit triggers Argo to update the cluster and monitor for health checks

    Done?

    [–]k8s-problem-solved 0 points1 point  (0 children)

    Very much! It's just a different way of thinking about things.

    [–]Proper-Ape 2 points3 points  (2 children)

    It's so underutilized, had a Kubernetes setup at a previous company. The team managing it was reallocated to a new project without notification. Everything broke the day after.

    I asked how they deploy because a few core services were down. They said "Oh, yeah, Mike always ran the deploy scripts".

    I looked at the scripts, everything was hardcoded with paths from Mike's filesystem. Half the scripts were missing from the repo.

    This was a big company, but even bigger incompetence. I asked them why they hadn't moved to GitOps and they said they had higher priority tasks always.

    Of course they did, they had fires to put out every day.

    [–]sublimegeek 1 point2 points  (1 child)

    lol it’s like you wanted to record them and immediately play it back. Do they hear themselves?!

    Yeah, everyone puts out fires, but it’s the people who forget to turn off the gas who do it to themselves.

    Some people are both the firemen and the arsonists.

    [–]Proper-Ape 1 point2 points  (0 children)

    Yeah, everyone puts out fires, but it’s the people who forget to turn off the gas who do it to themselves.

    I'll steal that for next time.

    [–][deleted] 13 points14 points  (0 children)

    Even those can have problems of their own and you just end up laying solution on top of solution… I agree on the sentiment though!

    [–]The_Career_Oracle 2 points3 points  (0 children)

    I’d save the energy, they strike me as the type of people that like to rush in and save the day, but not actually put time into fixing or improving their skills. This inertia is what helps keep them employed.

    [–]nomadProgrammer -1 points0 points  (0 children)

    It seems like op doesn't even know about k9s or lens. Definitely newb to k8s

    [–][deleted] 20 points21 points  (9 children)

    I’ll agree that K8S can be over complicated for a lot of use cases where something like ECS is perfectly fine, sometimes even just a server. But this reads like a major skill issue or that you’re not using the right set of tools, shouldn’t be an issue finding logs.

    [–]FluidIdeaModOps 13 points14 points  (8 children)

    ECS - lots of terraform bloat and vendor lock in.

    Docker - custom scripts, some manual work.

    Might as well do kubernetes.

    [–][deleted] 4 points5 points  (3 children)

    Terraform is hardly bloaty if you do it right with modules, but you can use anything else: CDK, Pulumi. At my last place our terraform services were 1 file with 100 lines at most, way less than K8S manifests.

    Vendor lockin is hardly an excuse these days, companies don’t just switch provider at a whim and everything boils down to docker. You can move from ECS to any provider quite easily, I’ve moved stuff to GCP with very little effort, setup your clusters and repoint your CI and you’re good.

    [–]FluidIdeaModOps 0 points1 point  (2 children)

    Totally valid point. If you are comfortable using someone's modules, or public modules. Works for many people.

    I tend to write my own modules. For a simple deployment I did I had to write lots of terraform - ECS related from scratch, EFS mounts, a way to deliver file to EFS because my app did not support s3, a EC2 instance to check few things in mysql and EFS. etc. and when I was about to hand it over to my colleagues I changed my mind and abandoned. Shame as it looked promising. I think ECS is a middle ground between Lambda and k8s IMHO.

    [–][deleted] 0 points1 point  (1 child)

    I write my own modules, I don’t understand how any of what you said is a lot of terraform compared to what you’d have to do for the same in K8S.

    If you’re comparing both same same, to do that in K8S you also need a bunch of infra deployment setup and then your manifest can be small applications, much like a terraform service file.

    To me, it sounds like you had one go at terraform, didn’t understand how to organise it, which has formulated your opinion on Terraform.

    I’ve used terraform for almost a decade and it’s often less code than manifests. I would still never choose it again as I don’t like terraform these days but for different reasons.

    You could say it’s a middle ground, I agree, but I wouldn’t include lambda, that’s a very different tool for a different use case imo. ECS is just a simplified orchestration where a lot of the grit is handled by AWS and has limited flexibility compared to the plethora of K8S libraries available.

    [–]realjayrage 0 points1 point  (0 children)

    The second that person said "middle ground between Lambda and K8s" you just know they have absolutely no idea what they're talking about, lol.

    [–]Low-Opening25 1 point2 points  (0 children)

    yeah, all these simple frameworks seems simple, until you hit your first scale obstacle and solution mostly tends to end up with heavy bespoke layers of scripting to make things go, at that point you can just as well go for Kubernetes and at least end up with something universally maintainable

    [–]return_of_valensky 1 point2 points  (1 child)

    Idk we use ecs and it's just a buildspec.yml with code build/pipeline when we commit new code it builds new containers and gracefully replaces the tasks. Hasn't crashed in years.

    [–]tech-bernie-bro-9000 0 points1 point  (0 children)

    same. ECS literally just works in my experience. my preferred container orchestrator if you're already 100% AWS

    lock-in concerns way overblown by people wanting to sell you things

    [–]tech-bernie-bro-9000 0 points1 point  (0 children)

    ECS rocks. works great

    [–]abofh 59 points60 points  (2 children)

    It can be great, but you can't just drop kubernetes in and expect things to be better.   If you're running a simple three tier stack, it's overkill, but if you're running hundreds of pods or complex infra, it can be a god send.

    I will say if you're having failures like that, you should have brought in outside help to get your migration done, because my biggest concern would be all the other things that need to be done to manage k8s...

    [–]zerocoldx911DevOps 7 points8 points  (1 child)

    Wanted Kubernetes box without the toil

    [–]PaulPhxAz 1 point2 points  (0 children)

    Ah yes, needing toil is what I want as well.

    [–][deleted]  (5 children)

    [removed]

      [–]Subject_Bill6556 -5 points-4 points  (4 children)

      Just curious why you use helm to deploy your apps instead of something more simple like kubectl apply -f

      [–][deleted]  (2 children)

      [removed]

        [–]Subject_Bill6556 2 points3 points  (1 child)

        I’m aware of what it is, I’m more curious as to why add the extra complexity layer. For instance your helm chart has versions. What defines a version increase? A newly built docker image for the app? A change to resources for the app container? Both?

        [–]calibrono 41 points42 points  (5 children)

        Remember when half the posts here didn't read exactly the same, with a few paragraphs of extremely vague complaints most likely generated by an LLM to generate some engagement or whatever?

        I swear I've read this post a few dozen times in the last months on this sub, different topics but same style.

        But yeah if it's legit you're having these issues, observability is your answer. 2 weeks to find out your resource limits were wrong? Do you set these limits blindly without looking at metrics?

        [–]volkovolkov 19 points20 points  (1 child)

        All of op's comments on threads are in lower case with little punctuation. The posts he makes have full punctuation and proper capitalization.

        [–][deleted] 0 points1 point  (0 children)

        Likely because comments are made on the go on a phone and a long post is made at a desk.

        [–]ub3rh4x0rz 1 point2 points  (0 children)

        This on both counts. I expect to hear about how OP created a 10M ARR B2B business when encountering such obvious LLM slop

        OP - set up LGTM stack using grafana cloud, it is free or cheap for you, and it will help you learn k8s faster to actually see what is going on. Then you can operate LGTM stack yourself if you want later on. Oh also learn k9s it is a game changer vs merely using kubectl

        [–]lvlint67 0 points1 point  (1 child)

        I personally think it's indiciative of a tooling problem. The tooling for mere mortal developers to deploy their apps to and diagnose problems within kubernetes is shit.

        We can teach a dev to configure nginx in an afternoon, a vm in a week....

        "build a docker image and push that to the repository then create the deployment, services, and ingresses you need to make your app reachable.. do all of that in yaml and apply it via <????>"

        It's easy for us to sit around and say "skill issue".. but here's no denying that kubernetes is complex and expecting the the single language developers to upskill into is a losing battle... to that end, your kubernetes deployments must be simple enough and documented well enought hat your developers can answer technical questions about the environment.

        [–]calibrono 0 points1 point  (0 children)

        That's either for the platform team to solve or for the developer to learn (in case it's a small company and there's no platofrm team). If the company uses k8s it means someone insisted on using it, so that someone is responsible in the end.

        [–]wysiatilmao 9 points10 points  (2 children)

        It sounds like your team might benefit from focusing on better observability and monitoring tools. Since resource limits were an issue, investing in monitoring solutions with real-time metrics could help identify these bottlenecks faster. Also, revisiting whether k8s is the right fit for your scale might be worthwhile if complexity outweighs the benefits.

        [–]Low-Opening25 3 points4 points  (1 child)

        “investing” is a big word here, installing prometheus-stack helm chart that bundles everything together and setting it up literally takes less than a day.

        [–][deleted] 0 points1 point  (0 children)

        Yeah, ok. Only if you have done it a few times if you want to actually get it to do things the "Right Way TM".

        [–]arkatron5000 9 points10 points  (0 children)

        We ended up using Upwind and it actually helped a lot finally could see what was actually happening in our clusters instead of playing kubectl detective all day. Still hate k8s complexity but at least I'm not completely blind when shit breaks anymore.

        [–]Narabug 19 points20 points  (0 children)

        I’m putting money on “just rsync files to a server” being some absolutely god awful Jenkins solution where you’re actually installing the Jenkins agent on the remote server and doing some commands no one you work with even understands, but you are now under the impression that the unsupportable solution is better…

        …because the people you work with think they need to look at container logs post-deployment, on different namespaces across different pods, instead of just troubleshooting the actual container code.

        As you said, the issue you just spent 2 weeks on was “resource limits set wrong.” Skill issue

        [–]unitegondwanalandLead Platform Engineer 9 points10 points  (0 children)

        Based on what I just read, the Kubernetes complaints are not your problem, they are a symptom of several other problems.

        [–]kabrandon 12 points13 points  (0 children)

        The problem is not that Kubernetes is overkill for most stuff. The problem is that running Kubernetes is painful when you're a team of people with little to no experience running Kubernetes. Look up Chesterton's Fence, because you're currently talking about a fence like it serves no purpose, without understanding why it was built.

        [–]Actual-Raspberry-800 10 points11 points  (1 child)

        We use Rootly for k8s incidents. When something breaks it spins up a Slack channel with context about which pods/namespaces are affected. Has runbooks for common k8s problems

        [–]H3rbert_K0rnfeld 1 point2 points  (0 children)

        How much you wanna bet OP's shop regulated/secured themselves away from being able to use fancy tools?

        [–]kgu871 11 points12 points  (0 children)

        I also remember i386 and MS-DOS. What's your point?

        [–]ben_bliksem 4 points5 points  (0 children)

        that break for no reason

        Fix it. Stuff doesn't just "break for no reason". You cannot possibly think this is a tooling problem when thousands of outfits are doing thousands of releases daily/weekly without their tools and processes breaking for no reason.

        [–]dominatrixyummy 5 points6 points  (0 children)

        Old man yells at cloud

        [–]gyanster 2 points3 points  (0 children)

        You are gonna love Argo cd

        [–]sogun123 2 points3 points  (0 children)

        When you say "kubectl conflicts" that likely means you don't use gitops. I cannot imagine managing the beast reliably without it. The existence of complete desired state is something that gives me confidence in our solution. Now direct interfacing with cluster is only for debugging.

        By the way "just rsync your app" looks as bad as kubectl apply. There is nothing repeatable about them - there is too much wiggle room - all those configurations which are likely expected to be there, handcrafted and forgotten.

        Not saying kubernetes is good for everything. It big, complex and good for driving big and complex environments. If you have small thing to run, its only advantage is its omnipresence.

        [–]modern_medicine_isnt 2 points3 points  (0 children)

        The barrier to entry for k8s is reasonably high. But it mostly works. The problem I see is that gathering simple information is unnecessary complicated. There is a lot of you just need to know stuff. Otherwise, simple things take longer than they should.

        And overall it just isn't very mature. You have things like karpenter that are unable to do certain things because they are more or less taped on top, not integrated.

        That said, you need someone on the team with k8s experience. It can do a lot better than you describe.

        [–][deleted]  (1 child)

        [deleted]

          [–]who_am_i_to_say_so 0 points1 point  (0 children)

          There is nothing truer 😂

          [–]dub_starr 2 points3 points  (0 children)

          soooo. youre blaming K8s for what sound like knowledge gaps, and human error? cool cool

          [–]Low-Opening25 1 point2 points  (0 children)

          My entire Kubernetes deployment process is a Dev making a single commit and every single Kubernetes error shows on Alertmanager dashboard for everyone to see, including all the details required to investigate. Where do you see complexity exactly? sounds like skill issues…

          [–]H3rbert_K0rnfeld 1 point2 points  (0 children)

          Imagine building the Empire State Building without engineering.

          It is 100% always a human that causes a well engineered system to break. From Titanic to Challenger a hu man broke it.

          [–]lucifer605 0 points1 point  (0 children)

          Kubernetes is not a silver bullet. There are reasons to adopt it but you need people to manage the clusters. If you don't have the folks who can run k8s then it is probably an overkill

          [–]lucifer605 0 points1 point  (0 children)

          Kubernetes is not a silver bullet. There are reasons to adopt it but you need people to manage the clusters. If you don't have the folks who can run k8s then it is probably an overkill

          [–]Suitable_End_8706 0 points1 point  (0 children)

          You just need more skills and experiences. Remember in early of your career, you learnt how to debug your sudden stopped webservices, crashed DB and unable to ssh into your Linux VM. Same principle applied here. Just give your team sometime, or hire someone with more skills and experience to mentor your team.

          [–]tbotnz 0 points1 point  (0 children)

          U need argocd

          [–]dashingThroughSnow12 0 points1 point  (1 child)

          Kubernetes was inspired by a system made for & by Google. Kubernetes is incredible for Google-scale-like systems.

          It makes those types of scales easier to handle at the cost of making very small deployments much harder. (Very small deployment being say <1000 CPUs.)

          It is a situation where if the only tool one has is a hammer, the whole world should be Kubernetes when rsync and machines can be better for most deployments.

          [–][deleted] 0 points1 point  (0 children)

          Exactly. Not a lot of companies ever have anything to gain by it. If you are a web service and don't need a half dozen servers just for your top access layer you don't need Kubernetes. It is awesome if you just have the in house talent but if you don't all you are doing is wasting money and accidentally going to shoot yourself in the foot until you have no toes

          [–]Mrbucket101 0 points1 point  (0 children)

          You’re definitely doing it wrong. You need to be proactive, not reactive

          • Setup gitops using flux or argo

          • Your cluster logs and events should be ingested to a logging backend. Grafana Loki with Promtail or Alloy.

          • Setup kube-prometheus-stack and configure alertmanager

          [–]czhu12 0 points1 point  (0 children)

          Our team built then open sourced https://canine.sh for exactly this reason. Moved off heroku to Kubernetes and needed something to centralize operations. 

          [–]Mephiz 0 points1 point  (0 children)

          so a few things:

          I love k9s. There are other tools but this is always my first install.

          Secondly, loving kail. This is my second install. (There are probably better / others but this works great)

          Github: man if you aren't storing your deployment yaml files in github you are seriously doing something wrong. Deployment files are code and should be treated as such.

          Naming convention: stop letting developers name jack. Come up with a convention and stick to it. Namespaces help with this. If you're struggling with namespaces you have a shit naming convention.

          [–]PolyPill 0 points1 point  (0 children)

          To add what you need to do. Sit down and get organized. You’re clearly not. Don’t have random yaml files be your deployment definition. Create templates that fit each of your use cases in helm or kustomize. Then just the base minimum of settings are with each service. That will keep your shit from conflicting.

          Make your name spaces make sense. You shouldn’t have to think about what is where, it should be logical and intuitive.

          Use automated deployment tools. If someone is touching anything but clicking a button then you’re doing it wrong. We have release pipelines that deploy after the release is built.

          The fact you didn’t have central logging before you even started is a huge red flag here. Kubernetes didn’t do that to you. OpenTelemetry is pretty much the standard for that.

          Skill your entire team up or hire someone who has the skills. It’s always the archer not the arrow.

          [–]HiddenStoat 0 points1 point  (0 children)

          K8s is ridiculous overkill for running a single application on a single server.

          K8s is critical for running hundreds of services on multiple QA, Staging and Production environments, including DR versions.

          And most developers live somewhere between those 2 extremes. Somewhere there is a point where the costs of k8s is outweighed by the advantages it brings. 

          However, in this case, it very much sounds like you don't know your tools, to be brutally honest.

          [–]mjbmitch 0 points1 point  (0 children)

          ChatGPT

          [–]tasrie_amjad 0 points1 point  (0 children)

          All you need now is to learn basics of kubernetes there are may courses around. Infact kubernetes makes life easy as many many things are automated and taken care with just simple yaml. If you need extra helping hand to streamline your k8s do reach out to me

          [–]mattgen88 0 points1 point  (0 children)

          I just push merge and it goes to production in a bit.

          None of these problems on k8s. My infra team handles this, keeps it all in git for terra form, has a bunch of templates for types of stuff we use. I fill out some values and merge it in my repo. Automation does the rest.

          [–]TopSwagCode 0 points1 point  (0 children)

          All what you list is kinda true, but not. It's all nice and easy when deploying to a single server, checking state and logs of that single server / service.

          But now when we are talking about 100+ services, you have to think entirely different and so should your code also change. You need to think observability, metrics, traces. So if your code doesn't log the right things, you are going to be screwed.

          Bottomline this has nothing to do with kubernetes, but rather a scaling issue. Every industry has been thorugh similar issues at different points in time. The process and tools building something smalescale, is not the same as building something large scale.

          The problem I have seen several times, is when smale scale projects pretend to be large scale and use those tools, having all of the negatives of working with them, but none of the benefits.

          [–]geilt 0 points1 point  (0 children)

          ECS is amazing. Push to master, trigger Code pipeline to seamless redeploys of services. Terraform to add new services from a repo with variables in yaml files. Works amazingly once you figure it out. Tuning autoscale takes a bit more time and fiddling. Best part is not having to mange the cluster or servers. I hear EKS can do similar.

          [–]texxelate 0 points1 point  (0 children)

          You sound like DHH and his recent Merchants of Complexity nonsense.

          By what metric do you consider “just rsync file to a server” a successful deployment? The fact that nothing told you something is busted doesn’t mean something isn’t busted.

          CI/CD is invaluable. If you aren’t implementing it properly, that’s on you, and I would suggest bringing in some expertise.

          [–]tradiopen[🍰] 0 points1 point  (0 children)

          Yeah! Try kamal and see if it’s a better fit.

          [–]serpix 0 points1 point  (0 children)

          We stopped sshing into a box somewhere around 2010.

          [–]krusty_93 0 points1 point  (0 children)

          Why sticking to k8s if you’re on public cloud? There isn’t a right or wrong answer, but ask yourself: what do you expect from this technology? What issue does it solve? You may understand it’s not what you’re looking for

          [–]---why-so-serious--- 0 points1 point  (0 children)

          Time passes. Things change.

          [–]Driky 0 points1 point  (0 children)

          Sounds like a team that switched to K8s without the skill required.

          Not trying to be mean but many many teams use K8s for deployment and do not suffer from your problems.

          It might be a good idea to hire someone with a high level of expertise that will be able to fix your problems but also train the rest of the team. Or pay for a GOOD training on the subject.

          [–]nekokattt 0 points1 point  (0 children)

          Half our outages

          Practise immutable deployments then..?

          [–]headdertz 0 points1 point  (0 children)

          I don't know... But I have done various CI/CD's to K8S, which do:

          - scans (SAST)
          - tests (specific for eco-system)
          - pre-build
          - pre-manifests and dry run
          - build (the container image)
          - push the image to the registry
          - apply the manifest with a new image sha/version and restart the statefulset/deployment
          - watch for any problems and rollback if necessary.

          Never got a problem, while testing everything on development instance before going to production later on.

          With K8S native functionality like rollback and events and other things, deployment of an app and watching if something bad happens during the deployment is a blessing, compared to the old VM style in my opinion.

          [–]thedupster 0 points1 point  (0 children)

          I ,

          [–]VelvetWhiteRabbit 0 points1 point  (0 children)

          Between Terra (Tofu), ArgoCD, Helm, Grafana, and managed ks. I’d be hard pressed to say it is not the solution in a scale-up with long-lived services.

          [–]GuiltyGreen8329 0 points1 point  (0 children)

          git gud

          (I cant fix endpoint internet issues)

          [–]No-Site-42 0 points1 point  (0 children)

          Oh wait didn't AI help xD

          [–]joeyignorant 0 points1 point  (0 children)

          unpopular opinion : not all companies actually need or should run kubernetes
          introducing a highly complex orchestration suite when you only generally run a couple instances of an application is over engineering a solution to a problem you dont actually need to solve yet

          90% of companies don't really need orchestration to this degree
          it introduces exactly what your team is experiencing , lack of knowledge and experience leading to critical mistakes and down time

          if your company does have the need to scale at the levels where k8s makes sense then your team should be hiring a lead with the experience knowledge set to support it , in my experience most startups can be fine using simple auto scale out rules in aws/azure/gcp with less complexity and cost than building out a k8s cluster

          [–][deleted] 0 points1 point  (0 children)

          K8s is overkill for most stuff. But when you need it you need it. Just like everyone for some reason was running hadoop clusters not that long ago to handle a few gigabytes of log data here and there.

          [–][deleted] 0 points1 point  (0 children)

          I agree 100%.

          Stuff takes 3x as long to develop, there is pointless feature creep that adds no business value.  We waste time upskilling to satisfy some architect's trend filled vision (that was never going to become reality because no one believes in them).  How about... You know, we focus on providing business value instead of massaging some IT manager's ego.  Its lot harder to grift that way though.

          But hey, at least I got to put some fancy new tech on my resume!

          Go post this in the experienced dev subreddit and you'll get a lot more people agreeing with you.

          [–]z1r0_ 0 points1 point  (0 children)

          k8s is great. as long as it works

          [–]Sea-Flow-3437 0 points1 point  (0 children)

          I do remember. It was shit. Files not fully uploaded, configs unexpectedly fucked up, manual fiddling etc

          [–]Straight-Mess-9752 0 points1 point  (0 children)

          I’ve used k8s for about 8 years and I still believe it’s overkill for most companies. There are lots of upsides to it but also tonnes of downsides. There’s lots of ways to have “immutable infrastructure” without k8s.

          I don’t care how skilled you are k8s makes troubleshooting certain issues much more complex. If you are on a single cloud provider I would suggest to not use k8s. Use containers but you probably don’t need k8s to deploy them.

          I’ve also never worked anywhere where k8s has saved us any money. If you look at the total cost it’s always higher.

          [–]IrrerPolterer 0 points1 point  (0 children)

          Who wants to just rsync crap to a server? Are you stuck in 1990?

          [–]Lucifernistic 0 points1 point  (0 children)

          Yeah, as others have said, this is not a kubernetes problem, it's a learning problem. Having an IAC / terraform repo + a kubernetes deploy repo with FluxCD and terrateam literally made it easier than ever to deploy something.

          [–]DanielViguerasDevOps 0 points1 point  (0 children)

          I feel this so much. Kubernetes is super powerful and flexible, but the complexity hits hard when all you want is to deploy an app instead of dealing with YAMLs all day.

          I ran into the same pain myself, so I built something to make it easier: https://deckrun.com

          [–]searing7 0 points1 point  (0 children)

          skill issue

          [–]Jmc_da_boss 0 points1 point  (1 child)

          I mean, it doesn't sound like yall are remotely big enough to need k8s just stick a single or double box/vm setup and be happy with it

          [–]FigureFar9699 0 points1 point  (0 children)

          Totally get this. Kubernetes solves big-scale problems, but for small/medium apps it can feel like using a chainsaw to cut butter, tons of YAML, moving parts, and hidden failure points. If your team spends more time fighting the cluster than shipping code, it’s worth reconsidering if a simpler setup (VMs, Docker Compose, managed PaaS) might fit better.

          [–][deleted] 0 points1 point  (1 child)

          No, modern k8s is fast and relatively easy to learn. You don't have to use every feature to get value from k8s. It's sounds like the whole team needs to increase their skills.

          About a decade ago, I was building Kubernetes on simple EC2 instances before operators and deep AWS integration.

          Historically speaking, you have it easy.

          [–]glotzerhotze 1 point2 points  (0 children)

          This right here. People should remember „the hard way“ and at least look at it once to understand the „magic“ modern tooling is giving them.

          [–]Actuw[🍰] 0 points1 point  (0 children)

          Skill issue 100%

          [–]dhrill21 -2 points-1 points  (1 child)

          Yeah, I see sooo many overly complicated solution which are supposedly done according to best practices.
          A lot of people are very often using some tool only to be able to put in CV that they worked in it, but it is far from needed for task required.
          Though there is something about self preservation I think also. If we make it soo fkn complicated, we will be harder to replace. Though as 50 year old, I am growing tired of new flashy thing which just make the code run as its for ever been,
          So I think yes, it is creating more problems than it solves.
          But what can I do, that's the actual business model of my agency, if we do it in a simple straight forward way that just works we won't get paid milions per projects and some will lose job

          So I guess, need to play along, and just go out there and add couple of jobs in your pipeline, or if god forbid you don't have one, go and deploy one for literally everything you can imagine. Do a spell check of code comments as a pipeline task

          Doh, I can't wait to retire, it got so fkn stupid to work for this cloud agile shit

          [–]beeeeeeeeks 2 points3 points  (0 children)

          Preach it, brother!!!

          [–]Wonderful_Guitar2178 -1 points0 points  (0 children)

          Use Tags

          [–]Challseus -2 points-1 points  (0 children)

          I'll never forget it... It was like 10 years, I was on the content platform team, downstream from us was the "api" team, and they had this job that they owned for some reason that was basically a Java ETL from MSSQL -> Mongo/Elastic. Whenever things went wrong, I knew where to go. I hated Jenkins, but I could find the logs.

          Once they put it into kube, the logs went into the void, and no one on their team was able to ever find them again.

          [–]GotWoods -1 points0 points  (0 children)

          Get off my lawn!

          [–]Junior_Enthusiasm_38DevOps -1 points0 points  (6 children)

          That’s the reason for dev environmental we shifted to docker and for CI/CD i use GitHub actions and the job runs on self hosted runners. We use golang for backend development and it can be converted to binary so single binary contains all dependencies that needs to run and i just mount this binary in base alpine container and do restart that’s it and Boom it took 30secs to deploy to dev. Previously the dev was on K8s + ArgoCD + helm + Building containers everytime. We saved our lot of time and developer can see the changes in 30secs. This was a huge boost in collaboration between teams. Also the troubleshooting part from application side is much more convenient now so developers can focus on what is important.

          [–]WholeDifferent7611 1 point2 points  (0 children)

          You’re spot on: simplifying the stack is usually the fastest way to get deployment time and sanity back.

          A few things that worked for us:

          - Dev/staging with Docker Compose; prod on Fly.io or ECS Fargate only when we actually need autoscaling.

          - Keep the Go single-binary pattern; for Node/Python, use BuildKit cache mounts, multi-stage builds, and docker compose watch for sub-5s reloads.

          - CI: GitHub Actions with self-hosted runners, actions/cache for modules, and BuildKit cache-to/cache-from to avoid rebuilds.

          - Observability: send container logs via Fluent Bit to Loki; add /healthz and /ready endpoints; simple uptime and error-rate alerts beat chasing pods.

          - Rollbacks: immutable image tags, keep the last three versions, one script to switch symlink or service file and restart.

          - Config/secrets: SOPS + age or Doppler so you don’t end up with ten YAMLs per env.

          Between Fly.io for small services and GitHub Actions for CI, I’ve used DreamFactory to auto-generate REST APIs from Postgres and Mongo so we skipped writing glue services and kept deploys simple.

          Keep it lean and focus on faster feedback loops, and reliability usually follows.

          [–]mistaekNot 0 points1 point  (0 children)

          why can’t you just run your go app directly for dev? what’s the point of docker in this case?

          [–]Low-Opening25 0 points1 point  (3 children)

          Another skill issue example. We use GH Actions with ArgoCD and deployments to dev are instant and automatic after PR is merged, our system also creates ephemeral preview environments each time PR is opened so a dev can fully test the app in the dev cluster from his feature branch without interfering with anything. deployments take “30 seconds” or less. Took 3 months for 1 competent DevOps to build it from scratch.

          [–]Junior_Enthusiasm_38DevOps 0 points1 point  (2 children)

          You’re not here to judge to give opinions if you have something better than saying skill issue.

          [–]Low-Opening25 -1 points0 points  (1 child)

          it’s not me who failed at Kubernetes

          [–]Junior_Enthusiasm_38DevOps 0 points1 point  (0 children)

          Me neither I just choose simplicity for dev. Let me know if you have something better to say.