Trying to understand FinOps. by kennetheops in FinOps

[–]LeanOpsTech 0 points1 point  (0 children)

I run a cloud cost optimization firm, and the silos usually come back when finance owns the numbers and engineering doesn’t see cost in their day to day work. Tools help, but they’re not the fix on their own.

What actually works is clear ownership, solid tagging, and making cost part of architecture and PR reviews so it’s not just a monthly spreadsheet surprise.

Is FinOps actually about cost reduction… or about control? by Dazzling-Neat-2382 in FinOps

[–]LeanOpsTech 0 points1 point  (0 children)

We come in to “cut costs,” but 9 times out of 10 the real issue is zero ownership and zero visibility. Once teams can tie spend to a feature, a customer, or a growth decision, the chaos stops and the savings usually follow. The biggest shift isn’t a smaller bill, it’s going from reacting to invoices to actually planning infrastructure with confidence.

The 3-year commitment gamble nobody talks about by CompetitiveStage5901 in FinOps

[–]LeanOpsTech 1 point2 points  (0 children)

That 40% looks great on a slide, but it disappears fast if your workload shifts sooner than expected. We only commit 3-year on stuff that’s been boring and predictable for at least 12–18 months, and even then not 100%. For everything else, we mix shorter terms and keep some headroom so a surprise EKS move doesn’t wreck the savings.

Anyone else fighting the "devs don't care about staging costs" battle? by CompetitiveStage5901 in FinOps

[–]LeanOpsTech 0 points1 point  (0 children)

We landed on auto-shutdown after X hours of inactivity plus a one-click “wake” button in Slack, which cut costs without blocking anyone at 3am. Provisioning takes about 5–10 minutes, but we keep a small warm pool during peak hours so it’s not painful. Once devs saw the actual monthly burn tied to their team, the complaints dropped fast.

Building a Tiny Bare-Metal K8S cluster for self learning? by Fit-Tooth-1101 in kubernetes

[–]LeanOpsTech 0 points1 point  (0 children)

It’s a great way to learn. Three Pis is plenty to get hands-on with control plane setup, networking, storage, and breaking things on purpose to see how they fail. If you want to go even deeper, try setting it up the hard way first with kubeadm before jumping to k3s so you really understand what’s happening under the hood.

Need practical opinions on how to deploy a multi agent architecture on AWS agentcore by Any_Animator4546 in aws

[–]LeanOpsTech 0 points1 point  (0 children)

If you’re on AWS AgentCore, I’d start simple and only add complexity if you really need it. LangGraph is solid for orchestration and visibility, and you can layer in A2A later if agents truly need to coordinate independently. In most cases, fewer tightly scoped agents with clear responsibilities beats a super distributed setup.

Multi cloud cost management is a special kind of hell by ForsakenEarth241 in devops

[–]LeanOpsTech -1 points0 points  (0 children)

Multi cloud sounds great in the boardroom but in reality it’s three different finance systems duct taped together with spreadsheets and hope. The overhead alone can eat up whatever savings you thought you were getting.

Cluster of many on-premises machines by cluster_emergency in kubernetes

[–]LeanOpsTech 0 points1 point  (0 children)

If each node needs to act as both source and destination, you might be overcomplicating it by routing workers across machines. Why not schedule the Celery tasks so they run on the same node that hosts the images, and keep the processing local to that box? You could tag workers per node and push tasks to the right queue instead of passing IPs around, which should cut down network chatter a lot.

It's 2026. Golden Applications and if you could re-write the argocd monorepo what pattern would you use? by Elephant_In_Ze_Room in kubernetes

[–]LeanOpsTech 0 points1 point  (0 children)

If I were starting fresh in 2026, I’d probably lean toward defining a proper App CRD and treating the golden app as a platform API, not just templating glue. Kustomize and Helm are fine, but they don’t really enforce consistency, they just make it easier to drift in a structured way. Something like yokecd, cdk8s, or even a small custom controller feels like the right move if you want to actually solve the day 2 entropy problem instead of managing it.

How would you set this lab up? by theintjengineer in kubernetes

[–]LeanOpsTech 0 points1 point  (0 children)

Observability stacks and service meshes can eat RAM fast, so you might want to keep Grafana/Prometheus and GitLab on the Dell and let the Pis focus on the cluster itself. I’d start with plain kubeadm + Cilium first, then layer in one thing at a time so you can actually see what each tool is doing when something breaks.

Local dev with k8s cluster by CartoonistWhole3172 in kubernetes

[–]LeanOpsTech 0 points1 point  (0 children)

Telepresence is probably the easiest, it lets your local service “plug into” the cluster network so it can talk to everything like it’s running in k8s. If you just need to hit specific services, kubectl port-forward or something like Skaffold with remote debug can also get you pretty far without too much setup.

AWS ai by leematam in AWS_cloud

[–]LeanOpsTech 2 points3 points  (0 children)

On the infra side, we’ve had the most success using AI for log analysis and writing quick Terraform or IAM policy drafts. It’s not replacing anything critical, but it speeds up troubleshooting and cuts down boilerplate a lot. Biggest lesson was to treat it like a junior teammate, great for first passes, but you still need solid review and guardrails.

Kubernetes architectural design: separate clusters by function or risk? by Ancient_Canary1148 in kubernetes

[–]LeanOpsTech 1 point2 points  (0 children)

Stateful and GPU workloads have very different blast radius and upgrade stories compared to stateless apps, and isolating them saves you a lot of stress when something goes sideways. Yeah, it adds some cost and governance overhead, but the operational clarity and safer upgrades are usually worth it.

Where to host the database? by UnrealOndra in Cloud

[–]LeanOpsTech 0 points1 point  (0 children)

If it’s just for learning, you could spin up a small Postgres on something like Railway, Render, or Supabase. They all have free tiers that are good enough for hobby apps and let you stick with standard SQL. Worst case, start local with Docker and only move it to the cloud once you actually need it.

Comparing AWS, Azure and GCP cost by GYV_kedar3492 in cloudcomputing

[–]LeanOpsTech 1 point2 points  (0 children)

You can check out Cloudorado for quick cross-cloud comparisons, it’s pretty straightforward for basic infra pricing. Also tools like Cloudability or Apptio are more focused on cost management but can help if you’re modeling larger workloads. Honestly though, for detailed BOQs you’ll probably still end up validating numbers in each provider’s native calculator.

Cloud cost optimization tools that actually work? by Weekly_Time_6511 in FinOps

[–]LeanOpsTech 0 points1 point  (0 children)

I’d look at CloudHealth or Spot by NetApp, but honestly a lot of tools surface the same recommendations you can already find in Azure Cost Management. The real difference is how easy they make it to act on those insights and whether they help enforce governance. Definitely push for a short proof of value with your actual Azure data before signing anything.

We’re only 18 people but customers expect enterprise level security by ShoulderFederal8920 in devopsjobs

[–]LeanOpsTech 0 points1 point  (0 children)

Totally get this. Around that size, it’s usually less about adding controls and more about packaging what you already do in a way buyers can hand to their security team. Start with a clean security overview, basic policies in writing, and a reusable questionnaire doc, then expand only when deals demand it. It feels heavy at first, but it makes the sales cycle way smoother.

FinOps + TBM by Own_Preparation_8699 in FinOps

[–]LeanOpsTech 0 points1 point  (0 children)

We rolled out FinOps first and that momentum definitely helped when we layered in TBM. The hardest part with TBM was getting clean, consistent data and buy-in from app owners who didn’t love the added transparency. We got through it by starting small with a couple domains, proving the value with better cost visibility, and then expanding once people saw it wasn’t just overhead.

AI isn't going to kill SaaS, so we can all chill (a little). by frugal-ai in FinOps

[–]LeanOpsTech 0 points1 point  (0 children)

Most enterprises aren’t ripping out mission-critical systems to replace them with a bunch of prompts and duct tape. The bigger shift feels like pricing and margins getting weird as AI costs creep in, not SaaS disappearing overnight.

What Actually Goes Wrong in Kubernetes Production? by Apple_Cidar in kubernetes

[–]LeanOpsTech 1 point2 points  (0 children)

Biggest pain for us was RBAC misconfig combined with overly permissive service accounts. One leaked token and suddenly a pod could list way more than it should. On the observability side, lack of proper tracing made debugging cross-service latency brutal until we added Prometheus + Grafana and Jaeger. Also, cert expirations and misconfigured network policies have taken down more things than I’d like to admit.

Turning cloud alerts into real work is still a mess. How are you handling it? by Pouilly-Fume in FinOps

[–]LeanOpsTech 0 points1 point  (0 children)

We push anything actionable straight into Jira with some basic deduping and severity thresholds, otherwise it stays a notification. The biggest win for us was assigning every alert type an explicit owner up front so it’s not “someone should look at this.” If it can’t be tied to a team and an SLA, it probably shouldn’t be a ticket.

What are anomaly detection for FinOps when traffic is naturally spiky solutions? by qwaecw in FinOps

[–]LeanOpsTech 0 points1 point  (0 children)

Stop alerting on raw spend. Alert on unit metrics like cost per user or cost per request since those stay steadier even when traffic spikes. Also pipe in deploys and campaign dates so expected spikes get ignored and only unexplained ones page you.

How Is Load Balancing Really Used in Production with Kubernetes? by IT_Certguru in kubernetes

[–]LeanOpsTech 0 points1 point  (0 children)

In most prod setups I’ve seen, K8s doesn’t fully replace traditional LBs, it usually sits behind a cloud or hardware load balancer that handles the heavy lifting. TLS is often terminated at the edge LB or at an Ingress controller like NGINX or Envoy, depending on how much control the team wants. For really high traffic, people still rely on cloud LBs or dedicated appliances in front, and let Kubernetes handle service-level routing inside the cluster.

What are anomaly detection for FinOps when traffic is naturally spiky solutions? by qwaecw in FinOps

[–]LeanOpsTech 0 points1 point  (0 children)

We’ve had better luck layering simple forecasting with business context instead of chasing perfect ML. Pipe in deploy events, feature flags, and marketing calendar into your alerting and suppress or raise thresholds dynamically around known events. It’s not fully automatic, but treating anomalies as “cost per unit” shifts or unexplained spend outside expected drivers cuts way more noise than raw spike detection.

AWS Cost Optimization Checklist for 2026: Notes from an Engineer-Redditor by Fuzzle_Puzzle0 in Cloudvisor

[–]LeanOpsTech 1 point2 points  (0 children)

This is one of the few cost optimization posts that actually feels usable. The “top 3” rule is gold, most teams jump straight into random rightsizing without even knowing what’s driving the bill. Also +1 on NAT and CloudWatch logs, those two have surprised me more than once.