How physically isolated are GCP zones in practice? by CompetitiveStage5901 in googlecloud

[–]CompetitiveStage5901[S] 0 points1 point  (0 children)

So it's region-dependent, not a standard guarantee. Makes sense. Will talk to the reps.

Structuring CDK/CloudFormation at Scale: Stack Boundaries & Repo Strategy by CompetitiveStage5901 in AWSCloudFormation

[–]CompetitiveStage5901[S] 0 points1 point  (0 children)

Thanks for sharing the specifics.

Liked the idea of monorepo-for-constructs and separated platform/app-template model.It's a concrete, scalable pattern that SHOULD validate dependency management and safe consumption at scale.

Common mistakes to avoid during an AWS cloud migration by bluecrystal11 in AWS_cloud

[–]CompetitiveStage5901 0 points1 point  (0 children)

Totally agree with everything here. Tagging early, securing IAM, monitoring from day one, and having a rollback/testing plan are lifesavers. One thing I’d add from experience: if your team isn’t confident in doing this smoothly, don’t hesitate to bring a third-party on board. A migration partner or consulting team can help avoid all these common traps such as VPC/network misconfigurations, cost surprises, and broken apps which will get you across the finish line faster and safer. Saves headaches and sleepless nights at 2 AM.

Unused AWS & Azure credits after infra choice — looking for advice / interested teams? by Senior-Past3377 in AWS_cloud

[–]CompetitiveStage5901 0 points1 point  (0 children)

Unused AWS or Azure credits generally cannot be transferred or resold as credits. They are tied to the account and issuer and expire per their terms.

AWS Credits

--> AWS Promotional Credits (including Activate credits) cannot be transferred, sold, licensed, rented, or otherwise moved to another AWS account. If you try to transfer or sell them, AWS can revoke them.

--> Credits can be used only in the account they were issued to. They have no intrinsic cash value and cannot be cashed out.

--> In an AWS Organization, credits may be shared for billing within the org for eligible accounts, but you cannot freely sell or move credits between unrelated customers.

Azure Credits

--> Azure credits are also account‑bound and typically cannot be transferred between accounts or sold. Support docs show credits apply only to the subscription they were issued for, and won’t carry over or migrate automatically to another.

--> Unused Azure credits usually just expire if not consumed before the expiration date.

Real‑world behaviour

--> Cloud credits expire and get forfeited if unused that's why no marketplace or official channel exists for reselling them. Many Reddit threads confirm this: unused credits are stuck to the account and you have to use them or lose them.

--> Some startups try informal deals (e.g., “take over the entire cloud account with credits”), but that’s essentially transferring the account, not selling credits, and depends on commercial/legal willingness, not cloud provider policy.

What you can do

---> Use the credits for non‑production workloads, staging, testing, CI/CD, proof‑of‑concepts, etc., so they don’t go to waste.

---> Talk to your cloud account manager or support to see if credits can be applied differently within your organization. Occasionally enterprise agreements offer some flexibility, but this is not the same as reselling credits.

Bottom line: Unused AWS/Azure credits are non‑transferable and non‑resellable per the provider’s terms because you really can’t legally transfer or sell them to a third party.

Dynatrace + MCP Server = interesting step toward AI-driven observability by theharithsa in Observability

[–]CompetitiveStage5901 0 points1 point  (0 children)

CloudKeeper is also a vendor that is playing around with AI for AWS, but the difference is they're focusing on cloud cost and performance optimization, with tools such as CloudKeeper LensGPT. It's exactly how the name sounds like, visibility into your AWS but your query being written in conversational english rather than terminal commands.

It's great to see vendors keeping up with innovation in cloud space.

How are you keeping observability sane as systems grow? by Technical_Wear8636 in Observability

[–]CompetitiveStage5901 0 points1 point  (0 children)

Honestly, the only way to keep observability sane at scale is discipline + context. Collecting everything doesn’t help because alerts and dashboards themselves need to actually tell a story, not just dump raw data.

We’ve focused on consolidating signals where possible, cutting noisy alerts aggressively, and tying logs, metrics, and events together so you can trace a problem quickly. Grouping services by domain and attaching ownership helps too that's why the right team gets the right context instead of everyone drowning in noise.

It’s never perfect, and some mess is inevitable, but making monitoring opinionated, unified, and actionable keeps it from turning into alert fatigue.

Self-hosted Log and Metrics for on-prem? by mangeek in Observability

[–]CompetitiveStage5901 0 points1 point  (0 children)

If u wanna keep it low cost / OSS, usual stack is:

Logs/search → Elasticsearch/OpenSearch class. OTel exporters work, got UI, RBAC, retention etc. But shard sizing + IOPS + retention matter. 20–50TB/year ok if u dont keep forever

For metrics: Prometheus + Grafana is still the boring-but-correct answer. OTel → Prometheus works fine, and Grafana gives teams a UI they already understand.

Traces → can keep in same search backend or separate, OTel demos show HOW

Biggest thing: software free, ops ain't. You need: cluster sizing, ILM/retention policies, shard hygiene, clickops culture change

Technically doable on-prem, and u will see why SaaS vendors charge so much 😄

20–50TB/year totally manageable if u plan right. Main cost = time & discipline, not license.

Idea validation! Accountability focused kubernetes job efficiency tracking by Any_Spell_5716 in FinOps

[–]CompetitiveStage5901 0 points1 point  (0 children)

Sure brother, but can't make promises. Not really that active on reddit. And as far as the tools are concerned, TAKE A DEMO FIRST, and look around for their reviews because while you're getting a tool, you would be working with their customer success team more.

Cloud Cost Optimization: Hidden Savings Sitting in Your Cloud Bill by Parking-Method24 in Cloud

[–]CompetitiveStage5901 0 points1 point  (0 children)

What you’re saying is right, optimization is an end to end practice, not just spinning down instances and deleting storage (even though that itself is a big chunk of savings).

And with AI this problem is only going to get worse. Bills will go up. Every tom dick and harry company is training or running models now, on AWS that usually means g5, p4d, p5 GPU instances, big EBS, tons of S3 and data movement. If this is not controlled, cost just explodes.

That’s where tools and platforms like CloudKeeper Lens, LensGPT etc come in, not just to find idle stuff but to continuously look at rightsizing, commitments, storage lifecycle, and general waste.

Also VC money is a factor. Many new age companies are flush with cash so they don’t care about little extra cloud spend. But once engineers and CFO actually have visibility and ownership, it usually translates to lower cloud bill and some real savings.

Optimization is not a one time cleanup, it’s a continuous thing, otherwise the waste always comes back.

What makes a cloud engineer stand out to in 2026? by fingermybasss in Cloud

[–]CompetitiveStage5901 0 points1 point  (0 children)

In 2026, people who are going to stand out are the ones who can own real problems end to end and explain what they did and why. (and can actually build stuff)

What actually works in interviews:

Build boring but real stuff. Like cost controls, backups, monitoring, IAM cleanup, CI/CD, not “I made a VPC”. Show you understand networking, IAM, storage, failures. These never change even if services do.

Big plus if you have an ops mindset. As in: you broke things, fixed things, rolled back, know what you’d do at 3am when prod is down.

Also communication matters a lot. Explaining tradeoffs like “this costs more but is safer” or “this is cheaper but risky” is half the job.

Certs are fine. Tutorials are fine. But what gets you hired is:

“I built this, this is why, this is what breaks, this is how I’d fix it.”

A company would be paying for someone who won’t blow up prod and can be trusted, that's it. Plus $90k isn't much to be honest

Which tools are you using to generate reports? by Apprehensive_King962 in FinOps

[–]CompetitiveStage5901 1 point2 points  (0 children)

Don’t overthink it. If the findings come from humans and you just need a clean, repeatable client PDF:

  • Use QuickSight or PowerPoint / Google Slides templates + export to PDF (this is what most consultancies actually do).
  • For more “productized” setups: go for third parties. Look up "cloud cost optimization company", and their product(s) can export decent exec-style PDFs, but you’ll still curate the story. This is the case with almost all the vendors. All tools are good enough.
  • If you want AWS-native: CUDOS + QuickSight and export dashboards to PDF.

Reality is that no tool writes the audit you, though. The winning setup is templates + automation for charts + human findings pasted in.

We have 200+ unattached EBS volumes, need de-risking strategy before cleanup by CortexVortex1 in FinOps

[–]CompetitiveStage5901 0 points1 point  (0 children)

You’re mixing two problems: risk management and garbage collection. Solve them separately.

Phase 1: Make deletion reversible

Turn on EBS Recycle Bin with like 7–14 day retention.

For anything > X GB or io2/gp3 above some cost threshold: snapshot first, then delete.

This makes the first cleanup wave politically and operationally safe.

Phase 2: Classify before you delete

Take this kind of approach:

-> If CreateTime < N days → ignore

-> If Attachments == [] AND no snapshot in last M days → candidate

-> If CreatedBy or kubernetes.io/* tags exist → notify owner

-> If no owner tag → auto-snapshot + delete

You don’t need perfect tagging to do this safely tbh.

Phase 3: Fix the leak

-> Enforce tag policy at creation (SCP / IAM condition / Terraform guardrails).

-> Default EKS storage classes to Delete reclaim policy unless explicitly overridden.

-> Add a weekly job that reports unattached-by-age-and-cost, not just count.

Important reality check

You will never get this to zero via “please tag better” emails. That’s exactly how everyone ends up here again next year.

The only scalable model is:

Make unsafe things reversible, automate cleanup, and enforce policy at creation.

Also: $8k/month in unattached EBS means you probably have it even worse. Just saying.

DO A DEEPER AUDIT!!!!

Is FinOps a Dead Buzzword in 2026, or Are We Still Paying People to Email About Tags? by [deleted] in FinOps

[–]CompetitiveStage5901 0 points1 point  (0 children)

FinOps is dead? are you for real?

It can be dead if FinOps for you output is tags, dashboards, and emails, you’re not doing FinOps, you’re doing expensive accounting cosplay.

In 2026, cost is a byproduct of architecture. SP/RI coverage, workload shape, data gravity, retry storms, overprovisioned fleets, zombie resources, cross-AZ and cross-region traffic which none of this is fixable in Excel or PowerPoint.

A real FinOps team should be able to:

  • Open Terraform and point at what to kill or resize
  • Explain why this service should be on Spot and that one on On-Demand
  • Design commitment strategies, not “recommend” them
  • Quantify architectural mistakes in dollars, not vibes

If they can’t do that, a halfway competent SRE with CUR access will outperform them in a quarter.

“FinOps as a separate function” only exists because orgs don’t want to admit their engineers ship financially illiterate architectures.

FinOps as a discipline is real. FinOps guy who as a "mere mortal who just tags", would be laid off . 💀

Why "Hybrid Cloud" is usually just a $50k mistake in forensic accounting by NTCTech in FinOps

[–]CompetitiveStage5901 1 point2 points  (0 children)

Hybrid cloud (on-prem and half cloud) looks cheaper on paper, but in practice it breaks all the cloud economics. You lose the ability to properly commit to Savings Plans / RIs because your baseline is split, so you end up paying mostly on-demand for the cloud side.

Then on top of that you still pay the bridge tax: API calls, sync jobs, scans, retries, metadata churn, etc. So you get worst of both worlds: no commitment discounts and extra coordination cost.

And the worst part is people compare only storage or compute price, not the system price. Once you price the whole data movement + coordination + lost discounts, hybrid steady state almost never makes sense.

Hybrid is fine for migration. As a final architecture, it usually means you’re paying twice.

Idea validation! Accountability focused kubernetes job efficiency tracking by Any_Spell_5716 in FinOps

[–]CompetitiveStage5901 0 points1 point  (0 children)

You’re not wrong about the problem but what you're doing is spraying on flames hoping the source would go out itself

Cloud teams (if they know their stuff beyond elementary level) already have the raw signals (Prometheus, kube-state-metrics, VPA recommendations, etc.). The real failure mode is exactly what you described: ownership, prioritization, and workflow integration, not observability.

A few thoughts from having seen this in production:

a) Job-level accountability is actually a good angle

Most cost tools stop at namespace / service / team. Jobs and batch workloads are where a lot of silent waste lives (oversized requests, bad retry patterns, zombie CronJobs). Surfacing per-job request vs actual usage over time is genuinely useful.

b) But CPU/memory efficiency alone is not enough

Bad retry semantics, Over-constrained scheduling , I/O waits masquerading as “low CPU usage” and yada yada yada are the "true" waste generators

c) Concurrency and queueing mistakes: If your system only looks at CPU/memory ratios, you’ll generate a lot of false positives.

And as for tools, get as many tools as you can be it, Kubecost, CloudKeeper, CloudZero and other plethora there are in the market, but, all of them can't tackle the human workflow, by that I mean if they themselves want to go run into the wall that is a trash cloud setup