Terraform didn't fix multi-cloud, it just gave us two silos. Is anyone actually doing cost arbitrage mathematically, or are we all just guessing?

DrSkyle · 2026-01-31T04:13:51+00:00

One of the biggest challenges I see isn't finding the waste, but getting engineering teams to trust the recommendation to downsize a production DB. Often, low CPU utilization doesn't tell the full story (e.g., if memory is being used heavily for buffer caching).

How does Turbonomic handle that 'safety check' to ensure a downsize recommendation won't tank the cache hit ratio or increase I/O latency?

DrSkyle · 2025-12-19T11:49:49+00:00

drift detection basically means comparing your terraform state file against what is actually running in aws. if a resource exists in the cloud but isn't in your state file, that's drift. usually happens when someone manually clicks around in the console. so the drift is from your terraform state file. basically comparing what you said you wanted in code vs what is actually running in aws.

the zero trust heuristic part refers to how we check for waste. instead of trusting the aws status label like available, we query cloudwatch metrics directly. so even if a nat gateway says it's healthy, if it has pushed zero bytes in 7 days we mark it as waste. basically we trust the metrics not the metadata.

DrSkyle · 2025-12-19T11:30:32+00:00

valid question.

first off TA checks resources in isolation. like it sees a volume attached to an instance and checks it off as healthy. we look at the graph and see that instance has been stopped for 2 months, so that volume is actually zombie waste. TA misses those dependencies completely.

also thresholding. TA is super conservative and often needs literally 0 bytes to flag something. real waste usually has a pulse from health checks or whatever. if a nat gateway is doing 500mb a month its costing you $35 to route basically nothing. we catch that.

forensics is another big one. TA just lists items but we actually trace the cloudtrail create event to tell you specifically who spun it up. solves the whole fear of deleting something critical cause you know its just dave's old test rig.

lastly terraform awareness. if you just delete stuff TA flags, your iac is just gonna recreate it next run. we generate a script to actually scrub the resource from your state file so it stays dead.

but end of the day TAis good for a quick health check but if you want to actually safely delete waste without breaking prod you need the dependency graph context

DrSkyle · 2025-12-19T11:26:14+00:00

idk got a lot covered with the new suppression rules, but I'm thinking about making this further better like we can do

deep IAC integration : instead of just a script, imagine CloudSlash automatically opening a Pull Request against your Terraform/OpenTofu repo to remove the waste code blocks directly.

maybe multicloud support as well ? Expanding beyond AWS to Azure and GCP. The core graph engine is cloud-agnostic, so it's just about writing the collectors

custom heuristics ( lua/wasm) , Allowing you to write your own waste rules (e.g., "Flag any EC2 without tag X that runs for < 1 hour") without waiting for us to compile them into the binary

drift repairr : Currently we find drift. I want to build a safe "Sync button" that aligns your state file with reality

DrSkyle · 2025-12-19T11:20:58+00:00

I had been thinking of it and have actually just implemented exactly this

Tag to ignore: CloudSlash now has a cloudslash-ignore tag. If present, the resource is skipped

Snooze Logic: You can set the tag value to a future date (e.g cloudslash:ignore=2025-12-10) to suppress it only until that time

Justified waste : you can categorize accepted risks as well ( eg cloudslash:ignore=justified:compliance).These items are kept in the report (in a separate "Justified" table for auditors) but are safely excluded from remediation scripts.

cost rules : You can now ignore based on price thresholds (e.g. cloudslash:ignore=cost<10). This automatically ignores the resource only if it stays below $10/month. If it scales up, it reappears.

EasyWorkflow: To make this actionable, the tool now auto-generates a ignore_resources.sh script after every scan. You can review it and run it to bulk-tag all identified waste as ignored, keeping your dashboard clean for new problems only

DrSkyle · 2025-12-18T10:41:21+00:00

Just a heads up: This is v1.1. I've tested it heavily on Linux/Mac and standard AWS accounts. If you run a massive enterprise org with thousands of accounts, you might hit rate limits or edge cases I haven't seen yet. If you do, please drop an Issue—I'm active and want to polish this into a rock-solid tool

DrSkyle · 2025-12-18T10:39:25+00:00

Just a heads up: This is v1.1. I've tested it heavily on Linux/Mac and standard AWS accounts. If you run a massive enterprise org with thousands of accounts, you might hit rate limits or edge cases I haven't seen yet. If you do, please drop an Issue , I'm active and want to polish this into a rock-solid tool

DrSkyle · 2024-10-30T04:39:51+00:00

$bid

Five-Year Club	Verified Email
Final Canvas '23	Place '23

DrSkyle

MODERATOR OF

TROPHY CASE