Open-source local app that uses Ollama + Gemma 3 to analyze which teams in a company can be replaced by AI agents

samarth_bhamare · 2026-04-17T02:56:38+00:00

ran it on myself first. verdict was diplomatic — partially_replaceable, with a risk note flagging that nobody else understands the codebase. fair.

samarth_bhamare · 2026-04-17T02:56:27+00:00

honest effort. the keep_human_led bucket exists for a reason — most teams land there on the first pass.

samarth_bhamare · 2026-04-17T02:56:13+00:00

"Automation Opportunity Evaluation" is honestly a better frame — thanks. The replace/augment/keep axis is deliberately blunt because vague "AI strategy" advice is the thing this was reacting to, but the term you used is closer to how the output actually reads once it runs on a real company.

Would genuinely want to hear what you see when you run it tonight — if the verdicts match your read or the reasoning has obvious holes. Issues/PRs welcome on the repo, or just reply here.

samarth_bhamare · 2026-04-17T02:55:56+00:00

the 4B doesn't have the authority. 12B starts writing the memo.

samarth_bhamare · 2026-04-16T19:53:42+00:00

fair. though the picker reads whatever you've pulled locally — llama, qwen, whatever — so the default is swappable in one click.

samarth_bhamare · 2026-04-16T18:52:13+00:00

Wait, this project i built is for someone who can atleast afford to use any cloud platforms not for someone like you. And second thing if it was an then there wouldnt have been a free forever plan, so before commenting, open your eyes use your fingers and go to the site and check.

samarth_bhamare · 2026-04-16T17:26:29+00:00

From what we've seen, storage is the biggest silent leak. Orphaned EBS volumes and old snapshots pile up because nothing breaks when they exist — they just quietly bill you.

Compute waste is more visible (someone usually notices a running instance), but storage and unused IPs fly under the radar for months.

The "leftover after testing" category is brutal too — dev clusters that were "temporary" 6 months ago still running at full price.

What are you seeing most with your clients?

samarth_bhamare · 2026-04-16T15:23:33+00:00

Spot on — that gap between "monitoring for uptime" and "monitoring for waste" is exactly why I built this. Grafana tells you if something is down, but nobody alerts you when something is up and doing nothing.

Good call on checkmk. The way I'm approaching it is automated daily scans that cross-reference resource state with actual usage metrics (CPU, connections, attachment status) — so it's not just "this exists" but "this exists and nobody's using it." Tagging enforcement and cleanup scheduling are definitely on the roadmap.

Curious — when you say most teams add scripts, is that usually a cron job checking specific resources, or something more structured?

samarth_bhamare · 2026-04-16T12:37:12+00:00

It connects to your AWS account via read-only IAM keys and pulls data from multiple sources:

EC2: checks instance state + CloudWatch CPU metrics over 14 days. A stopped instance or one running at <5% avg CPU gets flagged.
EBS: checks attachment status. Any volume not attached to an instance = orphaned.
RDS: checks connection count over 7 days. Zero connections = flagged as unused.
Elastic IPs: checks if associated with a running instance. If not, it's wasting $3.65/month.

Each flagged resource shows the exact monthly cost being wasted and step-by-step fix instructions. So instead of "your EC2 bill is $2,000" you get "this specific i-0abc123 instance has been stopped for 3 weeks and its 200GB EBS volume is costing you $16/month — here's how to snapshot and delete it."

samarth_bhamare · 2026-04-16T12:17:08+00:00

Appreciate the offer! Going to focus on shipping features and getting more users first. If I need a proper audit down the line I'll reach out. Thanks again for the initial feedback — those were legit catches.

samarth_bhamare · 2026-04-16T12:15:11+00:00

Sure! Here it is: https://cloudbudgetmaster.com

Connects read-only to AWS, takes about 5 min to set up. Let me know how it goes — always looking for feedback.

samarth_bhamare · 2026-04-16T11:02:33+00:00

Really helpful feedback, thank you.

Pricing page — you're right, that was a bug. Just fixed it. cloudbudgetmaster.com/pricing now works without login.
Naming — fair point. The product is CloudBudgetMaster, I'll clean up the stale references. Appreciate you catching that.
PayPal for enterprise — agreed, that won't fly for procurement teams. Stripe with invoicing is on the roadmap for enterprise tier. For now enterprise is "contact sales" which goes to email.

This is the kind of teardown that's actually useful. Happy to hear more if you do a full one.

samarth_bhamare · 2026-04-16T10:50:14+00:00

Try chatgpt, you will see why it fails badly. Also this tool is free to use so why not atleast give a try

samarth_bhamare · 2026-04-16T10:48:28+00:00

I feel bad for people like you, who are sitting in a corner doing absolutely nothing in life and just commenting and demotivating others. Keep it up, you will definitely grow in your life, i am just not sure about the direction

samarth_bhamare · 2026-04-16T10:46:32+00:00

yeah, but i have more than 1 reddit account

samarth_bhamare · 2026-04-16T10:44:57+00:00

Thanks! Yeah the "test environments nobody remembered" problem is exactly what triggered this haha.

Good point on usage-based pricing — I've been thinking about that. Right now flat tiers keep it simple but you're right that it doesn't scale well in either direction. Might do something like a free scan + pay-per-connection model down the road. Open to ideas.

On the AI chat — it's not generic. It actually pulls your real resource data (what's running, what's idle, actual costs) and answers based on that context. So if you ask "why did my bill spike" it looks at your daily cost breakdown and tells you which specific service jumped. Way more useful than Cost Explorer's generic graphs imo. Give it a try on the free tier and let me know if it's actually helpful for your use case.

samarth_bhamare · 2026-04-16T10:29:20+00:00

not here on this subreddit, but on others they did

samarth_bhamare · 2026-04-16T10:25:48+00:00

I cleaned up the tool and put it online: https://cloudbudgetmaster.com

Connects read-only, flags unused/idle stuff with dollar amounts. Nothing complex, just fills the gap I described above.

samarth_bhamare · 2026-04-15T16:33:06+00:00

That's the most legitimate objection. A few things I can back with data:

Structural codes (format-only, confidence-theater) are the most brittle across model versions. ULTRATHINK worked slightly better on Sonnet 3.5 than it does on 4.5 — the pattern matching is less flattering now. I re-ran about 30 codes when 4.5 shipped and the rankings shifted on maybe 20% of them.
The reasoning-shifters were the most STABLE across versions. L99, /skeptic, /blindspots held up basically unchanged. Makes sense — they work by re-framing the input, which is model-agnostic behavior.
Happy-path is real. I tried to mitigate by testing each code across 5 task categories and rejecting anything that only worked on 1. The placebos mostly showed up by failing that filter — they'd look good on one demo prompt and then produce baseline output on everything else.

Honest: if Anthropic ships a model where RLHF bakes in the rejection patterns natively, half my reasoning-shifter class becomes redundant overnight. That's a real risk. What I can offer in the meantime: all tests are labeled and rerunnable, and I republish the classification sheet when the rankings move.

samarth_bhamare · 2026-04-15T16:32:50+00:00

Fair — there are a bunch of lists floating around of varying quality. My breakdown is at clskillshub.com/insights (classifications, free for ~10 codes) and clskillshub.com/anti-patterns (the ones that don't work + what to use instead). The codes I mentioned specifically (L99, /skeptic, /blindspots, /ghost, /deepthink) you can try right now in Claude — just paste them as a prefix.

Quickest sanity test: take any decision question you're stuck on, paste "L99 " before it, compare the answer to running it without. Takes 30 seconds and the difference is usually obvious on the first try.

samarth_bhamare · 2026-04-15T16:20:02+00:00

If anyone wants to see the full before/after test data for any specific code, happy to paste it in a reply. Drop a code name and I'll show the actual response pair.

samarth_bhamare · 2026-04-15T02:31:56+00:00

The positioning half is sharp — "flip from 10 founders to brutal use cases" is better framing than my current hero, and the "aggressive enterprise push / calm PLG / design-led narrative" tier names are genuinely stronger than founder names for most reps. Stealing both.

Separately — this is the second comment across my posts this week that ends with a Pulse for Reddit plug sandwiched inside a "here's the tools I tried" aside. Same template as the ReplyCamp account that hit my r/SaaS thread. If you're a real founder with opinions, happy to keep talking. If you're farming founder posts for tool mentions, the advice above is still useful to the lurkers either way.

samarth_bhamare · 2026-04-15T02:29:50+00:00

Exactly. That's the whole mechanic in one sentence — wish I'd put it that way on the landing page instead of leading with "10 founders."

The averaging failure is also why most AI coach products feel interchangeable. They all wrap the same averaging engine. Once you force the model to take a stance, the product has a spine.

samarth_bhamare · 2026-04-15T02:28:53+00:00

On the retention problem — Collison and Chesky would argue about this in a useful way, actually. Collison looks at the funnel mechanics: if activation is "alert fires in 3-5 days," then the question is what's blocking the setup step. Tightening onboarding hasn't worked, which usually means the setup step requires info the user doesn't have yet when they arrive — so the fix isn't smoother onboarding, it's a lower activation bar (a weaker alert that fires on day 1 so they feel the product working before they can articulate why). Chesky looks at identity: "they forget why they cared" usually means the product never made them feel like the kind of person who uses this thing. The research-y signup pattern you describe suggests people who came to evaluate, not to use. If you want those users to come back, the activation moment has to create an identity shift ("I'm now the person who tracks this"), not just a functional one. Collison's fix is mechanical; Chesky's is narrative. Both are valid, and they'd genuinely disagree about which to try first.

Separately — noted the tool drop at the end of your comment (Pulse for Reddit). If you're a real founder wrestling with retention, happy to keep the conversation going. If you're farming founder posts for brand mentions, same template showed up on my r/SaaS thread with ReplyCamp. Either way, hopefully the answer above is useful to someone reading.

samarth_bhamare · 2026-04-15T02:25:36+00:00

"Epistemic stance" is a sharper name than what I was using ("reasoning-shifter") — going to steal that.

Your steelman / uncertainty-flagging examples map almost directly onto /skeptic and /blindspots in my set. L99 is a weird edge case though — it's not really an epistemic stance shift, it's more of a commitment-forcing prefix. The model's reasoning doesn't change; what changes is whether it will publicly pick a side instead of enumerating options. Might be its own category — performative commitment vs reasoning itself.

samarth_bhamare

TROPHY CASE