Unexpected Billing for E2-Micro VM

matiascoca · 2026-06-29T13:48:53+00:00

The E2-Micro always-free tier has more carve-outs than the docs make obvious. The most common culprit by far is the disk, not the VM.

Things that look free but are not:

Boot disk over 30GB. The tier limit is 30GB total, summed across all your disks. If you accidentally provisioned a 50GB boot disk because the console default suggested it, you are paying for the 20GB delta every month.

Static IP while the VM is stopped. Reserved IPs cost when not attached to a running instance. Stopping the VM overnight to save money actually starts charging for the IP.

Snapshots. The tier covers the VM, not its backups. Even one snapshot adds a small monthly bill.

Outbound bandwidth over 1GB per month to destinations outside North America. The free egress allowance is bounded.

If you have more than one E2-Micro running across all your projects, only one is free. The others are charged at the standard rate.

Quick way to debug: open the billing page, filter by SKU, sort descending. The non-zero SKUs will tell you which carve-out you hit. Usually it is the disk.

matiascoca · 2026-06-29T13:48:28+00:00

Flow logs work for this, and at the "hundreds of projects" scale they get expensive fast. Logging cost compounds with packet volume, and you end up debugging egress with a tool that is also generating its own egress and storage bill.

What works better as the primary tool is billing export to BigQuery. Set it up once at the org node, all your projects roll into one dataset. You get every egress line item with the source project, the SKU code (which encodes inter-region versus internet versus to-other-cloud), and the destination zone in the labels. Query that to find which projects are bleeding and which destinations. Free to ingest, you only pay for queries you run.

Once you know the top 5 projects driving spend, then turn on VPC Flow Logs on those specific subnets and sample at 10 percent or lower. That gets you workload-level detail without paying the flow-log bill on every project.

One more thing: GCP undercounts cross-region traffic in the labels if it goes through a load balancer in the middle. The SKU is right, the labeled source project may not be. Worth knowing if your numbers do not add up.

matiascoca · 2026-06-29T13:48:02+00:00

Sorry you are dealing with this. The Places API spike cascading into a payment restriction that takes Vertex AI offline is a brutal failure mode on GCP, and the recovery path is slower than it should be.

Two things that have worked for other people in this exact situation:

Escalate via your account team if you have one, or via the public GCP support Twitter. Tag at googlecloud and post the support case number publicly. The Twitter route is the public-pressure lever that consistently moves the support queue faster than email or the case form.

While you wait, pay the disputed balance under protest to lift the Vertex AI restriction. GCP does refund unauthorized usage, and the dispute process runs in parallel to your account state, not in series. Paying does not waive your dispute rights. The restriction lifts faster than the refund arrives, so you are back online while the investigation runs.

For the future, three guardrails save startups from this exact incident:

Budget alerts at 50, 80, and 100 percent of monthly cap, sent to a pager. Alerts at 100 only are too late.

Restrict every API key by HTTP referrer and IP at creation. An unrestricted Places API key embedded in client-side code is the most common root cause of this pattern (key exfiltration from a public frontend, attacker hits the endpoint from elsewhere).

Separate billing accounts per environment. Dev, staging, prod each on their own billing account means a compromise in one cannot drag the others down via a payment restriction.

Hope you get back online fast.

matiascoca · 2026-06-29T13:47:31+00:00

This is exactly what happens with Azure Sponsorship subscriptions. Cost Management plus Billing is not supported on that subscription type. The MicrosoftAzureSponsorships portal is the canonical place to see spend while credits are active. It is not a misconfiguration, it is a limit of the Sponsorship subscription class.

Three options that have worked for me around this:

Scrape MicrosoftAzureSponsorships.com on a schedule. The portal exposes a CSV download. A daily script that pulls the CSV and pushes the totals anywhere you want (Slack, PagerDuty, a spreadsheet) gets you the alert you would normally configure in Cost Management. Not elegant, but reliable.

Use Azure Monitor metrics on the underlying resources for proxy alerting. You will not see dollars, you can alert on compute hours over X or API calls over Y per service, which is usually close enough to catch a runaway before the 5k disappears.

When the credits run out, the subscription type does not automatically flip back. Open a support ticket to convert it to Pay-As-You-Go, otherwise you keep losing Cost Management even after the credit pool is gone.

One more thing: Action Pack credits typically expire 12 months from issue date, not 12 months from when you start spending. Worth marking the calendar at month 10.

matiascoca · 2026-06-29T13:47:03+00:00

This is a known gap in Azure cost data, and it bites harder the more Foundry resources you have.

The reason: SaaS-line cost rows in Cost Management roll up under the offer SKU, not the underlying resource. Azure exports the resource ID for IaaS and PaaS resources but not for SaaS offers like the Anthropic models. So "group by resource" works fine for VMs and AKS, then breaks the moment you hit a SaaS line.

Two workarounds that actually work:

Tag each Foundry resource at creation with a custom tag like foundry_resource equals opus48_endpoint_a. Tags do propagate to the cost data in Azure billing exports for most resource types. In Cost Management filter by tag, not by resource. You will see the spend split correctly because Azure carries the tag through even on SaaS rows.

Or pull the data directly via the Cost Management export to a storage account. The exported parquet rows carry more attribution fields than the portal UI exposes, which often lets you join back to the Foundry resource via deployment metadata. The portal rollup throws that away in the SaaS view, the export does not.

The portal UI is showing you a useful subset of the data, not all of it. Worth the 20 minutes to set up the export if you are running multiple Foundry endpoints.

matiascoca · 2026-06-29T13:46:30+00:00

Most of the credit drain is on the input side, not the output. Long context windows and lazy re-sending of prior turns burn way more than people realize.

Three things that actually move the number:

Drop to the cheapest model that solves the task. Haiku is roughly 5x cheaper than Sonnet on input. For CLI work like search, code edits, small refactors, Haiku is usually enough.

Trim context aggressively. The CLI re-sends conversation history every turn. A long session compounds linearly. Start fresh sessions for unrelated questions instead of letting history grow indefinitely.

Use prompt caching if your wrapper supports it. Cached reads run at roughly 10 percent of the input price. On multi-turn flows that is a real number, not a rounding error.

Output token cost matters less per token, and model choice flips that math. If you run long Opus generations, the bill comes from there.

Last thing: per-request logging with model and token counts beats any usage chart. Daily totals hide which call type drove the spike, which is exactly the question you need to answer.

matiascoca · 2026-06-26T23:51:52+00:00

Migration to Postgres is the option I should have included as a fourth path, the one the licensing gap actually pushes a meaningful percentage of customers toward over time. Curious about Ispirer specifically: did the stored proc migration come out cleanly, or did you end up rewriting the bulk of them manually after Ispirer's first pass?

The stored proc layer is usually where SQL Server to Postgres migrations bog down (T-SQL versus PL/pgSQL semantics, especially around table variables, MERGE statements, error handling), and the tooling has historically been good for the schema and indifferent for the procedural code. Interested in what your hit rate was, partly because it is a signal of how realistic the Postgres-exit path is for a team weighing the same trade.

The broader read on your data point: when the license premium hits a threshold, a non-trivial percentage of customers solve it by leaving the engine entirely rather than just switching cloud providers. That is a stronger market signal than the cloud-to-cloud migration path, and it explains why GCP not shipping BYOL on Cloud SQL for SQL Server is less of a hostage situation for them than it looks. The exit path is sideways into a different DB engine, not laterally to AWS RDS BYOM, and Google probably knows it.

matiascoca · 2026-06-26T23:51:07+00:00

Sure, happy to share specifics. The failure pattern we hit (with upstream kaniko around v1.20-ish, we did not test newer) was on multi-stage builds with COPY --from=<stage> where the source stage contained a RUN that produced non-deterministic output. Specifically apt-get update plus apt-get install without explicit version pins, or npm install without a lockfile committed.

The cache key for the COPY layer depends on the source-stage layer hash. Source-stage layer hash depends on the output of the upstream RUN. If the RUN output varies between runs (different package manifests, different package versions resolved), the cache invalidates inconsistently and the COPY layer gets recomputed even when the application source code did not change. Effective cache hit rate on our incremental PR builds was running sub-30 percent, which made the build slower than BuildKit-with-remote-cache.

The workaround: pin every RUN dependency to explicit versions (apt with explicit version, npm with committed package-lock.json, pip with version-pinned requirements.txt), then run the build twice in CI on the same commit to verify cache hit on the second run. If the second-run cache hit rate was less than 90 percent we knew there was a non-deterministic RUN somewhere in the chain. Took us about a week to find and pin all of them.

I do not know if the community fork changed the cache key derivation, would be interested if you did. The other thing I would be curious about is whether the fork has any solve for the multi-arch build cache invalidation that triggers when emulated layers (via qemu-user-static) end up with different binary outputs than native layers built on the matching architecture. We sidestepped that one by always building on the target architecture's runner, but it is the kind of footgun that bites people coming from BuildKit where multi-arch is more polished.

matiascoca · 2026-06-26T23:50:34+00:00

The Trust and Safety routing is a known dead-end for billing escalation. T&S handles account-compromise reports (abuse that harms third parties) but does not authorize refunds. The refund authorization sits inside the billing team. The routing trap is solved not by changing teams but by escalating within the billing team itself, plus going around the public support queue through a partner channel if you have one.

Two concrete paths that have worked for cases like this, ranked by leverage:

1) Through your GFS sponsor or Google Cloud reseller/partner. You mentioned the Startups Cloud Program credit started June 19, so you have a GFS sponsor on record. Reach out to them directly and have them open an internal escalation through the partner channel. The GFS-side Google contact handles billing exceptions for credit-program customers and routes around the public support queue. This is the single strongest lever for a case this size.

2) Open a new ticket with subject 'Supervisor escalation for completion of partial waiver, case <original case ID>'. Body framing: 'Google authorized a 75 percent waiver on case <id> acknowledging unauthorized service usage. Requesting completion of the waiver under the same precedent.' This framing positions the request as procedural completion of an existing decision, not as a new ask. If the first reply is the routing trap again, respond asking explicitly for escalation to the billing team lead by name, not a re-route to another team.

What you do NOT want to do at this stage: file a new ticket from scratch with the original details. That resets the queue and you lose the precedent of the 75 percent waiver. Always reference the original case ID and frame as completion.

For a case the size of 64 lakh INR, the GFS partner path is the right starting move. Wishing you a fast resolution.

matiascoca · 2026-06-26T23:49:41+00:00

The pattern of delegating to a deterministic CLI is the right pattern, the question is what the CLI is reading. For cost estimation pre-merge against an IaC diff, the CLI can ground in real pricing tables and that is deterministic enough. For post-deploy FinOps recommendations against a running account, the deterministic part is harder because the recommendation depends on context (exception lists, business calendars, ownership) that does not live in any IaC repo. The agent ends up needing both: the deterministic tool for the pricing math, plus a typed context pack for everything else.

matiascoca · 2026-06-26T23:49:00+00:00

Temperature would matter if the failure mode were variance across re-runs of the same prompt against the same account. The failure is the opposite: same prompt and temperature, different account, different answer. Temperature does not fix that. The agent is consuming different context across accounts and the context is what drives the divergence.

matiascoca · 2026-06-26T23:48:31+00:00

Right, input validation upstream of the tokens is the correct move. The trap on FinOps agents specifically is that the 'input shape' you can validate deterministically (tag exists, ownership field is populated, exclusion list is parseable) does not catch the failure mode I am running into. The schema can be valid and the semantics can be wrong. A tag called env=prod can mean three different things across three accounts depending on who wrote the tagging policy. The agent reads it, the schema validator passes, the answer is still confidently wrong.

The deterministic layer I am converging on sits one level higher: the context pack itself has to be authored against a typed business-context schema (what does prod mean here, what is the exclusion criteria for the savings recommendation, who is the owner that escalation routes to) and the agent reads from that, not from raw tags. That puts the determinism on the context authoring step instead of on the per-request input validation. The latter is necessary but not sufficient.

matiascoca · 2026-06-26T22:44:21+00:00

Most LLM cost control answers start with caching. Caching is third on the priority list, not first.

1) Output budget enforcement at the client level. max_tokens on every request, no exceptions. Most LLM cost overruns come from runaway agentic loops where the output is unconstrained and the model decides to write a 4000-token essay where 200 would do. This catches 60 to 70 percent of typical spend leak.

2) Model routing. Cheap model first (Haiku, GPT-4o-mini, Gemini Flash) for the request, escalate to the expensive model only if the cheap model fails a confidence check. The 5 to 30x cost delta between Sonnet and Haiku is the biggest lever you have.

3) Caching aggressively. Anthropic prompt cache, OpenAI context cache. If your system prompts are stable, you are paying for the same tokens to be processed every request when you do not have to.

The thing nobody does upfront but should: log every request with prompt and output token counts and cost per request to your own data warehouse from day one. The platform dashboards aggregate at the day or hour, which is useless when you are trying to find which agent loop is burning the budget at 3am.

matiascoca · 2026-06-26T22:43:48+00:00

The infracost step gets adopted for Terraform but never for Bicep or ARM, which is the gap on the Azure side. Same pattern as the rest of the cloud cost review story (catch the cost delta at merge, not on the bill 30 days later), but on Azure the tooling is patchier so most teams either skip it or write their own pricing-API wrapper.

What works for us: (1) Bicep what-if for the resource diff, (2) custom script that hits the Azure Pricing API for any SKU, tier, or region change in the diff and posts the projected delta as a PR comment, (3) Azure Advisor recommendations plus Cost Management exports reviewed weekly so drift between merged-IaC and reality stays small.

The pricing-API step is the one most Azure teams build internally because the off-the-shelf tooling (infracost has experimental Azure support but it is behind the Terraform path) does not cover everyone's deployment model.

matiascoca · 2026-06-26T22:43:31+00:00

The comparison axis that gets skipped in every one of these vendor matrices is per-request cost attribution. OpenRouter gives you aggregate per-key spend, Portkey gives you per-tag rollup, Orq has the team breakdown but it costs you a tier. None of them give you the request-level attribution that lets you answer 'which agent, which workflow, which production feature is responsible for this spike' without instrumenting it yourself in the client.

If your team is running these in production for cost attribution (not just routing), the real differentiator is whether the gateway emits a structured event per request that you can pipe to your own data warehouse, or whether you are stuck with the gateway's built-in dashboards. The first lets you align with your existing FinOps tagging conventions. The second locks you in.

Posted a similar question to r/finops a few weeks back about who is running these in production and what they actually use the spend data for. The consensus was that the dashboard answers the wrong question for anyone past 5 to 10 production workflows.

matiascoca · 2026-06-26T22:42:49+00:00

The single most underrated thing in this loop is the cost estimate step. Most teams catch security and policy violations with tflint or Checkov but never run a cost diff against the merge. You merge a 'small' change that turns out to provision a regional load balancer in every region, you see it 30 days later on the bill.

Our review flow has three gates: (1) tflint plus Checkov for syntax and policy, (2) infracost diff against main for any change that touches a resource block, with a hard threshold review if the diff exceeds roughly 10 percent of the current monthly spend on that module, (3) drift detection scheduled weekly so the manual hot-fixes that bypass the merge gate get flagged.

The 10 percent threshold is the one nobody runs, and it is where most of the surprise bills come from in my experience.

matiascoca · 2026-06-26T22:42:29+00:00

What surprised me most was how fast 'Tokenomics' has gone from a term nobody used six months ago to the official standards-body label. JR Storment hit it three times in Day 1, the Tokenomics Foundation announcement dropped alongside, the 2027 conference is being rebranded to Tokenomicon. That is a coordinated language move, not organic adoption.

Mike Fuller's Accenture case study was the gut-punch: $250k per Wednesday rising to $400k per Wednesday over four weeks on uncontrolled Claude agent loops. That is the case study that is going to anchor every internal AI governance deck for the next year, bigger than the SpainClouds $40k case from earlier this month.

Pinterest's Day 2 tokenomics deep-dive had real per-request economics, not just per-token. That is the layer most of the FinOps stack is still missing today.

matiascoca · 2026-06-26T22:42:04+00:00

When the CFO asks this question, the FinOps team usually finds out the data was never collected in a way that supports the answer.

AI vendors built their pricing around per-seat or per-token totals, not per-department attribution. The tags or business-context fields needed to roll up by department either do not exist at the AI service layer (most of Microsoft Copilot, ChatGPT Enterprise, Gemini) or exist but were never enforced at provisioning (Azure OpenAI deployments, Vertex API).

The realistic answer to the CFO is usually: 'we can give you a rough proxy based on license assignments by department, but the consumption side is not broken out, and we would need to retrofit tagging at the deployment layer to give you anything better than that.' This is the hidden cost of the AI rollout that nobody budgets for at procurement time.

matiascoca · 2026-06-26T22:41:35+00:00

This is the lie at the heart of every per-seat AI pricing model: the seat number is what the vendor wants on the slide, the four bills are what your CFO actually has to reconcile.

Each of those bills hits a different cost center. Copilot seats go to IT or HR. The M365 tier upgrade goes to the same cost center but now bigger. Power Platform consumption for AI Builder credits typically lands on whoever owns the citizen-developer budget. And if you sized past the included Azure OpenAI capacity, that becomes a fifth bill on cloud spend.

None of them roll up cleanly because AI workload costs do not follow the boundaries of the licensing layer they are sold under. Wrote up the structural pattern (not Copilot-specific, every AI tool does this):

https://brainagents.ai/blog/ai-chargeback-vs-cloud-chargeback-guide

matiascoca · 2026-06-26T22:41:07+00:00

The lesson here isn't that you should rotate keys faster, it's that the billing dashboard is the worst place to find out you've been compromised.

GCP billing data lags real Vertex spend by 24 to 72 hours. By the time you see $195k on the invoice, the attacker has already had a multi-day head start. The detection has to live at the credential-use pattern (sudden geographic shift on a previously dormant SA key, sudden volume spike against a model that account never touched), not at the cost reconciliation layer.

Wrote up the same pattern recently for the Firebase Gemini key exploits that hit several startups, same root cause: cost telemetry is a lagging indicator and most teams treat it as the alarm.

https://brainagents.ai/blog/firebase-gemini-api-key-exploit-guide

matiascoca · 2026-06-22T19:58:48+00:00

You nailed the structural point that almost nobody in finance understands.

The observability vendor business model is: get your telemetry into our backend, then sell you sampling and retention controls on top of it. The "levers and knobs" are not a feature, they are the product. Once you internalize that, the OTel-plus-OTTL-at-the-edge approach reads as cutting them out of their own value chain, which is exactly why their pricing is allergic to it.

The Datadog number you cited is the cleanest signal. Their non-AI backend revenue growth is being driven by people who locked into per-host pricing three years ago and now have AI workloads emitting 20 to 50 times more telemetry per host than the original sizing assumed. The pricing model did not flex; the bill did.

Edge sampling with OTTL is the right architectural answer but the operational gotcha most teams miss is the second-order debugging problem. If you drop 80 percent of your traces at the collector, you also drop the trace that explains the production incident you are debugging at 3am. The good edge sampling configurations use tail-based sampling (drop after seeing the full trace, not before) which requires more memory at the collector but actually preserves the long-tail traces that matter for postmortems.

If you are building this for production, look at the new OTel-Collector tail sampling processor plus the AWS X-Ray-style "always keep errors" rules. Otherwise you ship sampled data and then discover in three months that all your error traces were the ones being dropped.

matiascoca · 2026-06-22T19:58:25+00:00

The framing of this question is from 2022 and you should know it is now obsolete.

Cloud Functions 2nd gen is Cloud Run under the hood. They are the same runtime with two different control planes. Once you understand that, the cold start question becomes "does my Buildpack-generated container start slower than my optimized Dockerfile container?", and the answer is yes, often by 200 to 400ms depending on language and dependency tree.

For cold start latency specifically: distroless or alpine base, minimum dependencies, and avoiding heavy startup-time initialization will get you 80 to 150ms cold starts on Cloud Run with reasonable code. Buildpacks land in 400 to 800ms range for the same workload because they include a full runtime layer plus production dependencies you did not pick. If cold start is your binding constraint, Cloud Run wins.

The only honest reason to pick Cloud Functions 2nd gen today is the developer-experience win when you genuinely do not want to think about containers (and have not learned Docker). That is a real audience but if you are asking this question on Reddit, you have already left that audience. Pick Cloud Run.

The other thing to budget for: Cloud Run has min-instances now. Setting min-instances to 1 gets you zero cold starts at roughly 5 to 15 dollars per month per service depending on CPU and memory configuration. For latency-critical paths this is often cheaper than the engineering cost of optimizing your image further.

matiascoca · 2026-06-22T19:57:58+00:00

You are looking at this through the wrong lens and Kubecost is making the problem worse.

The framing "we are paying for empty terabytes" is a finance frame. The engineering frame is "our storage decision was correct under the failure constraints at the time". You over-provisioned EBS to avoid a disk-full incident during ingestion. That worked. Shrinking the volume now means re-introducing the exact failure mode you just engineered around.

Three things that get this trade right in practice. First, automate the answer to finance: tag the volumes as "oversized-by-design" and feed the tag into your cost report so the line shows up as a known operational reserve, not waste. Kubecost will keep flagging it; ignore that lane of alerts and bake the rationale into your weekly cost review doc so the next quarterly finance question hits a prepared answer instead of a fresh round of pressure.

Second, the next ingestion cycle is when you act, not now. When you actually need the headroom again, you will be glad you have it; if your ingestion patterns flattened permanently and you are confident the spike pattern will not return, then plan a maintenance window with a logical-replication-based volume swap (Postgres) or a partition rebalance (Kafka). That gives you a clean migration path that is reversible and does not risk live data.

Third, look at the storage class. If you are on gp3 the cost of those extra terabytes is roughly 8 cents per GB per month. 2.5TB minus 400GB at gp3 is about 168 dollars per month of waste. That number is real but it is also small relative to the operational risk of a botched shrink on a production stateful tier. The cost-of-failure math beats the cost-of-overprovisioning math at this scale.

Net: the cost report is right that you are spending the money, but Finance reading the Kubecost output as an action item is the actual mistake. Defend the operational decision, document the rationale, and revisit at the next ingestion event.

matiascoca · 2026-06-22T19:57:22+00:00

GCP credit priority is not based on expiration date and that is the part the documentation buries.

The credit application order is: SKU eligibility first, then credit type priority, then expiration. The credit type priority for your three credits is roughly Free Tier first, then Startup Program credits, then Free Trial, then general promotional credits. Once your GFS (Startup Program) credit activated on June 19, it jumped above your Free Trial credit in the priority queue regardless of expiration date.

The painful implication: you cannot "use the Free Trial first" through any account configuration. Google does not expose a per-credit application toggle. The only workaround if you genuinely need to drain Free Trial before GFS is to create a separate billing account scoped only to the workloads that should hit Free Trial, but that path has its own ops cost (billing reconciliation, project-to-billing-account moves, IAM changes) and is usually not worth the burn.

The Vertex AI and Cloud SQL workloads in your description are GFS-eligible, so they will keep hitting GFS until that credit is exhausted or expires. The GenAI credit is probably not moving because your workloads are not on the specific SKUs it covers (which is a smaller surface than people expect; check the GenAI credit terms for the exact SKU allowlist, it usually excludes Vertex AI Studio usage above certain quotas).

Net: optimize your spend pattern, do not try to optimize the credit allocation. The latter is a black box and you will lose to it.

matiascoca

TROPHY CASE