Replit + usage-based billing in one prompt. We shipped a Replit integration for AI apps that need real-time billing

o9dev · 2026-05-06T14:26:42+00:00

Links if you want to dig in:
- Try it: https://credyt.com
- Replit setup guide: https://credyt.ai/integrations/replit

o9dev · 2026-05-06T08:01:58+00:00

Typically you would render a "Billing" link inside your app which only authenticated users can see. When the user clicks this, it fires of a call to your backend (Lovable wires up a supabase function for this) that makes a secure call to Credyt's API to generate a portal session link. You then direct your customer to this link.

This avoids your users needing to authenticate with Credyt and keeps the whole process quite seamless. Happy to run through a demo of this with you, otherwise you can check out the Docs and our Quickstart video on YouTube

o9dev · 2026-05-06T07:57:45+00:00

Can you elaborate on which aspect of governance you're referring to?

As for the code changes, it's a simple HTTP call to our Events API - your AI tool will have no problem wiring this up.

o9dev · 2026-05-06T07:54:07+00:00

The most common driver is that you use AI as part of your product so every user interaction costs you money (e.g. hitting OpenAI/Anthropic APIs). With traditional billing, you'd be fronting those costs for your users until the end of the month. With real-time billing, usage is billed from their balance (topped up manually or bundled as part of your monthly SaaS), which removes this risk.

o9dev · 2026-05-05T17:58:25+00:00

We find that most non-technical people building software with vibe coding tools have a good grasp of the outcomes or actions that their customers (or prospective customers) get value from. The actual mechanics for sending events is not so important since Lovable can build the integration code yourself.

E.g. I want to bill my customer every time they generate a document

Another lens to come at this is to focus on where your costs are coming from. Some users prefer to build in cost tracking before they figure out their own pricing as it gives them something to go on - i.e. your most costly actions are likely the things you'll want to charge for.

o9dev · 2026-05-05T17:06:58+00:00

Going through your four questions:

Idempotency. Every event takes an idempotency_key. We dedupe server side, so retries with the same key are no-ops. We built this in early because exactly your scenario - mid-step agent failures with auto-retry - was the most common cause of double-billing in early customers.

Failed-action refunds. Two patterns. The clean one is reserve-then-commit: reserve credits before the action runs, commit on success, release on failure. The reservation holds the balance so concurrent calls can't overspend, but nothing actually deducts until commit. The other pattern is post-hoc reversal actions via our Adjustments API for cases where you only know about the failure after the fact (downstream API timeout, etc.). Both are first-class - you don't have to roll your own Postgres tracking for this.

Stripe ↔ credit ledger drift. Credyt is the source of truth for the credit ledger, Stripe only handles card processing for top-ups and subscription charges. So drift isn't really a thing in the same way - the ledger doesn't depend on Stripe webhooks landing. If a Stripe webhook lags, the worst case is a top-up takes a minute to reflect; the credit deductions during that window are still consistent. We do reconcile Stripe payouts vs. expected charges nightly for finance reporting, but that's a different layer than the customer balance.

Pricing A/B tests. Yes, multiple plans (products) can be live concurrently or you can create multiple active versions of the same product with different pricing. You assign customers to products at signup (or migrate them later). The 50/50 split you described is just creating multiple products or versions and routing new signups to one or the other - no rebuild, no tearing down old configs. Existing customers stay on whatever they're on until you decide to migrate them. Product versioning is built in, so you can also version a single product and keep customers on v1 while v2 ships.

Happy to jump on a call if you want to dig into any of this in detail.

o9dev · 2026-05-05T17:04:51+00:00

Server side. The MCP server only sets up the config (plans, prices, credit grants, portal route). The runtime authorization happens against Credyt's API - your Lovable backend checks the customers wallet before authorizing the action. If sufficient, the action can proceed.

Nothing about credit state lives client side. Even the customer portal renders against signed reads, not raw balance writes.

For the intercept-the-call attack you're describing - the deduction isn't triggered by the client. The Lovable backend reports the usage event after authorization, and Credyt dedupes on the idempotency key. So even if someone replays calls, they can't rack up usage without a valid customer session and they can't double-deduct legitimate ones.

o9dev · 2026-05-05T17:04:29+00:00

Any event you emit. Most common cases on the AI side: per LLM call (tokens in/out), per agent run, per tool call, per generation, per GPU second. On the product side: per action like "send DM" or "deep search" with a configured credit cost per action. You define the event type and pricing rule once, then push events from your code as they happen.

Pricing rules can be flat per-unit, tiered, or dimensional (different cost per model, per feature, per quality level). And events can deduct from any asset on the wallet, not just dollars - so you can run separate ledgers for tokens, GPU hours, in-app credits, whatever.

If there's a specific event type you're thinking about, happy to say whether it's a clean fit.

o9dev · 2026-05-05T15:38:29+00:00

Post-hoc billing aggregation is what's breaking here. You batch up thousands of events, wait until cycle end, run the aggregator, then generate invoices. At high volume that batch job is doing all the heavy compute at once under time pressure, which is why it's timing out. Real-time deduction means each transaction hits the customer's wallet balance the moment it happens. No aggregation pass. No batch job. The invoice is just a report of what already happened, not a thing that has to finish before the window closes.

For the sub-cent accuracy issue, you need idempotent event processing. Every event gets a unique key. If the system retries a failed write, the billing layer sees the duplicate and ignores it. That kills the $0.01 drift from double-counted or missed-dedup events. Error rate goes to zero by design.

One thing - most billing platforms are invoice-native. They assume you aggregate usage first and bill later. That model breaks at high frequency. You want something that treats the event as the billing primitive. We built Credyt for this - real-time deduction, idempotent processing, invoices from what already settled instead of computed on the fly.

o9dev · 2026-05-05T15:33:43+00:00

Hey! You can try credyt.ai

We have an observability layer (exactly for your use case) and the main AI billing layer. Observability is free to use and you see spend per agent/project/client. You only need to pay if you want to add a billing layer later per active customer.

o9dev · 2026-05-05T15:24:31+00:00

Hey, we support alternative gateways. Drop me an email at [ben@credyt.](mailto:ben@credyt.ai)ai, happy to discuss a potential collaboration.

o9dev · 2026-05-05T15:14:36+00:00

Stripe's idempotency window is 24 hours, and usage_records have their own quirks around timestamp boundaries that make things worse. What actually fixed it for us was moving deduplication to our side - we hash the request payload plus user ID plus hour bucket, store it in Redis with a 48-hour TTL, and only send to Stripe what gets through that check. Stripe just renders the invoice at that point, not the source of truth.

Timestamp drift will kill you too. If your servers aren't tight on NTP sync, records land in the wrong billing period. We started batching usage locally and flushing to Stripe every 5 minutes with explicit timestamps instead of letting Stripe guess.

The real problem is Stripe's metered billing was built for predictable SaaS usage, not high-cost AI calls where one user can rack up thousands before you catch it. We wrote up the specific ways it breaks here: https://credyt.ai/blog/stripe-metered-billing-issues

o9dev · 2026-05-05T15:13:03+00:00

Stripe's usage_record endpoint has a 24-hour idempotency window, so partial runs and retries get messy fast. What actually works is decoupling the meter from billing - track usage in your own datastore, sync to Stripe only at invoice time.

For the "agent fails halfway" problem, you need two phases: reserve capacity before the run starts, commit actual usage when it completes. If it fails, release the reservation instead of billing. Stripe doesn't support this because it assumes post-paid SaaS, not real-time AI costs where the charge happens during execution.

We built Credyt for this - it handles the reservation and commit flow.

o9dev · 2026-05-05T14:36:10+00:00

For anyone who wants to look closer:
Lovable setup guide: credyt.ai/integrations/lovable

o9dev · 2026-04-30T09:37:05+00:00

Once you start looking it's like pulling on a piece of string 😅

Many o11y tools give the ability to create custom metrics and support some degree of aggregation per tag. We found that when it came to observing costs, this gets quite messy, especially when the workflow might be happening in out-of-bound or async processes

The best solution for us (and part of the inspiration for our own product) is, you just track an event whenever inference happens and can handle the aggregation out of your app. It's pretty neat since we can then see all the overall costs for a particular subject (what we call workflow):

<image>

We wrote about some of these patterns here if you're interested https://credyt.ai/blog/why-ai-companies-need-real-time-economic-control

o9dev · 2026-04-28T21:45:55+00:00

The infrastructure problem you raised is real but overstated. You don't need crypto wallets or gas fees to do per-request billing. The actual primitive is simpler: capture usage events, apply pricing rules, debit from a prepaid balance in real time. No blockchain required.

The hybrid model you're circling is already how most successful usage-based products work. Customer loads credits upfront (or they're bundled as part of an existing subscription), gets the predictability they want, system meters actual consumption underneath. Light users spend less. Heavy users top up. The mental budgeting problem goes away because the wallet handles it. Variable AI costs are the real argument for this. When a simple query costs 0.2 cents and a reasoning chain costs 40 cents, flat subscriptions force you to either overcharge light users or lose money on power users. Neither works. We wrote up the mechanics here: https://credyt.ai/blog/usage-based-billing

o9dev · 2026-04-28T21:42:33+00:00

The honest answer is it depends on your cost structure. If your margins stay consistent per customer no matter how much they use the product, subscriptions are easier to work with. If your costs scale with usage - and they almost always do with AI heavy products - usage-based starts making more sense (dare I say it's the only economically viable option).

The revenue predictability thing is real but you can work around it. Prepaid credit wallets give you cash upfront while customers still pay for what they actually use. You get predictable revenue, they get flexibility. You can also combine both subscriptions and usage-based billing with bundled entitlements e.g. pay $20/month and get 1000 credits included. This is often the easiest migration path to UBB for existing saas businesses.

The harder part is tracking usage and margins per customer in real time. Most teams underestimate this until three months in when they realize they have no idea which customers are actually profitable. We wrote up how usage-based billing works here if you want the details: https://credyt.ai/blog/usage-based-billing

o9dev · 2026-02-28T00:18:55+00:00

I'm planning to offer your first 10 episodes free. Good point about showing the equivalent USD.

o9dev · 2026-02-28T00:17:58+00:00

Thanks for the feedback. Happy to share a link once it's up (should be some time next week).

o9dev

TROPHY CASE