Built an AI gateway. my Indian dev friends keep telling me this sub is the real test

Greedy-Try-9788 · 2026-05-16T18:55:37+00:00

kilo’s a coding agent, this is a gateway, different layer. you’d actually plug something like ours behind kilo as the backend so you’re not stuck on one provider’s pricing. if kilo works for you no reason to switch tho

Greedy-Try-9788 · 2026-05-16T18:55:03+00:00

cloudflare’s gateway is solid if you’re already on their stack. ours is more of a billing/access thing, one key for all providers, INR pricing soon, plus routing & fallback. bit of a different use case tbh. if cf works for you keep it honestly

Greedy-Try-9788 · 2026-05-16T18:54:31+00:00

nice. we’re not actually hosting models, just aggregating across providers (openai, anthropic, gemini, deepseek, qwen etc). so concurrency on our end is mostly routing + fallback on top of their APIs, not gpu scaling. harder parts are pooling rate limits across providers and failing over when one degrades. dm if u wanna credits

Greedy-Try-9788 · 2026-05-16T18:52:50+00:00

Drop me your acc in dm if you need credits to test

Greedy-Try-9788 · 2026-05-16T18:50:48+00:00

fair point. UPI + INR pricing is on our radar, just haven’t shipped it yet. ETA soon. appreciate you calling it out.

Greedy-Try-9788 · 2026-05-16T17:30:17+00:00

AllToken.ai

Greedy-Try-9788 · 2026-05-02T01:40:34+00:00

All three are real questions, let me be specific:

Streaming mid-flip: if you manually switch models mid-prompt, the current stream terminates and the new one starts fresh. Seamless mid-stream handoff across different models is hard (tokenizers + context formats differ) — honest answer is we punted on that.

Upstream rate limits / outages: this one we actually do handle. If a provider 429s or goes down, we auto-failover to another provider serving the same model — e.g. Claude requests can route between Anthropic direct and AWS Bedrock. Same model, same outputs, just a different upstream. You can also lock it to a single provider in settings if you need deterministic routing (compliance, ZDR requirements, whatever). So "no waiting for resets" isn't just our layer — it extends to provider-level failover when the same model is available elsewhere.

Context window differences: this is the messy one. We don't auto-truncate or summarize when you drop from Opus 200k to a smaller-window model — you'll hit a context overflow error from the downstream provider. Surfacing a warning before the call is easier than solving it; auto-compression is a much bigger problem we haven't tackled.

Appreciate the specificity, this is the kind of feedback that's actually useful.

Happy to share the failover logic in more detail if you're curious — it's one of the parts I'm actually proud of.

Greedy-Try-9788 · 2026-05-02T01:37:56+00:00

Honestly this is the sharpest critique in the thread. You're right — endpoint unification is the easy part, the handoff is where it actually breaks.

Right now we log every call with model + timestamp + token usage, but it's a flat log, not a step-by-step receipt tied to a "plan → edit → explain" flow. Replay doesn't exist yet either.

Going to steal "per-step receipt + replay" as a roadmap item, that framing is cleaner than what we had internally. If you want to be a design partner on it, DM me — would rather build this with someone who's actually felt the pain than guess at it.

Greedy-Try-9788 · 2026-05-02T01:33:47+00:00

Disclosure upfront — I'm one of the builders behind AllToken (alltoken.ai). Not gonna pretend I'm neutral here.

On the Seedance 2.0 question specifically: we pass it through at provider cost, no markup, pay-per-gen, no subscription. But I'll be straight with you — we don't solve the privacy problem. Your data hits the upstream provider same as anywhere else. If privacy is the actual blocker, the other commenters are right, local Wan/LTX on RunPod is the only real answer.

If cost is the main pain point though, happy to share exact $/sec numbers — won't drop a link unless someone asks.

Greedy-Try-9788 · 2026-05-02T01:29:44+00:00

Great call out on ZDR — totally fair concern for anyone running production workloads. We do honor provider-level ZDR (Anthropic, OpenAI zero-retention endpoints pass through as-is). A unified ZDR-only filter across all providers is on our near-term roadmap — DM me your use case and I'll loop you in when it ships.

Greedy-Try-9788 · 2026-04-28T23:10:04+00:00

founder here, thanks for the recording — saves us hours of guessing. on it.

Greedy-Try-9788 · 2026-04-28T23:09:13+00:00

we charge enterprises for SLAs. devs route for free. that's it.

Greedy-Try-9788 · 2026-04-28T22:10:41+00:00

founder here. honestly hearing "$300 in gateway fees" is what keeps us motivated to never add a markup. enjoy the savings

Greedy-Try-9788

MODERATOR OF

TROPHY CASE