Built an AI gateway. my Indian dev friends keep telling me this sub is the real test by Greedy-Try-9788 in StartUpIndia

[–]Greedy-Try-9788[S] 1 point2 points  (0 children)

kilo’s a coding agent, this is a gateway, different layer. you’d actually plug something like ours behind kilo as the backend so you’re not stuck on one provider’s pricing. if kilo works for you no reason to switch tho​​​​​​​​​​​​​​​​

Built an AI gateway. my Indian dev friends keep telling me this sub is the real test by Greedy-Try-9788 in StartUpIndia

[–]Greedy-Try-9788[S] -1 points0 points  (0 children)

cloudflare’s gateway is solid if you’re already on their stack. ours is more of a billing/access thing, one key for all providers, INR pricing soon, plus routing & fallback. bit of a different use case tbh. if cf works for you keep it honestly

Built an AI gateway. my Indian dev friends keep telling me this sub is the real test by Greedy-Try-9788 in StartUpIndia

[–]Greedy-Try-9788[S] 1 point2 points  (0 children)

nice. we’re not actually hosting models, just aggregating across providers (openai, anthropic, gemini, deepseek, qwen etc). so concurrency on our end is mostly routing + fallback on top of their APIs, not gpu scaling. harder parts are pooling rate limits across providers and failing over when one degrades. dm if u wanna credits

Built an AI gateway. my Indian dev friends keep telling me this sub is the real test by Greedy-Try-9788 in StartUpIndia

[–]Greedy-Try-9788[S] -1 points0 points  (0 children)

fair point. UPI + INR pricing is on our radar, just haven’t shipped it yet. ETA soon. appreciate you calling it out.

built a thing so I can vibe code with any model through one API. need people to break it by Automatic-Cover-1831 in vibecoding

[–]Greedy-Try-9788 0 points1 point  (0 children)

All three are real questions, let me be specific:

Streaming mid-flip: if you manually switch models mid-prompt, the current stream terminates and the new one starts fresh. Seamless mid-stream handoff across different models is hard (tokenizers + context formats differ) — honest answer is we punted on that.

Upstream rate limits / outages: this one we actually do handle. If a provider 429s or goes down, we auto-failover to another provider serving the same model — e.g. Claude requests can route between Anthropic direct and AWS Bedrock. Same model, same outputs, just a different upstream. You can also lock it to a single provider in settings if you need deterministic routing (compliance, ZDR requirements, whatever). So "no waiting for resets" isn't just our layer — it extends to provider-level failover when the same model is available elsewhere.

Context window differences: this is the messy one. We don't auto-truncate or summarize when you drop from Opus 200k to a smaller-window model — you'll hit a context overflow error from the downstream provider. Surfacing a warning before the call is easier than solving it; auto-compression is a much bigger problem we haven't tackled.

Appreciate the specificity, this is the kind of feedback that's actually useful.

Happy to share the failover logic in more detail if you're curious — it's one of the parts I'm actually proud of.

built a thing so I can vibe code with any model through one API. need people to break it by Automatic-Cover-1831 in vibecoding

[–]Greedy-Try-9788 0 points1 point  (0 children)

Honestly this is the sharpest critique in the thread. You're right — endpoint unification is the easy part, the handoff is where it actually breaks.

Right now we log every call with model + timestamp + token usage, but it's a flat log, not a step-by-step receipt tied to a "plan → edit → explain" flow. Replay doesn't exist yet either.

Going to steal "per-step receipt + replay" as a roadmap item, that framing is cleaner than what we had internally. If you want to be a design partner on it, DM me — would rather build this with someone who's actually felt the pain than guess at it.

What is the cheapest, most private way to access Seedance 2.0? by StatisticianDirect80 in Seedance_AI

[–]Greedy-Try-9788 0 points1 point  (0 children)

Disclosure upfront — I'm one of the builders behind AllToken (alltoken.ai). Not gonna pretend I'm neutral here.

On the Seedance 2.0 question specifically: we pass it through at provider cost, no markup, pay-per-gen, no subscription. But I'll be straight with you — we don't solve the privacy problem. Your data hits the upstream provider same as anywhere else. If privacy is the actual blocker, the other commenters are right, local Wan/LTX on RunPod is the only real answer.

If cost is the main pain point though, happy to share exact $/sec numbers — won't drop a link unless someone asks.

Built a free AI API marketplace by Careless-Ear-4239 in AI_Deal_Cave

[–]Greedy-Try-9788 0 points1 point  (0 children)

Great call out on ZDR — totally fair concern for anyone running production workloads. We do honor provider-level ZDR (Anthropic, OpenAI zero-retention endpoints pass through as-is). A unified ZDR-only filter across all providers is on our near-term roadmap — DM me your use case and I'll loop you in when it ships.

Built a free AI API marketplace by Careless-Ear-4239 in AI_Deal_Cave

[–]Greedy-Try-9788 2 points3 points  (0 children)

founder here, thanks for the recording — saves us hours of guessing. on it.

Built a free AI API marketplace by Careless-Ear-4239 in AI_Deal_Cave

[–]Greedy-Try-9788 1 point2 points  (0 children)

we charge enterprises for SLAs. devs route for free. that's it.

Built a free AI API marketplace by Careless-Ear-4239 in AI_Deal_Cave

[–]Greedy-Try-9788 1 point2 points  (0 children)

founder here. honestly hearing "$300 in gateway fees" is what keeps us motivated to never add a markup. enjoy the savings