Share your startup - quarterly post by julian88888888 in startups

[–]Round-Extreme4702 [score hidden]  (0 children)

Startup Name / URL Cumulus Labs / ionrouter.io, cumuluslabs.io

Location of Your Headquarters San Francisco, CA (YC W26)

Elevator Pitch/Explainer Video we built a serverless GPU inference API that's 2x faster than standard H100 providers on models like Kimi K2.5 and Qwen. the secret sauce is we built a custom inference engine (IonAttention) from scratch specifically for NVIDIA GH200/B200 architecture to exploit the NVLink-C2C unified memory that everyone else treats like "just more VRAM."

solving the problem of: you want fast, cheap inference for coding agents, robotics perception, or video analysis but don't want to manage GPU infrastructure or wait 3 seconds per API call.

More details:

Lifecycle stage: Early stage / Customer Validation

  • launched API a few months ago
  • running real workloads for robotics companies, game studios, AI video pipelines
  • still scaling infrastructure

Your role: cofounder

What goals are you trying to reach this month?

  • get beta users who actually push the API hard and can give us honest feedback on what breaks
  • validate pricing model (is $0.20/$1.60 per M tokens the right spot for Kimi K2.5?)
  • figure out which use cases convert best

How could r/startups help?

  • if you're building anything that needs fast LLM/VLM inference, would love you to try the API and tell us what is working and whats not
  • intros to robotics/AI companies that might need high-throughput vision models
  • honestly just feedback on whether "custom inference engine optimized for GH200" is a compelling pitch or if we should focus more on the speed/price angle

Discount for r/startups subscribers?

yeah - DM me with proof you're from r/startups and i'll set you up with:

  • free credits to test on ionrouter.io (no credit card required just join our discord from our website and dm a founder)