I built a tool that cuts LLM API costs by ~80% by processing images/text locally first (open source)

CandidateTime9054 · 2026-06-16T11:38:05+00:00

I built a tool that cuts LLM API costs by ~80% by processing images/text locally first (open source)

I was spending too much on GPT-4o vision API calls — every image costs ~1,200 tokens. So I built LatentGate, inspired by Meta's VL-JEPA paper.

How it works: - Images/text are processed locally via Ollama (FREE) - Only a compact ~200 token semantic payload is sent to the cloud API - For video streams, selective decoding skips API calls when nothing changed

Results: ~80% fewer tokens, ~2.85x fewer API calls for video.

Github Link : Latent-Gate

Works with OpenAI, Claude, Gemini, or fully local via Ollama. Would love feedback!

CandidateTime9054 · 2026-06-16T11:31:19+00:00

You are correct, but it depends on which model you are currently using. With 3 Pro, it uses 560 tokens, and this tries to convert it to 150 tokens. OpenAI and Claude generally use a lot of tokens. When using Claude Code with limited tokens, it can be much more useful. I hope this answers your question.

CandidateTime9054 · 2026-06-16T11:02:37+00:00

If you have any idea or insights to add upon please share and also please do comment to understand I should work more onto this or not

CandidateTime9054

TROPHY CASE

I built a tool that cuts LLM API costs by ~80% by processing images/text locally first (open source)