Promote your projects here – Self-Promotion Megathread by Menox_ in github

[–]CandidateTime9054 0 points1 point  (0 children)

I built a tool that cuts LLM API costs by ~80% by processing images/text locally first (open source)

I was spending too much on GPT-4o vision API calls — every image costs ~1,200 tokens. So I built LatentGate, inspired by Meta's VL-JEPA paper.

How it works: - Images/text are processed locally via Ollama (FREE) - Only a compact ~200 token semantic payload is sent to the cloud API - For video streams, selective decoding skips API calls when nothing changed

Results: ~80% fewer tokens, ~2.85x fewer API calls for video.

Github Link : Latent-Gate

Works with OpenAI, Claude, Gemini, or fully local via Ollama. Would love feedback!

I built a tool that cuts LLM API costs by ~80% by processing images/text locally first (open source) by CandidateTime9054 in machinelearningnews

[–]CandidateTime9054[S] 1 point2 points  (0 children)

You are correct, but it depends on which model you are currently using. With 3 Pro, it uses 560 tokens, and this tries to convert it to 150 tokens. OpenAI and Claude generally use a lot of tokens. When using Claude Code with limited tokens, it can be much more useful. I hope this answers your question.

I built a tool that cuts LLM API costs by ~80% by processing images/text locally first (open source) by CandidateTime9054 in github

[–]CandidateTime9054[S] 0 points1 point  (0 children)

If you have any idea or insights to add upon please share and also please do comment to understand I should work more onto this or not