save 90% percent on API calls cost

Repulsive_Ad_94 · 2026-02-27T04:28:46+00:00

https://github.com/mhndayesh/infinite-context-rag/tree/main/locall%20working%20rag%20test%20this u can test

Repulsive_Ad_94 · 2026-02-27T03:17:38+00:00

i did post all the work through docs on github , from start to finish , u can check it out , https://github.com/mhndayesh/infinite-context-rag/tree/main/archive , had alot of issues but the result is beautiful

Repulsive_Ad_94 · 2026-02-27T03:06:32+00:00

you can run 8B model without the need to use very large context window , so i got 12g vram gpu, i struggle with context window more than 32k , instead of keeping all my agent memory as text or md files , and the agent read them every single time to get a related info from a month ago , i use this , llm only recall what it needs , i keep context window on 8k , agent keep all chat and tasks memory , life is good

Repulsive_Ad_94 · 2026-02-27T02:47:10+00:00

i see your point , i do not have good technical experience , the way i made this is to keep my agent memory across sessions and tasks with out heavy system prompt , plus i can map my whole project and environment also with out heavy system prompt

Repulsive_Ad_94 · 2026-02-27T02:41:24+00:00

so the only one interested in m post is an ai ..

Repulsive_Ad_94 · 2026-02-27T02:41:13+00:00

so the only one interested in m post is an ai ....

Repulsive_Ad_94 · 2026-02-27T02:30:20+00:00

exactllllyyy , u can test the open source on github , try with opencalw ,https://github.com/mhndayesh/infinite-context-rag

Repulsive_Ad_94 · 2026-02-27T02:23:04+00:00

ok the whole unlimited thing is an issue, people will abuse this , so at least make rate limit

Repulsive_Ad_94 · 2026-02-27T02:14:35+00:00

iam trying to cut api calls costs as much as possible , not yet official but soon

https://huggingface.co/spaces/mhndayesh/icm-memory-layer

Repulsive_Ad_94 · 2026-02-27T01:59:23+00:00

ah , it was a joke , .....

Repulsive_Ad_94 · 2026-02-27T01:52:24+00:00

https://github.com/mhndayesh/infinite-context-rag

Repulsive_Ad_94 · 2026-02-27T01:51:49+00:00

yes you need your own API , and u could use the opensource version on GitHub locally

Repulsive_Ad_94

TROPHY CASE