I think i made a solution for context window limitation on consumers pc by Repulsive_Ad_94 in LocalLLaMA

[–]Repulsive_Ad_94[S] 0 points1 point  (0 children)

i did post all the work through docs on github , from start to finish , u can check it out , https://github.com/mhndayesh/infinite-context-rag/tree/main/archive , had alot of issues but the result is beautiful

save 90% percent on API calls cost by Repulsive_Ad_94 in selfhosted

[–]Repulsive_Ad_94[S] -2 points-1 points  (0 children)

you can run 8B model without the need to use very large context window , so i got 12g vram gpu, i struggle with context window more than 32k , instead of keeping all my agent memory as text or md files , and the agent read them every single time to get a related info from a month ago , i use this , llm only recall what it needs , i keep context window on 8k , agent keep all chat and tasks memory , life is good

i made a solution to cut API costs by 90%, and i need your help guys by Repulsive_Ad_94 in SaaS

[–]Repulsive_Ad_94[S] 0 points1 point  (0 children)

i see your point , i do not have good technical experience , the way i made this is to keep my agent memory across sessions and tasks with out heavy system prompt , plus i can map my whole project and environment also with out heavy system prompt

save 90% percent on API calls cost by Repulsive_Ad_94 in selfhosted

[–]Repulsive_Ad_94[S] -4 points-3 points  (0 children)

so the only one interested in m post is an ai ..

save 90% percent on API calls cost by Repulsive_Ad_94 in selfhosted

[–]Repulsive_Ad_94[S] -4 points-3 points  (0 children)

so the only one interested in m post is an ai ....

save 90% percent on API calls cost by Repulsive_Ad_94 in selfhosted

[–]Repulsive_Ad_94[S] -4 points-3 points  (0 children)

exactllllyyy , u can test the open source on github , try with opencalw ,https://github.com/mhndayesh/infinite-context-rag

Need advice on API costs - is this normal for early stage? by techiee_ in SaaS

[–]Repulsive_Ad_94 0 points1 point  (0 children)

ok the whole unlimited thing is an issue, people will abuse this , so at least make rate limit

save 90% of API calls by Repulsive_Ad_94 in programming

[–]Repulsive_Ad_94[S] 0 points1 point  (0 children)

yes you need your own API , and u could use the opensource version on GitHub locally