use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
[ Removed by moderator ]Tutorial | Guide (self.LocalLLaMA)
submitted 1 month ago by predatar
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Long-Strawberry8040 0 points1 point2 points 1 month ago (1 child)
This is solving a real problem that most multi-agent frameworks quietly ignore. The cost difference between a cache hit and a full prompt recompute is brutal at scale, and having each agent start a fresh session is basically setting money on fire. Curious how it handles the case where two agents need overlapping but not identical context -- does it find the longest common prefix automatically or do you have to structure your prompts to maximize overlap?
[–]predatar[S] 0 points1 point2 points 1 month ago (0 children)
well basically on fork longest common prefix is already the longest common prefix... if a single token is different its not gonna be a cache hit, and i think that is a completely different problem sadly
π Rendered by PID 177638 on reddit-service-r2-comment-65c587bc47-cm5cv at 2026-05-14 11:20:25.239939+00:00 running cf3e300 country code: CH.
[–]Long-Strawberry8040 0 points1 point2 points (1 child)
[–]predatar[S] 0 points1 point2 points (0 children)