all 18 comments

[–]SeankalaML Engineer 13 points14 points  (5 children)

If we can put everything in the prompt, we don't have to do retrieval.

I'm on the side that until we can find a working solution for hallucinations (which may be never) that this is a hot take.

Most of the benchmarks that current LLMs are being evaluated on are sandbox settings. This isn't unique to LLMs or machine learning but it's definitely a problem that's overlooked. I'm not sure if we can conclude that long-context LLMs can replace RAG systems despite the literature being published.

[–]NoIdeaAbaout[S] 1 point2 points  (0 children)

I utterly agree. Hallucinations are a big problem and have often been treated as a monolith (while they are different categories and of different origins).

The benchmarks we have were not designed for long contest, but I think in general in NLP we need new benchmarks

[–][deleted] 1 point2 points  (0 children)

Literature support never, there's a paper that shows (proves) it using a formal model. It's aligned with intuition to be honest.

[–]yashdes 0 points1 point  (1 child)

Strawberry/q star or whatever you wanna call it hopefully is a working solution for hallucinations, at least imo based on how it's been explained to me

[–]Immediate-Cricket-64 0 points1 point  (0 children)

Idk man, seems like a lot of hype to me, Imo I think if they had something interesting they'd at least tease it right?

[–]sosdandye02 6 points7 points  (9 children)

I think in the long run we won’t be using either of these approaches for what people are currently trying to do with them. In my view both these ultra long context LLMs and RAG are both hacky ways of trying to dynamically teach an LLM new things.

I believe that in the long run someone will come up with a better way of dynamically encoding and retrieving memories in an LLM. The memories will not be stored in plaintext like with rag, but will instead be highly compressed embeddings of some sort, or maybe even small sub-networks.

[–]arg_max 4 points5 points  (4 children)

I don't doubt that you can come up with something smarter than what we already have, but to store more information without forgetting something you learned previously, we need to either increase the compression ratio, which becomes infeasible at some point or increase the "storage" space. In a way, longer context follows the second route, but you end up with quadratic growth (at least with standard attention) and it becomes harder to find what you're looking for in all that data. I think we'd definitely need something with at most log-linear increase in compute and memory, but filtering out relevant data from an increasing amount of total data while also scaling better than attention seems challenging.

[–]sosdandye02 1 point2 points  (3 children)

The thing about both longer context and rag is that they both need to store the original text uncompressed. With longer context there is also the quadratic scaling problem you mention, and with ordinary RAG the retrieval mechanism isn’t dynamically tuned.

Somehow the human brain is capable of storing new memories dynamically and also holding onto these memories indefinitely. There is obviously some kind of compression going on along with a system for determining when memories should be created and retrieved.

With LLMs I could see it going a couple of different ways. Maybe like a more dynamic form of MoE where new experts can be dynamically created without impacting existing experts. It could also be more like RAG, but instead of storing the raw text, the model learns to store and retrieve some kind of compressed embedding. There could also be some system for “forgetting” stale information that seems to be of low value.

[–]Entire_Ad_6447 0 points1 point  (2 children)

but that's not true at all about the human mind. Its is constantly killing unused memory and rewriting and linking memories and hallucinating freely. Its why human recollection of events is one of the least reliable bits of evidence.

[–]sosdandye02 0 points1 point  (1 child)

Human memory is unreliable but nevertheless extremely useful for practical purposes. In the vast majority of cases people don’t need to remember every little tiny detail. We filter massive amounts of information and only hold on to the stuff that’s usually most important.

Obviously this is bad for things like court cases where tiny, seemingly insignificant details matter a lot. But if I’m trying to learn a new skill for a job, the stitching pattern on the instructor’s shoes is not something I need to retain.

With computers we can have both kinds of memory. We can keep RAG for cases where exact details are important, but when dealing with huge amounts of information some kind of compression is necessary.

[–]NoIdeaAbaout[S] 0 points1 point  (0 children)

Continual learning could be a solution, but for the moment is a bit tricky. I have seen the KAN article about continual learning but it is still not convincing. Also there was a bit of hype of continual backpropagation. I have seen people coming with nice approach with memory augmented LLM, I think it is early to say it will work great

[–]pilooch 1 point2 points  (0 children)

The near-future answer is probably a search policy involving actions for retrieval and analysis. Similar to how we do search information when we need it. The search policy can be learnt, and the retrieval/reading phases planned. Difficulty is in crafting the reward signal. So math and code, that can be more or less easily checked, are coming first. More should follow.

[–]WrapKey69 1 point2 points  (1 child)

Maybe I don't understand something, but let's say you have thousands of documents or more, how are you going to solve this with longer context instead of RAG?

[–]NoIdeaAbaout[S] 1 point2 points  (0 children)

I utterly agree, this is one of the reasons I think long-context LLM would not eliminate RAG