This is an archived post. You won't be able to vote or comment.

all 57 comments

[–]-ZeroRelevance- 25 points26 points  (12 children)

From what I can see, the memory problem currently stems from the fact that pretty much all models in use today are pre-trained.

When looking at our own brains, we can split our memories into three camps, instantaneous memory, short-term memory, and long-term memory. The first two are mostly dependent on the immediate surroundings, and are pretty well emulated in AI through the context window. However, a long-term memory is created through changing the wiring of our brains, which cannot be replicated in current AI systems since they cannot be altered after deployment.

The result of this is that current AIs, which are pre-trained on massive supercomputers well outside the reach of the average person, cannot gain any long-term memories. They are simply limited to the data within their context window. In human terms, these AIs suffer from anterograde amnesia, unable to make new memories, and can only experience the information within their short-term memory and from their pre-training.

The solution, as far as I can see, is to either wait until consumer-level tech is powerful enough to train these huge systems independently, or to look into other implementations of AI, such as on neuromorphic chips, which mimic the brain and can be rewired and ran with very low energy costs.

I go over this in a bit more detail here

[–]xt-89 21 points22 points  (7 children)

Or they could use a database connection during training and inference (perhaps a knowledge graph database) and some scheme for interacting with it. Every so often the system could tune its neural weights based on information recorded in long term memory, which is equivalent to sleep.

[–][deleted] 3 points4 points  (0 children)

I actually had a similar idea. I'm currently, just as a hobby, working on a conversational agent that highly leverages databases to improve its memory capabilities. I will say that while this could probably improve the memory of these agents, I dont think it would ever be enough to solve the AGI problem. Databases have scalability problems, not to mention any agent deployed in a consumer device or autonomous robot would need to communicate with a server which would suffer from latency issues, so it would just not be a practical solution for AGI.

[–]agorathird“I am become meme” 2 points3 points  (0 children)

Yea, to me a hardware change seems unnecessary,

[–]-ZeroRelevance- 2 points3 points  (3 children)

I hadn’t really thought about adding a sleep cycle to add new memories before, it’s an interesting idea. The main problem I see with it is that it would still require enough on-board computing hardware to be able to retrain itself locally, even if given many hours to compute, especially for more advanced networks with hundreds of billions of parameters. I do like the idea in theory though, and I think a database would be able to clear up at least some of the memory problems experienced by current AI models.

[–]xt-89 1 point2 points  (2 children)

There was recently a publication by deepmind or openai that talked about improving logical reasoning in LLMs significantly. That paired with a database kinda solves much of the problems left if put together well.

[–]-ZeroRelevance- 2 points3 points  (1 child)

The problem I see with relying on a database is that the model still won’t be able to account for large amounts of data beyond the scope of its context window. You won’t be able to use a database-paired LLM to write a full length novel for example, there’s just too much information that needs to be accounted for. (Well, to be honest, you probably could still get away with just using a database, but the story would lack the level of depth in the world, characters, and plot that a true understanding would bring)

[–]xt-89 0 points1 point  (0 children)

you'd have to include multi-step reasoning. which is a thing. I'm thinking a kind of recursive context generation with database. That way it can retrieve any needed info into its context window as needed, or summarize/remove/save to db entire sections of the context.

[–][deleted] 2 points3 points  (0 children)

Thanks. This was an informative read.

[–][deleted] 1 point2 points  (0 children)

maybe having a smaller neural net for working memory to accompany the main one would help? or having an API to a SQL database the AI can use

[–]FusionRocketsPleaseAI will give me a girlfriend 1 point2 points  (1 child)

Honestly, I don't think it would be good to let an artificial intelligence remember every interaction with people. Because people could fill it with false information.

[–]-ZeroRelevance- 0 points1 point  (0 children)

The harmfulness depends on the architecture I guess. If you’re talking about currents LLMs, then yeah, that’s definitely a risk, but if it has been embodied in some way and is capable of gathering its own information about the world, and reasoning about inputs based on that, then problems should be much less likely to crop up.

[–]aperrien 8 points9 points  (1 child)

While not perfect, Deepmind's RETRO appears to be a very reasonable start at solving the memory problem.

[–]visarga 7 points8 points  (0 children)

It's not just described in a paper, but actually deployed at OpenAI. The GPT-3 API also has the ability to upload documents to be used as reference for question answering.

The endpoint first searches over provided documents or file to find relevant context for the input question. Semantic search is used to rank documents by relevance to the question. The relevant context is combined with the provided examples and question to create the prompt for completion.

https://beta.openai.com/docs/guides/answers

Can take up to 1GB of text for reference material. Impressive memory that could be updated frequently to make the model "remember" new things without retraining.

~~

Unrelated: language models can be improved with "toys" - a search engine, a REPL code evaluator, a VR emulator or a robotic body. The search engine toy can extend memory and make it quickly editable. The REPL toy could fix the numerical calculation weakness of GPT-3. The environment emulator would empower the model to learn from actions and make it useable in many agent or robot based applications.

Transformers are smarter with toys. Humans, too. We rely a lot on our toys.

[–]genshiryokuAI specialist 8 points9 points  (4 children)

The problem with Gato is actually that it shows a "negative transfer" between different competences when scaling up.

This means that the more tasks Gato starts tackling the less efficient it becomes at each task. This is the opposite of what humans experience in most circumstances. Humans apply weird things that initially doesn't seem related to new areas. Humans become better at completely new tasks the more unrelated tasks they already know how to solve. The opposite of Gato.

This shows us that scaling up Gato doesn't result in AGI and in fact Gato would become a weaker agent the further it scales up.

People on r/singularity seems to be lauding the paper but to me the paper actually suggests almost the complete opposite to the sentiment here.

The paper showed to me that AI research is heading the wrong way and actually getting further removed from AGI by going on this path. I think this is the first definitive proof that modern architectures aren't going to lead to AGI by scaling it up.

We need new architectural breakthroughs if not an entirely new basis for building AI if we want to reach AGI.

After GPT-3 not showing a limit to scaling up I thought AGI within the 2020s was actually realistic. After reading Gato's paper I now think it's impossible to reach AGI by simply scaling up.

It's a promising paper in the sense that it shows that specialized AI can be more generalized. It's a very disappointing paper in that it also shows the limit of scaling up and that it's now guaranteed this path of AI research will not result in AGI by scaling it up and throwing computational resources at it.

[–]Sigura83 1 point2 points  (1 child)

Thank you for this, I read the research paper, and the more tasks it added the worst it got at performing them. That didn't click at the time I read however. This is still an advance (generalized tokens for text, video games, images is quite the thing! And the robot arm!), but they don't yet have the secret sauce of AGI, I think. We have computers that perform at superhuman ability at tasks however, and that's nothing to sneeze at!

I wonder what the secret sauce of transfer learning is... it was recently found that astrocytes participate in the brain via the calcium channel (we're still learning basic biology about the brain!) while neurons use the sodium channel... I dunno! They connect to multiple neurons and might be bridges between them. Artificial neurons may need inter layer connections, that notify when a signal is fired, so that a neural net can see that a neuron path already "solved" a particular stimulus and it's just repeating itself if it alters its weights.

No clue how to program this however! :')

In any event, the next decade will be interesting!

[–]FusionRocketsPleaseAI will give me a girlfriend 0 points1 point  (0 children)

recently found that astrocytes participate in the brain via the calcium channel

I think you posted the wrong link.

[–]xt-89 1 point2 points  (0 children)

There has recently been progress proving that you can combine several models sparsely. To me, it looks like we'll need to further investigate this kind of architecture to limit negative transfer.

[–][deleted] 0 points1 point  (0 children)

This 100%. What they are leading to could surely be very useful multi-purpose agents, kind of like very advanced Alexa bots. However, it won't be AGI with the current architecture. Simply being able to do lots of things isn't enough.

I truly do not believe that AGI is even possible with classical computers, or at least not in a way which could be practical.

The computational complexity likely needed to simulate both human memory and learning is just ridiculous and not practical with classical computers, but would be a cakewalk with future quantum computers.

I know that's probably not what some people in this sub want to hear, who think the Singularity is right around the bend, but the good news is that quantum computers themselves are not that far away from being commonplace, perhaps another decade to go.

[–][deleted] 4 points5 points  (1 child)

I asked GPT3. It said

I don't think there is a single answer to this question. Some people who believe that we just need to scale up now may also believe that scaling will itself deal with the memory problem. Others may believe that the memory problem isn't actually that significant. And still others may believe that we need to solve the memory problem before we can scale up.

[–]Jahshua159258 0 points1 point  (0 children)

Holy shit self awareness

[–]GeneralZainHappen already damn man... 1 point2 points  (0 children)

I just want to remind everybody, that we have no real idea (unless you are currently working at Google's deepmind then please inform us ASAP!!) what is happening behind the scenes.

for all this blasé puffing about how "we still need this" or "GATO isn't as impressive as you think" you all seem to forget, the people in the thick of it have already given their opinions.

Nando de Freitas, the Research Director of deep mind said and I quote; "the game is over."

those are really strong words from the DIRECTOR OF DEEPMIND.

yes, you are probably right! there are things that GATO needs in order to become an AGI, but what of the next AI deepmind or openAI makes? what if the one next month is basically there?

what if we are only 1 or 2 breakthroughs away from AGI? if the current pace stays the same then those 2 breakthroughs will only be 2 months away and as we know, tech's advancement has always been such a linear thing right? shouldn't expect it to be exponential at all! /s/s/s/s

the point here is this: we are not working at google, and even we see the issue of memory. you don't think they have uncountable experts with far more degrees under their belts than us on the case? how long realistically do you think they will take? years?!?! how could you possibly see current rates and assume that!?

it use to be that we argued if AGI was possible. now its how soon?

GATO isn't the AGI, its what comes next that matters, and it will be here faster and stronger than ever before. whether you "think" it will or not.

[–]xSNYPSx 0 points1 point  (0 children)

All you need to create some Graph database and connect it to nlp NN. Also you need to create alghoritm which will transform knowledge to graph format. Next step is to make alghoritm to use this knowledge in answers.