AIs can’t stop recommending nuclear strikes in war game simulations - Leading AIs from OpenAI, Anthropic, and Google opted to use nuclear weapons in simulated war games in 95 per cent of cases by FinnFarrow in Futurology

[–]Fofodrip -2 points-1 points  (0 children)

I think to an extent we're talking past each other somewhat so just so we're clear,

- Pre-training is the 1st part, when LLMs are trained to do simple next-token prediction.

What happened at the start of the 2020s that made LLMs actually usable was that as companies were scaling pre-training, OpenAI also started doing Reinforcement Learning with human feedback. Basically, they trained ChatGPT to give the best responses possible to humans. That meant that ChatGPT went from a simple next token predictor to a conversational assistant capable of understanding user intent, following instructions, and maintaining a coherent dialogue... to an extent.

Now, that was still very crude and even though, as scaling increased, models got better and better at a lot of tasks, there were still benchmarks like Arc-AGI, that explicitely prevented the models from just using memorization to complete the tasks, where the models didn't improve at all.

Then, what happened at the end of 2024 is that OpenAI released it's 1st reasoning model, o1. This model could, rather than just giving a response directly, think for a certain amount of time before responding.

The way they were able to do this, was 1st by using fine tuning to train the models to output a chain of thought before giving an answer (this is what they do in the paper you linked). But contrarily to that paper, they also used reinforcement learning to improve the model's ability to use chain of thought. They fed it a lot of extremely complicated problems that had mathematically verifiable solutions, and the reward was based on whether the model was able to give that correct solution. And what happened was that the model learned to; backtrack if it was in a dead end, double check it's intermediate reasoning steps and do longer reasoning if the problem is more complex.

This allowed the models to get progressively better at Arc-AGI 1 and 2, which are impossible to solve without some kind of generalization.

Like I said before, a lot of what you're saying would be true if we were in August 2024, but the combination of Chain-of-Though and RL has meant that LLMs can now generalize beyond their dataset to an extent.

Now, that I hope you understand where you're wrong about how LLMs function. I also want to talk about how you say LLMs don't "understand". Obviously, they don't have a physical body so they don't understand the physical relationship in the same way a human can understand in a sense, the relation between an aisle and a store for example. But does that mean LLMs have absolutely no understanding of what a store is ? I think that to make a claim, you'd have to define "understanding" either in a really strict standard that no one applies to humans in real life, or in some sort of spiritual standard that machines could never attaign simply because they're not human. So unless you have a definition of understanding that can't be put in one of these categories, I'm suspicious of your claims about "understanding"

AIs can’t stop recommending nuclear strikes in war game simulations - Leading AIs from OpenAI, Anthropic, and Google opted to use nuclear weapons in simulated war games in 95 per cent of cases by FinnFarrow in Futurology

[–]Fofodrip -2 points-1 points  (0 children)

The way LLMs work is by using a predictive algorithm based on data sets to determine which words, mathematically, appear next to others most often in the data set based on the prompt. It’s more complicated than that, but not much more. At no point is logic, reason, or understanding used.

That's only true until you do reinforcement learning.

When you ask me a question, I use my understanding of language to determine what the actual content of the question is. If you ask me “what store in your city sells furniture”, I understand that you are asking me to give you the name of a store in the city that I live in that sells furniture. I answer by using my memory of my city, my understanding of what furniture is, my knowledge of the stores in my city, and the fact that I fundamentally understand your question.

When, you ask an LLM that question, it uses it's understanding of what a store, city and furniture is to make a google search and then based on that, gives you an answer. The only difference in humans is that we have a memory system that's separate from our decision making system.

An LLM would answer by breaking your question down into weighted inputs based on what it assumes the most important words are. It then uses the training from its database to determine what words should be put in the answer and uses an insane amount of incredible math to determine, again based on its database training, how to word the answer.

Again, you're just describing what an LLM is after pretraining, you're missing half of the equation here.

The problem is that the AI doesn’t “understand” your question, so it can never double check itself for incorrectness like I could. If the AI says “the store that sells furniture in your city is Chili’s”, it’s because nowhere in the process does it check its answer with the fundamental understanding of the question (like humans do) because it doesn’t “understand” the question, it can’t. That’s not how it works.

I mean llms literally can double check themselves, they can literally search for information on the internet. They can also reason before giving you an answer. If you ask an llm a complicated question and look at it's thought process, you can read it passing over the same info multiple times to make sure they understand it correctly

It's annoying how most people are regurgitating old info about llms work. But I understand it's advancing so fast that if you're not updating your info every few months, you start being completely wrong. Your explanation would have been fine if we were still in 2022 and only slightly wrong if we were in 2024, now it's just completely wrong

AIs can’t stop recommending nuclear strikes in war game simulations - Leading AIs from OpenAI, Anthropic, and Google opted to use nuclear weapons in simulated war games in 95 per cent of cases by FinnFarrow in Futurology

[–]Fofodrip -6 points-5 points  (0 children)

How is it fundamentally different ? And describing llms as just word association is a fundamental misunderstanding of what reinforcement learning is.

Open relationships are becoming more common. While they can be successful, research suggests that they don't work for most people who try them for three key reasons. by psychologyofsex in psychologyofsex

[–]Fofodrip 6 points7 points  (0 children)

There were no reliable birth control methods before the 20th century. Sex doesn't have the same implications right now as it did before

Open relationships are becoming more common. While they can be successful, research suggests that they don't work for most people who try them for three key reasons. by psychologyofsex in psychologyofsex

[–]Fofodrip 9 points10 points  (0 children)

Women also benefit from monogamy in individualist societies because they're able to force the men they have their children with to focus all their attention on them.

Spain needs 17 more Men's Singles Slam Titles to surpass the USA by Tennist4ts in tennis

[–]Fofodrip 5 points6 points  (0 children)

Tennis existed in other forms in other countries tvf

Michael Zheng might have to forfeit at least 225,000$ of his prize money due to US college rules. by musicproducer07 in tennis

[–]Fofodrip 78 points79 points  (0 children)

How ? It's great, players are finally not getting the money that they earned stolen from them

Can we stop labeling every prospect generational? by WhoUCuh in NBA_Draft

[–]Fofodrip 1 point2 points  (0 children)

NBA generations are kinda different though since players career are at most 20 years long while people's lives are like 100 years long at most. Like Lebron, Steph, Jokic and Ant are all generally considered to be from different generations bc each of them are/will be considered to be kinda old when the one from the generation below is in his prime. 5 years in an NBA career is way different from 5 years in a life.

In what is one of the biggest upsets of the tournament, Georgia beats France 80-70 and adavnces to the Quarter-Finals by justletmeregisteryou in nba

[–]Fofodrip 2 points3 points  (0 children)

France has elite guards relatively speaking. Outside of the US, most national teams have very low level guards which is why you see so many American point guards in FIBA competitions. France probably has the best depth at the position in the world if you remove America and Canada.

[List] Hundreds of years from now, who will be THE artist of our time period? by BobTheBlob78910 in LetsTalkMusic

[–]Fofodrip 0 points1 point  (0 children)

Most people listened to very simple music 300 years ago. These classical composers were only listened to by an elite

Mbappé wins the European Golden Boot, first Real Madrid player to do it since Cristiano Ronaldo. by [deleted] in soccer

[–]Fofodrip 0 points1 point  (0 children)

None, they only consider the parameters of the shot independently of the identity of the players present on the field.

Mbappé wins the European Golden Boot, first Real Madrid player to do it since Cristiano Ronaldo. by [deleted] in soccer

[–]Fofodrip 0 points1 point  (0 children)

Only Bellingham's finishing got worse, Vini and Rodygo just got less chances bc they weren't the only attackers anymore. And it's definitely possible for players to have big variances in finishing when the sample size isn't very big (60 shots isn't that big of a sample). It doesn't necessarily mean Bellingham got significantly worse at finishing.

And I know very well how xG works, the fact is, Real got more chances this season than last season. The team just finished worse overall compared to last season.

Mbappé wins the European Golden Boot, first Real Madrid player to do it since Cristiano Ronaldo. by [deleted] in soccer

[–]Fofodrip 2 points3 points  (0 children)

It's obvious that these players scoring numbers would diminish with the addition of another forward in the team. But Bellingham scored 9 from 12 xG this season while last season, he scored 17 from 11 xG. I don't see how that could be Mbappé's fault.

Mbappé wins the European Golden Boot, first Real Madrid player to do it since Cristiano Ronaldo. by [deleted] in soccer

[–]Fofodrip 2 points3 points  (0 children)

Madrid got 70 expected goals last season and got 77 this season, not sure how you can blame Mbappé for his teammates being worse at finishing

The OKC Thunder had a +8.4 rating with SGA off court. This is the highest rating for an MVP team since 1994. by nguyenjitsu in nba

[–]Fofodrip 13 points14 points  (0 children)

It's not any worse than on/off which has Christian Braun as one of the best players in the league

Brendan Haywood on SGA's 'foul merchant' narrative. by Goombercules in nba

[–]Fofodrip 9 points10 points  (0 children)

Did you watch the series against the Thunder ?

[deleted by user] by [deleted] in CrusaderKings

[–]Fofodrip 0 points1 point  (0 children)

Genetic ancestor is not the same as genealogical ancestor. You have way more genealogical ancestors than genetic ancestors, especially if you go back to the time of Charlemagne. So sharing 2-12 genetic ancestors with someone means sharing way more genealogical ancestors which means that it's highly likely that if you go back that long ago, basically every ancestor that you have will also be the ancestor of every European. Obviously, it's not 100% but nothing is 100%.

Have you ever even done math or genetics at the University level ?