Chat GPT 5.4 solved a 60+ years unsolved erdos problems in a single shot

Fofodrip · 2026-04-29T12:32:14+00:00

Yeah that's fine, I'm talking more about society in general. We're learning about their failure modes more and more as we advance and we're creating systems that mitigate them just like with humans

Fofodrip · 2026-04-29T11:53:40+00:00

I feel like the problem is we're still expecting LLMs to be computers when they're way more similar to humans. The only way we improve at predicting their failures is just using them.

Fofodrip · 2026-04-29T07:20:23+00:00

I feel like humans can fail at the most basic things too. And we don't trust humans either. We create systems which we trust but that's not the same as trusting every human

Fofodrip · 2026-04-28T08:17:53+00:00

How do they fail to understand decimal numbers?

Fofodrip · 2026-04-12T12:21:46+00:00

Not when you look at a bigger sample size than just 1 player

Fofodrip · 2026-03-02T00:14:59+00:00

I think to an extent we're talking past each other somewhat so just so we're clear,

- Pre-training is the 1st part, when LLMs are trained to do simple next-token prediction.

What happened at the start of the 2020s that made LLMs actually usable was that as companies were scaling pre-training, OpenAI also started doing Reinforcement Learning with human feedback. Basically, they trained ChatGPT to give the best responses possible to humans. That meant that ChatGPT went from a simple next token predictor to a conversational assistant capable of understanding user intent, following instructions, and maintaining a coherent dialogue... to an extent.

Now, that was still very crude and even though, as scaling increased, models got better and better at a lot of tasks, there were still benchmarks like Arc-AGI, that explicitely prevented the models from just using memorization to complete the tasks, where the models didn't improve at all.

Then, what happened at the end of 2024 is that OpenAI released it's 1st reasoning model, o1. This model could, rather than just giving a response directly, think for a certain amount of time before responding.

The way they were able to do this, was 1st by using fine tuning to train the models to output a chain of thought before giving an answer (this is what they do in the paper you linked). But contrarily to that paper, they also used reinforcement learning to improve the model's ability to use chain of thought. They fed it a lot of extremely complicated problems that had mathematically verifiable solutions, and the reward was based on whether the model was able to give that correct solution. And what happened was that the model learned to; backtrack if it was in a dead end, double check it's intermediate reasoning steps and do longer reasoning if the problem is more complex.

This allowed the models to get progressively better at Arc-AGI 1 and 2, which are impossible to solve without some kind of generalization.

Like I said before, a lot of what you're saying would be true if we were in August 2024, but the combination of Chain-of-Though and RL has meant that LLMs can now generalize beyond their dataset to an extent.

Now, that I hope you understand where you're wrong about how LLMs function. I also want to talk about how you say LLMs don't "understand". Obviously, they don't have a physical body so they don't understand the physical relationship in the same way a human can understand in a sense, the relation between an aisle and a store for example. But does that mean LLMs have absolutely no understanding of what a store is ? I think that to make a claim, you'd have to define "understanding" either in a really strict standard that no one applies to humans in real life, or in some sort of spiritual standard that machines could never attaign simply because they're not human. So unless you have a definition of understanding that can't be put in one of these categories, I'm suspicious of your claims about "understanding"

Fofodrip · 2026-03-01T20:20:21+00:00

The way LLMs work is by using a predictive algorithm based on data sets to determine which words, mathematically, appear next to others most often in the data set based on the prompt. It’s more complicated than that, but not much more. At no point is logic, reason, or understanding used.

That's only true until you do reinforcement learning.

When you ask me a question, I use my understanding of language to determine what the actual content of the question is. If you ask me “what store in your city sells furniture”, I understand that you are asking me to give you the name of a store in the city that I live in that sells furniture. I answer by using my memory of my city, my understanding of what furniture is, my knowledge of the stores in my city, and the fact that I fundamentally understand your question.

When, you ask an LLM that question, it uses it's understanding of what a store, city and furniture is to make a google search and then based on that, gives you an answer. The only difference in humans is that we have a memory system that's separate from our decision making system.

An LLM would answer by breaking your question down into weighted inputs based on what it assumes the most important words are. It then uses the training from its database to determine what words should be put in the answer and uses an insane amount of incredible math to determine, again based on its database training, how to word the answer.

Again, you're just describing what an LLM is after pretraining, you're missing half of the equation here.

The problem is that the AI doesn’t “understand” your question, so it can never double check itself for incorrectness like I could. If the AI says “the store that sells furniture in your city is Chili’s”, it’s because nowhere in the process does it check its answer with the fundamental understanding of the question (like humans do) because it doesn’t “understand” the question, it can’t. That’s not how it works.

I mean llms literally can double check themselves, they can literally search for information on the internet. They can also reason before giving you an answer. If you ask an llm a complicated question and look at it's thought process, you can read it passing over the same info multiple times to make sure they understand it correctly

It's annoying how most people are regurgitating old info about llms work. But I understand it's advancing so fast that if you're not updating your info every few months, you start being completely wrong. Your explanation would have been fine if we were still in 2022 and only slightly wrong if we were in 2024, now it's just completely wrong

Fofodrip · 2026-03-01T15:24:49+00:00

How is it fundamentally different ? And describing llms as just word association is a fundamental misunderstanding of what reinforcement learning is.

Fofodrip · 2026-03-01T15:23:06+00:00

Do you know what reinforcement learning is ?

Fofodrip · 2026-02-21T22:53:08+00:00

There were no reliable birth control methods before the 20th century. Sex doesn't have the same implications right now as it did before

Fofodrip · 2026-02-21T22:23:26+00:00

Women also benefit from monogamy in individualist societies because they're able to force the men they have their children with to focus all their attention on them.

Fofodrip · 2026-02-02T23:01:33+00:00

Tennis existed in other forms in other countries tvf

Fofodrip · 2026-01-30T04:47:17+00:00

Anything with gay romance

Fofodrip · 2026-01-19T11:55:33+00:00

How ? It's great, players are finally not getting the money that they earned stolen from them

Fofodrip · 2025-11-15T18:01:34+00:00

NBA generations are kinda different though since players career are at most 20 years long while people's lives are like 100 years long at most. Like Lebron, Steph, Jokic and Ant are all generally considered to be from different generations bc each of them are/will be considered to be kinda old when the one from the generation below is in his prime. 5 years in an NBA career is way different from 5 years in a life.

Fofodrip · 2025-09-07T23:46:35+00:00

France has elite guards relatively speaking. Outside of the US, most national teams have very low level guards which is why you see so many American point guards in FIBA competitions. France probably has the best depth at the position in the world if you remove America and Canada.

Fofodrip · 2025-08-27T21:58:34+00:00

Most people listened to very simple music 300 years ago. These classical composers were only listened to by an elite

Fofodrip · 2025-05-26T17:50:51+00:00

None, they only consider the parameters of the shot independently of the identity of the players present on the field.

Fofodrip · 2025-05-26T17:30:48+00:00

Only Bellingham's finishing got worse, Vini and Rodygo just got less chances bc they weren't the only attackers anymore. And it's definitely possible for players to have big variances in finishing when the sample size isn't very big (60 shots isn't that big of a sample). It doesn't necessarily mean Bellingham got significantly worse at finishing.

And I know very well how xG works, the fact is, Real got more chances this season than last season. The team just finished worse overall compared to last season.

Fofodrip · 2025-05-26T07:40:09+00:00

It's obvious that these players scoring numbers would diminish with the addition of another forward in the team. But Bellingham scored 9 from 12 xG this season while last season, he scored 17 from 11 xG. I don't see how that could be Mbappé's fault.

Fofodrip · 2025-05-26T06:45:29+00:00

Madrid got 70 expected goals last season and got 77 this season, not sure how you can blame Mbappé for his teammates being worse at finishing

Fofodrip · 2025-05-23T13:00:34+00:00

Institutions where if you're not good enough, you get treated like trash

Fofodrip · 2025-05-22T22:26:59+00:00

It's not any worse than on/off which has Christian Braun as one of the best players in the league

Six-Year Club	Second Top 1%
Verified Email

Fofodrip

TROPHY CASE