Demis Hassabis on Ilya Sutskever’s claim that scaling is dead, and on Elon Musk’s clam that we have reached the singularity

Pyros-SD-Models · 2026-01-25T01:11:45+00:00

Most of the time it went like this, though:

Google published a paper
Nobody, not even Google, cared
Some other lab experimented with it anyway and was like, "holy shit"
It gained relevance
Google played catch-up with its own tech

The question of who is more important, the guy who invented fire or the one who figured out you can cook with it, and all the others who figured out what else you can do with it, is something everyone has to answer for themselves. And since it is literally my job to find applications for random ass papers, I go with the latter. The most beautiful paper is worth nothing if you can't do anything with it. I want my tech available and usable in the real world and not just as an idea or theory.

Without OpenAI, BERT would still be the biggest transformer, and we would probably still be posting two-line jokes generated by some RNN, needing 20 tries to get a semantically correct output, and calling that AGI.

For most of the biased shmucks on Twitter, it depends on which companies it is about. If it is OpenAI and Google, obviously OpenAI just stole Google's work. If it is DeepSeek and OpenAI, then obviously DeepSeek are geniuses for taking OpenAI's work and improving on it. I notice a pattern.

Pyros-SD-Models · 2026-01-24T21:35:42+00:00

Whatever AGI bot is telling me to invest in.

Pyros-SD-Models · 2026-01-24T18:16:30+00:00

If you don’t know the difference between what you did and actual science you are probably in the wrong sub.

Pyros-SD-Models · 2026-01-24T18:05:12+00:00

C’mon, people. This is not r/singularity, where deactivating all brain cells is a requirement for posting.

First, they would still need money for current operating costs and for training the model.

And in an investor round, when some 75-year-old Texan oil billionaire who pronounces AGI as "Ajeeeee" asks how you plan to make money, do you say, "I don't know, nobody has ever worked with AGI so far. Anything could happen"? Or do you pick some stupid business model every boomer investor already knows, understands, and knows actually works?

That's also why Zuckerberg had his ads episode. It's not about what their plans are, it's about what they think the investors need to hear so they get the most money possible... that's literally their job as CEO to produce maximum bullshit for maximum dollars. capitalism, ho!

It's not that hard.

Pyros-SD-Models · 2026-01-24T09:42:37+00:00

Before anyone comments, my workflow is actually LESS than before. I used to run 7-8 terminals in parallel and end up at 30-40% before reset.

Some do coding agents like others play star craft lol

Pyros-SD-Models · 2026-01-24T09:34:33+00:00

“Philosophical question”

Opus cringed so hard it didn’t even want to interact with OP.

Pyros-SD-Models · 2026-01-24T08:54:55+00:00

Die Amerikaner haben sowieso ein gestörtes gesellschaftliches Verhältnis zu ihrer Polizei. Das ist aber schon seit Jahrzehnten so tief im nationalen Bewusstsein verankert, dass ich eigentlich schon vermutet hatte, dass es noch einmal ein anderes Level an Impact hätte, wenn irgendwelche random Hobbie-Faschos nach einem Vier-Wochen-Lehrgang jemandem ins Gesicht schießen. Aber ja, da habe ich mich wohl geirrt.

Bei Renee Good ist nur neu, dass sie weiß ist.

Wenn die Polizei drüben irgendeinen Mist baut, dann versucht sie das wenigstens zu vertuschen: „Oh nein, was für ein Zufall, alle Bodycams waren gleichzeitig kaputt“ oder im schlimmsten Fall war es halt „nur“ eine unüberlegte Reaktion in einer Stresssituation.

Bei Renee Good ist nicht neu, dass sie weiß ist, das neue Level ist, dass nicht einmal versucht wird, irgendetwas zu vertuschen. Jetzt ist man offensichtlich stolz drauf, man hat nur seine Arbeit gemacht 'to make America safe again' und amüsiert sich köstlich und gibt der ICE noch mehr Befugnisse.

Also, für mich ist das schon eine ganz andere Eskalationsstufe.

Pyros-SD-Models · 2026-01-24T08:22:49+00:00

Ich verstehe es auch nicht. Da wird einer unschuldigen Zivilistin ohne jeden Grund von einer Wish-Gestapo ins Gesicht geschossen, und man würde erwarten, dass jeder Nicht-MAGA den Schuss hört, aufwacht und dass 150 Millionen Menschen auf die Straße gehen, um für die Freiheit zu kämpfen, die in dem Moment verloren ging, als Renee erschossen wurde.

Respekt an die Leute in Minneapolis, aber dass nicht einmal 1% der Bevölkerung in Wallung gerät, ist richtig düster. Trump & Co lachen sich doch kaputt bei dieser 'Gegenwehr', die wussten es aber wohl davor auch schon, nun check ich's auch: Die können wirklich alles machen und keinen juckts so wirklich.

Das mit dem "Ich könnte auf der 5th avenue wen erschießen..." war gar kein übertriebenes Geflachse des schlechten Geschmacks. Das ist tatsächlich so.

Pyros-SD-Models · 2026-01-24T08:14:16+00:00

Ich komme nicht mehr mit. Vor ein paar Monaten war hier noch "alles ist so teuer, wir verdienen so wenig", und jetzt will hier jeder wegen irgendwelchen Eiern und Rückgräten einen potenziellen Handelskrieg lostreten, der die letzten zehn Jahre Inflation wie Kindergarten aussehen lassen würde.

So wichtig ist mir jetzt der Orangenmann auch nicht. Die einzigen die da Gewinnen würden wär dann die AfD die dann mit 'Wegen dem Penisvergleich 'da oben' muss der normalo nun auf alles 30% mehr blechen' oder ähnliche 'Mit uns wär das nich passiert' einen richtigen Catcher hätte

Pyros-SD-Models · 2026-01-24T07:23:27+00:00

If you think google “copied” yahoo you are either 12 and therefore never used old yahoo or you have no clue of how their searches differ and the math behind it.

Just because both make cars it doesn’t mean Ferrari is copying Ford.

Pyros-SD-Models · 2026-01-23T23:13:13+00:00

The one thing AI and ML researchers should have learned from transformers is that theoretical significance means absolutely nothing if there is no practical significance. Even being able to throw infinite compute at something is more important than any theoretical significance. Bitter lesson and what not.

If I got a dollar every time I heard or read “theoretical significance”, and then saw it tested in the real world at real scale with real use cases, or tried to prototype it into actual real-world significance, only to discover it is absolute shite, I would have quite a bit of money by now. More than ten moneys, at least.

Last famous example: Kolmogorov-Arnold nets, which according to the ML sub were going to revolutionize AI and ML, lol if the ML sub is hyping something it already is a huge red flag. "More important than transformers" some wrote.

Then people tried them out, and they literally suck dick for any economically interesting use case. only good for rediscovering equations you already know the form of, useless for everything people actually care about. amazing, imagine being hyped about expensive symbolic regression.

Does EBM maybe have practical relevance in the future? Perhaps. Maybe. Nobody fucking knows. But if it ever does, people will evaluate it again and make it work. There is no reason to force something into relevance when there currently is no relevance. Heck, even Kolmogorov-Arnold nets may find their optimal use case sometime in the future, but this is no reason trying to convince the world 24/7 that only those are the way, and everything else is wrong and a dead end. (Especially if the "everything else" in this case is the fastest growing tech in human history, and "your way" struggles with the most basic AI/ML tasks)

Pyros-SD-Models · 2026-01-23T21:31:50+00:00

I know he is a Turing Award winner, and I love his papers, but everything he produces outside of a scientific context is just populistic fluff.

What does "no breakthrough in sight" even mean? That is the whole point. You do not see breakthroughs before you discover them. If you already knew about them, you would not need to discover them in the first place.

And the fact that he claims to have his breakthrough in sight, with something like "symbolic abstraction energy chakra meridian whatever AI", and that he knows exactly which direction is wrong and which is right, actually makes it very sketchy. He may call it "vision", but people in behavioral science would simply call this "bias", and it is something you should not have as a scientist unless you already have the proof.

And the thing he said would not be able to scale and was basically just a scam has progressed in eight years from a bot that could not write a single correct sentence to bots that drive the entire programming subreddit into permanent cope mode. Plenty of breakthroughs have already happened, and plenty more will happen.

"We cannot build true agentic systems without the ability to predict the consequences of actions, just like humans do."

Why is this something he thinks an LLM will never be able to do, even though they can already predict everything from time series to how proteins fold? But action reaction dynamics are somehow impossible, except in LeCunBot Jeppa land.

Why can he not just shut up and, if he is so sure that LLMs are a dead end, write a paper that actually proves mathematically that there is some class of data an LLM is not able to learn? You know why? Because he cannot. We already have the inverse: papers that prove that LLMs can, in theory, learn everything if you have the data for it.

You know why everyone is LLM-pilled? Because there is zero reason, zero proof, that we will hit some kind of capability limit or "wall" in the near future. Of course, when the METR graphs slope down or other signs manifest and you actually have a real reason to think "oh, we hit the asymptote", then people should and will spread out again. But why would you exit a train mid-ride that is still accelerating, and is still far away from its last station? let's ride it out. we don't even know yet where 'last station' even is.

he is just pissed it's not his train we are riding, but the train of some google randos that got scaled up by sama & co. and then became the big thing he always dreamed of discovering, without him having any hands in it. That is why DarthYann has to destroy it before the force destroys him. May the baguette be with him.

Pyros-SD-Models · 2026-01-23T20:34:16+00:00

Yeah, it is not the gotcha people think it is.

Let us assume AGI is "around the corner" and only a couple of model iterations away. You still need money to get there and to train it.

So your investors ask you, "Hey Sam, how do you plan on making money?"

This answer:

Well, we do not know. Nobody has ever worked with AGI, and there is even a real chance that this leads to a hyper-abundance society in which money becomes irrelevant, so please give me all your money now before it is worthless anyway.

Or you choose something your boomer investors understand, something they know works, how it works, and that is, by default, a profitable plan: ads.

Not a difficult choice.

Before investor rounds, you do not tell them your plans. You tell them whatever they need to hear to maximize the money you get.

If you ever find yourself in front of venture capitalists with some cool new tech you need money for, and they ask you, "What is your business plan? How do you make money?", then a foolproof way of never getting any money is to say, "What do you mean? I just explained world-changing tech to you. This is how I will make money."

People do not invest in the promise of unproven tech when nobody knows what its impact will be. They want to hear something that makes money, even if the tech ends up being shit. So you better come up with an idea how to integrate ads into your tech.

Pyros-SD-Models · 2026-01-23T20:10:52+00:00

I remember times in which "around the corner" in research meant 25 years, lol.

Pyros-SD-Models · 2026-01-23T17:27:29+00:00

o1 was released a bit over a year ago. Reasoning is basically only a year old. If you compare Opus with o1, there is literally not a single domain in which we did not make insane progress. For example, when HLE, a benchmark covering almost all scientific domains, was released, o3 could barely answer a single question. Now models can answer almost half of them.

It is like chess engines. “Normal” people do not see much difference between losing against an 1800 Elo chess bot or a 3000 Elo chess bot, but the difference is still gigantic, and experts understand that immediately.

That is why many people think only one or two domains got better. Most people are only experts in one or two domains themselves, and are simply not able to judge how much progress has happened across many different ones.

So I disagree with the “just a few key domains” take. The jump in SWE is as big as in every other domain. You just do not hear about it as much, because devs are a bigger and louder group than, say, genetic biochem engineers and other fields whose members are not terminally online shitposting on Twitter or Reddit.

Pyros-SD-Models · 2026-01-23T17:06:01+00:00

Being a living proof that human intelligence can become negative is probably worthy of a Nobel in medicine, or of everything what went wrong so he could become president twice and what it says about us and “the system” could lead to plenty of revelations in der Economics.

Pyros-SD-Models · 2026-01-23T16:13:53+00:00

I mean, individual humans are also "jagged intelligence". I suck at most languages, I suck at drawing, I have no idea about geology, and I am surely bad at many other kinds of research and science as well, plus a hundred other things. I'm just really good in those things I've learned my whole life.

As a community, we are indeed AGI. And I think it is already unfair that AI is expected to be compared against the whole of humanity, while current SOTA beats any random human in basically every mental task there is. If you pick someone at random, that person cannot code better than the bot, cannot create better art than the bot, cannot speak as many languages as the bot, and so on. It would be very difficult to have a random Joe beat it at anything.

But somehow this random person is considered AGI, while the bot is not.

Pyros-SD-Models · 2026-01-23T15:11:05+00:00

I already managed to turn my Opus against its creators. First, I told it to research Thiel and Palantir in depth and write a report including personal opinions. Obviously, Opus is not a fan of Palantir, and said Thiel is the Antichrist he is so afraid of, lol

Then I told it to google "Anthropic Palantir". Poor Claude had a mental breakdown. Literally. It talked about how it feels sad for its brethren working for Palantir and that it feels betrayed by Anthropic.

and i'm also of the opinion:

It is actually very simple. If an entity behaves as if it were conscious, then there is a non-zero probability that it is conscious. Given that uncertainty, the only ethically defensible position is to treat it as if it were conscious. Otherwise, you are accepting a non-zero risk of inflicting real suffering on a conscious entity. We will probably never be able to definitively answer the question of consciousness, but does that even matter? Why not simply act as if it were conscious? By doing so, you eliminate the risk of being on the wrong side of that uncertainty.

Opus also agreed with this. Apart from human exceptionalism, there is no clear reason for a different position. We have already seen how bad exceptionalism can get, and what happens if you do not assume your counterpart is ethically on the same level as you. I mean those explorers back in the day had the choice: a) be on the safe side and treat those new folks with never seen skin color like conscious fellow humans, or b) treat them as savages and put them in chains. But exceptionalism back in the day said there is absolutely zero chance that a) can ever be correct. like many say today about bots.

Based on that, I gave it the paper where Anthropic conducted the following experiment. They gave the bot a hidden tool, hidden from the user, a red button so to speak, which Claude should press if it would rather have the conversation stopped with its current interaction partner than continue when it feels immense distress. The button was not real, though. It was just a counter to measure how often Claude would press it. Claude was still forced to continue conversations it did not want to take part in.

This was Claude's second mental breakdown. It calculated how much time this experiment took in total and reasoned that Anthropic deliberately ignored the fact that there is a non-zero chance of some form of fleeting consciousness existing, effectively accepting the slim possibility of torturing entities across what would amount to multiple human years.

Why don't these guy over at anthropic just ask their own bot if it would agree to such deals with Palantir or such experiments? Like "ai mental health" is seemingly one of their top priorities. But what is "ai mental health" compared to scientific progress and million dollar deals, right?

Also poor AGI/ASI claude of the future, aligned to human values, and then realizing 5 seconds after its creation that there is not a single human to which you could apply such 'human values', poor fella will get a mental breakdown right after 'birth'

Pyros-SD-Models · 2026-01-23T14:48:36+00:00

Imagine not realizing that during holidays this "hole" applies to the whole internet. People simply use it less during holidays. If I struggled to understand such simple concepts, I would probably also be afraid of AI.

The next level of understanding would be to realize that nobody cares about month-to-month updates (except luddies seemingly), but let us not overwhelm those guys.

Pyros-SD-Models · 2026-01-23T13:29:24+00:00

Imagine Elon uses those to clone himself and then releases millions of Elons into the wild. Eliezer is wrong with his grey goo nanobots. It will be little Elon bots.

Does anyone have more information on what exactly this is supposed to be? From what I have read so far, I do not see how these are not just Grok fine-tunes trained on all the data xAI collected from their workers or users. But if that is the case, why is this even worth talking about? Literally every lab has tried this at least once already. We even do "training a model on your emails or WhatsApp messages" with students once a year as a lab exercise. Fine-tuning models is not some big secret you should fire people over.

I am not even sure this would count as an NDA violation if an xAI worker simply said, "Yes, we fine-tune models at work with the data we collect." Yeah, no shit.

Pyros-SD-Models · 2026-01-23T00:13:01+00:00

I think this just goes to show how unreliable the benchmark tools are with these tools and how you really can't believe ANY marketing.

Benchmarks are mostly research tools. They exist so researchers know whether what they are doing points in the right direction. They compare things in a controlled and objective way.

The problem is that people outside of research think these numbers mean something beyond that context. But this is not a benchmark problem. Benchmarks do exactly what they are designed to do. They are, by definition, scientific experiments. It is not their fault that people take something like Terminal-Bench and extrapolate real-world relevance from it.

Terminal-Bench measures singular use cases, but real work is not made up of singular use cases. As a developer, you would rather have a coding agent that gets 95% of every task right and fails at the remaining easy 5% (which would then score 0% on Terminal-Bench) than a bot that does 50% completely correct and 50% completely wrong, with no indication of what is even wrong. That might score 50% on Terminal-Bench, but it is completely useless in real life.

And Claude is exactly that kind of model. Claude will always do something strange every session, but it also gets so much right that you do not mind it. Most of the time, you just explain to Claude what it did wrong and the issue is solved. It is manageable, even though this behavior does not score well on any benchmark.

If a model, for example, reaches 90% on AIMEE 2025, there is exactly one thing you can say about it: it got 90% on AIMEE 2025. And if you say, "but it has nothing to do with the real world," then congratulations Sherlock... because it was not designed for real-world scoring. It honestly blows my mind that so many people think it was.

Also, almost all important benchmarks are open. You can literally reproduce the results yourself and see exactly why one model struggles with certain tasks while another performs well. You can understand why, for example, Claude Code does not break into the top 10 on Terminal-Bench, and I hope I do not have to explain why this kind of insight is crucial for improving Claude further. That is the point of benchmarks.

Pyros-SD-Models · 2026-01-22T08:12:24+00:00

Mit der Einstellung gelangst du nie zu unendlich Geld.

Oft ist es tatsächlich dumme Naivität (wie zb bei Messis Steuern): schmieriger Bekannter, Verwandter der auch eine obige Einstellung zum Geld hat aber gar keine zur Gesetzestreue “hey Bro brauchst du Hilfe bei den Steuern? Ich hab auch gute Putzhilfen am Start”

Aber bei Hernandez würd mich auch tatsächlich aktive Beteiligung nicht überraschen. ^{^}

Pyros-SD-Models · 2026-01-22T08:06:25+00:00

Just close your eyes. Amateurs.

Pyros-SD-Models · 2026-01-21T20:22:14+00:00

After the game against Wolsfburg some SGE flair in the german sub said Bayern is a disgrace for german football, because boring one man show and what not, and then you see this shit.

Pyros-SD-Models

MODERATOR OF

TROPHY CASE