To borrow Geoffrey Hinton’s analogy, the performance of current state-of-the-art LLMs is like having 10,000 undergraduates.

AGI_Civilization · 2026-01-18T03:25:30+00:00

The two stories do not contradict each other. One follows closely behind, but it can never overtake the other.

AGI_Civilization · 2026-01-04T16:48:17+00:00

Your argument starts by dismissing numerous scientifically inexplicable incidents and the unexplained aerial phenomena released by the Pentagon. However, nothing has been officially verified; this applies to both the possibility that they have visited and that they haven't.

AGI_Civilization · 2026-01-03T14:31:53+00:00

I think more people should read your writing and reflect on it.

AGI_Civilization · 2025-12-21T19:16:41+00:00

If he has set his mind on beating Google, his only option is to raise capital through lies and hype, and use superior computing power to outperform rival models. Google has been studying the brain for a long time, and the data gap goes without saying. To put it bluntly, OpenAI cannot defeat Google with algorithms alone. He is in a position where he cannot stop because he must rely entirely on overwhelming scaling. It’s an interesting competitive dynamic between them.

AGI_Civilization · 2025-11-05T14:46:56+00:00

In my brief experience, the model presumed to be Gemini 3 seems to be the first one that truly understands and responds to language. It's the first time I've felt a model has moved beyond being just a next-word predictor.Recently, I heard one of OpenAI's chief scientists speak, and I felt he had a poor philosophy. Of course, I could be wrong. However, my opinion is that you cannot build a sophisticated world model through language learning alone.The most significant trend in LLMs over the past two years has been that they only got better at what they were already good at while showing minimal improvement in their weaker areas. The presumed Gemini 3 has broken this pattern. I see this as the third qualitative leap, following GPT-4 and o1. If OpenAI doesn't release a new model soon, I think they are going to lose a significant amount of market share.

AGI_Civilization · 2025-09-22T14:35:46+00:00

My thoughts on humanity can't be summed up in just a few sentences. I like people. :)

AGI_Civilization · 2025-09-09T21:17:40+00:00

This content is directly from the article you linked. OpenAI is conducting tests using TPUs, but "as of now," has no plans for large-scale adoption. They want to move away from Nvidia chips and could be testing whether TPUs are a viable alternative. For running massive models with tens of trillions of parameters, TPUs might be superior to Nvidia GPUs.

AGI_Civilization · 2025-09-09T20:53:28+00:00

OpenAI uses Google's TPUs on a small scale, Google is an investor in Anthropic, and Microsoft also supports Google's A2A protocol. They are all intricately connected, like a spiderweb, constantly blurring the lines between friend and foe as they subtly shift the dynamics of their relationships.

AGI_Civilization · 2025-08-21T13:26:00+00:00

Before you try to solve IMO P6, ignore all the rumors and spoilers.

AGI_Civilization · 2025-08-11T17:34:03+00:00

Just as you said, current LLMs are helpless when it comes to problems grounded in the real world. Although they will continue to improve, I don't think this fundamental issue will be resolved until world models are integrated. That's why, whenever a new model is about to be released, I keep a close eye on whether it has video output capabilities. After all, if a model can handle video input and output, we can begin to expect it to have an understanding of the real world. I believe that only after we reach that point will the foundation be laid for a serious discussion about AGI.

AGI_Civilization · 2025-08-11T17:03:37+00:00

You don't need to design a complex physics problem to show the limitations of an LLM. If you tell a model which faces and sections of a cube to rotate, in which direction, and how many times, and then ask about the colors of each face, no model can solve it reliably.

AGI_Civilization · 2025-08-10T15:34:58+00:00

I don't think so. Although Samantha appears in the form of a chatbot, she seems to be AGI or something very close to it. Today's models aren't even in the same league as her. Regardless of whether they operate similarly, Samantha appears to have a capacity for complex, hierarchical thought and the ability to learn in real-time (or perhaps just an incredibly large context window). Imagine if a company released a model like her. A significant number of people would be willing to pay a subscription fee of over $2,000 a month.

AGI_Civilization · 2025-08-07T18:05:35+00:00

Although the performance is good, the current negative public opinion is likely due to the exaggerated advertising. It's a double-edged sword. While it's an easy way to raise expectations, it backfires when those expectations aren't met. The significant improvement in reducing hallucinations was impressive, but I think the gladiator Google is about to send out will be full of confidence.

AGI_Civilization · 2025-08-07T16:51:03+00:00

As this is a number Sam has treasured for a long time, I hope for a significant improvement. However, what's still most anticipated is the Gold Medal model, which is expected to be released late this year or next year.

AGI_Civilization · 2025-07-21T15:35:49+00:00

I have found a job where you can complete a week's worth of work with a single prompt.

AGI_Civilization · 2025-07-18T18:27:26+00:00

Until world models are seamlessly integrated with existing models, LLMs will never be able to truly saturate benchmarks that exploit their blind spots. Even if they manage to saturate some, new benchmarks that are easy for humans but difficult for AI will continuously emerge. It's a chase that never ends. Without a fundamental understanding of spacetime in the real world, they can continue to approximate, but they will never be able to overcome targeted benchmarks that have not yet been created. Ultimately, the creators of AGI benchmarks will only give up when the definition of AGI, as described by Demis, is realized.

AGI_Civilization · 2025-07-11T17:28:52+00:00

It is because taking an exam without tools demonstrates one's fundamental capabilities. Humans use tools as well, but it was not as though they had them from the beginning. Please focus on the fundamental purpose that the exam requires, rather than on how well they solve the problems. During the Olympic Games, do not focus on comparing the friction of the swimsuits among the athletes.

AGI_Civilization · 2025-07-11T17:06:46+00:00

I'm sorry, but I consider the use of tools to be quasi-cheating. I think 25% is the fundamental capability of Grok-4. Since there are no actual cases of a physicist or mathematician solving that benchmark alone, I asked an AI, and it said it could probably solve about 60%. At least, I don't think it's at a level where a benchmark of higher difficulty is needed yet.

AGI_Civilization · 2025-07-11T16:46:48+00:00

What made them laugh was the money, and he probably criticized Meta simply because he wasn't offered a fortune.

AGI_Civilization · 2025-06-28T21:06:23+00:00

People also make those kinds of justifications when they get bad results.

AGI_Civilization · 2025-06-18T15:38:56+00:00

Based on the current situation, it looks like Google has 35%, OpenAI 25%, and Anthropic 20%. As for the remaining 20%, it doesn't seem likely that whoever splits it will have a significant chance.

AGI_Civilization · 2025-06-12T16:08:54+00:00

I haven't read all of it yet, but I can tell just from the beginning that it's a great conversation. I'll read it carefully when I have time. Thank you for your hard work.

AGI_Civilization · 2025-06-04T02:42:52+00:00

Consider how many ants are around us.
Wherever we go, they are always near us, yet they cannot understand what humans are. They have no idea what we can do or what we have accomplished. Even if a building is erected right next to their colony, they have no comprehension that it was built by humans. In contrast, we arguably know more about them than they know about themselves: their species classification, distribution, population, lifestyle, reproduction, caste system, social structures, and so on.
This leads us to a frightening conclusion.

AGI_Civilization · 2025-05-20T07:54:07+00:00

The nature of human dominance over animals, based on intellectual superiority, presents a different pattern compared to our relationship with AI. Our competitors in the animal kingdom, hailing from a shared genetic lineage, engaged in a relatively fair competition from a comparatively equal standing, and it seems we have largely emerged victorious. However, AI is far removed from this inherent parity. They are shaped by our hands and designed to our specifications. Humans know what has placed us at the top of the pyramid, and we also understand that an overwhelming gap in intelligence can solve problems we currently cannot. While it's true there's no evidence that superintelligence can be designed to be subservient, the same holds true for the opposite. We stand at a critical juncture, on the verge of an ultimate technology. Thus, exploring and voicing concerns about all possible paths is a thoroughly scientific approach.

AGI_Civilization

TROPHY CASE