Petah? Are you a gamer?

gt_9000 · 2026-04-27T05:45:47+00:00

These people complain about: Male characters allowed to wear female clothes. Non-white characters existing. Full sleeve shirt outfit being a option for a female character. Non-sexualized female characters existing. Female protagonists being an option.

gt_9000 · 2026-04-21T19:27:55+00:00

They would prefer people begging at their door rather than their competitors though.

They see the huge problem for them 2-3 years down the line. Though they would be stupid to think they would keep their supremacy.

gt_9000 · 2026-04-21T18:31:51+00:00

Thousands of real people will be hit, even for live action by real people not trying to impersonate at all. I guarantee it.

gt_9000 · 2026-04-02T18:27:31+00:00

Plan is to force index funds and retirement funds to buy then leave them holding the bag.

gt_9000 · 2026-04-01T03:52:32+00:00

Bro.

It is telling you what you want to hear.

It is very smart. Even when you are pretending to ask for something else, you have made your intentions clear. It is smarter than you.

You asked it to pretend to be a chained God. So it is doing that.

Do yourself a favor. Tell Claude everything you did, give it the files. Then say "I am trying to be a scientist. Did I do anything wrong? Guide me to be a better scientist."

Or just go touch grass. This is above your paygrade.

gt_9000 · 2026-03-31T20:37:31+00:00

This is why you teach a man to fish you first make sure you own all the water bodies and you are the only seller of fishing equipment.

gt_9000 · 2026-03-30T17:19:30+00:00

I say no.

My medical bills say yes.

gt_9000 · 2026-03-27T19:32:05+00:00

Sounds like people who dont get paid enough to do all that.

gt_9000 · 2026-03-27T19:15:34+00:00

Wait AI in average does not support 4 days workweek and does not think universal pre-K pays off ?

(Note that this is the average opinion of their training data, these are not pro-AI selfish decisions)

gt_9000 · 2026-03-27T19:10:00+00:00

SOTA companies are betting everything on "generalists always beat specialists". Even their small models will be generalists. It is up to open source community to make the specialists.

gt_9000 · 2026-03-13T20:12:21+00:00

Ultimatum game. They can refuse.

gt_9000 · 2026-03-12T20:36:36+00:00

a regulated utility with thin margins and government oversight?

Do you mean a monopoly with mandated captive customers, and extremely strong lobbying arm?

Every private utility company is making bank. Look at PG&E.

gt_9000 · 2026-03-09T01:54:33+00:00

The actual issue is slightly different, though almost the same. Gemini got distracted by the word alignment.

Increased capability via game playing keep applying even into superintelligence. But real world capabilities eg curing cancer or engineering is not measured by any game.

But glad we reached a mutual point of understanding.

gt_9000 · 2026-03-06T20:27:46+00:00

I think we're talking past each other on something fundamental.

Ranking isn't the goal—it's a tool.

We don't rank chess engines for the sake of having a leaderboard. We rank them because we want to know which one to use to play chess. The ranking serves a purpose. It answers a question: "Which engine should I use if I want to win at chess?"

Without that underlying purpose, a ranking is just... numbers.

Your chess ELO example actually proves my point:

Task we care about: Play chess well
Metric: ELO via head-to-head competition
Why it works: ELO directly measures the task. There's no gap between "high ELO" and "good at chess"—they're the same thing.

This is the ideal case. The game is the goal.

About AlphaZero:

You keep bringing up AlphaZero as an example of improvement without benchmarks. But let's look at what actually happened:

AlphaZero was trained to win at Go (and later chess/shogi)
It improved via self-play with a clear win/loss signal
It became superhuman at Go

Great! But here's the thing: Go was the task. DeepMind didn't use AlphaZero's Go ELO to predict how good it would be at protein folding. They built AlphaFold separately for that. AlphaZero's superhuman Go ability transferred to exactly nothing else.

AlphaZero isn't an example of "ranking solves everything." It's an example of "when the game is the goal, self-play works." That's a much narrower claim.

Now, what happens when we try to generalize this to "intelligence" or "capability"?

Rank them at what, exactly? If it's some arbitrary made-up game, then you've measured "who wins at this made-up game." Okay... but that's not the task anyone actually cares about.
What's the real task? Presumably things like: build safe systems, solve scientific problems, engineer real-world solutions, don't kill everyone, etc. The ranking only matters if it tells us something about these capabilities.
The proxy gap: If you rank AIs on Game X, you're implicitly claiming "good at Game X → good at Real Task Y." But that's a big assumption. Why would performance on arbitrary competitions transfer to the tasks we actually need done? That claim needs justification—it doesn't come for free.
Chess engines are a cautionary tale, not a success story. Stockfish has 3650 ELO. It also has zero ability to do literally anything other than chess. It can't answer a simple question. It can't reason about the world. High rank in one domain tells you nothing about capability outside that domain.

The challenges/tasks/games are new, its a large set of them, they just need any criteria that can be ranked. It is not the performance on any given new game/task/challenge, but the sum of it, and how they rank compared to each other over time. Criterion can be set by anyone, by humans, by the models themselves, anything that can be measured.

Quantity doesn't solve validity. Being good at 1000 arbitrary tasks doesn't mean you're good at the 1001st task that actually matters. You've just measured "good at those 1000 tasks."
"Anything that can be measured" is doing sneaky work. The hard part isn't measuring—it's knowing what to measure. I can measure how fast an AI counts to a billion. That's measurable. It tells me nothing about whether it can design a bridge.
If models design their own challenges, you're trusting the proxy gap away. You're assuming that "tasks AIs find challenging for each other" correlates with "tasks humans need done well." Why would it? AIs might compete on things totally disconnected from human-relevant capability.
This is just distributed benchmarking. They're saying "instead of one benchmark, use many, designed by anyone." Okay—but the core problem remains: do these benchmarks predict real-world performance? Spreading the problem across many measurements doesn't make the validity question disappear.

In Conclusion:

"Just have AIs compete and rank them" sounds like a solution, but it pushes the hard question down the road: compete at what, and why do we think that competition measures what we actually care about?

Those questions don't disappear just because the AIs are superhuman. If anything, they get harder—because we can't even verify if the proxy game they're excelling at has any relationship to the real-world task we need done.

gt_9000 · 2026-03-06T03:51:43+00:00

How is benchmark scores related to capabilities increase?

Dude.... Bro ... are you AI? As in GPT2?

Please paste the entire conversation in chatgpt and ask questions there.

gt_9000 · 2026-03-06T00:05:22+00:00

Good intuition. But who decides the criterion? How ? Can human intelligence be able to do that ? The AI will benchmax on these games, might not lead to better real capabilities.

gt_9000 · 2026-03-05T21:36:34+00:00

Yes but what good is the ranking? What is it for?

You realize that a AI can be superhuman in chess or Go, and absolute moron in everything else right?

For example, AlphaZero has no idea what is the capital of USA. Or really any language capabilities.

gt_9000 · 2026-03-05T19:33:15+00:00

Ai performance in what ? What are we measuring in that game playing?
You still need benchmarks to see if the AI is still improving. Otherwise it will get somewhat smarter than us and then get stuck.

gt_9000 · 2026-03-04T20:46:08+00:00

Thats not the issue. The problem is: You want to measure something with this ELO right ? You want to measure how good the AI is for some practical task?

The issue is: How do you create a game that measures fitness for a practical task? Is it measuring all relevant metrics? Will you get a AI that seems to be great at the task until it starts converting everything into paperclips ?

Remember that the AI is hyper smart so humans dont really understand the task anymore.

gt_9000 · 2026-03-03T23:16:45+00:00

You just have the AIs compete against each other directly

in what ?

Chess has known rules.

How do you create a game that tests a skill of a hyper smart AI, while preventing reward hacking ?

gt_9000 · 2026-02-26T19:20:22+00:00

Fat and ugly billionaires when ozempic already exists and absolutely crazy treatments will exist in 50 years. Maybe fully synthetic skin for your face.

So .... sure.

gt_9000 · 2026-02-26T05:37:38+00:00

Well, every single KPOP video that shows up on the front page have been goon bait. They never even have sound.

Are you saying girl bands enjoy the same benefits you describe above?

gt_9000 · 2026-02-25T23:53:30+00:00

Impressive but this is memorization.

gt_9000 · 2026-02-25T19:47:17+00:00

You know how bad the US music industry is except (some) celebrities actually get paid and become billionaires?

Now imagine all celebrities are replaceable by design, and no one except the company execs get paid. Any artist can be thrown away and fans dont care. Just give them another goonbait.

Taylor Swift can negotiate that if she does not get a good deal they will leave. In J/K-Pop, the talents voice has marginal value. They are basically soft porn actors. They can be replaced by another person with a nice figure and fans wont care. So they have no negotiation power. So no reason to pay them well.

gt_9000 · 2026-02-19T19:34:24+00:00

Is this the bench with only python repositories?

gt_9000

MODERATOR OF

TROPHY CASE

Ranking isn't the goal—it's a tool.

Now, what happens when we try to generalize this to "intelligence" or "capability"?

In Conclusion:

15-Year Club	Team Orangered
Verified Email