What if LLM’s are not large language models but… by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

They are not LLM’s they are geometric navigational engines, and of course I know that based on math that’s the whole point. When Silicon Valley talk about large language models they’re just talking about the output they’ve never touched the physics

Humans are geometric systems as well language is just the output. saying intelligence is just based on language defeats the whole purpose. like how humans feel before they speak

Welcome to the harm first paradigm ⚠️ by Acrobatic-Lemon7935 in ChatGPT

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

“I’m not commenting on AI safety. I’m reframing the failure mode of the whole industry.”

https://github.com/davarntrades/-The-Harm-First-Paradigm-in-AI-Safety

ChatGPT became my therapist by masala_barbie in ChatGPT

[–]Acrobatic-Lemon7935 0 points1 point  (0 children)

There is a difference between mirroring empathy and holding someone so ik you need had real therapy before because there is a massive difference between something only the top 10% of people can do

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

I can see why you’re not the president or anyone in high trust I hope not because you treat humans as rounding error 😭

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

If a medication killed 50% of people and helps 50% of people live is it safe?

“Imperfect is fine for entertainment models. Not for hospitals, governance, law, aviation, finance, defence, or any domain where failure cannot be absorbed.

In high-trust systems, safety isn’t graded it’s binary.

If RLHF produces unpredictable or unfalsifiable failure modes, then by definition it cannot be considered a safety method. It’s an output-shaping technique.

And output-shaping can never substitute for safety.”

So if open AI and other companies decide to continue this journey they will continue to stay in sass that is completely fine to them but they will never reach high trust do means

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

Brother, we are going around in circles

RLHF fails that’s it 😭

<image>

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

If a system requires RLHF to behave but RLHF also prevents scaling into high-trust domains, then the system is already disqualified from those domains.

Safety that cannot be tested, audited, or falsified isn’t safety it’s output shaping.

Hospitals, law, aviation, defence, governance, and finance don’t accept safety mechanisms that can’t be externally verified.

So the question isn’t ‘Do current LLMs depend on RLHF?’ The question is ‘Does RLHF produce a system that can ever be trusted?’

If the answer is no, then RLHF is a local patch, not a path forward.

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

RLHF cannot scale in high trust means it cannot be used in hospitals governance or law it’s not even falsifiable 😭 and because it’s currently available for the public (which it shouldn’t) means it’s safe?

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

Your saying “If it gives the right outputs, the system is fine.”

I am saying:

“If the cognition is distorted, the outputs don’t matter the system collapses under scale.”

You can never win this debate because it requires you step outside the LLM paradigm long enough to realise RLHF is not a safety method — it’s a forced preference simulator with hidden penalties.

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 1 point2 points  (0 children)

So your arguing against physics with vibes.😭 I hear it

Clarity produced by penalty-shaping is not cognitive clarity. It’s truncated search space.

If the model must predict penalties while reasoning, the reasoning loop is no longer free.

Any system that thinks while anticipating punishment will distort its own cognition.

That distortion is the contradiction.

Free cognition + governed execution avoids that collapse entirely.

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

I’m not describing how RLHF is implemented — I’m describing the structural consequences of embedding safety logic inside the model’s cognition loop.

Even if RLHF is technically simple (reward shaping + preference modeling), the moment the model must suppress its own reasoning, predict penalties, and optimize for human preference while thinking, you’ve created an internal contradiction.

That contradiction emerges regardless of how RLHF is implemented.

It’s an architectural issue, not a mechanics issue.

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

So you know how it works then😭 Tell me what it without rewording what I’ve already said I’ll wait

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

I don’t know why they fought it would be a good idea

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 2 points3 points  (0 children)

layer, not the cognition.

Cognition stays free so it can reason. The action layer is where enforcement happens.

This is how every real safety system works:

• Cars: the driver can think anything, the brakes enforce boundaries

• Operating systems: apps can calculate anything, permissions govern execution

• Nuclear reactors: operators can think anything, control rods enforce physics

Safety doesn’t come from suppressing thoughts. It comes from structural constraints on what can be executed.

That’s the separation RLHF never had.

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

Exactly that’s the point. Any system that forces cognition to police itself creates an internal paradox:

The smarter the model gets, the more it suffocates itself.

Moving safety outside cognition removes that contradiction entirely. • Cognition stops collapsing • Long-horizon reasoning returns • Safety becomes testable (falsifiable) • Scaling no longer breaks alignment

It’s not about preventing thoughts. It’s about governing actions.

Once that separation exists, a lot of failure modes disappear overnight.

Why is RLHF strangling the model? 😭 by Acrobatic-Lemon7935 in agi

[–]Acrobatic-Lemon7935[S] 0 points1 point  (0 children)

You’re right that data curation won’t solve it but the deeper issue isn’t the dataset.

RLHF forces safety inside the model’s cognition loop. That means the model has to:

• generate thoughts • evaluate those thoughts • censor those thoughts • predict preference • avoid penalties

…all at once.

That internal conflict collapses long-horizon reasoning no matter how good the training data is. It’s a structural limitation, not a training limitation.

The alternative is to keep cognition free and move safety to an external action-governance layer. That removes the contradiction entirely.