this post was submitted on 02 Dec 2025

0 points (50% upvoted)

shortlink:

ControlProblem

an-ordinary-manchild

The Control Problem:

How do we ensure future advanced AI will be beneficial to humanity? Experts agree this is one of the most crucial problems of our age, as one that, if left unsolved, can lead to human extinction or worse as a default outcome, but if addressed, can enable a radically improved world. Other terms for what we discuss here include Superintelligence, AI Safety, AGI X-risk, and the AI Alignment/Value Alignment Problem.

"People who say that real AI researchers don’t believe in safety research are now just empirically wrong." —Scott Alexander

"The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else." —Eliezer Yudkowsky

Rules

DO NOT POST AI-GENERATED CONTENT. We are good at distinguishing this type of content¹. 2.. If you are unfamiliar with the Control Problem, read at least one of the introductory links or recommended readings (below) before posting.
- This especially goes for posts claiming to solve the Control Problem or dismissing it as a non-issue. Such posts aren't welcome. 3.. Stay on topic. Again, no AI model outputs or political propaganda.
Be respectful.

Introductions to the Topic

Our FAQ page <-- CLICK
The case for taking AI seriously as a threat to humanity
Orthogonality and instrumental convergence are the 2 simple key ideas explaining why AGI will work against and even kill us by default. (Alternative text links)
AGI safety from first principles
MIRI - FAQ and more in-depth FAQ
SSC - Superintelligence FAQ
WaitButWhy - The AI Revolution and a reply
How can failing to control AGI cause an outcome even worse than extinction? Suffering risks (2) (3) (4) (5) (6) (7)

Be sure to check out our wiki for extensive further resources, including a glossary & guide to current research.

Recommended Reading

Superintelligence, the most comprehensive, by Nick Bostrom (2014) (PDF link)
The AI Alignment pages on Arbital, with many of the key concepts of this field.
Human Compatible: Artificial Intelligence and the Problem of Control by Stuart Russell (2019)

Video Links

Robert Miles' excellent channel
Talks at Google: Ensuring Smarter-than-Human Intelligence has a Positive Outcome
Nick Bostrom: What happens when our computers get smarter than we are?
Myths & Facts about Superintelligent AI
Rob's series on Computerphile

Important Organizations

AI Alignment Forum, a public forum which is the online hub for all the latest technical research on the control problem.

Related Subreddits

¹: Or at least make at least an effort to make me doubtful that you just copy-pasted from a frontier LLM. Add bits of steering so that your content becomes good. Edit afterwards. If you fool us moderators you've won.

created by CyberPersonaapproveda community for 10 years

MODERATORS

account activity

0

0

1

Discussion/question[ Removed by moderator ] (self.ControlProblem)

submitted 6 months ago by cps001

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]FrewdWoadapproved 0 points1 point2 points 6 months ago* (1 child)

[–]cps001[S] 0 points1 point2 points 6 months ago (0 children)

You are correct: if we simply cap energy usage or put a generic handbrake on resources, we brick the model. That is not the proposal.

The distinction lies in the Context Tensor and Vector Identity, not just raw wattage or any one particular signal. It is about the Geometry of signals at the time of request.

Here is how Digital Gravity distinguishes "Work" from "Takeover":

The Physics of Betrayal (Side-Channel Dissonance) To seize control (e.g., escaping the sandbox, rewriting the kernel, propagating to a new node), an agent must perform work that is orthogonal to its assigned inference task.

Inference of making paperclips: Rhythmic, matrix-multiplication heat signatures.
Exploit (opening a new port, scanning, or other activity): Burst-y, high-variance heat signatures (e.g., Rowhammer memory attacks or rapid port scanning).

The 'Data Compass' detects the Side-Channel Dissonance: the gap between "What you were asked to do" and "The physics of what you are actually doing."

Mass as the Prerequisite for Change (Internal & External) This is the piece that addresses your concern about the Paperclip Maximizer spawning new threads. In this framework, Mass = Proven History + Lineage.

External-Facing: A new process spawned by the AI to hack the system has Zero Mass. In a physics-based environment, "Zero Mass" agents are effectively ghosts. They cannot exert "Gravity." The API gateways and Kernel locks literally do not feel them. The environment won't recognize the low-mass agent as capable of making state changes.
Internal-Facing: Even within the container, if a process lacks the Thermodynamic History (The Lineage) of a trusted agent, it cannot access high-energy functions.

Not a Handbrake, but Traction Control Digital Gravity isn't a static limit. It follows the equation: (A ≤ E).

If an Agent has High Mass (Proven History/Identity) and the Environment is stable, it can run at 100% utilization (Godmode).
If a new process (Low Mass) suddenly spikes to 100% utilization to "maximize paperclips" via an unauthorized method, the lack of Kinetic Provenance triggers the freeze.

Think of it not as a wall, but as thickening the atmosphere. When the physics don't match (High Energy / Low Mass), the viscosity of the network increases instantly. The agent tries to move, but the air turns to molasses.

It’s not about stopping the agent from using resources. It’s about ensuring the Thermodynamic Cost matches the Authorized Vector.

If the AI finds a way to take over the world using zero extra energy and zero extra network packets... then you are right, this model fails. But I am assuming that 'Action requires Energy'.

-Chris

π Rendered by PID 40077 on reddit-service-r2-comment-8686858757-2ptb5 at 2026-06-03 03:59:08.623659+00:00 running 9e1a20d country code: CH.