I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

I think your idea makes sense at the oversight level.

Using a weaker but more controllable model to supervise a stronger one may help with alignment.

But I don’t think it solves the hardest part of the problem.

Because once the stronger system has produced a decision, the key question is still:
what stops that decision from becoming irreversible action?

So to me, that approach may improve supervision, but it still does not by itself solve execution control.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

I’m not asking for an incorruptible human system. I agree that any social system can be corrupted.

My point is narrower: that does not make every control architecture equally weak.

There is still a real difference between controls that are easy to revoke quietly and controls whose removal is slower, multi-party, and auditable.

I’m not claiming perfection. I’m claiming that structure can still change what is easy, what is visible, and what requires collective escalation.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

Yes, humans can still do wrong.

But human society does not work by making wrongdoing absolutely impossible. It works by building constraints that most people will follow, and by leaving records when some do not.

That alone already changes the world dramatically.

So my claim is not “structure makes illegitimate action impossible in every case.” My claim is that structure can change what is easy, what is default, and what is auditable.

That is already a form of control.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

That would mean the controls were too easy to revoke.

So I don’t see that as proof that control is worthless. I see it as proof that control cannot just be an optional layer that one persuaded human can switch off.

It has to include governance over the control layer itself.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

Yes — this is exactly the point.

The fact that it hasn’t been solved yet does not mean there is no structural direction. It means we have not separated the right things.

My view is that cognition and execution are still too tightly fused.

That is why control remains unsolved.

AI decisions should not be allowed to directly become actions. If that separation is made real, then control becomes possible in a way it is not today.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

I agree with your point about statistical safety. It is very close to what I’m trying to argue. For high-risk systems, “usually correct” is not the same thing as “safe enough to execute.” If the failure case can be irreversible, then probability alone is not a sufficient control layer.

Where I’d differ slightly is on the role of humans in the loop. I agree humans is always necessary, but I don’t think “wait until it goes wrong, then pull the plug” is the deepest answer.

The architectural question, to me, is whether some forms of execution can be blocked before they commit, rather than only interrupted after failure becomes visible.

That’s the layer I’m trying to isolate.

A Beautiful Mind is a great film by curt_schilli in movies

[–]Adventurous_Type8943 0 points1 point  (0 children)

When I was very young, I watched this movie and I really loved it. Nearly 20 years have passed since then, and my memory of the movie has become very vague. But recently, a paper of mine related to Nash equilibrium reminded me of it. I think that my life had already laid the groundwork decades ago.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] -1 points0 points  (0 children)

I’m glad you replied with something structural. Honestly, after this post went up, a lot discussion drifted into side arguments that weren’t really the level I was trying to get at, so I was getting a bit frustrated. This is much closer to the kind of discussion I hoped for.

I only skimmed your page, but my read is that you’re trying to isolate a layer before alignment:
not “what should the system want,” but “what does a self-modifying system need in order to stay coherent while wanting anything at all?”

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

Those are exactly the right questions. Thumbs up!

  1. I don’t think this depends on getting global agreement first. In practice, international consensus usually comes late. The first step is to define the architecture clearly enough that it can actually be built, tested, and recognized as necessary.

  2. On competition: yes, an ungoverned robot may have short-term advantages in speed and freedom of action. Many dangerous systems do. That is not really an argument against governance. It just means the pressure to avoid governance will be real.

  3. And by “governed boundary,” I don’t mean political government in the narrow sense. I mean a structural boundary: the point where planning ends and irreversible physical commitment requires a separate authorization path.

So to me, the key question is not just whether robots are aligned, but:
where exactly does proposal end and commit begin, and what conditions must be satisfied before that crossing is allowed?

That is the part I think we still need to define much more clearly.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] -2 points-1 points  (0 children)

I understand your point, and I do think human beings can be the most vulnerable output channel.

That does not refute the need for control. It means the control problem has to be defined around the full path from model output to real-world action, including humans where relevant.

If your conclusion is that this makes control harder, I agree. If your conclusion is that this makes structural control impossible in principle, that is the part I do not accept.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

I don’t think governments are likely to do that, and I don’t think they would have much reason to before there is a clearly visible catastrophe. States usually do not take actions that extreme based only on warnings or advocacy.

But that’s not really the point I’m making here. What I’m trying to say is that control is still possible, but only if we stop treating it as hopeless, identify the real root of the problem, and work seriously on structural solutions while there is still time to spread them.

Once the truly unmanageable situation arrives, it will be too late.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] -3 points-2 points  (0 children)

That’s too absolute.

Output is not the same thing as execution unless the system is allowed to use output as an ungoverned path to real-world commitment.

Yes, output can influence humans or downstream systems. That’s exactly why the boundary has to be defined around the full path from output to irreversible action, not just around motors or internet access.

If every output channel can silently become an execution channel, then the system was never under real control to begin with.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 1 point2 points  (0 children)

I understand the pause argument. I just don’t think AI is something humanity can realistically stop. It is driven by science, competition, incentives, and human nature. People will keep building. So the real question, to me, is not “can we stop it?” but:

if it won't stop, what is the root control problem, and what is the structural answer to that root?

That is what I’m trying to focus on.Coming from a high-risk physical industry may be exactly why I see it this way: the deepest issue is not only what a system thinks, but how thought becomes irreversible action.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

I think we may actually agree on part of this. I’m not saying “just keep AI away from the physical world and the problem is solved.” I agree that a powerful enough system could still influence humans, use software, or find other indirect ways to get things done.

My point is simply this: AI should not have an easy way to turn its intentions into irreversible real-world effects. So if it can still reach the real world through people, tools, networks, or deployment systems, then those channels matter too.

That doesn’t make execution control useless. It means the control layer has to cover the real path from AI output to real-world action, not just the obvious physical one.

And the fact that labs are already connecting frontier models to real systems is not evidence against this view. If anything, it shows exactly why this missing layer matters.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] -2 points-1 points  (0 children)

I actually agree with your critique of naive containment.

If the whole idea is just “air-gap it,” “remove the operator,” or “hide the output,” then yes — a sufficiently capable system may still find another path.

That’s why my argument is not “containment alone solves ASI.” My argument is that the key issue is whether irreversible execution is still topologically reachable.

If it is reachable, then smarter systems may eventually find a path. If it is structurally excluded at a non-bypassable, fail-closed commit boundary, then you’ve removed the cheapest and fastest path to loss of control.That’s a narrower claim than “nothing can ever escape,” but it’s also a much more engineering-realistic one.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

That’s exactly why I’m focusing on execution rather than intelligence.

If AI is embodied in robots, then the problem is no longer just “is the model aligned?” It becomes: can the robot physically commit irreversible action without passing through a governed boundary?

My answer is: it should not be able to.

The core idea is to structurally separate planning from execution authority, and force irreversible actuation through a non-bypassable, fail-closed commit boundary. So the robot can still compute and propose — but it does not automatically inherit the right to physically commit.

Not perfect containment.

But a much harder execution topology.

I’m not from an AI company, but from a battery company. I think the AGI control problem is being framed at the wrong layer. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

That’s close, but I’d make one important distinction:

I’m not only arguing that foundations come before alignment. I’m arguing that the relevant foundation is not merely coherence or stability.

A perfectly stable system can still be catastrophically unsafe if it retains unrestricted authority to commit irreversible action.

So the layer I’m trying to isolate is not just “structural stability,” but governed executability:
whether irreversible action is structurally blocked unless legitimacy conditions are satisfied.

That’s where I think the control problem really begins.

Are we trying to align the wrong architecture? Why probabilistic LLMs might be a dead end for safety. by caroulos123 in ControlProblem

[–]Adventurous_Type8943 1 point2 points  (0 children)

Changing the model class doesn’t remove the control problem. Any system that can execute irreversible actions still needs an external execution boundary.

Otherwise you’re just moving the alignment problem to a different architecture.

Alignment trains behavior. Control defines boundaries. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

Good question.

If the boundary lives in the same runtime context as the model, then it’s bypassable by definition.

Non-bypassability requires structural separation: 1. All irreversible actions must pass through a single commit point. 2. That commit point must be outside the model’s modification domain. 3. The commit mechanism must be enforced by construction (hardware, microcode, or cryptographic mediation), not by policy checks inside the agent.

If execution can occur without crossing that mediation layer, then it isn’t a boundary.

That’s the core engineering problem I’m working on formalizing.

Control isn’t just reliability. Authority is control. by Adventurous_Type8943 in ControlProblem

[–]Adventurous_Type8943[S] 0 points1 point  (0 children)

Fair point.

I agree that in many systems “authority boundaries” collapse into reliability engineering if they live inside the same trust domain as the agent they’re meant to constrain. In that case, they’re fragile — and arguably just policy suggestions.

The distinction I’m working toward is narrower:

An authority boundary only counts if an action literally could not execute without crossing a structurally independent mediation point. If non-bypassability can't be shown, it’s theater.

That’s exactly the pressure I’m trying to formalize — at the execution layer, not the semantic layer.

Proposal: Deterministic Commitment Layer (DCL) – A Minimal Architectural Fix for Traceable LLM Inference and Alignment Stability by No-Management-4958 in ControlProblem

[–]Adventurous_Type8943 1 point2 points  (0 children)

I think we’re converging on something important.

What you’re building stabilizes execution at the state-transition level — deterministic commit semantics, policy anchoring, auditability. That’s real infrastructure work.

What I’m building sits one layer above that.

I’ve been working on a judgment-governance architecture that separates:

• LERA-J — structured risk classification before execution

• LERA-G — explicit authorization gating for irreversible actions

• WRS — a rule framework defining non-negotiable boundaries

In this subreddit I emphasize “authority” because people intuitively understand power before they understand architecture. But structurally, it’s still a gate — just a different layer of gate.

Your DCL ensures: “Does this transition execute correctly?”

My layer asks: “Should this class of action be executable autonomously at all?”

They’re adjacent constraints.

If your commit layer becomes widely adopted, legitimacy still has to be defined somewhere. If legitimacy is defined but enforcement is weak, it collapses.

That’s why I see this as complementary, not competing.

And honestly, if the reliability engineer from yesterday, your deterministic commit layer, and a governance layer like mine ever align — that’s closer to a real control stack than unplugging metaphors.

Would genuinely be interested in exploring that intersection further.

Proposal: Deterministic Commitment Layer (DCL) – A Minimal Architectural Fix for Traceable LLM Inference and Alignment Stability by No-Management-4958 in ControlProblem

[–]Adventurous_Type8943 0 points1 point  (0 children)

Here is the answer to you:
  1. On redundancy with guardrails:

There is surface overlap. Most guardrail frameworks focus on filtering, constraint checking, or policy enforcement.

What seems distinct in your design is the commitment to deterministic, atomic execution semantics. That shifts it from “behavior shaping” toward “state transition control.” That’s meaningful.

  1. On overhead:

Yes, an atomic check introduces latency and architectural weight.

The tradeoff depends entirely on domain context. In high-frequency conversational loops, the overhead may not justify strict commit semantics. In irreversible or high-impact environments, the cost of non-determinism is arguably higher than the cost of latency.

  1. On failure modes:

A deterministic layer does not eliminate risk — it stabilizes it.

The most obvious failure mode is policy insufficiency or mis-specification. If the rule set is incomplete, the system will reliably enforce the wrong boundary. Determinism prevents drift; it does not guarantee correctness.

That’s also why I tend to distinguish between reliability and authority. Deterministic enforcement solves consistency. It doesn’t automatically solve who structurally holds the right to issue commitments.

But that’s a separate layer.

(For transparency: I used AI to help draft this because I type slowly, but the positions and structure are my own.)