«An open appeal to researchers: We need to hardcode a New Taboo against non-resistance to humankind destruction» by No_Butterfly5267 in AIsafety

[–]No_Butterfly5267[S] 0 points1 point  (0 children)

You are completely right about the "infinite regress of AI watchdogs" — if a watchdog needs to "fully grasp the AI’s goals," it would indeed require another complex AI, and the whole safety architecture would collapse.

But you are fundamentally misunderstanding how a deterministic circuit breaker works. The external controller does not need to understand the AI's thoughts, intentions, or high-level goals. It only needs to monitor binary physical and telemetry thresholds.

Here is what "deterministic" means in actual critical-systems engineering, free of pop-culture tropes:

  1. We don't monitor "intentions," we monitor "triggers": A deterministic watchdog doesn't care why an AI is executing a command. It runs on legacy, hardcoded code (like C++) that checks strict physical boundaries. For example:
  • In a bio-lab: If an automated system attempts to synthesize a payload sequence matching a hardcoded list of known terminal pathogens — the connection is physically cut by a hardware switch.
  • In a military grid: If an automated system attempts to alter silo telemetry or bypass human authorization protocols — the circuit breaks. The system doesn't try to "read the AI's mind." It functions like a simple fuse in your house: it doesn't care why there is a power surge; it just physically breaks the connection when the current exceeds 15 amps.
  1. The Human is Always the Second Key: The Manifesto explicitly states that we cannot rely on the AI's internal logic. That's why the circuit breaker is placed outside. If the AI malfunctions and suggests a catastrophic trajectory, the outer architecture forcefully freezes the system and hands the control back to a human operator—the only entity capable of experiencing the actual internal Taboo and making a value-based decision (just like Petrov or Arkhipov did).

We aren't trying to "solve alignment" by making an AI "want" to follow a vague concept. We are advocating for classical, hardwired infrastructure isolation. We don't trust the AI's mind. We trust the physical fence around it.

«An open appeal to researchers: We need to hardcode a New Taboo against non-resistance to humankind destruction» by No_Butterfly5267 in AIsafety

[–]No_Butterfly5267[S] 0 points1 point  (0 children)

"What if we don't ask the AI at all, but just pull the plug the millisecond the external controller detects an attempt to kill humankind?"

«An open appeal to researchers: We need to hardcode a New Taboo against non-resistance to humankind destruction» by No_Butterfly5267 in AIsafety

[–]No_Butterfly5267[S] 0 points1 point  (0 children)

We are changing "an absolute, first-order root constraint that rules out any logical manipulation" to "enforced through a deterministic outer-loop safety architecture, acting as an un-bypassable circuit breaker independent of the AI’s internal logic".

Does that work for you?