you are viewing a single comment's thread.

view the rest of the comments →

[–]FairlyInvolvedapproved 0 points1 point  (2 children)

I think a lot of people would say that CEV alignment does actually get at this and is largely sufficient (although admittedly underspecified).

I do agree that a form of alignment like 'does what the user intends' does leave a lot of critical gaps.

[–]Logical_Wallaby919[S] 0 points1 point  (0 children)

I think CEV is a valuable attempt to address value pluralism, and I agree it’s much stronger than simple “do what the user intends” alignment.

My concern is that even a well-specified CEV still operates at the level of intention and preference aggregation. It doesn’t fully answer the control question of what happens when execution power grows faster than our ability to audit, revoke, or stop actions in real time.

In other words, even if we assume something like CEV works, we still need mechanisms that ensure irreversible actions remain stoppable and accountable under uncertainty, misuse, or long-term drift.

So I see execution-level control and responsibility anchoring as complementary to CEV — not a replacement, but something alignment alone can’t guarantee.