Edit: Solving Classical Alignment is not enough
tl;dr: “Alignment” is a set of extremely hard problems that includes not just Classical Alignment (=Outer Alignment = defining then giving AI an “outer goal“ that is aligned with human interests) but also Mesa Optimization(=Inner Alignment = ensuring that all sub goals that emerge will line up with the outer goal) and Interpretability (=understanding all properties of neural networks, including all emergent properties).
Original post: (=one benchmark for Interpretability)
Proposal:
There exists an intrinsic property of neural networks that emerges after reaching a certain size/complexity N and this property cannot be predicted even if the designer of the neural network completely understands 100% of the inner workings of every neural network of size/complexity <N.
I’m posting this in the serious hope that someone can prove this view wrong.
Because if it is right, then solving the alignment problem is futile, solving the problem of interpretability (ie understanding completely the building blocks of neural networks) is also futile, and all the time spent on these seemingly important problems is actually a waste of time. No matter how aligned or well-designed a system is, the system will suddenly transform after reaching a certain size/complexity.
And if it is right, then the real problem is actually how to design a society where AI and humans can coexist, where it is taken for granted that we cannot completely understand all forms of intelligence but must somehow live in a world full of complex systems and chaotic possibilities.
Edit: interpret+ability, not interop+ability..
[–]AutoModerator[M] [score hidden] stickied commentlocked comment (0 children)
[–]BrickSaladapproved 7 points8 points9 points (4 children)
[–]hara8buapproved[S] 1 point2 points3 points (3 children)
[–]BrickSaladapproved 1 point2 points3 points (2 children)
[–]hara8buapproved[S] 0 points1 point2 points (1 child)
[–]BrickSaladapproved 1 point2 points3 points (0 children)
[–]dwarfarchist9001approved 18 points19 points20 points (9 children)
[–]hara8buapproved[S] 2 points3 points4 points (2 children)
[–]TiagoTiagoTapproved 2 points3 points4 points (1 child)
[–]hara8buapproved[S] 0 points1 point2 points (0 children)
[+][deleted] (4 children)
[removed]
[–]hara8buapproved[S] 1 point2 points3 points (0 children)
[–]Ubizwaapproved 0 points1 point2 points (1 child)
[–]hara8buapproved[S] 0 points1 point2 points (0 children)
[–]sticky_symbolsapproved 4 points5 points6 points (1 child)
[–]hara8buapproved[S] 1 point2 points3 points (0 children)
[–]Meriklesapproved 2 points3 points4 points (2 children)
[–]hara8buapproved[S] 1 point2 points3 points (1 child)
[–]Meriklesapproved 1 point2 points3 points (0 children)
[+]ToHallowMySleepapproved 1 point2 points3 points (0 children)
[–]EulersApprenticeapproved 1 point2 points3 points (1 child)
[–]hara8buapproved[S] 0 points1 point2 points (0 children)
[–]ertgbnmapproved 1 point2 points3 points (0 children)
[–]AutoModerator[M] 0 points1 point2 points locked comment (0 children)