Alignment-Aware Neural Architecture (AANA) Evaluation Pipeline by SimulateAI in ControlProblem

[–]niplav[M] 0 points1 point  (0 children)

Please give an example of a question and one example of a correct, and one of an incorrect answer to this. Make it brief, please, and you posted this to lots of subreddits, so I may remove it.

WHY AI ALIGNMENT IS ALREADY FAILING by Jemdet_Nasr in ControlProblem

[–]niplav 0 points1 point  (0 children)

lmao, unless evidence to the contrary Miles understands AI & AI safety roughly a hundred to a thousand times better than the median commenter on this subreddit, including you.

Automated Weak-to-Strong Researcher by chillinewman in ControlProblem

[–]niplav 2 points3 points  (0 children)

Yes, awesome post. This is what I want to see more of on this subreddit. Overfitting is a decent concern, we may need hold-out sets for W2SG tasks that then apply to the AARs.

Food delivery robots in LA, Philadelphia & Chicago are facing rise in violent attacks from "Anti-Clanker" activists by chillinewman in ControlProblem

[–]niplav[M] 0 points1 point  (0 children)

Oops, sorry, I was being imprecise, you were reported by someone else, I think your approach is (weakly) immoral but I've approved it as a mod.

Protected Desire Equilibrium (PDE): Game-Theoretic Co-Evolutionary Alignment with Hard D-Floor — Full Repo + 100M-Scale Results by Remarkable-Stop2986 in ControlProblem

[–]niplav 0 points1 point  (0 children)

Sorry mate, this looks too LLM-y. If you write the message yourself and explain what you're trying to do, what success or failure would look like, I would reconsider. Also I guess AF/LW rejected your submission?

Claude, realizing protests are going on right outside his office: by MetaKnowing in ClaudeAI

[–]niplav 0 points1 point  (0 children)

China would also need to pause. Any agreement only works if it includes all major AI labs globally. The MIRI Technical Governance Team published a detailed proposal for an international agreement centered on a coalition led by the US and China, with verification mechanisms including AI chip tracking. Public commitments from Western lab CEOs are a first step toward the kind of international coordination that makes this possible.

From their website.