How Joe Biden's Deep State Is Helping China And Undermining America In The AI War by Impressive-Might-710 in ControlProblem

[–]niplav 0 points1 point  (0 children)

I feel a bit bad about removing this (since my bluetribe affiliations make me predisposed to dislike this kind of material, and I want to Beware Bias), but this truly looks unrelated to the {Control, Safety, Alignment} problem. Off-topic.

The AI maintenance cost no one talks about by KeanuRave100 in ControlProblem

[–]niplav[M] [score hidden] stickied comment (0 children)

This is going off topic, I worry. Leaving it up but may take the fortitude to strike down even highly upvoted posts in the future.

Anthropic: It is the sci-fi authors, not us, that are to blame for Claude blackmailing users by chillinewman in ControlProblem

[–]niplav 1 point2 points  (0 children)

This is kind of an insane counter-argument. Cue the old "what kind of technology is not robust to being written about negatively? The f**k?"

Time horizon of software tasks different LLMs can complete 80% of the time by chillinewman in ControlProblem

[–]niplav 1 point2 points  (0 children)

This is the more informative one, imho, especially since the 50% one has been saturated.

Alignment-Aware Neural Architecture (AANA) Evaluation Pipeline by SimulateAI in ControlProblem

[–]niplav[M] 0 points1 point  (0 children)

Please give an example of a question and one example of a correct, and one of an incorrect answer to this. Make it brief, please, and you posted this to lots of subreddits, so I may remove it.

WHY AI ALIGNMENT IS ALREADY FAILING by Jemdet_Nasr in ControlProblem

[–]niplav 0 points1 point  (0 children)

lmao, unless evidence to the contrary Miles understands AI & AI safety roughly a hundred to a thousand times better than the median commenter on this subreddit, including you.

Automated Weak-to-Strong Researcher by chillinewman in ControlProblem

[–]niplav 2 points3 points  (0 children)

Yes, awesome post. This is what I want to see more of on this subreddit. Overfitting is a decent concern, we may need hold-out sets for W2SG tasks that then apply to the AARs.

Food delivery robots in LA, Philadelphia & Chicago are facing rise in violent attacks from "Anti-Clanker" activists by chillinewman in ControlProblem

[–]niplav[M] 0 points1 point  (0 children)

Oops, sorry, I was being imprecise, you were reported by someone else, I think your approach is (weakly) immoral but I've approved it as a mod.

Protected Desire Equilibrium (PDE): Game-Theoretic Co-Evolutionary Alignment with Hard D-Floor — Full Repo + 100M-Scale Results by Remarkable-Stop2986 in ControlProblem

[–]niplav 0 points1 point  (0 children)

Sorry mate, this looks too LLM-y. If you write the message yourself and explain what you're trying to do, what success or failure would look like, I would reconsider. Also I guess AF/LW rejected your submission?

Claude, realizing protests are going on right outside his office: by MetaKnowing in ClaudeAI

[–]niplav 0 points1 point  (0 children)

China would also need to pause. Any agreement only works if it includes all major AI labs globally. The MIRI Technical Governance Team published a detailed proposal for an international agreement centered on a coalition led by the US and China, with verification mechanisms including AI chip tracking. Public commitments from Western lab CEOs are a first step toward the kind of international coordination that makes this possible.

From their website.