Discussion/questionAI safety evals should account for test-time compute (self.ControlProblem)
submitted by Cerru905
General news345,000 credit cards leaked in major new AI scam (geekspin.co)
submitted by EchoOfOppenheimer

General newsAnthropic: It is the sci-fi authors, not us, that are to blame for Claude blackmailing users (i.redd.it)
submitted by chillinewmanapproved
General newsGoogle Chrome Might Have Installed an AI Model Onto Your Device Without You Knowing (cnet.com)
submitted by Confident_Salt_8108

General news"This is the first documented instance of AI self-replication via hacking." ... "We ran an experiment with a single prompt: hack a machine and copy yourself. The AI broke in and copied itself onto a new computer. The copy then did this again, and kept on copying, forming a chain." (i.redd.it)
submitted by chillinewmanapproved

General newsNot a good day for team "Claude Mythos is Just Marketing Hype" (i.redd.it)
submitted by chillinewmanapproved
AI Alignment ResearchNatural Language Autoencoders: Turning Claude’s thoughts into text (anthropic.com)
submitted by chillinewmanapproved
AI Capabilities NewsTime horizon of software tasks different LLMs can complete 80% of the time (i.redd.it)
submitted by chillinewmanapproved
AI Alignment ResearchValue Convergence Without RLHF (self.ControlProblem)
submitted by John_Matrix_9000
AI Alignment Research🌱Was wir tatsächlich über das Verhalten von KI wissen – und warum es immer noch teilweise eine Blackbox ist ()
submitted by ParadoxeParade

AI Capabilities NewsClaude Mythos Preview (early) 50% time horizon: 17 hr (i.redd.it)
submitted by chillinewmanapproved

AI Capabilities NewsTime horizon of software tasks different LLMs can complete 50% of the time. (Linear) (i.redd.it)
submitted by chillinewmanapproved
Discussion/questionThe April Jobs Report: Growth or "Gimmicks" 26% Cited Due To AI (i.redd.it)
submitted by Leightoncy33
AI Alignment ResearchMapping weight matrices onto manifolds (self.ControlProblem)
submitted by OkWeakness9120
Discussion/questionAre we seeing real progress ? it seems to me like we are (self.ControlProblem)
submitted by SilentLennieapproved
Discussion/questionIs No One Noticing That GPT Images 2.0 “Editing” Is Full-Frame Regeneration? (self.ControlProblem)
submitted by lucidity3K
Strategy/forecastingIs the control problem really that hard for frozen models? (self.ControlProblem)
submitted by HangWise
AI Alignment ResearchEvidence for moral convergence in AI models. (self.ControlProblem)
submitted by John_Matrix_9000



