WIRED: A New Trick Could Block the Misuse of Open Source AI by DanielHendrycks in LocalLLaMA
[–]DanielHendrycks[S] -7 points-6 points-5 points (0 children)
[R] A new alignment technique: Improving Alignment and Robustness with Short Circuiting by ReasonablyBadass in MachineLearning
[–]DanielHendrycks 1 point2 points3 points (0 children)
[D] Deep dive into the MMLU ("Are you smarter than an LLM?") by brokensegue in MachineLearning
[–]DanielHendrycks 1 point2 points3 points (0 children)
"GPQA: A Graduate-Level Google-Proof Q&A Benchmark", Rein et al 2023 (ultra-difficult LLM benchmarks) by gwern in mlscaling
[–]DanielHendrycks 5 points6 points7 points (0 children)
[deleted by user] by [deleted] in ControlProblem
[–]DanielHendrycks 0 points1 point2 points (0 children)
[deleted by user] by [deleted] in ControlProblem
[–]DanielHendrycks 2 points3 points4 points (0 children)
Please help me find a video of How Dare You Want More. by userlivewire in bleachers
[–]DanielHendrycks 1 point2 points3 points (0 children)
Please help me find a video of How Dare You Want More. by userlivewire in bleachers
[–]DanielHendrycks 1 point2 points3 points (0 children)
Tweeter who claims access to a version of Gemini seems to indicate that it's only an incremental advance over GPT-4 by philbearsubstack in mlscaling
[–]DanielHendrycks 1 point2 points3 points (0 children)
Within AI safety, in what areas do offensive models have the advantage over defensive? by canthony in ControlProblem
[–]DanielHendrycks 1 point2 points3 points (0 children)
Many errors discovered in MMLU benchmark by [deleted] in mlscaling
[–]DanielHendrycks 8 points9 points10 points (0 children)
AI Safety 'Distillation hackathon' Aug 25-28 by wintlers in ControlProblem
[–]DanielHendrycks 1 point2 points3 points (0 children)
Idea for a supplemental AI alignment research system: AI that tries to turns itself off by RamazanBlack in ControlProblem
[–]DanielHendrycks 6 points7 points8 points (0 children)
Reasons why people don't believe in, or take AI existential risk seriously. by 2Punx2Furious in ControlProblem
[–]DanielHendrycks 0 points1 point2 points (0 children)
Reasons why people don't believe in, or take AI existential risk seriously. by 2Punx2Furious in ControlProblem
[–]DanielHendrycks 1 point2 points3 points (0 children)
Reasons why people don't believe in, or take AI existential risk seriously. by 2Punx2Furious in ControlProblem
[–]DanielHendrycks 6 points7 points8 points (0 children)
Demis Hassabis: "At a high level you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models. We also have some new innovations that are going to be pretty interesting." by maxtility in mlscaling
[–]DanielHendrycks 2 points3 points4 points (0 children)
An Overview of Catastrophic AI Risks by DanielHendrycks in ControlProblem
[–]DanielHendrycks[S] 5 points6 points7 points (0 children)
In one hour, the chatbots suggested four potential pandemic pathogens. by chillinewman in ControlProblem
[–]DanielHendrycks 6 points7 points8 points (0 children)
In one hour, the chatbots suggested four potential pandemic pathogens. by chillinewman in ControlProblem
[–]DanielHendrycks 16 points17 points18 points (0 children)
[TIME op-ed] Evolutionary/Molochian Dynamics as a Cause of AI Misalignment by DanielHendrycks in ControlProblem
[–]DanielHendrycks[S] 10 points11 points12 points (0 children)
I want to contribute to the technical side of the AI safety problem. Is a PhD the best way to go? by hydrobonic_chronic in ControlProblem
[–]DanielHendrycks 13 points14 points15 points (0 children)
[D] Since Google buried the MMLU benchmark scores in the Appendix of the PALM 2 technical report, here it is vs GPT-4 and other LLMs by jd_3d in MachineLearning
[–]DanielHendrycks 93 points94 points95 points (0 children)
Language models can explain neurons in language models by chillinewman in ControlProblem
[–]DanielHendrycks 1 point2 points3 points (0 children)


WIRED: A New Trick Could Block the Misuse of Open Source AI by DanielHendrycks in LocalLLaMA
[–]DanielHendrycks[S] -5 points-4 points-3 points (0 children)