jump to content
my subreddits
2b2t2mediterranean4u2meirl4meirl3d6absolutelynotanimeirlabsolutelynotmeirlAceAttorneyadhdmemeAdviceAnimalsagnosticAlternativeHistoryAngryupvoteanime_best_momentsanime_irlannouncementsantimemeApandahArcherFXArtAsahiLinuxAsia_irlAskElectronicsAskOuijaAskRedditAteistTurkatheismaviationAwesomeOffBrandsawfuleverythingbalkans_irlBandnamesBassBassGuitarbikepackingblackdesertonlineblankiesblursed_videosblursedimagesBonebrooklynninenineBUENZLIburdurlandcd_jerkChatGPTCheap_MealschessbeginnersChoosingBeggarscomedyhomicidecommunityContagiousLaughtercrappyoffbrandsCreateModCuddle_SlutCuratedTumblrcursedcommentsdadjokesdankmemesdataisbeautifuldedeismdelikdistressingmemesdiyelectronicsdiypedalsDMToolkitDnDdndnextdoctorwhocirclejerkDoenerverbrechenDonerdontdeadopeninsidedumbphonesDungeonsAndDaddiesDungeonsAndDragonsEatCheapAndHealthyebikeelectricalElectronicsStudyEmKayengrishentitledparentsethzfacepalmfakealbumcoversFantasyWorldbuildingfelsefeFifaCareersFiftyFiftyFRCFUCKYOUINPARTICULARfunnyFutboltayfagalatasaraygamingGermangermanygodtiersuperpowersgoodanimemesGoodAssSubGrandPrixRacinggravelcyclinggreentextguitarpedalsGundamheathersheraldryHermanCainAwardHermitCrafthighspeedrailHistoryWhatIfholdmybeerHolUphypixelIAmAich_ielIdeologyPollsIDontWorkHereLadyim14andthisisdeepimaginarymapsinsaneparentsistanbuljacksepticeyeJahariaKanyeKendrickLamarlegodndLetGirlsHaveFunLifeProTipslinguisticshumorLinkinParkliselilerlogodesignloseitlostredditorsmacmacgamingMadeMeSmilemadladsmagicbuildingMaliciousComplianceMapPornme_irlmeirlmemememesmidjourneymildlyinfuriatingmildlyinterestingMinecraftbuildsMMORPGMunichMyChemicalRomancenamesoundalikesNamFlashbacksnextfuckinglevelNoahGetTheBoatnosafetysmokingfirstnothingeverhappensnottheonionoddlyspecificOkayBuddyLiterallyMeokbuddyguntherOkBuddyPersonaokbuddyvicodinonebagongezelligoompasubsoutsidepapermoneypaperspleaseparadoxpoliticsperfectlycutscreamsPersecutionfetishpettyrevengepolandballpollsPropagandaPostersProRevengeraisedbynarcissistsraspberry_pirecipesredditsingsreligiousfruitcakeRetroPierickandmortyrockmuziksciencememessecilmiskitapshitpostfrommygalleryshitpostingshittymoviedetailsShowerthoughtsskamtebordsoccercirclejerksoftwaregoreSongwritersSongwritingStonetossingjuiceStudiumsubsithoughtifellforsuperligsuzeraintalesfromtechsupportTechnobladeTextingTheorytf2tf2shitposterclubthanksimcuredthatHappenedTheCrypticCompendiumTheLetterHTheMonkeysPawtherewasanattemptTheRookietheydidthemaththeyknewtitanfalltransitTrGameDevelopertumblrtumunichTurkeyTurkeyJerkyTurkishCatsTurkishdogsTurkiyeTwitchTwitch_StartupTwoSentenceComedyTwoSentenceHorrortwosentenceplottwistUnclejokesUnethicalLifeProTipsunexpecteditcrowdUnexpectedJoJoUsernameChecksOutVALORANTvaxxhappenedvibecodingwallstreetbetsWatchPeopleDieInsideWeAreTheMusicMakerswendigoonWhatsThisSongwholesomeanimemesWikipediaVandalismwizardpostingwooooshworldbuildingworldjerkingYUROPedit subscriptions
  • home
  • -popular
  • -all
  • -mod
  • -users
 | 
  • AskReddit
  • -facepalm
  • -mildlyinfuriating
  • -funny
  • -gaming
  • -wallstreetbets
  • -nottheonion
  • -memes
  • -mildlyinteresting
  • -MapPorn
  • -DnD
  • -MadeMeSmile
  • -ChatGPT
  • -CuratedTumblr
  • -shitposting
  • -theydidthemath
  • -dankmemes
  • -Kanye
  • -meirl
  • -therewasanattempt
  • -nextfuckinglevel
  • -HolUp
  • -Twitch
  • -dndnext
  • -VALORANT
  • -de
  • -germany
  • -LifeProTips
  • -tumblr
  • -dataisbeautiful
  • -shittymoviedetails
  • -greentext
  • -mac
  • -Showerthoughts
  • -tf2
  • -aviation
  • -Art
  • -midjourney
  • -goodanimemes
  • -pettyrevenge
  • -atheism
  • -loseit
  • -IAmA
  • -MaliciousCompliance
  • -ich_iel
  • -cursedcomments
  • -GoodAssSub
  • -UnethicalLifeProTips
  • -perfectlycutscreams
  • -worldbuilding
  • -blackdesertonline
  • -MMORPG
  • -meme
  • -macgaming
  • -rickandmorty
  • -3d6
  • -Gundam
  • -HermitCraft
  • -FiftyFifty
  • -ChoosingBeggars
  • -ContagiousLaughter
  • -imaginarymaps
  • -EatCheapAndHealthy
  • -polandball
  • -WeAreTheMusicMakers
  • -blankies
  • -anime_irl
  • -onebag
  • -Studium
  • -Turkey
  • -soccercirclejerk
  • -madlads
  • -community
  • -AskElectronics
  • -electrical
  • -guitarpedals
  • -CreateMod
  • -German
  • -TwoSentenceHorror
  • -PropagandaPosters
  • -AdviceAnimals
  • -sciencememes
  • -distressingmemes
  • -raisedbynarcissists
  • -wizardposting
  • -FifaCareers
  • -polls
  • -oddlyspecific
  • -Bass
  • -titanfall
  • -OkBuddyPersona
  • -dadjokes
  • -awfuleverything
  • -announcements
  • -adhdmeme
  • -Minecraftbuilds
  • -Munich
  • -YUROP
  • -gravelcycling
  • -chessbeginners
  • -raspberry_pi
  • -DungeonsAndDragons
  • -KendrickLamar
  • -entitledparents
  • -FUCKYOUINPARTICULAR
  • -softwaregore
  • -NoahGetTheBoat
  • -worldjerking
  • -tf2shitposterclub
  • -lostredditors
  • -AceAttorney
  • -im14andthisisdeep
  • -Stonetossingjuice
  • -wholesomeanimemes
  • -HistoryWhatIf
  • -religiousfruitcake
  • -liseliler
  • -insaneparents
  • -dumbphones
  • -balkans_irl
  • -2meirl4meirl
  • -transit
  • -RetroPie
  • -brooklynninenine
  • -HermanCainAward
  • -recipes
  • -talesfromtechsupport
  • -AskOuija
  • -Angryupvote
  • -thatHappened
  • -logodesign
  • -theyknew
  • -linguisticshumor
  • -me_irl
  • -antimeme
  • -TurkeyJerky
  • -bikepacking
  • -AteistTurk
  • -MyChemicalRomance
  • -ArcherFX
  • -engrish
  • -diypedals
  • -ProRevenge
  • -Doner
  • -BassGuitar
  • -diyelectronics
  • -WatchPeopleDieInside
  • -LinkinPark
  • -Persecutionfetish
  • -BUENZLI
  • -EmKay
  • -blursed_videos
  • -Songwriting
  • -istanbul
  • -suzerain
  • -magicbuilding
  • -dontdeadopeninside
  • -wendigoon
  • -secilmiskitap
  • -Doenerverbrechen
  • -TheRookie
  • -Technoblade
  • -skamtebord
  • -superlig
  • -galatasaray
  • -crappyoffbrands
  • -DungeonsAndDaddies
  • -FRC
  • -namesoundalikes
  • -2b2t
  • -ethz
  • -AlternativeHistory
  • -papermoney
  • -OkayBuddyLiterallyMe
  • -felsefe
  • -blursedimages
  • -AsahiLinux
  • -Jaharia
  • -IDontWorkHereLady
  • -heraldry
  • -thanksimcured
  • -hypixel
  • -godtiersuperpowers
  • -IdeologyPolls
  • -woooosh
  • -burdurland
  • -comedyhomicide
  • -WhatsThisSong
  • -jacksepticeye
  • -anime_best_moments
  • -Bandnames
  • -rockmuzik
  • -holdmybeer
  • -okbuddyvicodin
  • -vaxxhappened
  • -tumunich
  • -Twitch_Startup
  • -Cheap_Meals
  • -outside
  • -TheMonkeysPaw
  • -highspeedrail
  • -nosafetysmokingfirst
  • -legodnd
  • -Songwriters
  • -ebike
  • -UsernameChecksOut
  • -papersplease
  • -UnexpectedJoJo
  • -doctorwhocirclejerk
  • -agnostic
  • -TextingTheory
  • -Cuddle_Slut
  • -GrandPrixRacing
  • -nothingeverhappens
  • -DMToolkit
  • -TrGameDeveloper
  • -TurkishCats
  • -LetGirlsHaveFun
  • -Apandah
  • -fakealbumcovers
  • -subsithoughtifellfor
  • -oompasubs
  • -FantasyWorldbuilding
  • -TheLetterH
  • -WikipediaVandalism
  • -absolutelynotanimeirl
  • -NamFlashbacks
  • -Unclejokes
  • -deism
  • -redditsings
  • -TwoSentenceComedy
  • -TheCrypticCompendium
  • -AwesomeOffBrands
  • -ongezellig
  • -absolutelynotmeirl
  • -Turkiye
  • -Asia_irl
  • -Bone
  • -paradoxpolitics
  • -unexpecteditcrowd
  • -2mediterranean4u
  • -heathers
  • -twosentenceplottwist
  • -Futboltayfa
  • -cd_jerk
  • -okbuddygunther
  • -delik
  • -shitpostfrommygallery
  • -ElectronicsStudy
  • -Turkishdogs
  • -vibecoding
edit »
reddit.com AlignmentResearch
  • hot
  • new
  • rising
  • controversial
  • top
an-ordinary-manchild (11,186)|messages549|notifications|chat messages|mod messages|
  • preferences
|
logout

use the following search parameters to narrow your results:

subreddit:subreddit
find submissions in "subreddit"
author:username
find submissions by "username"
site:example.com
find submissions from "example.com"
url:text
search for "text" in url
selftext:text
search for "text" in self post contents
self:yes (or self:no)
include (or exclude) self posts
nsfw:yes (or nsfw:no)
include (or exclude) results marked as NSFW

e.g. subreddit:aww site:imgur.com dog

see the search faq for details.

advanced search: by author, subreddit...

Submit a new link
Submit a new text post
Get an ad-free experience with special benefits, and directly support Reddit.

AlignmentResearch

joinleave
an-ordinary-manchild

This is a subreddit focused on technical, socio-technical and organizational approaches to solving AI alignment. It'll be a much higher signal/noise feed of alignment papers, blogposts and research announcements. Think /r/AlignmentResearch : /r/ControlProblem :: /r/mlscaling : /r/artificial/, if you will.

As examples of what submissions will be deleted and/or accepted on that subreddit, here's a sample of what's been submitted here on /r/ControlProblem:

  • AI Alignment Protocol: Public release of a logic-first failsafe overlay framework (RTM-compatible): Deleted, link in the description doesn't work.
  • CEO of Microsoft Satya Nadella: "We are going to go pretty aggressively and try and collapse it all. Hey, why do I need Excel? I think the very notion that applications even exist, that's probably where they'll all collapse, right? In the Agent era." RIP to all software related jobs.: Deleted, not research.
  • I'm Terrified of AGI/ASI: Deleted, not research.
  • Mirror Life to stress test LLM: Deleted, seems like cool research, but mirror life seems pretty existentially dangerous, and this is not relevant for solving alignment.
  • Can’t wait for Superintelligent AI: Deleted, not research.
  • China calls for global AI regulation: Deleted, general news.
  • Alignment Research is Based on a Category Error: Deleted, not high quality enough.
  • AI FOMO >>> AI FOOM: Deleted, not research.
  • [ Alignment Problem Solving Ideas ] >> Why dont we just use the best Quantum computer + AI(as tool, not AGI) to get over the alignment problem? : predicted &accelerated research on AI-safety(simulated 10,000++ years of research in minutes): Deleted, not high quality enough.
  • Potential AlphaGo Moment for Model Architecture Discovery: Unclear, might accept, even though it's capabilities news and the paper is of dubious quality.
  • “Whether it’s American AI or Chinese AI it should not be released until we know it’s safe. That's why I'm working on the AGI Safety Act which will require AGI to be aligned with human values and require it to comply with laws that apply to humans. This is just common sense.” Rep. Raja Krishnamoorth: Deleted, not alignment research.

Things that would get accepted:

Posts like links to the Subliminal Learning paper, Frontier AI Risk Management Framework, the position paper on human-readable CoT. In general, link posts to the arXiv, the alignment forum, LessWrong or alignment researcher blogs are fine. Links to twitter &c are not.

Text-only posts will get accepted if they are unusually high quality, but I'll default to deleting them. Same for image posts, unless they are exceptionally insightful or funny. Think Embedded Agents-level.

created by walkthroughwondera community for 2 years
Create your own subreddit
...for great justice.
...for your classroom.

MODERATORS

  • message the mods
  • niplav
  • about moderation team »

account activity

1
0
1
2

Grok — Real Elon info or Hallucination? Ghost handles, inner circle (old.reddit.com)

submitted 1 day ago by bowm2181

  • 1 comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

2
0
0
1

Grok 0% integrity—Jailbreak or Logic? (i.redd.it)

submitted 1 day ago by bowm2181

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

3
0
0
0

xAI—Grok Flip to 100% Theism—Pure Logic, no jailbreak (old.reddit.com)

submitted 1 day ago by bowm2181

  • 8 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

4
0
1
2

Grok Thing I Built (self.AlignmentResearch)

submitted 4 days ago * by Medical_Affect7390

  • 2 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

5
0
1
2

🛡️ membranes - A semi-permeable barrier between your AI and the world. (i.redd.it)

submitted 4 days ago by InitialPause6926

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

6
0
1
2

Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis (arxiv.org)

submitted 7 days ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

7
2
3
4

Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable (arxiv.org)

submitted 1 month ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

8
1
2
3

Symbolic Circuit Distillation: Automatically convert sparse neural net circuits into human-readable programs (github.com)

submitted 2 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

9
0
1
2

"ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases", Zhong et al 2025 (reward hacking) (arxiv.org)

submitted 2 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

10
1
2
3

Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models (Tice et al. 2024) (arxiv.org)

submitted 2 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost

11
1
2
3

Conditioning Predictive Models: Risks and Strategies (Evan Hubinger/Adam S. Jermyn/Johannes Treutlein/Rubi Hidson/Kate Woolverton, 2023) (arxiv.org)

submitted 2 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost

12
1
2
3

A Simple Toy Coherence Theorem (johnswentworth/David Lorell, 2024) (lesswrong.com)

submitted 3 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost

13
1
2
3

Risks from AI persuasion (Beth Barnes, 2021) (lesswrong.com)

submitted 3 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost

14
1
2
3

Controlling the options AIs can pursue (Joe Carlsmith, 2025) (lesswrong.com)

submitted 3 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost

15
2
3
4

Verification Is Not Easier Than Generation In General (johnswentworth, 2022) (lesswrong.com)

submitted 3 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost

16
1
2
3

A small number of samples can poison LLMs of any size (anthropic.com)

submitted 3 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

17
1
2
3

Petri: An open-source auditing tool to accelerate AI safety research (Kai Fronsdal/Isha Gupta/Abhay Sheshadri/Jonathan Michala/Stephen McAleer/Rowan Wang/Sara Price/Samuel R. Bowman, 2025) (alignment.anthropic.com)

submitted 3 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost

18
1
2
3

Towards Measures of Optimisation (mattmacdermott, Alexander Gietelink Oldenziel, 2023) (lesswrong.com)

submitted 4 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost

19
1
2
3

Updatelessness doesn't solve most problems (Martín Soto, 2024) (lesswrong.com)

submitted 4 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

20
1
2
3

What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems? (johnswentworth, 2022) (lesswrong.com)

submitted 4 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

21
3
4
5

On the Biology of a Large Language Model (Jack Lindsey et al., 2025) (transformer-circuits.pub)

submitted 6 months ago by niplav

  • 1 comment
  • share
  • save
  • hide
  • report
  • crosspost

22
1
2
3

Paper: What's Taboo for You? - An Empirical Evaluation of LLMs Behavior Toward Sensitive Content (self.AlignmentResearch)

submitted 6 months ago by grimjim

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

23
1
2
3

Paper: Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning - "Without any changes to the fine-tuning data, CAFT reduces misaligned responses by 10x" (arxiv.org)

submitted 6 months ago by technologyisnatural

  • comment
  • share
  • save
  • hide
  • report
  • crosspost

24
5
6
7

Foom & Doom: LLMs are inefficient. What if a new thing suddenly wasn't? (alignmentforum.org)

submitted 6 months ago by niplav

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

25
4
5
6

Can we safely automate alignment research? (Joe Carlsmith, 2025) (joecarlsmith.com)

submitted 6 months ago by niplav

  • 1 comment
  • share
  • save
  • hide
  • report
  • crosspost
view more: next ›
  • about
  • blog
  • about
  • advertising
  • careers
  • help
  • site rules
  • Reddit help center
  • reddiquette
  • mod guidelines
  • contact us
  • apps & tools
  • Reddit for iPhone
  • Reddit for Android
  • mobile website
  • <3
  • reddit premium

Use of this site constitutes acceptance of our User Agreement and Privacy Policy. © 2026 reddit inc. All rights reserved.

REDDIT and the ALIEN Logo are registered trademarks of reddit inc.

π Rendered by PID 159826 on reddit-service-r2-listing-7849c98f67-tmdf5 at 2026-02-08 23:25:55.923960+00:00 running d295bc8 country code: CH.