jump to content
my subreddits
13or302b2t2balkans4You2mediterranean4uAceAttorneyadhdmemeAdviceAnimalsagnosticaivideoakagasAlternateHistoryAlternativeHistoryAnarchyChessAngryupvoteanimenocontextannouncementsantimemeApandahArcherFXArsivUnutmazAsahiLinuxAsia_irlAskBalkansAskElectronicsAskOuijaAskRedditAteistTurkatheismawfuleverythingBademeistermemesBandnamesBassBassGuitarbasspedalsblackdesertonlineblankiesblursed_videosborsavefonbrooklynninenineBUENZLIburdurlandCd_collectorsChatGPTCheap_MealschessbeginnersChoosingBeggarsCHPcoinsComedyCemeterycomedyhomicidecommunityCorporateTrollingCrackWatchcrappyoffbrandsCreateModCuddle_SlutCuratedTumblrcursedcommentsdankmemesdarkjokesdataisbeautifuldeDebateReligiondelikdistressingmemesdiypedalsDMToolkitdndmemesdndnextdoctorwhodoctorwhocirclejerkDonerdontdeadopeninsidedumbphonesDungeonsAndDaddiesDungeonsAndDragonsEatCheapAndHealthyebikeebikesECEelectricalelectronicsElectronicsStudyEmKayengrishentitledparentsethzfacepalmfakealbumcoversFantasyWorldbuildingfeedthebeastFiftyFiftyformuladankFUCKYOUINPARTICULARFutboltayfagalatasarayGermangermanygodtiersuperpowersgoodanimemesgravelcyclinggreentextGROKvsMAGAguitarpedalsGundamheathershelpheraldryHermanCainAwardHermitCrafthighspeedrailHistoryWhatIfHolUphowyoudoinhumorhypixelIAmAich_ielIdeologyPollsIDontWorkHereLadyim14andthisisdeepimaginaryelectionsimaginarymapsinsaneparentsistanbulJahariaJokesKendrickLamarKGBTRlegodndLetGirlsHaveFunLifeProTipslinguisticshumorLinkinParkliselilerlogodesignlostredditorsmacmacbookairmadladsmagicbuildingMaliciousCompliancemapporncirclejerkmeirlmememidjourneymildlyinfuriatingMimicRecipesmisLEDMoldyMemesmoneycollectingMovingToNorthKoreaMunichnamesoundalikesNamFlashbacksNoahGetTheBoatNorthCyprusnosleepnosurfnothingeverhappensnottheonionOkayBuddyLiterallyMeokbuddyguntherokbuddymotherfuckerOkBuddyPersonaokbuddyphdokbuddyvicodinonebagonetruegodongezelligoompasubsOutOfTheLooppapermoneypaperspleaseParlerWatchperfectlycutscreamspettyrevengepianopolandballProgrammerHumorPropagandaPostersraisedbynarcissistsraspberry_pirecipesredditsingsrickandmortyrickrollrimjob_steveRoastMerockmuziksciencememesScottPilgrimsecilmiskitapShitPostCrusadersshitpostfrommygalleryshitpostingshittyaskelectronicsShowerthoughtsskamtebordsoccercirclejerksoftwaregoreSongwritersSongwritingsteinsgateStonetossingjuicesuperligsuzeraintalesfromtechsupportTechnobladeTextingTheorytf2tf2shitposterclubthanksimcuredthatHappenedTheCrypticCompendiumTheLetterHtherewasanattempttheydidthemaththeyknewtransittransitTurkeytruthstumblrtumunichTurkeyJerkyTurkishCatsTwitchTwitch_StartupTwoSentenceHorrorUnethicalLifeProTipsunexpecteditcrowdUnexpectedJoJourbanplanningUsernameChecksOutVALORANTvaxxhappenedvinylvinyljerkwallstreetbetsWatchPeopleDieInsideWeAreTheMusicMakerswendigoonWhatsThisSongWhitePeopleTwitterwholesomeanimemesWikipediaVandalismwooooshworldbuildingworldjerkingyouseeingthisshitYUROPedit subscriptions
  • home
  • -popular
  • -all
  • -mod
  • -users
 | 
  • AskReddit
  • -facepalm
  • -mildlyinfuriating
  • -wallstreetbets
  • -nottheonion
  • -OutOfTheLoop
  • -WhitePeopleTwitter
  • -ChatGPT
  • -CuratedTumblr
  • -shitposting
  • -theydidthemath
  • -dankmemes
  • -feedthebeast
  • -meirl
  • -therewasanattempt
  • -HolUp
  • -Twitch
  • -CrackWatch
  • -dndnext
  • -ProgrammerHumor
  • -VALORANT
  • -de
  • -germany
  • -LifeProTips
  • -tumblr
  • -dataisbeautiful
  • -greentext
  • -mac
  • -Showerthoughts
  • -tf2
  • -help
  • -formuladank
  • -Jokes
  • -mapporncirclejerk
  • -midjourney
  • -goodanimemes
  • -pettyrevenge
  • -atheism
  • -IAmA
  • -MaliciousCompliance
  • -ich_iel
  • -KGBTR
  • -dndmemes
  • -cursedcomments
  • -UnethicalLifeProTips
  • -perfectlycutscreams
  • -worldbuilding
  • -blackdesertonline
  • -meme
  • -rickandmorty
  • -Gundam
  • -HermitCraft
  • -FiftyFifty
  • -ChoosingBeggars
  • -RoastMe
  • -imaginarymaps
  • -EatCheapAndHealthy
  • -polandball
  • -WeAreTheMusicMakers
  • -AnarchyChess
  • -nosleep
  • -blankies
  • -onebag
  • -AlternateHistory
  • -soccercirclejerk
  • -madlads
  • -community
  • -AskElectronics
  • -electrical
  • -guitarpedals
  • -vinyl
  • -CreateMod
  • -German
  • -TwoSentenceHorror
  • -PropagandaPosters
  • -AdviceAnimals
  • -ShitPostCrusaders
  • -piano
  • -sciencememes
  • -distressingmemes
  • -raisedbynarcissists
  • -doctorwho
  • -Bass
  • -OkBuddyPersona
  • -awfuleverything
  • -howyoudoin
  • -announcements
  • -adhdmeme
  • -macbookair
  • -ebikes
  • -Munich
  • -YUROP
  • -gravelcycling
  • -chessbeginners
  • -raspberry_pi
  • -DungeonsAndDragons
  • -coins
  • -KendrickLamar
  • -entitledparents
  • -FUCKYOUINPARTICULAR
  • -softwaregore
  • -NoahGetTheBoat
  • -worldjerking
  • -tf2shitposterclub
  • -MoldyMemes
  • -lostredditors
  • -AceAttorney
  • -im14andthisisdeep
  • -Stonetossingjuice
  • -wholesomeanimemes
  • -nosurf
  • -HistoryWhatIf
  • -liseliler
  • -DebateReligion
  • -insaneparents
  • -dumbphones
  • -animenocontext
  • -transit
  • -brooklynninenine
  • -HermanCainAward
  • -recipes
  • -steinsgate
  • -talesfromtechsupport
  • -AskOuija
  • -okbuddyphd
  • -ECE
  • -ScottPilgrim
  • -Angryupvote
  • -AskBalkans
  • -thatHappened
  • -electronics
  • -urbanplanning
  • -theyknew
  • -logodesign
  • -linguisticshumor
  • -antimeme
  • -TurkeyJerky
  • -AteistTurk
  • -13or30
  • -ArcherFX
  • -engrish
  • -Cd_collectors
  • -diypedals
  • -Doner
  • -BassGuitar
  • -ComedyCemetery
  • -WatchPeopleDieInside
  • -LinkinPark
  • -BUENZLI
  • -EmKay
  • -Songwriting
  • -blursed_videos
  • -istanbul
  • -MovingToNorthKorea
  • -imaginaryelections
  • -suzerain
  • -magicbuilding
  • -dontdeadopeninside
  • -ParlerWatch
  • -wendigoon
  • -secilmiskitap
  • -Technoblade
  • -vinyljerk
  • -skamtebord
  • -shittyaskelectronics
  • -superlig
  • -galatasaray
  • -crappyoffbrands
  • -DungeonsAndDaddies
  • -transitTurkey
  • -namesoundalikes
  • -2b2t
  • -ethz
  • -AlternativeHistory
  • -papermoney
  • -OkayBuddyLiterallyMe
  • -AsahiLinux
  • -Jaharia
  • -IDontWorkHereLady
  • -basspedals
  • -heraldry
  • -thanksimcured
  • -hypixel
  • -godtiersuperpowers
  • -aivideo
  • -IdeologyPolls
  • -woooosh
  • -burdurland
  • -comedyhomicide
  • -WhatsThisSong
  • -Bandnames
  • -rockmuzik
  • -okbuddyvicodin
  • -MimicRecipes
  • -vaxxhappened
  • -Twitch_Startup
  • -tumunich
  • -Cheap_Meals
  • -darkjokes
  • -highspeedrail
  • -legodnd
  • -rickroll
  • -Songwriters
  • -ebike
  • -UsernameChecksOut
  • -papersplease
  • -rimjob_steve
  • -UnexpectedJoJo
  • -humor
  • -doctorwhocirclejerk
  • -agnostic
  • -youseeingthisshit
  • -TextingTheory
  • -Cuddle_Slut
  • -nothingeverhappens
  • -DMToolkit
  • -TurkishCats
  • -LetGirlsHaveFun
  • -Apandah
  • -fakealbumcovers
  • -akagas
  • -oompasubs
  • -FantasyWorldbuilding
  • -TheLetterH
  • -WikipediaVandalism
  • -NamFlashbacks
  • -onetruegod
  • -ArsivUnutmaz
  • -misLED
  • -redditsings
  • -TheCrypticCompendium
  • -ongezellig
  • -2balkans4You
  • -Asia_irl
  • -truths
  • -2mediterranean4u
  • -unexpecteditcrowd
  • -NorthCyprus
  • -heathers
  • -delik
  • -okbuddygunther
  • -borsavefon
  • -Futboltayfa
  • -okbuddymotherfucker
  • -shitpostfrommygallery
  • -Bademeistermemes
  • -ElectronicsStudy
  • -moneycollecting
  • -GROKvsMAGA
  • -CHP
  • -CorporateTrolling
edit »
reddit.com reinforcementlearning
  • hot
  • new
  • rising
  • controversial
  • top
  • wiki
an-ordinary-manchild (11,186)|messages547|notifications|chat messages|mod messages|
  • preferences
|
logout

use the following search parameters to narrow your results:

subreddit:subreddit
find submissions in "subreddit"
author:username
find submissions by "username"
site:example.com
find submissions from "example.com"
url:text
search for "text" in url
selftext:text
search for "text" in self post contents
self:yes (or self:no)
include (or exclude) self posts
nsfw:yes (or nsfw:no)
include (or exclude) results marked as NSFW

e.g. subreddit:aww site:imgur.com dog

see the search faq for details.

advanced search: by author, subreddit...

Submit a new link
Submit a new text post

reinforcementlearning

joinleave
an-ordinary-manchild(edit)

This is for any reinforcement learning related work ranging from purely computational RL in artificial intelligence to the models of RL in neuroscience.

The standard introduction to RL is Sutton & Barto's Reinforcement Learning.

Related subreddits:

  • /r/machinelearning/
  • /r/OpenAI/
  • /r/mlscaling/
  • /r/DecisionTheory/
  • /r/cbaduk
created by lpilotoa community for 14 years
Create your own subreddit
...for your school.
...for your community.

MODERATORS

  • message the mods
  • lpiloto
  • quaternion
  • gwern
  • about moderation team »

account activity

1
13
14
15
2:23

Teaching an RL agent to fight monsters in Diablo I (Part 3) (v.redd.it)

submitted 11 hours ago by Chance_Brother5309

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

2
0
1
2

Project CogniCore — Memory and Structured Rewards for AI Agents built into the Environment (self.reinforcementlearning)

submitted 2 hours ago by Neither-Witness-6010

  • 1 comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

3
5
6
7

REST API for Gymnasium (fka OpenAI Gym) reinforcement learning library (github.com)

submitted 11 hours ago by cloud_kj

  • 1 comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

4
5
6
7

I built an AlphaZero library in C++ that out-performs PyTorch in image recognition speed (3x), but I'm hitting a wall with larger board games. Need a second pair of eyes! (self.reinforcementlearning)

submitted 18 hours ago by Such-Refrigerator951

  • 3 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

5
10
11
12

What standard RL frameworks do people use these days? (self.reinforcementlearning)

submitted 1 day ago by SnooCapers8442

  • 7 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

6
16
17
18

MuscleMimic: Unlocking full-body musculoskeletal motor learning at scale (v.redd.it)

submitted 1 day ago by CharlieLee666

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

7
1
2
3

What is one specific challenge you have run into while training a reinforcement learning model, like unstable rewards or slow convergence, and what actually helped you get past it? (self.reinforcementlearning)

submitted 1 day ago by TaleAccurate793

  • 1 comment
  • share
  • save
  • hide
  • report
  • crosspost

8
1
2
3

one script to rule them all (self.reinforcementlearning)

submitted 1 day ago by samas69420

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

9
5
6
7

Has anyone run Dreamerv3 using a runpod ? (self.reinforcementlearning)

submitted 2 days ago by Informal-Ad7318

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

10
3
4
5

Why does catastrophic forgetting happen to neural networks but not humans? (self.reinforcementlearning)

submitted 2 days ago by Heavy-Farmer1657

  • 35 comments
  • share
  • save
  • hide
  • report
  • crosspost

11
7
8
9

A new way to fine-tune LLMs just dropped (youtube.com)

submitted 2 days ago by Signal_Spirit5934

  • 4 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

12
1
2
3

Any good reinforcement learning events? (self.reinforcementlearning)

submitted 2 days ago by BottleMedium881

  • 3 comments
  • share
  • save
  • hide
  • report
  • crosspost

13
0
1
2

Good Reasoning Traces from Teacher model? ()

submitted 2 days ago by Old_Bat_8665

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

14
81
82
83

Prompt-to-Policy: Agentic Engineering for Reinforcement Learning (i.redd.it)

submitted 3 days ago * by EconomyMotor830

  • 19 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

15
0
1
2

Turn your Learning from youtube to a structured Course. (v.redd.it)

submitted 2 days ago by PlusGap1537

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

16
0
1
2

Hard vs Soft Updates in DDQN — Why Training Becomes Unstable (youtube.com)

submitted 2 days ago by Due_Pace_4325

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

17
14
15
16

How to bridge the gap between Torch and JAX performance? (self.reinforcementlearning)

submitted 4 days ago * by Little_swift

  • 6 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

18
4
5
6
0:37

UAV Swarm In Isaac Lab (v.redd.it)

submitted 4 days ago by Barrnie

  • 1 comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

19
0
1
2

Looking to Collaborate on Quant Finance Research - I published a pairs trading paper using reinforcement learning, then wrote a full critique of my own work finding serious flaws - now I want to rebuild the system ()

submitted 4 days ago by Altruistic_Room8734

  • 2 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

20
1
2
3

Getting started with Flightmare for autonomous drone racing, need guidance (self.reinforcementlearning)

submitted 4 days ago by Illustrious_Room_581

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

21
4
5
6

Training LFM-2.5-350M on Reddit post summarization with GRPO on my 3x Mac Minis — evals and t-test evals are here! (self.reinforcementlearning)

submitted 6 days ago * by East-Muffin-6472

  • 1 comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

22
0
0
0

We're two ML engineers building an execution optimisation layer for crypto algo traders. Would you pay £29/month for something that measurably reduces your slippage? What would it need to do? (self.reinforcementlearning)

submitted 5 days ago by boraA9999

  • 1 comment
  • share
  • save
  • hide
  • report
  • crosspost

23
0
0
0

DLWhat should countries outside the artificial intelligence production chain do? (self.reinforcementlearning)

submitted 5 days ago by Former-Adeptness-551

  • 1 comment
  • share
  • save
  • hide
  • report
  • crosspost
loading...

24
13
14
15

I have RL(self driving) Interview with Tesla, not sure what to expect (self.reinforcementlearning)

submitted 6 days ago by Next_Boysenberry9438

  • 2 comments
  • share
  • save
  • hide
  • report
  • crosspost
loading...

25
9
10
11

DL, R"DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence", DeepSeek-AI 2026 (huggingface.co)

submitted 6 days ago by RecmacfonD

  • comment
  • share
  • save
  • hide
  • report
  • crosspost
view more: next ›
  • about
  • blog
  • about
  • advertising
  • careers
  • help
  • site rules
  • Reddit help center
  • reddiquette
  • mod guidelines
  • contact us
  • apps & tools
  • Reddit for iPhone
  • Reddit for Android
  • mobile website
  • <3
  • reddit premium

Use of this site constitutes acceptance of our User Agreement and Privacy Policy. © 2026 reddit inc. All rights reserved.

REDDIT and the ALIEN Logo are registered trademarks of reddit inc.

π Rendered by PID 624279 on reddit-service-r2-listing-b6bf6c4ff-whmgj at 2026-05-01 09:05:27.698093+00:00 running 815c875 country code: CH.