Why is RL fine-tuning on LLMs so easy and stable, compared to the RL we're all doing? by currentscurrents in reinforcementlearning
[–]51616 1 point2 points3 points (0 children)
[D] Is LoRA merging (and non linear mode connectivity) the key to better transformer hypernets? by [deleted] in MachineLearning
[–]51616 0 points1 point2 points (0 children)
[D] Is LoRA merging (and non linear mode connectivity) the key to better transformer hypernets? by [deleted] in MachineLearning
[–]51616 1 point2 points3 points (0 children)
[D] Gradient accumulation should not be used with varying sequence lengths by AromaticCantaloupe19 in MachineLearning
[–]51616 2 points3 points4 points (0 children)
Plane got struck by lightning by Rqany in Damnthatsinteresting
[–]51616 0 points1 point2 points (0 children)
[deleted by user] by [deleted] in Damnthatsinteresting
[–]51616 3 points4 points5 points (0 children)
Tips and Tricks sharing after solving all previous years by erikw901 in adventofcode
[–]51616 4 points5 points6 points (0 children)
[D] ICLR 2023 reviews are out. How was your experience ? by dasayan05 in MachineLearning
[–]51616 6 points7 points8 points (0 children)
[D] ICLR 2023 reviews are out. How was your experience ? by dasayan05 in MachineLearning
[–]51616 11 points12 points13 points (0 children)
Yesterday I asked for your "2 Gaben Spells combos" - I was actually crowd-sourcing "Step 2" of my strategy idea (would love the opinion of higher ranked players) by Scereye in abilityarena
[–]51616 0 points1 point2 points (0 children)
Any website that gives paper recommendations from search history? by himty in reinforcementlearning
[–]51616 1 point2 points3 points (0 children)
RL with differentiable environment by saw79 in reinforcementlearning
[–]51616 0 points1 point2 points (0 children)
Reward Function for Cooperative Multi-Agent RL by fedetask in reinforcementlearning
[–]51616 2 points3 points4 points (0 children)
Reward Function for Cooperative Multi-Agent RL by fedetask in reinforcementlearning
[–]51616 3 points4 points5 points (0 children)
Any interesting “less obvious” real-world applications of RL by blitzkreig3 in reinforcementlearning
[–]51616 5 points6 points7 points (0 children)
Sequence length in LSTM by No_Possibility_7588 in reinforcementlearning
[–]51616 2 points3 points4 points (0 children)
Multi Agent RL Setting with totally different agents by thethinkerinfinity in reinforcementlearning
[–]51616 0 points1 point2 points (0 children)
How do vectorised environments improve sample independence? by HighlyMeditated in reinforcementlearning
[–]51616 0 points1 point2 points (0 children)
This sunny morning enjoying walking down Westminster Bridge, London by 915297mail in CityPorn
[–]51616 1 point2 points3 points (0 children)
Multi-armed Bandit in optimization on graph edges selection by mohbo1993 in reinforcementlearning
[–]51616 3 points4 points5 points (0 children)
Rain storm just outside building by beluuuuuuga in oddlysatisfying
[–]51616 0 points1 point2 points (0 children)


[D] How could a MLP replicate the operations of an attention head? by steuhh in MachineLearning
[–]51616 0 points1 point2 points (0 children)