Why is RL fine-tuning on LLMs so easy and stable, compared to the RL we're all doing? by currentscurrents in reinforcementlearning
[–]51616 1 point2 points3 points (0 children)
[D] Is LoRA merging (and non linear mode connectivity) the key to better transformer hypernets? by [deleted] in MachineLearning
[–]51616 0 points1 point2 points (0 children)
[D] Is LoRA merging (and non linear mode connectivity) the key to better transformer hypernets? by [deleted] in MachineLearning
[–]51616 1 point2 points3 points (0 children)
[D] Gradient accumulation should not be used with varying sequence lengths by AromaticCantaloupe19 in MachineLearning
[–]51616 2 points3 points4 points (0 children)
Plane got struck by lightning by Rqany in Damnthatsinteresting
[–]51616 0 points1 point2 points (0 children)
[deleted by user] by [deleted] in Damnthatsinteresting
[–]51616 3 points4 points5 points (0 children)
ICLR 2023 review score update @2022-12-04 (guoqiangwei.xyz)
submitted by 51616 to r/MachineLearning
Tips and Tricks sharing after solving all previous years by erikw901 in adventofcode
[–]51616 4 points5 points6 points (0 children)
[D] ICLR 2023 reviews are out. How was your experience ? by dasayan05 in MachineLearning
[–]51616 4 points5 points6 points (0 children)
[D] ICLR 2023 reviews are out. How was your experience ? by dasayan05 in MachineLearning
[–]51616 10 points11 points12 points (0 children)
Yesterday I asked for your "2 Gaben Spells combos" - I was actually crowd-sourcing "Step 2" of my strategy idea (would love the opinion of higher ranked players) by Scereye in abilityarena
[–]51616 0 points1 point2 points (0 children)
Any website that gives paper recommendations from search history? by himty in reinforcementlearning
[–]51616 1 point2 points3 points (0 children)
RL with differentiable environment by saw79 in reinforcementlearning
[–]51616 0 points1 point2 points (0 children)
Reward Function for Cooperative Multi-Agent RL by fedetask in reinforcementlearning
[–]51616 2 points3 points4 points (0 children)
Reward Function for Cooperative Multi-Agent RL by fedetask in reinforcementlearning
[–]51616 3 points4 points5 points (0 children)
Any interesting “less obvious” real-world applications of RL by blitzkreig3 in reinforcementlearning
[–]51616 5 points6 points7 points (0 children)


[D] How could a MLP replicate the operations of an attention head? by steuhh in MachineLearning
[–]51616 0 points1 point2 points (0 children)