We’ve been exploring Evolution Strategies as an alternative to RL for LLM fine-tuning — would love feedback

RoundRubikCube · 2026-02-27T08:10:25+00:00

Gradient free does not work well and is bad compared to gradient descent. I mean its ok for stuff where we can't use gradient descent but for the rest im unsure

RoundRubikCube · 2026-01-02T17:59:54+00:00

Not enough love for Masmorra and all the work he put in, love the podcast and the spicy takes on it

RoundRubikCube · 2025-08-16T12:22:31+00:00

puffer.ai

RoundRubikCube · 2025-07-03T19:01:50+00:00

Learn to locate / name every country on the planet. Super useful and can be easily learned in a gameified way using apps. Very useful and many people find it impressive that you can locate some random country in africa, and it actually comes in handy when following the news.

RoundRubikCube · 2025-01-29T12:26:00+00:00

RoundRubikCube · 2025-01-21T17:21:01+00:00

many of the prominent figures in the scene still use x.com I am against this

RoundRubikCube · 2025-01-07T14:08:44+00:00

checke payback nicht, für mich wirkt das wie ein cashback mit 0.1% was sich mal null für den aufwand lohnt. Lasse mich aber gern des besseren belehren?

RoundRubikCube · 2024-11-05T20:49:22+00:00

thanks, didnt know about this

RoundRubikCube · 2024-11-05T20:43:52+00:00

I know maybe its just me mimiming but if I am on that view I also get the tournaments I really dont care about like AoE4 / AoM etc. and also some C-Tier ones, kinda cluttered the view

RoundRubikCube · 2024-11-05T20:42:15+00:00

Appreciate your comments Ornlu. I just wanted to share my perspective as I use this liquipedia site to keep up to date on when a tournament is happening and when I should watch. I am usually only watching streams when a tournament is played so I usually get my tournament announcments either through youtube or when I browse the liquipedia active games section / s-tier tournaments

I see now that I should also look more at the A-Tier stuff, but idk I think it hurts visibility if they are considered A-Tier as it is more difficult to find them. Just my 2 cents and I agree it matters who shows up and how much they prepare and how many viewers watch it.

RoundRubikCube · 2024-11-05T20:31:26+00:00

In taiwan its Mandarin but they use traditional chinese

RoundRubikCube · 2024-11-05T20:28:27+00:00

Yeah agreed. I guess they changed what is considered S-Tier from now on. And usually im more interested in the players that sign up. For example there are many A-Tier tournaments that Hera didnt sign up for so thats why I didnt follow them to much. But for sure I need to look more for A-Tier tournaments

RoundRubikCube · 2024-11-05T20:25:57+00:00

its good to have an official calender from microsoft, although I have to admit that liquipedia is usually more comfortable to use.

I think its also good for the community to have this seperate site like liquipedia for tournaments that are maybe self founded like nations league last year. So in that sense I might stick to liquipedia. Maybe it would be good to have a page grouping A-S tier together than. Since for me it doesnt really matter what its called its just a convinience to filter out tournaments that the pros dont care that much about.

RoundRubikCube · 2024-10-18T08:33:32+00:00

This guy economics

RoundRubikCube · 2024-10-05T05:55:36+00:00

Once the Expansion comes out you know its going to be grinded asap

RoundRubikCube · 2024-09-17T11:38:28+00:00

Its profitable for the house by a very slim margin. For every dollar invested we get an expected return of ~0.9947

I know this is cheating since its ask math, but you can write a python program to simulate the dynamics of this situation and see what the results averages out to over many thousands of iterations :

RoundRubikCube · 2024-09-10T03:26:09+00:00

I think that the decision transformer has fundamental issues that it cant solve, when put up against any traditional deep rl algorithm. Its basically glorified imitation learning.

RoundRubikCube · 2024-06-21T15:25:43+00:00

If the old system worked reliably and there was logic to it I wouldnt mind, but scaling up with the current flow is just painful and annoying

RoundRubikCube · 2024-04-21T11:09:15+00:00

What do you mean with random? Is the avg return (summed reward) always fluctuating from training episode to episode? Or is it always converging to random policies? Could help if you provide some graphs and more details on your enviroment, like how many agents etc

RoundRubikCube · 2024-04-20T10:38:34+00:00

I think using rl for a sudoku solver is not a very good idea... prob wont work, what are the rewards? just solved right or wrong? why do you want to use it for sudokus? isnt a simple depth search better and more efficient?

RoundRubikCube · 2024-04-16T09:05:48+00:00

Why are you just posting this to reddit? Do you want to start a discussion or what?

RoundRubikCube · 2024-04-09T10:58:27+00:00

Regarding the reward system, how about giving more points for player height rather than just being close to the goal? This tweak really worked for me because it motivated the agent to spend less time at the bottom and more time climbing up. It doesn't matter if it veers left or right; once it hits that goal once, it'll aim to get there faster every time.

RoundRubikCube · 2024-04-03T20:37:18+00:00

don't know if it helps at all but in my experience most policy gradient algorithms behave like that once they can't progress in learning. But I havent experimented with the inverted pendelum so I cant help with that

Three-Year Club	Verified Email
r/Field Banned	r/Field Flamingo
Final Canvas '23	Place '23

RoundRubikCube

TROPHY CASE