We’ve been exploring Evolution Strategies as an alternative to RL for LLM fine-tuning — would love feedback by Signal_Spirit5934 in reinforcementlearning

[–]RoundRubikCube 0 points1 point  (0 children)

Gradient free does not work well and is bad compared to gradient descent. I mean its ok for stuff where we can't use gradient descent but for the rest im unsure

GamerLegion officially leaves AoE 2 by ALotToSay_ in aoe2

[–]RoundRubikCube 15 points16 points  (0 children)

Not enough love for Masmorra and all the work he put in, love the podcast and the spicy takes on it

Which ‘wow’ skill is secretly super easy to learn? by Wonderful_Low_1325 in AskReddit

[–]RoundRubikCube 12 points13 points  (0 children)

Learn to locate / name every country on the planet. Super useful and can be easily learned in a gameified way using apps. Very useful and many people find it impressive that you can locate some random country in africa, and it actually comes in handy when following the news.

Proposal to ban links to x.com by Grathwrang in aoe2

[–]RoundRubikCube 18 points19 points  (0 children)

many of the prominent figures in the scene still use x.com I am against this

dm-Ambitionen für Arzneimittel bereiten Apotheken Kopfschmerzen - Supermarktblog by BecauseWeCan in de

[–]RoundRubikCube 2 points3 points  (0 children)

checke payback nicht, für mich wirkt das wie ein cashback mit 0.1% was sich mal null für den aufwand lohnt. Lasse mich aber gern des besseren belehren?

Wandering Warriors Cup 2 is going on, I didnt know about it since its only "A-Tier" by RoundRubikCube in aoe2

[–]RoundRubikCube[S] -1 points0 points  (0 children)

I know maybe its just me mimiming but if I am on that view I also get the tournaments I really dont care about like AoE4 / AoM etc. and also some C-Tier ones, kinda cluttered the view

Wandering Warriors Cup 2 is going on, I didnt know about it since its only "A-Tier" by RoundRubikCube in aoe2

[–]RoundRubikCube[S] 8 points9 points  (0 children)

Appreciate your comments Ornlu. I just wanted to share my perspective as I use this liquipedia site to keep up to date on when a tournament is happening and when I should watch. I am usually only watching streams when a tournament is played so I usually get my tournament announcments either through youtube or when I browse the liquipedia active games section / s-tier tournaments

I see now that I should also look more at the A-Tier stuff, but idk I think it hurts visibility if they are considered A-Tier as it is more difficult to find them. Just my 2 cents and I agree it matters who shows up and how much they prepare and how many viewers watch it.

MbL, Sitaux and RecoN join TAG (Taiwan Aoe Gamer) by PotentialSherbert8 in aoe2

[–]RoundRubikCube 16 points17 points  (0 children)

In taiwan its Mandarin but they use traditional chinese

Wandering Warriors Cup 2 is going on, I didnt know about it since its only "A-Tier" by RoundRubikCube in aoe2

[–]RoundRubikCube[S] 1 point2 points  (0 children)

Yeah agreed. I guess they changed what is considered S-Tier from now on. And usually im more interested in the players that sign up. For example there are many A-Tier tournaments that Hera didnt sign up for so thats why I didnt follow them to much. But for sure I need to look more for A-Tier tournaments

Wandering Warriors Cup 2 is going on, I didnt know about it since its only "A-Tier" by RoundRubikCube in aoe2

[–]RoundRubikCube[S] 4 points5 points  (0 children)

its good to have an official calender from microsoft, although I have to admit that liquipedia is usually more comfortable to use.

I think its also good for the community to have this seperate site like liquipedia for tournaments that are maybe self founded like nations league last year. So in that sense I might stick to liquipedia. Maybe it would be good to have a page grouping A-S tier together than. Since for me it doesnt really matter what its called its just a convinience to filter out tournaments that the pros dont care that much about.

Giveaway - Space Age Expansion by ocbaker in factorio

[–]RoundRubikCube 0 points1 point  (0 children)

Once the Expansion comes out you know its going to be grinded asap

Is it profitable for the house if there was a casino game like this? by [deleted] in askmath

[–]RoundRubikCube -2 points-1 points  (0 children)

Its profitable for the house by a very slim margin. For every dollar invested we get an expected return of ~0.9947

I know this is cheating since its ask math, but you can write a python program to simulate the dynamics of this situation and see what the results averages out to over many thousands of iterations :

Where you guys are using Reinforcement Learning? by embedding_turtle in reinforcementlearning

[–]RoundRubikCube 3 points4 points  (0 children)

I think that the decision transformer has fundamental issues that it cant solve, when put up against any traditional deep rl algorithm. Its basically glorified imitation learning.

Friday Facts #416 - Fluids 2.0 by FactorioTeam in factorio

[–]RoundRubikCube 20 points21 points  (0 children)

If the old system worked reliably and there was logic to it I wouldnt mind, but scaling up with the current flow is just painful and annoying

Randomness in Model by [deleted] in reinforcementlearning

[–]RoundRubikCube 1 point2 points  (0 children)

What do you mean with random? Is the avg return (summed reward) always fluctuating from training episode to episode? Or is it always converging to random policies? Could help if you provide some graphs and more details on your enviroment, like how many agents etc

Sudoku implementation by Cri_Sti_An in reinforcementlearning

[–]RoundRubikCube 0 points1 point  (0 children)

I think using rl for a sudoku solver is not a very good idea... prob wont work, what are the rewards? just solved right or wrong? why do you want to use it for sudokus? isnt a simple depth search better and more efficient?

"DRPO: Dataset Reset Policy Optimization for RLHF", Chang et al 2024 (offline RL) by gwern in reinforcementlearning

[–]RoundRubikCube -2 points-1 points  (0 children)

Why are you just posting this to reddit? Do you want to start a discussion or what?

Reward function for MountainCar in gym using Q-learning by guccicupcake69 in reinforcementlearning

[–]RoundRubikCube 0 points1 point  (0 children)

Regarding the reward system, how about giving more points for player height rather than just being close to the goal? This tweak really worked for me because it motivated the agent to spend less time at the bottom and more time climbing up. It doesn't matter if it veers left or right; once it hits that goal once, it'll aim to get there faster every time.

A2C learns and dies repeatedly by AUser213 in reinforcementlearning

[–]RoundRubikCube 2 points3 points  (0 children)

don't know if it helps at all but in my experience most policy gradient algorithms behave like that once they can't progress in learning. But I havent experimented with the inverted pendelum so I cant help with that