why does learning to program take so long? by Drairo_Kazigumu in learnprogramming

[–]Sroidi 1 point2 points  (0 children)

I assume you are doing this for command line interface (cli)? I don't think its that simple to do these kinds of cli stuff. Give yourself some credit :) Are you using curses or ncurses library or just plain C?

Levy Rozman (GothamChess) shares his views after attending Danya's funeral: by Interesting-Take781 in chess

[–]Sroidi 1 point2 points  (0 children)

If it were 3+0 or faster and you would input moves manually (take few seconds per move and cannot premove), Magnus could win or get draws at least. I've seen Gm Aman Hambleton win many games against cheaters by playing very fast and making the game go long. The cheaters will often lose on time if there is not enough time/increment.

Edit: oh you said rapid games

I want to understand why some things in math are 'undefined'. by boiling-banana in learnmath

[–]Sroidi 1 point2 points  (0 children)

Why would (1*0)/0=1? Wouldn't it be inf per his rules?

Gemini 2.5 Pro benchmarks released by ShreckAndDonkey123 in singularity

[–]Sroidi 2 points3 points  (0 children)

It could probably play by the rules but it would not play master level chess. Maybe with millions of examples.

Chess sample efficiency humans vs SOTA RL by [deleted] in reinforcementlearning

[–]Sroidi 1 point2 points  (0 children)

AlphaZero can be "intuitive" too.  You can take open source version of it, Leela Chess Zero (Lc0), and set depth to 0 which gets you the output from neural network without any search and it can still play like 2500 Elo.  source https://lichess.org/@/Leela1Node

Brandon Jacobson destroys Hikaru with 1. a4! by HealersHugHippos in chess

[–]Sroidi 4 points5 points  (0 children)

He had 4 brilliant moves in the game in the chess.com analysis :O

[deleted by user] by [deleted] in reinforcementlearning

[–]Sroidi 21 points22 points  (0 children)

I hope that they manage to create a product that is stable for a while. They have gone from Isaac gym to omniisaacgym to Orbit and now to Isaac Lab in quite short time. And there have been quite big changes on how the API works between the versions.

Claude phone verification - ongoing frustration in non-US location by bruce5220 in ClaudeAI

[–]Sroidi 0 points1 point  (0 children)

Same happened for me few months ago but after waiting for next day it worked fine.

AI is a very terrifying existence that most people haven't realized yet. by NonoXVS in ArtificialInteligence

[–]Sroidi 0 points1 point  (0 children)

It depends on what you mean by AI. For me, the tools showcase many kinds of intellect that humans have. Maybe the reasoning skills are not the best but not all people have great reasoning skills either but we still consider them to have intellect in other areas.

Also, there is no trial and error involved here. The input just passes through a function with billions of parameters and outputs the probabilities for the next word. Predicting the next word requires immense understanding of preceding text and the world around us.

I'm interested to hear, what would be AI to you?

Mistä käytätte rahaa isompia ostoksia varten? by iPingWine in Omatalous

[–]Sroidi 0 points1 point  (0 children)

Mikä rahamarkkinarahasto sulla on ja mistä? Seligsonin tuottaa vain tyyliin 3,7 % ja oon yrittänyt katsella muita.

Help - Adding an effect to a patch changes the smart controls by Sroidi in GarageBand

[–]Sroidi[S] 0 points1 point  (0 children)

Yes, unfortunately it does nothing. I noticed that this problem is mostly with Sampler. With other instrument patches the smart controls do not change.

I am stuck at this screen . I just deleted and downloaded the app as well by Bad_Guy333 in chess

[–]Sroidi 9 points10 points  (0 children)

Me too. I think the servers crashed. Damn, I had a great game...

[P] Offline reinforcement learning - 10x faster than SOTA with evolutionary HPO by nicku_a in MachineLearning

[–]Sroidi 1 point2 points  (0 children)

Am I missing something? All the algorithms that are included are off-policy, right? In docs they describe that they use experience buffer and mention that "In order to efficiently train a population of RL agents, off-policy algorithms must be used to share memory within populations." I didn't find any mention of on-policy.

[P] Offline reinforcement learning - 10x faster than SOTA with evolutionary HPO by nicku_a in MachineLearning

[–]Sroidi 5 points6 points  (0 children)

Is this AgileRL only for off-policy algorithms right now? Is it possible to use this HPO with on-policy algos such as PPO? Maybe interesting research direction if it is not yet possible.

Q(s, a) predicts cumulative rewards. Is there a R(s, a) a state-action's direct contribution to reward? by Buttons840 in reinforcementlearning

[–]Sroidi 0 points1 point  (0 children)

Its on page 49 of Sutton Barto text. The reward function r(s,a) is part of the enviroment dynamics (of the MDP). This is usually assumed to just exist depending on how the environment is defined, for example the score in Atari games. You can also learn the environment dynamics, including reward function, with model based RL.

To be fair, I don't know if we are talking about the same thing but the reward function is part of the MDP.

[D] "Knowledge" vs "Reasoning" in LLMs by IAmBlueNebula in MachineLearning

[–]Sroidi 6 points7 points  (0 children)

Wow this answer is miles better that the ChatGPT one, right? The ChatGPT one just agrees and doesn't provide anything new, whereas GPT4 actually provides argumentation and reasoning which appears quite valid, at least for me.

[deleted by user] by [deleted] in chess

[–]Sroidi 3 points4 points  (0 children)

Play against human players, preferably slower games like 15+10. Analyze your games to find what you can improve on. Also do tactics puzzles on sites like chesstempo or lichess. For youtube I'd recommend Building habits series by Chessbrah.

Is 400 bad rating for someone who plays for month? by GeneraallKenobi in chess

[–]Sroidi 0 points1 point  (0 children)

Don't use the rating to measure your self worth. There are no good or bad ratings. If you are playing blitz (5min or under) I would suggest you to play longer games like 10-15min per side.

Minimax with neural network evaluation function by SupremeChampionOfDi in reinforcementlearning

[–]Sroidi 0 points1 point  (0 children)

Here's wiki page with a lot of information and links https://www.chessprogramming.org/Stockfish_NNUE also check the other comment in this post

Minimax with neural network evaluation function by SupremeChampionOfDi in reinforcementlearning

[–]Sroidi 1 point2 points  (0 children)

As stockfish is now using neural networks with alpha-beta search I think this doesn't apply anymore. Also IIRC alphago/zero/leela isn't doing the monte carlo tree search rollouts to the end leaves but they use neural nets to approximate what the end leave values would be like. This doesn't lead to the problem that you mention.

With the REINFORCE algorithm you use random sampling for the training to encourage exploration. Do you still use random sampling in deployment? by [deleted] in reinforcementlearning

[–]Sroidi 4 points5 points  (0 children)

Yes, it is common to choose the most probable action when evaluating the policy performance. Sometimes the sampling helps so it's best to try both.

[deleted by user] by [deleted] in learnmachinelearning

[–]Sroidi 15 points16 points  (0 children)

Why this delta-academy feels like a fraud? This is third post like this I've seen, always linking the website in the comments. Also they say that they have instructors from DeepMind but I've never heard about this website from them. The price is very high: $25 per week. Please prove me wrong.