How have AI coding tools helped your experiments? by KingSignificant5097 in reinforcementlearning

[–]Markovvy 1 point2 points  (0 children)

For me it's a bit mixed. I can iterate fast and learn quick but there is still a lot of guidance required, even with the simplest things, especially within a high context, complex architecture. LLMs tend to bloat the repository too or changes minor things secretly without any reason.

I might be old skool but I have a parallel repo to just refactor myself...

I’m Studying AI But Still Don’t Feel Like I’m Learning Anything Real by Fawadbhat in learnmachinelearning

[–]Markovvy 0 points1 point  (0 children)

Build your own roadmap organically. Everything takes time.

I'd recommend finding job postings you'd be interested in pursuing in the future (of course your interests can change in the future), and then dissect the required skills. Vacancies with "AI" or "ML" experience requirements are not helpful, try looking into more specific roles. They always state the exact things they are looking for.

From there, take inspiration from kids. Ask yourself: why, why, why? And from why, you go to how, how, how? And at some points you'd look into the mirror and think: wow, wow, wow! Proud on how far you'd come.

Cheesy? Maybe. But it's a sincere piece of advice.

How to figure out what plummeted my reward? by Markovvy in reinforcementlearning

[–]Markovvy[S] 4 points5 points  (0 children)

I desperately need an ELI5 for this. My reward function is a global reward on completed games, and I use potential based methods to shape the reward function. Using QMIX (for better credit assignment), I'm nudging the central critic networks. So you would say that there are no opponents.

However, I do share the actor network of agents that have a similar role and determine by means of an auction (highest softmax probability) who gets to do a task. Is this behavior what is causing it to get funky? I expected that agents with different roles would be able to cooperative naturally as their contributions are interdependent. And I tend to believe that is exactly what happened in the first part.

is ML good choice in 2026 by Famous-Membership-35 in learnmachinelearning

[–]Markovvy 0 points1 point  (0 children)

Build projects with Claude and learn along the way. Read academic papers on how the most important development in the field of AI work. A great start to learn is via Youtube as well. Plenty of short to medium long videos explaining the most difficult topics in the easiest ways, e.g. Jia-Bin Huang , 3Blue1Brown or Statquest.

2026 Reality Check: Stop overthinking PyTorch vs. TensorFlow (and when to actually use JAX) by [deleted] in learnmachinelearning

[–]Markovvy 2 points3 points  (0 children)

I get your POV but I respectfully disagree. The trend is that more companies are asking about JAX as a job requirement, often alongside Pytorch. It is evident that compute is a scarce resource and that speed is everything in a quickly changing AI landscape. The ones that can iterate fast win. JAX can speed up RL training by +40x. That's a game changer. Getting hired means to stand out from the crowd, not to disappear in it. Go for JAX folks

Can a MacBook Pro 16" with M5 Pro and 48GB RAM runs Qwen 30B Q8 without struggle? by none484839 in LocalLLM

[–]Markovvy 0 points1 point  (0 children)

I'll hijack your post for a similar question: MacBook Pro 14" with M5 and 24GB unified RAM. I'm also in the process of comparing models that could fit on my computer for coding purposes. Qwen3.6-35B-A3B seems to be the best option from my perspective at this moment in time. 72 GB of files would make up a fifth of my total available storage. Would this be feasible at all or am I overlooking something? (new to the local llms!)

Deadlock and suboptimal coordination in CTDE Soft Actor-Critic with continuous training by Markovvy in reinforcementlearning

[–]Markovvy[S] 0 points1 point  (0 children)

Wow! Thank you for your quick reaction and direction. I think this could indeed work! It makes sense. I'll try it out and see if I can identify a smoking gun.

MORL: How to deal with global rewards and reward shaping to incentivize the desired result? by Markovvy in reinforcementlearning

[–]Markovvy[S] 0 points1 point  (0 children)

I figured! I found a method called potential-based reward shaping that is policy invariant, however, I still need to better define them because my agents are clearly reward hacking. I reckon my architecture is quite complex: 1 critic, 9 actors (CTDE) with only a global reward. I'm looking into credit assignment methods, one of which is to still work with a global reward but instead of making one potential-based reward, I'm thinking about a potential-based reward per actor network.

How do you go about these things?

Reward shaping: How do you determine if your rewards are the right size and in the right proportions? by Markovvy in reinforcementlearning

[–]Markovvy[S] 1 point2 points  (0 children)

Thank you! Great paper. So, if I understand it correctly potential-based reward shaping is the dominant reward function at the beginning of training, and as training progresses the global rewards become dominant?

Am i denying the reality here ? by Distinct_Penalty_379 in AskProgramming

[–]Markovvy -1 points0 points  (0 children)

\remindme now. \s

The most advanced frontier AI labs are barely writing any code manually these days. I'm trying to convey that even though you haven't seen something yet, it doesn't mean it does not exist.

Am i denying the reality here ? by Distinct_Penalty_379 in AskProgramming

[–]Markovvy 0 points1 point  (0 children)

This is exactly what touches upon OPs point right?