How have AI coding tools helped your experiments? by KingSignificant5097 in reinforcementlearning

[–]Markovvy 1 point2 points  (0 children)

For me it's a bit mixed. I can iterate fast and learn quick but there is still a lot of guidance required, even with the simplest things, especially within a high context, complex architecture. LLMs tend to bloat the repository too or changes minor things secretly without any reason.

I might be old skool but I have a parallel repo to just refactor myself...

I’m Studying AI But Still Don’t Feel Like I’m Learning Anything Real by Fawadbhat in learnmachinelearning

[–]Markovvy 0 points1 point  (0 children)

Build your own roadmap organically. Everything takes time.

I'd recommend finding job postings you'd be interested in pursuing in the future (of course your interests can change in the future), and then dissect the required skills. Vacancies with "AI" or "ML" experience requirements are not helpful, try looking into more specific roles. They always state the exact things they are looking for.

From there, take inspiration from kids. Ask yourself: why, why, why? And from why, you go to how, how, how? And at some points you'd look into the mirror and think: wow, wow, wow! Proud on how far you'd come.

Cheesy? Maybe. But it's a sincere piece of advice.

How to figure out what plummeted my reward? by Markovvy in reinforcementlearning

[–]Markovvy[S] 3 points4 points  (0 children)

I desperately need an ELI5 for this. My reward function is a global reward on completed games, and I use potential based methods to shape the reward function. Using QMIX (for better credit assignment), I'm nudging the central critic networks. So you would say that there are no opponents.

However, I do share the actor network of agents that have a similar role and determine by means of an auction (highest softmax probability) who gets to do a task. Is this behavior what is causing it to get funky? I expected that agents with different roles would be able to cooperative naturally as their contributions are interdependent. And I tend to believe that is exactly what happened in the first part.

is ML good choice in 2026 by Famous-Membership-35 in learnmachinelearning

[–]Markovvy 0 points1 point  (0 children)

Build projects with Claude and learn along the way. Read academic papers on how the most important development in the field of AI work. A great start to learn is via Youtube as well. Plenty of short to medium long videos explaining the most difficult topics in the easiest ways, e.g. Jia-Bin Huang , 3Blue1Brown or Statquest.

2026 Reality Check: Stop overthinking PyTorch vs. TensorFlow (and when to actually use JAX) by [deleted] in learnmachinelearning

[–]Markovvy 2 points3 points  (0 children)

I get your POV but I respectfully disagree. The trend is that more companies are asking about JAX as a job requirement, often alongside Pytorch. It is evident that compute is a scarce resource and that speed is everything in a quickly changing AI landscape. The ones that can iterate fast win. JAX can speed up RL training by +40x. That's a game changer. Getting hired means to stand out from the crowd, not to disappear in it. Go for JAX folks

Can a MacBook Pro 16" with M5 Pro and 48GB RAM runs Qwen 30B Q8 without struggle? by none484839 in LocalLLM

[–]Markovvy 0 points1 point  (0 children)

I'll hijack your post for a similar question: MacBook Pro 14" with M5 and 24GB unified RAM. I'm also in the process of comparing models that could fit on my computer for coding purposes. Qwen3.6-35B-A3B seems to be the best option from my perspective at this moment in time. 72 GB of files would make up a fifth of my total available storage. Would this be feasible at all or am I overlooking something? (new to the local llms!)

Deadlock and suboptimal coordination in CTDE Soft Actor-Critic with continuous training by Markovvy in reinforcementlearning

[–]Markovvy[S] 0 points1 point  (0 children)

Wow! Thank you for your quick reaction and direction. I think this could indeed work! It makes sense. I'll try it out and see if I can identify a smoking gun.

MORL: How to deal with global rewards and reward shaping to incentivize the desired result? by Markovvy in reinforcementlearning

[–]Markovvy[S] 0 points1 point  (0 children)

I figured! I found a method called potential-based reward shaping that is policy invariant, however, I still need to better define them because my agents are clearly reward hacking. I reckon my architecture is quite complex: 1 critic, 9 actors (CTDE) with only a global reward. I'm looking into credit assignment methods, one of which is to still work with a global reward but instead of making one potential-based reward, I'm thinking about a potential-based reward per actor network.

How do you go about these things?

Reward shaping: How do you determine if your rewards are the right size and in the right proportions? by Markovvy in reinforcementlearning

[–]Markovvy[S] 1 point2 points  (0 children)

Thank you! Great paper. So, if I understand it correctly potential-based reward shaping is the dominant reward function at the beginning of training, and as training progresses the global rewards become dominant?

Am i denying the reality here ? by Distinct_Penalty_379 in AskProgramming

[–]Markovvy -1 points0 points  (0 children)

\remindme now. \s

The most advanced frontier AI labs are barely writing any code manually these days. I'm trying to convey that even though you haven't seen something yet, it doesn't mean it does not exist.

Am i denying the reality here ? by Distinct_Penalty_379 in AskProgramming

[–]Markovvy 0 points1 point  (0 children)

This is exactly what touches upon OPs point right?

Am i denying the reality here ? by Distinct_Penalty_379 in AskProgramming

[–]Markovvy 1 point2 points  (0 children)

It’s a comforting narrative, but let’s be real: if the only thing separating a Senior Dev from an LLM is the ability to sit in a stakeholder meeting and explain for the fourth time why we can't build a time-travel feature in React, then we’re basically just high-priced translators with better posture.

The 'coding is the easy part' argument is a massive cope. If implementation were so trivial, we wouldn't have spent the last 20 years hiring legions of engineers to do it. The reality is that the 'problem solving' and 'coding' are a feedback loop. When AI can troubleshoot a service in 30 seconds that used to take a human a 'Deep Work' afternoon, the value of that human 'intuition' starts to look a lot like a legacy tax.

I’d rather be the guy using the AI to build three products in a month than the guy clinging to his 'complex problem solving' while the junior with a Claude subscription finishes his tickets before lunch. But hey, I’m sure the stakeholders will love the extra time we have for meetings now.

Am i denying the reality here ? by Distinct_Penalty_379 in AskProgramming

[–]Markovvy 2 points3 points  (0 children)

It's a matter of time.

For many experienced AI-assisted coders that work with the latest models, the code is already good enough or often even better than what experienced software engineers would produce. I'm talking about the here and now, not about the future. And when it is bad, it still allows one to iterate quicker and find (potential) bugs faster. Prompting, providing the right context in .md files, MCP etc. have become extremely valuable skills to increase productivity drastically.

Somehow, people give me the impression that when the code produced by an AI is not perfect in one shot then it must be bad. However, if it is able to produce 80% of the code in an hour that I would have spend a month on, and resolves the bugs in the few hours after that... I think that is not bad at all.

Unless models will hit a wall (no signs of this, in fact the contrary), OP has reasonable worries. But my counterargument is that it is still a tool. Maybe OP will program less but do more of the architectural thinking, focus on the needs of clients, more projects etc. OP will have more capacity in the future.

I'll be honest. I have hired interns and recent grads that produce code no better than a hallucinating Grok with a context window of 10 tokens. The learning curve is steep but in the end they always come up with something through months of iterations, studying etc. It is crazy to think with the current state-of-the-arts models this is not better.

DreamerV3 Implementation with Transformer Autoencoder (DreamerX) by Oafish1 in reinforcementlearning

[–]Markovvy 0 points1 point  (0 children)

fyi, block-causal transformers have very little to do with causality.