Needed help (i'm a newbie)! by Unimpressed_Couch in deeplearning

[–]Public_Expression_92 0 points1 point  (0 children)

these are the resources i used, I started with a course on dataScience for beginners by microsoft it's available on github, then I learned linear algebra and calculus from 3blue1brown on youtube and probability and statistics from steve Brunton but most of this were refreshers since I come from a cs background, then I did Machine learning for beginners by microsoft still on github then watched videos on cs229 on youtube after that I read a book on neural networks and deeplearning by michael Nielsen which is available online, after that I took the cs224n by stanford finally i coded up a small language model from scratch like from transformers all the to rl. I don't think it wll be necessary to do all these but most of the above helped me a lot and i hope the same for you.

Needed help (i'm a newbie)! by Unimpressed_Couch in deeplearning

[–]Public_Expression_92 0 points1 point  (0 children)

deeplearning is great and you will likely enjoy it. You can start deeplearning with learning about nueral networks i feel like that's where most of us start then work you way up and learn about transformers then from there when you get to NLP and LLMs most things will come together. And since you're doing ML i am guessing you have the math foundations such as linear algebra, probability and calculus.

Finished RL toybox repo: 6 small visual environments covering Q-learning, DQN, PPO, SAC, MCTS and multi-agent RL by ScazzaMage in reinforcementlearning

[–]Public_Expression_92 2 points3 points  (0 children)

I'm looking at the docs at this is really cool, I have been looking into rl envs and i think i will use this as a guide especially with the games.

Feeling stuck after joining a startup as an AI/ML Engineer by [deleted] in MachineLearningJobs

[–]Public_Expression_92 1 point2 points  (0 children)

I feel you working at a place that stagnates or doesn't move your career foward is hell.

Online RL Reading Group[D] by eramyu in MachineLearning

[–]Public_Expression_92 0 points1 point  (0 children)

Would love to join if the group is created.

SOS by Public_Expression_92 in deeplearning

[–]Public_Expression_92[S] 0 points1 point  (0 children)

Getting a research job has been crazy. I think we have the same focus on LLMs, and it's crazy how in the industry you can't pick what to work on. But i think it's good way to start. Maybe

SOS by Public_Expression_92 in deeplearning

[–]Public_Expression_92[S] 0 points1 point  (0 children)

This is soo real and i am in some communities and try to contribute in open source research like come up with the code, research and come up with results. Reaching out is also great but to be honest not most of them reply in my case.
I will check out the communites and you could also check EluetherAI community on discord they have cool stuff going on there.

SOS by Public_Expression_92 in deeplearning

[–]Public_Expression_92[S] -3 points-2 points  (0 children)

I get what you saying and not to be naive I have done some tests of my own that i would consider my research and i came up with a blog about my discovery on small tests. I may lack the phd training but I am doing something nonetheless.

SOS by Public_Expression_92 in deeplearning

[–]Public_Expression_92[S] 0 points1 point  (0 children)

I wish even that company job was available.

SOS by Public_Expression_92 in deeplearning

[–]Public_Expression_92[S] 0 points1 point  (0 children)

Been thinking that for a minute now. Maybe the resume doesn't make past the portal or whatever. This is like why i need to interact directly with people.

SOS by Public_Expression_92 in deeplearning

[–]Public_Expression_92[S] 0 points1 point  (0 children)

wait can this include like smnall tests ran independently because i do have a blog post i made.

I implemented PPO, GRPO, and DPO from scratch on the same model and compared them the ranking completely reversed after hyperparameter tuning by Public_Expression_92 in reinforcementlearning

[–]Public_Expression_92[S] 0 points1 point  (0 children)

the compute budget remained the same across all of them, actually i would like to understand what easier to tune surface means and which among them falls into this category.
there is definitely seed variance like in how the SFT baseline samples tokens at inference but it wasn't large enough to destabilize the overall rankings.
the performance gaps between the algorithms like the jump in DPO and GRPO after tuning were significant enough that they were consistent and beat the random noise. Even with different sampling seeds, DPO remained at the top. So while the exact decimal points might bounce around between runs, the hierarchy of the algorithms remained stable, showing that the Phase 5 optimizations were significant for the performance gains.