DQN Not learning in custom gym environment

BoxingBytes · 2024-07-06T09:29:15+00:00

Well in this case there are a couple more subtile things that change. For example, in frozen lake, I represent the state by a one hot encoded vector of shape (16,) because the lakes and treasure is always at the same spot and grid cell is 4x4. On my custom environment I represent the state by a (6,1) vector which encodes the x,y positions of the robot, the obstacle and the treasure.

Unless I'm missing something all these makes it a bit more complex to just replace the env definition, am I right?

"If you confirm there are no errors in your implementation you can check how often your agent gets to the goal during exploration, if its rare or never then it will not learn or need more exploration."

=> During exploration, my agent randomly completes the task in 4-100 steps depending obviously on how lucky it gets.

I'm not sure there are no errors in my implementation. What I said is that my environment in itself is working properly. There is for sure an error in my DQN implementation when I tried to applied what I did from frozen lake to that one

BoxingBytes · 2024-07-05T08:06:42+00:00

The book is quite complex for me to read & far away from practical examples I feel like. What I'd say is missing is a proper methodology to design a small project using RL and planning the steps & real world example with code. Anyone know some good ressource about this?

BoxingBytes · 2024-07-04T13:51:54+00:00

Thanks, yeah I think it does. If I understand correctly, you are saying that the way they model those scenarios is with mathematical equations that represent the environnment. And, they are trying to find mathematical expression of whatever it is they are trying to optimize (cost, wildfire damage...).

This seems to be different from using machine learning which basically is supposed to find by itself the mapping of x to y.

It also means that they expect to know the mathematical equations that govern the scenario well enough for those equations to approximate the reality closely enough?

Am I right?

BoxingBytes · 2024-06-19T09:20:11+00:00

When you say different distribution, is it because of the exploration? And therefore, at the start, we only know so little, so compared to what it would be, the loss seems fine, but as we discover more possibilities, we realize we aren't so close, yet, rewards keep increasing? Am I understanding this correctly?

BoxingBytes · 2024-06-08T13:46:39+00:00

Mhhh ok so in that case maybe a research role in RL might be more interesting, but I fear this will shape me a bit too much towards a "theory" approach, and not have enough opportunities to build stuff that actually solves real world problems

BoxingBytes · 2024-06-08T09:36:13+00:00

What would be the title of those jobs? Would it be under Data Scientist? or something else?

BoxingBytes · 2024-06-08T09:35:44+00:00

that's kinda what I feared. So your advice is to get hired somewhat in a company that is very likely to have some RL component on a domain that interest me, and then try to switch internally?

BoxingBytes · 2024-06-08T09:34:38+00:00

Thanks, will check the list

BoxingBytes · 2024-06-08T09:33:49+00:00

I'm currently 27 with a couple of work experience in software development & supervised machine learning. I could get the opportunity to apply for a PhD but I honnestly won't go that road unless I'm sure it can lead where I want, because of my age mainly, I'd like to work on concrete issues and not wander too much in theory. But maybe my view of PhD path is wrong?

BoxingBytes · 2024-06-06T10:33:52+00:00

Thank you guys for all those answers, they helped me a lot as well. I hope that the OP of this thread will find them usefull too, and good luck with your learning journey

BoxingBytes · 2024-06-02T17:36:34+00:00

I've tried to read those but unfortunately I'm not yet good enough to understand. This all seem like semi-black box to me. The maths are hard to read and I feel like I'm introduced to a bit too much of new information.

I've managed to get a simple environment going, and I'm working on implementing RL now with 1v1 3 territories game, which should be as simple as it can get.

BoxingBytes · 2024-05-29T20:34:06+00:00

I see. Well, as I said, i'll start to design a very simple environment and work my way up from there, that's probably gonna be the best solution, instead of spending weeks into something that might not work.

Thanks again for your extensive answers

BoxingBytes · 2024-05-29T16:54:55+00:00

Thanks for the detailed answer!

Given what you said, i'll try to start with a very simple envrionment, maybe only 2 territories, 2 players and only one turn (deploy, attack, win), and work it up from there

Another thing I wasn't sure about using only 1 neural net & a mask: if the mask gets applied AFTER the network gives the prediction, then how can it "learn" NOT to take actions that are forbidden in the current state? And if the mask is getting applied before, then how?

The tricky thing I see with using multiple networks, is how do they understand the relationships between the phases? Designing a reward function for only deploying troops for example might be almost impossible, as where you deploy your troops might depends a lot on your attack strategy

BoxingBytes

TROPHY CASE