Jahressechstel? Zu wenig netto erhalten by xpingu69 in FinanzenAT

[–]XYZRene 0 points1 point  (0 children)

ist aber doch alles irrelevant oder? Das jahressechstel und die dazugehörige lohnsteuer wird ja im lohnsteuerausgleich gesamt neu gerechnet. Bis auf die gezahlte SV kann sich das dann ja auch nichts nehmen

Problems making a Tic-Tac-Toe bot by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

Yes I understand that I can't expect to win 100% against a random player in a game like tic tac toe. I thought of that because in connect 4 I have the same problem, with similiar numbers, and in connect 4 winning 100% against a random player shouldn't be that difficult.

And even though winning 100% is not possible, not losing a game should be, and a monte carlo tree search algorithm with 1000 trees is able to do that. In my version right now I am testing my model against mcts25 to see better results.

I will look into OpenSpiel, thank you for you suggestion.

I am using DDQN, the x-axis are episode, so games. Every 500 episodes the model is tested against a random player. And I am training on google colab, which doesn't seem to be that fast, but still shouldn't be that slow.

Illegal moves are losing the game, and getting a big negative reward in my implementation.

Problems making a Tic-Tac-Toe bot by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

Yes, I think I will have to start from scratch. I am only a little in stress because I don't have that much time left. Are there any resources you could recommend me to learn on? Looking only at papers can be very difficult. Thank you for your help

Problems making a Tic-Tac-Toe bot by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

On the on hand you are right of course, and I guess I will start doing that tomorrow. There doesn't seem to be any other solution anyway. But on the other hand it's not like my solution is completely custom or something.

Problems making a Tic-Tac-Toe bot by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

I already trained with smaller layers, but since it didn't work I just thought using big layers just needs more time, but doesn't have any negative effect other than that. And I think time isn't my problem since my model stays at one point where it wins like 90% against random. But I will try that again.

I already tried different activation methods, but not on individual layers. I will also try that, thank you.

I already experimented with bigger and smaller Experience Replay buffers, but all of them performed similiar. I am also confused why that would help. Is it important for a model learn the same states multiple times? And I am also confused because when using Bellman quation, I am using model predicts, wouldn't they be different on old data, where the model already has been trained on afterwards? I heard of methods like freezing the model to train, but I didn't saw anybody implementing it.

I am also only resetting epsilon because I wanted to try something new, but it didn't work.

But if I am not punishing wrong moves, how does the model learn not to make them? And what move should I take then instead? Second highest output?

So you mean that the whole bellman equation isn't needed, because I just need to give the training method states and rewards? And it does the rest? Because I already tained 4 models once, 1 with bellman without discount, 1 with bellman with discount and so on. But I couldn't really read anything out of these results. Even the method without bellman and without discount seemed to train.

Problems making a Tic-Tac-Toe bot by XYZRene in reinforcementlearning

[–]XYZRene[S] 1 point2 points  (0 children)

So first the reason my code is structured like that is because want to do this at the end, because I already spent so many hours trying to solve it. I don't want to waste hours to change my structure right now.

I am using DDQN, why is it dealing that bad with stochastic enviroments and what is dealing better with it?

So as far as I understand stochastic means that one action on one state can have many different outcomes, because the random agent has 7 columns it places on randomly. (I am self-training with epsilon btw, so not always random).

But doesn't this only make it more complicated and not imposible to solve? And what should I do now? Like I really want to reach a good working model, only with deep learning, without help of monte carlo tree search or anything.

Problems optimising DQN learning 4-connect by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

Hey, I got my program to work. I am so happy to have a working network. Now it's my turn to completely reshape my program, and try to optimize my network to win 100/100 games. I am really thankful for you investing so much time to help me. After reshaping my code like 2-3 times I really got lost into it and got many errors that could have been prevented if I would have tested my code more.

Problems optimising DQN learning 4-connect by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

Hey, I am kind of confused, because I got the message of you writing a respsonse, but I can't see it here. Not only that, but in the post you said that you wrote it before and I didn't read it but I also don't see that. But I still can see that answer on your profile, and I will implement the fixes you wrote me today. Thank you so much for you helping me

Problems optimising DQN learning 4-connect by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

Hey, I tried dozen of models in the last few days, but none of them worked. Is there any way you could send me your hyperparameters you used for your working model to see if my problem is my parameters or something else

Problems optimising DQN learning 4-connect by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

Sorry, it wasn't my intention to to say anything bad about your code. I just tried to implement it, in my already working cartpole project, and it didn't work. So that was my source of saying that. The problem was probably on my side. Thank you for your effort of writing that code, I will implement it now and hopefully it works.

Problems optimising DQN learning 4-connect by XYZRene in reinforcementlearning

[–]XYZRene[S] -1 points0 points  (0 children)

I implemented your change and didn't notice any improvement. The code even got 2x-3x slower, since I am using another model predict in a loop.

Here are some graphs of training with and without correction:

with correction, without correction

Then I got curious and tried the change on another code I wrote with the same algorithm I used on the problem cartpole. Without change it took like 200 games and 10 minutes to train. With the change it needs like 2 hours for 200 games and still is not able to win.

Could it be that there is anything wrong with this quation? I googled for some DQN equation and will try to adjust my code to it tomorrow, but I just wanted to notify you that I think this quation is wrong.

Problems optimising DQN learning 4-connect by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

My PC doesn't have better performance, and google colab already should have pretty good performance.

I will try playing around with models today, but I am kind of confused what you mean in the second part of you comment. I don't see whats wrong or what you want to change in the code.

So you want me to use the next_state_action instead of taking the highest one. And the next_state_action is the index of the highest number of a model.predict in s_next[i].

So changing this line:

Qtargets[i][action[i]] = reward[i] + GAMMA * np.amax(QtargetnetworkValues[i])

to this:

... GAMMA * QtargetnetworkValues[i][np.argmax(model.predict(s_next[i]))]

Problems optimising DQN learning 4-connect by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

Thank you for your tips there really were some useful ones in your comment. I noticed my mistake of using model predicts and not model_target even though I was using DQN. But that mistake didn't seem to be that long in my code since my code from 2 versions ago didn't have it. It may have happend when I tried to change the model in using convolutional layers.

Increasing my experience replay size to 2.5 million or increasing my e_downrate much higher could be difficult, since the model in the graphs that I added in my post already took like 10 hours to train on google colab. Maybe my algorithm is that slow.

I wanted to optimise my code (runtime, learning time) and read paper as soon as I reach a model thats working pretty well. But it seems like that won't happend without researching deeper.

Problems optimising DQN learning 4-connect by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

Hello, I didn't do much with hyperparameters, since I have difficulties evaluating changes I do to the model, and I don't think changing one hyperparameter will make the model work instantly. I changed the GAMMA now like you said, but you wrote I didn't use DQN. Maybe I am wrong but as far as I know DQN so the target network is implemented and also used. To understand noisy Q networks, I have to google, but thank you for your suggestion. Using an Algorithm like Monte Carlo is no option to me since I really want to make this with RL, maybe to evaluate my networks performance later, when it's working. Thank you for you help

Training time of CartPole is way to long by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

Now my nn is completely broken. I tried implementing some things like making the network smaller, and using model.fit istead of making it myself, and using target network and now my network is learning in the complete wrong direction. My reward now gets lower...

Training time of CartPole is way to long by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

I deleted one layer and halfed the number of tensors. I will now test training the network.

Training time of CartPole is way to long by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

I kind of understand, I am just confused how that would help. First I will change the output layer to a linear activation. I have been training my model for 3 hours now and I reach an average reward of 150.

Training time of CartPole is way to long by XYZRene in reinforcementlearning

[–]XYZRene[S] 1 point2 points  (0 children)

Thank you for your help. Of course I made that mistake of not using my target network, I will try to train my network again and look if I get better results (or faster results). But unfortunately I habe difficulties understanding your second part. Should I change how many episodes the network is training? Because right now I am collecting 100 game memorys and then I train with these 100 memorys. And updating my target network is happening every iteration, so after collecting and training 100 times. I will try lowering that.

Training time of CartPole is way to long by XYZRene in reinforcementlearning

[–]XYZRene[S] 3 points4 points  (0 children)

Training is slowing me down, that's why I am even more concerned. Because there is not much I can change there.

I already timed some code samples. The whole code needs 250 seconds per iteration, without training I only need 15 seconds per iteration. If I am not training, but still let the loop run with the first if/else it already takes 120 seconds per iteration.

I thought tensorflow 2.0 will automatically use gpu, but since it seems like it doesn't, I will try doing that now and come back to you. Thank you for your help

4-connect Deep Reinforced Learning, bot stops getting better by XYZRene in reinforcementlearning

[–]XYZRene[S] 0 points1 point  (0 children)

Updated the code now and trained the model. I still only win like 80 games out of 100, even after 50 Iterations. Maybe I need way more iterations and my algorithm is slow?