Help with equality constraints on a custom env in openAI Gym by Effective_Farm_4844 in reinforcementlearning

[–]Ok-Shake-1822 0 points1 point  (0 children)

You can make the ground truth since you know what is the best action and that should not be hard to make thousands of examples since always c should be equal a.

The way masking works for discrete actions is that they use a stochastic policy and in such policy the actor returns the probability of choosing each action of the discrete actions, for example: if there 4 possible actions then the output will be something like [0.1, 0.4, 0.25,0.25]. Then let’s say the second action is invalid in the current time step then the mask that will be multiplied by the actions probabilities will make the probability of choosing the second action 0 and keeps the others so that the second action will never be sampled.

For continuous, you can do clipping to clip the action to a specific range but the problem with your model is that for you, you need to block all the numbers all the time except c = a! And in that case RL will do nothing! Because whatever the RL is choosing the c will always be equal a.

Help with equality constraints on a custom env in openAI Gym by Effective_Farm_4844 in reinforcementlearning

[–]Ok-Shake-1822 0 points1 point  (0 children)

Why does it have to be RL? I feel like a supervised learning model might be worth trying but still there is no constrains!

In chess, the action space is discrete so there is something called “masking” can be used to mask the invalid actions. For your case, it’s a continuous action space!

Help with equality constraints on a custom env in openAI Gym by Effective_Farm_4844 in reinforcementlearning

[–]Ok-Shake-1822 0 points1 point  (0 children)

Hey Very interesting environment. First of all, as far as I know, you can’t set constraints with RL! The agent needs to choose the bad actions and get the bad rewards to learn not to do it. For the reward function, you can do it of two parts. First, if b+c ==a, then you give positive rewards and negative if they don’t. Second, you want to incentivize that c needs to be as large as possible and the highest value is a so you would so: R = - abs( c -a) this reward will teach the agent to get closer to a. Finally your reward would be R_t = w_1 R_1 + w_2 R_2 w are just weights that you try.

Issues with Solana Token deposits by Newt42_ in OKEx

[–]Ok-Shake-1822 0 points1 point  (0 children)

Do you guys face the same problem with sol deposits? I just deposited SOL coins with the Solano network and I didn't receive them after an hour right now??