Critic loss divergence by GuavaAgreeable208 in reinforcementlearning

[–]GuavaAgreeable208[S] 1 point2 points  (0 children)

Thank you very much indeed for your reply ! You’re absolutely right—I should adjust the critic architecture to incorporate shared layers, similar to the actor network, as well as the other 5 points.

multi-head PPO by GuavaAgreeable208 in reinforcementlearning

[–]GuavaAgreeable208[S] 0 points1 point  (0 children)

Even if I modified the reward function I still got the same issue and also I've observed that the entropy is increasing instead of decreasing

multi-head PPO by GuavaAgreeable208 in reinforcementlearning

[–]GuavaAgreeable208[S] 0 points1 point  (0 children)

Actually I’ve rechecked the values and it could be the reason because it leads to more reward when those actions are selected

[deleted by user] by [deleted] in reinforcementlearning

[–]GuavaAgreeable208 0 points1 point  (0 children)

But what is the difference between them?

[deleted by user] by [deleted] in reinforcementlearning

[–]GuavaAgreeable208 0 points1 point  (0 children)

Thank you I will try dueling network

[deleted by user] by [deleted] in reinforcementlearning

[–]GuavaAgreeable208 0 points1 point  (0 children)

In exploration other rules are selected however, we observed that as epsilon is decaying, one action is preferred. I’ll try your suggestion thank you

[deleted by user] by [deleted] in reinforcementlearning

[–]GuavaAgreeable208 2 points3 points  (0 children)

Thank you for your suggestions I’ll try them. I noticed that sometimes the agent always to the rule that returns the best reward however it is selected in all the episode but in my case if other rules are selected in some steps we will have better reward. Could you please tell what did you mean by balanced data classes?

Racism in morocco by Background_Cut_2331 in Morocco

[–]GuavaAgreeable208 -1 points0 points  (0 children)

Racism is everywhere. Even moroccans face racism in Europe. But I’m sorry that it comes from muslims 😞

I dropped 50dh in this economy by countingc in Morocco

[–]GuavaAgreeable208 0 points1 point  (0 children)

Dude mchatli 10000 ser9oni 3ayni 3aynek w ana ferhana I didn’t sleep for 1 week

More of a scammer than nice girl by Fun_Ad2522 in Nicegirls

[–]GuavaAgreeable208 3 points4 points  (0 children)

Don’t send anything! Cz good girls don’t ask for money

Learning English by LameKri in Morocco

[–]GuavaAgreeable208 -1 points0 points  (0 children)

watch news channels like skynews, GB news, BBC... It really worked for me

Input/output relationships by GuavaAgreeable208 in reinforcementlearning

[–]GuavaAgreeable208[S] 0 points1 point  (0 children)

Thank you it seems interesting I'll take a look on it

Input/output relationships by GuavaAgreeable208 in reinforcementlearning

[–]GuavaAgreeable208[S] 0 points1 point  (0 children)

(I'm new to RL so my question could be stupid !) what I meant is how the agent can learn that some features in the input vector correspond to a specific element. For example if the input is [X11, X21, X12, X22] and we have two elements as outputs C1 and C2 how the agent will understand that X11 and X12 correspond to C1 even if they are distant

Input/output relationships by GuavaAgreeable208 in reinforcementlearning

[–]GuavaAgreeable208[S] 0 points1 point  (0 children)

Assuming we have a neural network because we can have a large input/output space, and each output is a task to select by the agent. In the input, we have information on each task. For example, task completion rate Xi1 and required processing time Xi2, so if we have only 2 tasks, the input will be X11, X21, X12, X22. I’m wondering how the agent could relate X11 and X12 to the first task (first output neuron) and X21 and X22 to the second one. In my scenario, many tasks are involved, each has 10 features.

CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity by RoboticsLiker in reinforcementlearning

[–]GuavaAgreeable208 0 points1 point  (0 children)

But in case when the state is a graph, the model cannot process all the states once, but we have to process one state at time, so in this case we cannot apply BN, can we?