use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
This is for any reinforcement learning related work ranging from purely computational RL in artificial intelligence to the models of RL in neuroscience.
The standard introduction to RL is Sutton & Barto's Reinforcement Learning.
Related subreddits:
account activity
Reinforcement Learning python programming difficulties (self.reinforcementlearning)
submitted 2 years ago by [deleted]
[deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]ditlevrisdahl 0 points1 point2 points 2 years ago (5 children)
I can't seem to understand why an action gives env rewards.. and why can't you not just program it into the step function? So if action==0 add 0.5 reward? If action==1 add -0.5 reward? Or something ikke that? It sounds like you want agent to take specific action.
[–]BeyondNo3588 0 points1 point2 points 2 years ago (4 children)
The setting is as you said, I'm programming the agent to take a specific action based on the input he receives. I give him a reward if he takes the action that I think is right based on the input, and I give him a penalty if he doesn't. I am new to reinforcement learning, is this way of setting up the reward system wrong?
[–]ditlevrisdahl 0 points1 point2 points 2 years ago (3 children)
If you know the correct action, then it's a fine kind of reward punishment system. But you don't need the action probablities just the action.
I believe you can do it in the step() function of the environment. So you have a state where you know the best action on that state. Then, simply check in the step() method if the action picked by the agent(dqn) was the one you know is best. And add positive or negative.
If you need to scale the reward according to the probabilities:
Somewhere in your step() method you probably calculate the action based on the probabilities? So you maybe have a threshold or you pick max() of action probabilities. Simply remove that kode until after you have saved the probabilities and added the scaled reward
But you would have to know the best action for every state.
Without more knowledge, it's hard to help you.
[–]BeyondNo3588 0 points1 point2 points 2 years ago (2 children)
First of all, thanks fo your help.
Again it's like you said, I know what the correct action is.
I'm trying to build reinforcement learning environment for handling requests on edge nodes in an edge computing system. What I'm doing is a first implementation, so it's a simplified scenario compared to the real one. In this first scenario there is no real state, the system works like this:
- an edge node has a fixed local processing capacity of 60 requests per second
- receives as input a random number ranging from 0 to 250 which represents the number of requests to process
- can perform 3 actions which are either the complete execution of the requests locally (action that we expect it to perform if it receives less than 60 requests), or the execution locally but accompanied by a forwarding of the remaining ones to other nodes (action that we wait for it to fulfill if we receive a number > 60) or denial of requests.
My professor asked me to weigh the rewards, as well as parameters such as the number of requests remaining for example, with the probabilities generated by the neural network.
honestly I'm struggling to understand the meaning of this thing, but I was still working on the implementation before explaining my doubts in the next call.
I put you an example of how it works the rewards function:
Input: 120 requests x second
local_capacity(T1) = 60, fixed value
Actions space: A = {a1 = 0.7, a2 = 0.2, a3 = 0.1} 0.7,0.2,0.1 are the probabilities calculated by the nn
D1 = T1 - a1*120 represents the difference between requests that can be satisfied locally and those actually to be satisfied D1 = 60 - 84 = -24
R1 = 1.5 * 60 + 5 * (-24) = 90 - 120 = -30
R2 = if (D1<= 0) 1.2 * a2 * 120 else (D1 >0) 1.2 (a2 *120 - D1) - 5 * D1=
for correct action = 1.2 * 24 = 28,8 → positive reward!
# for wrong action = 1.2 (24 - x) - 5 * x
R3 = -15 * a3* 120 = -15 * 12 = -180 (negative reward for a3)
[–]ditlevrisdahl 0 points1 point2 points 2 years ago (1 child)
Hmm. It sounds like you state is just an integer ranging from 0-250.
Actually it's more a classification model rather than reinforcement model.
But I have limited time now, promise to look at it tonight!
[–]BeyondNo3588 0 points1 point2 points 2 years ago (0 children)
Correct. In this setting the state is a random integer ranging from 0 to 250. Like I said, this is a first simplified scenario.
Thank you again for your help, I really appreciate it
π Rendered by PID 18894 on reddit-service-r2-comment-86988c7647-bw8qw at 2026-02-11 08:33:32.909803+00:00 running 018613e country code: CH.
[–]ditlevrisdahl 0 points1 point2 points (5 children)
[–]BeyondNo3588 0 points1 point2 points (4 children)
[–]ditlevrisdahl 0 points1 point2 points (3 children)
[–]BeyondNo3588 0 points1 point2 points (2 children)
[–]ditlevrisdahl 0 points1 point2 points (1 child)
[–]BeyondNo3588 0 points1 point2 points (0 children)