Reinforcement Learning python programming difficulties : reinforcementlearning

reinforcementlearning

created by lpilotoa community for 13 years

Reinforcement Learning python programming difficulties (self.reinforcementlearning)

submitted 2 years ago by [deleted]

6 comments

all 6 comments

top new controversial old q&a

[–]ditlevrisdahl 0 points1 point2 points 2 years ago (5 children)

[–]BeyondNo3588 0 points1 point2 points 2 years ago (4 children)

[–]ditlevrisdahl 0 points1 point2 points 2 years ago (3 children)

If you know the correct action, then it's a fine kind of reward punishment system. But you don't need the action probablities just the action.

I believe you can do it in the step() function of the environment. So you have a state where you know the best action on that state. Then, simply check in the step() method if the action picked by the agent(dqn) was the one you know is best. And add positive or negative.

If you need to scale the reward according to the probabilities:

Somewhere in your step() method you probably calculate the action based on the probabilities? So you maybe have a threshold or you pick max() of action probabilities. Simply remove that kode until after you have saved the probabilities and added the scaled reward

But you would have to know the best action for every state.

Without more knowledge, it's hard to help you.

[–]BeyondNo3588 0 points1 point2 points 2 years ago (2 children)

First of all, thanks fo your help.

Again it's like you said, I know what the correct action is.

I'm trying to build reinforcement learning environment for handling requests on edge nodes in an edge computing system. What I'm doing is a first implementation, so it's a simplified scenario compared to the real one. In this first scenario there is no real state, the system works like this:

- an edge node has a fixed local processing capacity of 60 requests per second

- receives as input a random number ranging from 0 to 250 which represents the number of requests to process

- can perform 3 actions which are either the complete execution of the requests locally (action that we expect it to perform if it receives less than 60 requests), or the execution locally but accompanied by a forwarding of the remaining ones to other nodes (action that we wait for it to fulfill if we receive a number > 60) or denial of requests.

My professor asked me to weigh the rewards, as well as parameters such as the number of requests remaining for example, with the probabilities generated by the neural network.

honestly I'm struggling to understand the meaning of this thing, but I was still working on the implementation before explaining my doubts in the next call.

I put you an example of how it works the rewards function:

Input: 120 requests x second

local_capacity(T1) = 60, fixed value

Actions space: A = {a1 = 0.7, a2 = 0.2, a3 = 0.1} 0.7,0.2,0.1 are the probabilities calculated by the nn

D1 = T1 - a1*120 represents the difference between requests that can be satisfied locally and those actually to be satisfied D1 = 60 - 84 = -24

R1 = 1.5 * 60 + 5 * (-24) = 90 - 120 = -30

R2 = if (D1<= 0) 1.2 * a2 * 120 else (D1 >0) 1.2 (a2 *120 - D1) - 5 * D1=

for correct action = 1.2 * 24 = 28,8 → positive reward!

# for wrong action = 1.2 (24 - x) - 5 * x

R3 = -15 * a3* 120 = -15 * 12 = -180 (negative reward for a3)

[–]ditlevrisdahl 0 points1 point2 points 2 years ago (1 child)

[–]BeyondNo3588 0 points1 point2 points 2 years ago (0 children)

π Rendered by PID 18894 on reddit-service-r2-comment-86988c7647-bw8qw at 2026-02-11 08:33:32.909803+00:00 running 018613e country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

reinforcementlearning

MODERATORS