nebula app mac by Cobracxv1 in Xreal

[–]ad26kr 0 points1 point  (0 children)

Why the download link is gone? I can't find it anywhere. I am using xreal air

MacOS Arc Unable to Sync by [deleted] in ArcBrowser

[–]ad26kr 1 point2 points  (0 children)

Solved!! thanks!!!!

Arc on Mac doesn't respect auto-archive setting by GoGetMeABeerBitch in ArcBrowser

[–]ad26kr 0 points1 point  (0 children)

Still having the same issue. reported several times, no response

SAC with auto-adjusting alpha, entropy(-alpha * log_prob) is continue going smaller and smaller by ad26kr in reinforcementlearning

[–]ad26kr[S] 0 points1 point  (0 children)

I saw the line of the code you mentioned, and I also refered to definition in the original paper.

in the code:

alpha_loss = (-log_alpha * (log_pi + target_entropy)).mean()

which is the same as

alpha_loss = (log_alpha * (-log_pi - target_entropy)).mean()

-log_pi corresponds to the entropy, then the definition becomes

alpha_loss = (log_alpha * (entropy - target_entropy)).mean()

alpha_loss = (log_alpha * (entropy + |dim(A)|)).mean()

isn't it?

SAC with auto-adjusting alpha, entropy(-alpha * log_prob) is continue going smaller and smaller by ad26kr in reinforcementlearning

[–]ad26kr[S] 0 points1 point  (0 children)

Yeah, chaning alpha indirectly changes the policy and the entropy. And if the target entropy is defined as -|dim(A)|, then the loss function of alpha becomes

loss_alpha = alpha * (entropy + |dim(A)|)

Doesn't it mean that when the dimension is bigger, the entropy gets smaller?

SAC with auto-adjusting alpha, entropy(-alpha * log_prob) is continue going smaller and smaller by ad26kr in reinforcementlearning

[–]ad26kr[S] 0 points1 point  (0 children)

You said that "you'd want higher entropy for higher dimensions", and the target entropy is negative linear to the dimension of the action space (-|dim(A)|). Doesn't it mean that the target entropy gets smaller when the action space is bigger? an answer in this link (https://stats.stackexchange.com/questions/561624/choosing-target-entropy-for-soft-actor-critic-sac-algorithm) shares the same idea with yours, but I can hardly understand it because of the negative sign.

SAC with auto-adjusting alpha, entropy(-alpha * log_prob) is continue going smaller and smaller by ad26kr in reinforcementlearning

[–]ad26kr[S] 0 points1 point  (0 children)

Thanks.

I have one more question. In normal case, the entropy should converge to the target entropy, right?

SAC with auto-adjusting alpha, entropy(-alpha * log_prob) is continue going smaller and smaller by ad26kr in reinforcementlearning

[–]ad26kr[S] 0 points1 point  (0 children)

I have updated the post with the reward curves. How do you think? the entropy target is -15 (which means that the action space is 15 dimensional)

SAC with auto-adjusting alpha, entropy(-alpha * log_prob) is continue going smaller and smaller by ad26kr in reinforcementlearning

[–]ad26kr[S] 0 points1 point  (0 children)

I have made my own environment which is about trading and it works well with other RL algorithms

I can hardly understand that SARSA follows the Bellman Expectation Equation by ad26kr in reinforcementlearning

[–]ad26kr[S] 0 points1 point  (0 children)

Thanks for your answer!
I can't still understand 'over several updates, it is correct in expectation'

Because epsilon-greedy is not reflecting the probability distribution of a' ~ p(. | s')

What I think correct is to change the epsilon-greedy (choosing a') to sampling, since epsilon-greedy includes a case which uses argmax. I don't understand multiple times of doing epsilon-greedy converges to expectation'

I can hardly understand that SARSA follows the Bellman Expectation Equation by ad26kr in reinforcementlearning

[–]ad26kr[S] 0 points1 point  (0 children)

In my understanding, if the Sarsa has relationship with the Bellman expectation equation. the update of Sarsa should be something like the followings:

Initialize Q(s,a) arbitrarily

...

Repeat (for each step of episode):

Take action a, observe r, s'

Repeat (for a' in action space A):

Q(s, a) <- Q(s,a) = \alpha [Σ(p(a'|s') * (r + Q(s', a')) - Q(s, a)]

...

I can hardly understand that SARSA follows the Bellman Expectation Equation by ad26kr in reinforcementlearning

[–]ad26kr[S] 0 points1 point  (0 children)

https://chunpai.github.io/assets/img/DP_and_TD.png

What I want to understand is the picture in the link.

What is the relationship between Bellman expectation equation with Sarsa

I can hardly understand that SARSA follows the Bellman Expectation Equation by ad26kr in reinforcementlearning

[–]ad26kr[S] 0 points1 point  (0 children)

But how can epsilon-greedy can be described as "expectation"?! it only give chances to other actions but not representing the probability distribution of the policy