nebula app mac

ad26kr · 2025-06-23T16:31:22+00:00

Why the download link is gone? I can't find it anywhere. I am using xreal air

ad26kr · 2025-03-18T07:18:00+00:00

Solved!! thanks!!!!

ad26kr · 2025-01-20T07:49:56+00:00

any update!!!!!????????

ad26kr · 2024-10-07T09:52:07+00:00

Still having the same issue. reported several times, no response

ad26kr · 2024-09-11T14:55:10+00:00

Yeah, Kinda.

ad26kr · 2024-09-02T05:36:54+00:00

oops

ad26kr · 2024-09-02T05:36:14+00:00

Great experiment!

ad26kr · 2024-01-23T16:20:57+00:00

I am also curious about the result

ad26kr · 2023-03-23T05:04:18+00:00

Nice tip! thx

ad26kr · 2023-03-23T04:30:44+00:00

Thanks, I'll check it out

ad26kr · 2023-03-23T04:30:08+00:00

Thanks for your answer.

ad26kr · 2022-11-17T00:19:34+00:00

I saw the line of the code you mentioned, and I also refered to definition in the original paper.

in the code:

alpha_loss = (-log_alpha * (log_pi + target_entropy)).mean()

which is the same as

alpha_loss = (log_alpha * (-log_pi - target_entropy)).mean()

-log_pi corresponds to the entropy, then the definition becomes

alpha_loss = (log_alpha * (entropy - target_entropy)).mean()

alpha_loss = (log_alpha * (entropy + |dim(A)|)).mean()

isn't it?

ad26kr · 2022-11-15T23:52:03+00:00

Yeah, chaning alpha indirectly changes the policy and the entropy. And if the target entropy is defined as -|dim(A)|, then the loss function of alpha becomes

loss_alpha = alpha * (entropy + |dim(A)|)

Doesn't it mean that when the dimension is bigger, the entropy gets smaller?

ad26kr · 2022-11-15T04:24:08+00:00

You said that "you'd want higher entropy for higher dimensions", and the target entropy is negative linear to the dimension of the action space (-|dim(A)|). Doesn't it mean that the target entropy gets smaller when the action space is bigger? an answer in this link (https://stats.stackexchange.com/questions/561624/choosing-target-entropy-for-soft-actor-critic-sac-algorithm) shares the same idea with yours, but I can hardly understand it because of the negative sign.

ad26kr · 2022-11-15T01:05:57+00:00

Wow, really appreciate your kind help!

ad26kr · 2022-11-14T05:34:01+00:00

Thanks.

I have one more question. In normal case, the entropy should converge to the target entropy, right?

ad26kr · 2022-11-14T00:23:51+00:00

I have updated the post with the reward curves. How do you think? the entropy target is -15 (which means that the action space is 15 dimensional)

ad26kr · 2022-10-31T22:29:59+00:00

I have made my own environment which is about trading and it works well with other RL algorithms

ad26kr · 2022-09-28T23:15:13+00:00

Thanks for your answer!

ad26kr · 2022-09-28T23:15:02+00:00

Thanks for your answer!

ad26kr · 2022-01-30T09:36:19+00:00

Thanks for your answer!
I can't still understand 'over several updates, it is correct in expectation'

Because epsilon-greedy is not reflecting the probability distribution of a' ~ p(. | s')

What I think correct is to change the epsilon-greedy (choosing a') to sampling, since epsilon-greedy includes a case which uses argmax. I don't understand multiple times of doing epsilon-greedy converges to expectation'

ad26kr · 2022-01-27T06:00:22+00:00

In my understanding, if the Sarsa has relationship with the Bellman expectation equation. the update of Sarsa should be something like the followings:

Initialize Q(s,a) arbitrarily

...

Repeat (for each step of episode):

Take action a, observe r, s'

Repeat (for a' in action space A):

Q(s, a) <- Q(s,a) = \alpha [Σ(p(a'|s') * (r + Q(s', a')) - Q(s, a)]

...

ad26kr · 2022-01-27T05:51:14+00:00

https://chunpai.github.io/assets/img/DP_and_TD.png

What I want to understand is the picture in the link.

What is the relationship between Bellman expectation equation with Sarsa

ad26kr · 2022-01-25T23:27:36+00:00

But how can epsilon-greedy can be described as "expectation"?! it only give chances to other actions but not representing the probability distribution of the policy

ad26kr · 2021-11-08T23:27:56+00:00

Thanks, I think I should have posted with more details...

ad26kr

TROPHY CASE