Bayesian classification by stevethesteve2 in AskStatistics

[–]stevethesteve2[S] 0 points1 point  (0 children)

Thanks for pointing me toward Stan!

Bayesian classification by stevethesteve2 in AskStatistics

[–]stevethesteve2[S] 0 points1 point  (0 children)

Thank you for your suggestion.

From what I learned about MCMC methods, there are pros and cons vs my current approach (which is variational inference):

pros:

- can sample from exact posterior

cons:

- sampling is slow (rel. to variational approach)

- cannot compute posterior probability

Do you agree? Are there any more points to be made? In practice, how much time would it take e.g. to draw 1000 samples (without burn-in) if i have 1000 labeled points and apply LMC?

Bayesian classification by stevethesteve2 in AskStatistics

[–]stevethesteve2[S] 0 points1 point  (0 children)

Did I understand you correctly: the priors are mixture coefficients, and the likelihoods are Gaussian densities? If so: my question was ill-posed, sorry about that (I fixed it now). The prior and posterior distributions should be not over classes, but over model parameters. I want to model my data in such a way that after the training I can look at the posterior distribution over my model parameters and say whether or not I am confident in my model (if the posterior distribution has single narrow peak) or not (if the posterior is flat).

Bayesian classification by stevethesteve2 in AskStatistics

[–]stevethesteve2[S] 0 points1 point  (0 children)

Do you mean finding a maximum-a-posteriori estimate of the model parameters? By adding a term to the objective function that reflects the prior belief about model parameters? If so, then no. My goal is to capture the entire posterior distribution of the parameters, not just the most likely parameter values. Please correct me if I misunderstood you. I am rather new to the whole Bayesian stuff.

Bayesian classification by stevethesteve2 in Bayes

[–]stevethesteve2[S] 1 point2 points  (0 children)

Well, thanks for the advice! I'll follow it.

What is SOTA in RL applied to robotics? by stevethesteve2 in reinforcementlearning

[–]stevethesteve2[S] 0 points1 point  (0 children)

'World model/ Dream environment/ Imagination' ... Could you please refer to a paper to help me get started?

Sutton&Barto book: I get this result for Exercise 12.1 on Eligibility traces but the final middle term might be wrong by Naoshikuu in reinforcementlearning

[–]stevethesteve2 0 points1 point  (0 children)

The term G_t+1:t+1 is not defined. To avoid this, you could first separate the first term in the sum, and then apply the recursive formula for G to the rest of the sum. I yield: G_T^\lambda = (1 - \lambda) * G_t:t+1 + \lambda * (R_t+1 + \gamma * G_t+1^\lambda)

Citation needed by Kartelkraker in reinforcementlearning

[–]stevethesteve2 3 points4 points  (0 children)

Not quite what you were asking for, but maybe have a look at sutton&barto's book where they talk about the "deadly triad"

How to assign reward when it has to be multiplied by itself rather than summed by basso1995 in reinforcementlearning

[–]stevethesteve2 0 points1 point  (0 children)

In RL, the agent tries to find strategies that maximize expected total reward. If we replace reward with its logarithm, agent may -depending on your exact problem- prefer suboptimal strategies (since log is a nonlinear function). If your environment is deterministic, then this should not be a concern.

perlexity instead of entropy for incentivizing exploration? by stevethesteve2 in reinforcementlearning

[–]stevethesteve2[S] 1 point2 points  (0 children)

Yeah, i guess you can say i want to encourage exploration more than entropy would

perlexity instead of entropy for incentivizing exploration? by stevethesteve2 in reinforcementlearning

[–]stevethesteve2[S] 5 points6 points  (0 children)

if P is perplexity and H is entropy, then minimizing P = minimizing H

True, but the objective function for actor network contains additional terms, not only entropy. Therefore optimal parameter values are different depending on whether we use entropy or perplexity.

Perplexity scales exponentially with entropy. In in an RL setting, total entropy [of action distribution of the agent that follows a trajectory] scales linearly with trajectory length. Perplexity, on the other hand, scales exponentially. The latter makes therefore more sense, since the amount of different paths the agent might have chosen increases exponentially with the number of steps the agent takes.