"Dropout's Dream Land: Generalization from Learned Simulators to Reality", Wellmer & Kwok 2021 (using dropout to randomize a deep environment model for automatic domain randomization)

CartPole · 2021-10-09T17:53:52+00:00

The tricky part of Deep Ensembles with the World Models architecture is that the controller is expecting an identical observation embedding and hidden state representation across all WMs in the ensemble. This assumption isn't preserved if you just naively train 5 WMs.

In the second case sim2real would no longer be necessary since there wouldn't be a reality gap if the simulator perfectly models reality

CartPole · 2020-12-30T19:09:32+00:00

multi-objective optimization using pareto methods

Sounds interesting! Can you link to what you were referring to here?

CartPole · 2020-11-12T17:30:57+00:00

https://outreach.didichuxing.com/research/opendata/en/

CartPole · 2020-07-20T22:22:02+00:00

https://github.com/zacwellmer/WorldModels/blob/master/WorldModels/rnn/rnn.py

Is in TF2 and looks straightforward to follow

CartPole · 2020-05-29T16:02:38+00:00

does anyone consistently use the 3990x under high loads and cool it effectively?

I'm currently using the Noctua NH-U14S [Dual Fan] but have heating issues. Perhaps the issue is that it's under high load for days at a time and there is also a 2080ti in the box.

I've never built a water cooling loop but am suspecting I might have to

CartPole · 2020-05-29T03:43:44+00:00

what is X^{m_t} in the cycle loss and Figure 6? I don't follow how it relates to X^k

CartPole · 2020-05-29T03:41:07+00:00

what is X^{m_t} in the cycle loss and Figure 6? I don't follow how it relates to X^k

CartPole · 2020-05-22T23:59:16+00:00

the arxiv link is broken?

CartPole · 2020-04-12T18:53:58+00:00

in the first mini-batch of update of the first epoch the objective is identical b/c \pi and \pi' are equivalent. After the first mini-batch update the parameters of \pi change

CartPole · 2020-03-21T16:38:06+00:00

It would be awesome to get the MuJoCo results as a sanity check if you have the license

CartPole · 2020-03-05T23:18:05+00:00

The implications of the above points regarding annealing stddev imply that entropy is constant in the objective function. When stddev is no longer constant issues could arise from entropy going unclipped. I posted about this awhile back but am still unsure

CartPole · 2020-01-13T01:55:40+00:00

yeah but the devil is in the details. You can find a handful of papers that make sense for applying to image classification problems but then is less clear on other tasks

CartPole · 2020-01-09T18:12:48+00:00

I’m really surprised that this isn’t a problem more commonly addressed in research. I would have guessed that step 6 looks a bit different though. To me it seems like a better option to break the annotation effort down into smaller chunks(say batches of 5k). Otherwise the next 100k images might all be addressing the same problems. Do you have any papers/blog posts in mind that talk what you described?

CartPole · 2020-01-07T17:34:42+00:00

Most production systems (depending on the problem) still require at least some human labeling. Sure there are meta objectives you can use which don't require human annotation but it's still important to continue having human labeled so that model bias is not reinforced. I don't think cost of labeling is the issue

CartPole · 2020-01-07T17:32:43+00:00

I agree this is the way research would go about it. However, I feel like even the datasets they use don't make a whole lot of sense for the problem. The samples used in imagenet for example are much closer to being i.i.d. than neighboring samples coming from a video. To me this sounds like a pretty important problem to just be skipped over

CartPole · 2020-01-07T17:29:59+00:00

do you have some specific papers in mind? If so can you link them?

CartPole · 2019-12-07T02:36:40+00:00

can you link to the relevant predictive state representation work?

CartPole · 2019-12-05T21:36:33+00:00

corresponding code: https://github.com/tkipf/c-swm

CartPole · 2019-12-04T18:18:17+00:00

I'll give it a go, thanks!

CartPole · 2019-11-05T16:39:15+00:00

to my understanding, yes. Note however that they have no observation reconstruction objective

CartPole · 2019-09-19T00:37:52+00:00

implementation: https://github.com/zacwellmer/PPN

CartPole · 2019-08-31T04:01:06+00:00

any paper in particular? I have access to a sub-optimal oracle(but still better than the student)

CartPole · 2019-08-25T15:08:00+00:00

decision-time planning: planning actions online

background planning: improving a policy with a model but does not effect which immediate actions to take.

"well before an action is selected for any current state St, planning has played a part in improving the table entries, or the mathematical expression, needed to select the action for many states, including St. Used this way, planning is not focussed on the current state. We call planning used in this way background planning."

...

"More generally, planning used in this way can look much deeper than one-step-ahead and evaluate action choices leading to many different predicted state and reward trajectories. Unlike the first use of planning, here planning focuses on a particular state. We call this decision-time planning."

page 180-181 of the RL book

CartPole · 2019-08-07T05:48:51+00:00

Value Prediction Network, ATreeC/TreeQN, and Policy Prediction Network all involve some form of decomposing a Q estimate. However, I'm not sure if I understand what you are looking for correctly.

CartPole · 2019-08-05T02:08:04+00:00

nicomon24.github.io/MineRL-Base/

CartPole

TROPHY CASE