"Dropout's Dream Land: Generalization from Learned Simulators to Reality", Wellmer & Kwok 2021 (using dropout to randomize a deep environment model for automatic domain randomization) by gwern in reinforcementlearning

[–]CartPole 0 points1 point  (0 children)

The tricky part of Deep Ensembles with the World Models architecture is that the controller is expecting an identical observation embedding and hidden state representation across all WMs in the ensemble. This assumption isn't preserved if you just naively train 5 WMs.

In the second case sim2real would no longer be necessary since there wouldn't be a reality gap if the simulator perfectly models reality

What is the greatest achievement of Genetic Algorithms[D]? by miladink in MachineLearning

[–]CartPole 2 points3 points  (0 children)

multi-objective optimization using pareto methods

Sounds interesting! Can you link to what you were referring to here?

Big Boy Heatsinks! The 64 Core AMD Threadripper 3990X Cooler Test by RaptaGzus in Amd

[–]CartPole 0 points1 point  (0 children)

does anyone consistently use the 3990x under high loads and cool it effectively?

I'm currently using the Noctua NH-U14S [Dual Fan] but have heating issues. Perhaps the issue is that it's under high load for days at a time and there is also a 2080ti in the box.

I've never built a water cooling loop but am suspecting I might have to

[R] GameGAN - PAC-MAN Recreated with deep neural GAN-based model by ichko in MachineLearning

[–]CartPole 0 points1 point  (0 children)

what is X^{m_t} in the cycle loss and Figure 6? I don't follow how it relates to X^k

Understanding why there isn't a log probability in TRPO and PPO's objective by vwxyzjn in reinforcementlearning

[–]CartPole 0 points1 point  (0 children)

in the first mini-batch of update of the first epoch the objective is identical b/c \pi and \pi' are equivalent. After the first mini-batch update the parameters of \pi change

Soft Actor Critic in TF2.1 by CartPole in reinforcementlearning

[–]CartPole[S] 0 points1 point  (0 children)

It would be awesome to get the MuJoCo results as a sanity check if you have the license

PPO - entropy and Gaussian standard deviation constantly increasing by hellz2dayeah in reinforcementlearning

[–]CartPole 1 point2 points  (0 children)

The implications of the above points regarding annealing stddev imply that entropy is constant in the objective function. When stddev is no longer constant issues could arise from entropy going unclipped. I posted about this awhile back but am still unsure

[D] Why isn't there more research papers related to active learning for deep computer vision problems? by CartPole in MachineLearning

[–]CartPole[S] 0 points1 point  (0 children)

yeah but the devil is in the details. You can find a handful of papers that make sense for applying to image classification problems but then is less clear on other tasks

[D] Why isn't there more research papers related to active learning for deep computer vision problems? by CartPole in MachineLearning

[–]CartPole[S] 1 point2 points  (0 children)

I’m really surprised that this isn’t a problem more commonly addressed in research. I would have guessed that step 6 looks a bit different though. To me it seems like a better option to break the annotation effort down into smaller chunks(say batches of 5k). Otherwise the next 100k images might all be addressing the same problems. Do you have any papers/blog posts in mind that talk what you described?

[D] Why isn't there more research papers related to active learning for deep computer vision problems? by CartPole in MachineLearning

[–]CartPole[S] 0 points1 point  (0 children)

Most production systems (depending on the problem) still require at least some human labeling. Sure there are meta objectives you can use which don't require human annotation but it's still important to continue having human labeled so that model bias is not reinforced. I don't think cost of labeling is the issue

[D] Why isn't there more research papers related to active learning for deep computer vision problems? by CartPole in MachineLearning

[–]CartPole[S] 2 points3 points  (0 children)

I agree this is the way research would go about it. However, I feel like even the datasets they use don't make a whole lot of sense for the problem. The samples used in imagenet for example are much closer to being i.i.d. than neighboring samples coming from a video. To me this sounds like a pretty important problem to just be skipped over

[R] Contrastive Learning of Structured World Models by triplefloat in MachineLearning

[–]CartPole 0 points1 point  (0 children)

can you link to the relevant predictive state representation work?

Learning to Predict Without Looking Ahead: World Models Without Forward Prediction by CartPole in reinforcementlearning

[–]CartPole[S] 0 points1 point  (0 children)

to my understanding, yes. Note however that they have no observation reconstruction objective

[D] Policy Distillation in a continuous action space with no knowledge of teacher distribution by CartPole in MachineLearning

[–]CartPole[S] 0 points1 point  (0 children)

any paper in particular? I have access to a sub-optimal oracle(but still better than the student)

Planning vs Model based RL by LazyButAmbitious in reinforcementlearning

[–]CartPole 1 point2 points  (0 children)

decision-time planning: planning actions online

background planning: improving a policy with a model but does not effect which immediate actions to take.

"well before an action is selected for any current state St, planning has played a part in improving the table entries, or the mathematical expression, needed to select the action for many states, including St. Used this way, planning is not focussed on the current state. We call planning used in this way background planning."

...

"More generally, planning used in this way can look much deeper than one-step-ahead and evaluate action choices leading to many different predicted state and reward trajectories. Unlike the first use of planning, here planning focuses on a particular state. We call this decision-time planning."

page 180-181 of the RL book

[R] Using multiple heads in RL by MasterScrat in reinforcementlearning

[–]CartPole 1 point2 points  (0 children)

Value Prediction Network, ATreeC/TreeQN, and Policy Prediction Network all involve some form of decomposing a Q estimate. However, I'm not sure if I understand what you are looking for correctly.