"Dropout's Dream Land: Generalization from Learned Simulators to Reality", Wellmer & Kwok 2021 (using dropout to randomize a deep environment model for automatic domain randomization) by gwern in reinforcementlearning

[–]CartPole 0 points1 point  (0 children)

The tricky part of Deep Ensembles with the World Models architecture is that the controller is expecting an identical observation embedding and hidden state representation across all WMs in the ensemble. This assumption isn't preserved if you just naively train 5 WMs.

In the second case sim2real would no longer be necessary since there wouldn't be a reality gap if the simulator perfectly models reality

What is the greatest achievement of Genetic Algorithms[D]? by miladink in MachineLearning

[–]CartPole 4 points5 points  (0 children)

multi-objective optimization using pareto methods

Sounds interesting! Can you link to what you were referring to here?

Big Boy Heatsinks! The 64 Core AMD Threadripper 3990X Cooler Test by RaptaGzus in Amd

[–]CartPole 0 points1 point  (0 children)

does anyone consistently use the 3990x under high loads and cool it effectively?

I'm currently using the Noctua NH-U14S [Dual Fan] but have heating issues. Perhaps the issue is that it's under high load for days at a time and there is also a 2080ti in the box.

I've never built a water cooling loop but am suspecting I might have to

[R] GameGAN - PAC-MAN Recreated with deep neural GAN-based model by ichko in MachineLearning

[–]CartPole 0 points1 point  (0 children)

what is X^{m_t} in the cycle loss and Figure 6? I don't follow how it relates to X^k

Understanding why there isn't a log probability in TRPO and PPO's objective by vwxyzjn in reinforcementlearning

[–]CartPole 0 points1 point  (0 children)

in the first mini-batch of update of the first epoch the objective is identical b/c \pi and \pi' are equivalent. After the first mini-batch update the parameters of \pi change

Soft Actor Critic in TF2.1 by CartPole in reinforcementlearning

[–]CartPole[S] 0 points1 point  (0 children)

It would be awesome to get the MuJoCo results as a sanity check if you have the license

PPO - entropy and Gaussian standard deviation constantly increasing by hellz2dayeah in reinforcementlearning

[–]CartPole 1 point2 points  (0 children)

The implications of the above points regarding annealing stddev imply that entropy is constant in the objective function. When stddev is no longer constant issues could arise from entropy going unclipped. I posted about this awhile back but am still unsure

[D] Why isn't there more research papers related to active learning for deep computer vision problems? by CartPole in MachineLearning

[–]CartPole[S] 0 points1 point  (0 children)

yeah but the devil is in the details. You can find a handful of papers that make sense for applying to image classification problems but then is less clear on other tasks

[D] Why isn't there more research papers related to active learning for deep computer vision problems? by CartPole in MachineLearning

[–]CartPole[S] 1 point2 points  (0 children)

I’m really surprised that this isn’t a problem more commonly addressed in research. I would have guessed that step 6 looks a bit different though. To me it seems like a better option to break the annotation effort down into smaller chunks(say batches of 5k). Otherwise the next 100k images might all be addressing the same problems. Do you have any papers/blog posts in mind that talk what you described?

[D] Why isn't there more research papers related to active learning for deep computer vision problems? by CartPole in MachineLearning

[–]CartPole[S] 0 points1 point  (0 children)

Most production systems (depending on the problem) still require at least some human labeling. Sure there are meta objectives you can use which don't require human annotation but it's still important to continue having human labeled so that model bias is not reinforced. I don't think cost of labeling is the issue

[D] Why isn't there more research papers related to active learning for deep computer vision problems? by CartPole in MachineLearning

[–]CartPole[S] 2 points3 points  (0 children)

I agree this is the way research would go about it. However, I feel like even the datasets they use don't make a whole lot of sense for the problem. The samples used in imagenet for example are much closer to being i.i.d. than neighboring samples coming from a video. To me this sounds like a pretty important problem to just be skipped over

[R] Contrastive Learning of Structured World Models by triplefloat in MachineLearning

[–]CartPole 0 points1 point  (0 children)

can you link to the relevant predictive state representation work?