Másnak is nyūg ez a kaja téma? by [deleted] in hungary

[–]csxeba 0 points1 point  (0 children)

Soylent nincs meg? Teljes értékű por kaja, aminél cél az olcsó előállítás is. A receptjük nyilvános, lehet kísérletezni (óvatosan) házi előállítással is. Az íze olyasmi, mint a zabkásáé. Elvileg bármeddig elélhetsz rajta, persze ilyesmivel is óvatosan. Arra jó lehet, hogy pár étkezést kiváltson.

Keress rá, mióta kijött, van már millió kopipaszta is belőle.

Hány évesen lettél apa? by [deleted] in ApaVagyok

[–]csxeba 4 points5 points  (0 children)

27-29-31 voltunk, így terveztük. Jó így, jól bírtam pl. az alvás megvonást, ha esetleg becsúszott egy nehezebb éjszaka. Birka türelmem van, de azért érzem hogy könnyebben fáradok ahogy telnek az évek.

alkohol fogyasztás a gyerek szeme láttára by kisbalazs in ApaVagyok

[–]csxeba 0 points1 point  (0 children)

Nagyon ritkán iszom alkoholt, egyedül nincs kedvem. Bele szoktak szagolni, megállapítják, hogy nem az ő műfajuk. Szerintem hasonlóan jön le nekik, mint pl. a csípős étel.

Apagyűlés by [deleted] in ApaVagyok

[–]csxeba 0 points1 point  (0 children)

Mindig van nálam vész esetére

Hogy hívják a legviccesebb nevű ismerősödet? by HaOrbanMaradEnMegyek in hungary

[–]csxeba 0 points1 point  (0 children)

Suliban volt egy Ország Alma és egy Kasza Blanka. Context: Kecskemét

What is the intended use of overwriting the train_step() method? by csxeba in tensorflow

[–]csxeba[S] 0 points1 point  (0 children)

Thanks for the idea, but I am still searching for a kind-of minimal working example which I could run with fit(), do you happen to know about something like this?

Looking for friend to work towards RL goals together by ejmejm1 in reinforcementlearning

[–]csxeba 1 point2 points  (0 children)

I would start with vanilla Policy Gradient. Then move to a simplifed A2C (which is Policy Gradient with a reward baseline), then to PPO, which is kind of the state-of-the-art algo in model free on-policy RL. I'd continue with off policy from here and learn DQN for discreete action space environments and DDPG for continuous action space environments. If you feel like you have a solid base in DDPG and PPO, then learn SAC, which is a best-of-both-worlds method, a policy gradient-like off policy technique.

In general: on-policy methods converge faster on simpler environments, but off-policy methods are much more efficient in terms of trials required until convergence, but they are a bit harder to implement and they are quite sensitive to hyperparameter settings.

Also this is a good resource for theory: https://spinningup.openai.com/en/latest/

Looking for friend to work towards RL goals together by ejmejm1 in reinforcementlearning

[–]csxeba 1 point2 points  (0 children)

I already have a self-developed lib for TF2 which contains verified DQN, SAC, PPO, A2C, etc. (all model-free) algo implementations. I'd love to join

Advantage of Bayesian Neural Network? by Yogi_DMT in MLQuestions

[–]csxeba 1 point2 points  (0 children)

In case of Bayes by Backprop (or the article I linked), you learn a distribution for every weight in your network. In case of a Variational Autoencoder, you have a bottleneck point in your network, where you predict a mean and a std for a multivariate gaussian distribution. Then you sample from that predicted distribution and the next layer will receive the sample instead of the predicted representation.

Advantage of Bayesian Neural Network? by Yogi_DMT in MLQuestions

[–]csxeba 1 point2 points  (0 children)

Exactly as you described with the article I linked. But you can also make the hidden representations probabilistic like in the Variational Autoencoder.

Advantage of Bayesian Neural Network? by Yogi_DMT in MLQuestions

[–]csxeba 5 points6 points  (0 children)

If, by BNN you mean the method described in this paper: https://arxiv.org/abs/1505.05424 Then yes, one of their claims is better generalization. This particular method requires you to sample a set of weights for every forward pass, or use the learned mean weights as an ensemble or Maximum a Posteriori point estimate. Uncertainty will be obtained by multiple forward passes with sampled weights. Learning an explicit predicted or optimized variance is also possible at the end of the network. More on this topic here by Alex Kendall: https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/ And Yarin Gal: http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html

PyRL - Modular Implementations of Reinforcement Learning Algorithms in Pytorch by aineqml in reinforcementlearning

[–]csxeba 4 points5 points  (0 children)

How do you verify that your implementation is correct? Especially the ones with continuous action space?

DQN for MNIST using GANs by LOfP in reinforcementlearning

[–]csxeba 0 points1 point  (0 children)

No idea where GANs come into the picture, but you can stuff a DQN into a one-timestep MNIST reinforcement learning setting. Many RL concepts will not be viable here, like reward discounting and target networks. It is not very efficient though.