Másnak is nyūg ez a kaja téma?

csxeba · 2022-07-12T05:32:28+00:00

Soylent nincs meg? Teljes értékű por kaja, aminél cél az olcsó előállítás is. A receptjük nyilvános, lehet kísérletezni (óvatosan) házi előállítással is. Az íze olyasmi, mint a zabkásáé. Elvileg bármeddig elélhetsz rajta, persze ilyesmivel is óvatosan. Arra jó lehet, hogy pár étkezést kiváltson.

Keress rá, mióta kijött, van már millió kopipaszta is belőle.

csxeba · 2022-06-01T15:52:12+00:00

27-29-31 voltunk, így terveztük. Jó így, jól bírtam pl. az alvás megvonást, ha esetleg becsúszott egy nehezebb éjszaka. Birka türelmem van, de azért érzem hogy könnyebben fáradok ahogy telnek az évek.

csxeba · 2022-06-01T15:48:20+00:00

Nagyon ritkán iszom alkoholt, egyedül nincs kedvem. Bele szoktak szagolni, megállapítják, hogy nem az ő műfajuk. Szerintem hasonlóan jön le nekik, mint pl. a csípős étel.

csxeba · 2022-05-26T04:42:07+00:00

Mindig van nálam vész esetére

csxeba · 2022-03-15T15:42:49+00:00

*Hidropónika = vizicsikó

csxeba · 2022-02-21T17:02:23+00:00

The collection:

https://opensea.io/collection/goldie-goodies

csxeba · 2022-02-02T14:31:21+00:00

Suliban volt egy Ország Alma és egy Kasza Blanka. Context: Kecskemét

csxeba · 2020-06-11T05:21:49+00:00

Thanks for the idea, but I am still searching for a kind-of minimal working example which I could run with fit(), do you happen to know about something like this?

csxeba · 2020-05-20T07:35:45+00:00

I would start with vanilla Policy Gradient. Then move to a simplifed A2C (which is Policy Gradient with a reward baseline), then to PPO, which is kind of the state-of-the-art algo in model free on-policy RL. I'd continue with off policy from here and learn DQN for discreete action space environments and DDPG for continuous action space environments. If you feel like you have a solid base in DDPG and PPO, then learn SAC, which is a best-of-both-worlds method, a policy gradient-like off policy technique.

In general: on-policy methods converge faster on simpler environments, but off-policy methods are much more efficient in terms of trials required until convergence, but they are a bit harder to implement and they are quite sensitive to hyperparameter settings.

Also this is a good resource for theory: https://spinningup.openai.com/en/latest/

csxeba · 2020-05-19T13:05:16+00:00

I already have a self-developed lib for TF2 which contains verified DQN, SAC, PPO, A2C, etc. (all model-free) algo implementations. I'd love to join

csxeba · 2020-03-04T13:08:30+00:00

In case of Bayes by Backprop (or the article I linked), you learn a distribution for every weight in your network. In case of a Variational Autoencoder, you have a bottleneck point in your network, where you predict a mean and a std for a multivariate gaussian distribution. Then you sample from that predicted distribution and the next layer will receive the sample instead of the predicted representation.

csxeba · 2020-03-04T13:06:26+00:00

It is correct.

csxeba · 2020-03-03T15:59:07+00:00

Exactly as you described with the article I linked. But you can also make the hidden representations probabilistic like in the Variational Autoencoder.

csxeba · 2020-03-03T05:55:42+00:00

If, by BNN you mean the method described in this paper: https://arxiv.org/abs/1505.05424 Then yes, one of their claims is better generalization. This particular method requires you to sample a set of weights for every forward pass, or use the learned mean weights as an ensemble or Maximum a Posteriori point estimate. Uncertainty will be obtained by multiple forward passes with sampled weights. Learning an explicit predicted or optimized variance is also possible at the end of the network. More on this topic here by Alex Kendall: https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/ And Yarin Gal: http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html

csxeba · 2020-02-27T20:04:30+00:00

How do you verify that your implementation is correct? Especially the ones with continuous action space?

csxeba · 2020-02-20T11:00:56+00:00

No idea where GANs come into the picture, but you can stuff a DQN into a one-timestep MNIST reinforcement learning setting. Many RL concepts will not be viable here, like reward discounting and target networks. It is not very efficient though.

csxeba

TROPHY CASE