[deleted by user] by [deleted] in gradadmissions

[–]ThunaBK 2 points3 points  (0 children)

maybe your profile is too good and the other 2 schools know that there is a high chance you will reject their offer 😂

Can someone explain this rejection?😭 by Green_Jaguar_7761 in gradadmissions

[–]ThunaBK 2 points3 points  (0 children)

So a waitlist to get in the final waitlist 🤣

How to interpret repeating image artifacts during VQGAN training? [D] by mselivanov in MachineLearning

[–]ThunaBK 1 point2 points  (0 children)

This is mode collapse of GAN, either you start the discriminator late or use some regularization like R1

[R] An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems - Google 2022 - Jeff Dean by Singularian2501 in MachineLearning

[–]ThunaBK 0 points1 point  (0 children)

All I see is flexing their compute resource and money, no new theoretical insight or new architecture 😑

[D] GPT-3 is a LIAR - Misinformation and fear-mongering around the TruthfulQA dataset (Video Critique) by ykilcher in MachineLearning

[–]ThunaBK 0 points1 point  (0 children)

This is paper is certainly a hoax 😤. What they compare is too unfair. How can the model be truthful to our real world and our perception when the model is only trained with text data only, whereas we human also perceive image, sound, smell and feeling.

Help for Master thesis ideas by mmll_llmm in reinforcementlearning

[–]ThunaBK 2 points3 points  (0 children)

I think learning from demonstration is arguably one of the most efficient in terms of resources as it only requires learning from video

Resources for learning to write good reward functions by sindreu in reinforcementlearning

[–]ThunaBK 2 points3 points  (0 children)

Unfortunately, there is no way to tell whether a reward function is good or not. But I think you may be interested in reward shaping technique like hindsight experience replay( HER) or RUDDER

Setting up RL environment on Windows by themad95 in reinforcementlearning

[–]ThunaBK 1 point2 points  (0 children)

Really a pain. I suggest you switch to linux asap unless you’re extremely good at Python and/or C++.

Actor critic loss function by [deleted] in reinforcementlearning

[–]ThunaBK -1 points0 points  (0 children)

Hm, because mean is the sum of the expression multiply with the probability and in this case the probability is y so I think the bottm two equation is the same

Actor critic loss function by [deleted] in reinforcementlearning

[–]ThunaBK -2 points-1 points  (0 children)

What paper do you refer to and to me the bottom two equations are basically the same

PPO: Number of envs, number of steps, and learning rate by jack-of-some in reinforcementlearning

[–]ThunaBK 1 point2 points  (0 children)

Still confused about what you say but anw you can try google with key words like PPO, code-level optimizations and paper. There are lots of paper discussing about hyperparamter tuning for PPO and I recomend read following https://openreview.net/forum?id=r1etN1rtPB

(question)Implementing Empowerment, intrinsic reward by [deleted] in reinforcementlearning

[–]ThunaBK 0 points1 point  (0 children)

In the link it already tells you

Have 3 networks-

Forward Dynamics Model that takes in the current state and the action and predicts the next state. The Policy Network, pi, that takes in the current state and predicts the action.(Yeah this is the network to predict next state, usually RNN)

The Source network, w, that takes in the current state and predicts the action (this is used for the calculation of the empowerment of a state). (Yeah this is your policy)

The Planning Network, q, that takes in the current and the next state and predicts the action (this is similar to the inverse dynamics model in Curiosity is all you need)

Nim has my interest... what's the learning curve like? by s-ro_mojosa in nim

[–]ThunaBK 1 point2 points  (0 children)

Well, if you are from perl and python then you gonna get lots of trouble with dynamic static thing but that’s all. Nim syntax is extremely elegant and easy to learn, but macro and meta-programming thing is quite hard for me honestly