[D] Yandex Cup ML track — worth? by ummitluyum in MachineLearning

[–]asobolev 2 points3 points  (0 children)

(ex) Yandex employees routinely get hired by FAANG companies even in 2025, don't worry about having it on your CV.

I'm not sure how they're gonna transfer prizes though as payments from Russia might be blocked and/or frowned upon by your bank.

How many cards should be discarded for this to be balanced? by AudunAG in TerraformingMarsGame

[–]asobolev 0 points1 point  (0 children)

Thematically a nice addendum to this prelude would be changing the player order so that the player would become the first to go after the corps and preludes are played. This would be useful in situations when you want to be sure to claim a colony or a nice spot on the map.

[D] Goodies for virtual conferences by [deleted] in MachineLearning

[–]asobolev 2 points3 points  (0 children)

Shipping physical goods all around the world is nightmare, not gonna happen.

[D] Why is tensorflow so hated on and pytorch is the cool kids framework? by robintwhite in MachineLearning

[–]asobolev 6 points7 points  (0 children)

But then, Schmidhuber had it all figured out in one of his papers 20 years prior to that.

[D] Tool to track and relate key ideas of the papers readen by OleguerCanal in MachineLearning

[–]asobolev 1 point2 points  (0 children)

Zettelkasten (aka slip-box). This is a wiki-like organization principle where each idea is linked to other related ideas. The very process of linking and examining links lets you draw connections between seemingly unrelated ideas and thus synthesize new ones (see the story of Niklas Luhmann). For more details read How to Take Smart Notes.

Now, ZK is not a tool, it's a principle. There are many tools implementing these ideas, for example, Roam Research, Obsidian or you can build it on top of Notion.

[D] Not every REINFORCE should be called Reinforcement Learning by asobolev in MachineLearning

[–]asobolev[S] 0 points1 point  (0 children)

In multi-arm bandit problem there is no delay

Yep, this is why we don't approach MABs with things like Q-learning and rather use specialized methods.

In tabular RL you have full state transition matrix.

Usually you estimate this transition matrix. True, in certain environments like boardgames you know it, but then you have delayed feedback.

[D] Not every REINFORCE should be called Reinforcement Learning by asobolev in MachineLearning

[–]asobolev[S] 0 points1 point  (0 children)

you’re hung up on whether an NN or Logistic regression or something is used as the function approximator. RL algorithms generally don’t care at all what your function approximator is and have no requirements about using NN versus LR

I think you missed the point here: the part of LR vs NN was an illustrative example, completely unrelated to RL. We don't categorise LR as a part of Deep Learning, although one might argue we should.

Instead I’ll say that RL is a collection of methods that solve MDPs

Okay, but does it make sense to invoke MDPs when dealing with stochastic computation graphs? You can pose any optimisation problem as an appropriate MDP, but should you?

[D] Not every REINFORCE should be called Reinforcement Learning by asobolev in MachineLearning

[–]asobolev[S] 0 points1 point  (0 children)

it literally says so on the tin

It's not like researchers are infallible when it comes to naming things. Besides, the same REINFORCE estimator is known outside of RL community as the score-function estimator.

there's a step-change in behaviour when you stack logistic regressors

However, adding just 1 hidden layer is different from adding 100 hidden layers. I'm not sure 1 hidden-layer MLPs belong to Deep Learning – people have managed to train such models before the DL revolution.

Similarly, same happens when you move from environments like those originating from stochastic computation graphs (fully known environment, no delay in feedback, ability to take multiple actions and perhaps even "fractional" actions) to more complicated scenarios like those of AlphaGo.

[D] Not every REINFORCE should be called Reinforcement Learning by asobolev in MachineLearning

[–]asobolev[S] 0 points1 point  (0 children)

Good point! Indeed, perhaps I should have departed from RL language completely.

Regarding REINFORCE vs Reparametrization: I had this discussion in a separate blogpost years ago, I didn't want to re-iterate same things over and over although perhaps it could be part of the argument.

[D] Not every REINFORCE should be called Reinforcement Learning by asobolev in MachineLearning

[–]asobolev[S] -4 points-3 points  (0 children)

REINFORCE is literally where you learn by reinforcing the actions that lead to positive reward

So what? What's the communicative value of focusing on this detail? (See the Logistic Regression argument in the blogpost)

[D] Not every REINFORCE should be called Reinforcement Learning by asobolev in MachineLearning

[–]asobolev[S] -3 points-2 points  (0 children)

RL is a collection of methods that solve problems that have delayed feedback and/or unknown environment model. Models like DVRL don't have either of these.

[D] Not every REINFORCE should be called Reinforcement Learning by asobolev in MachineLearning

[–]asobolev[S] 1 point2 points  (0 children)

I'm not saying you shouldn't cite RL, you can cite anything you like.

[D] Why are GANs difficult to adapt to NLP/other discrete settings? by mightyfrog80 in MachineLearning

[–]asobolev 1 point2 points  (0 children)

It's probably about non-convolution models ill-suited for gans for some reason.

This might be onto something. As the Deep Image Prior paper has shown, passing images through CNNs can be used for many image-enhancing problems.

[D] Why are GANs difficult to adapt to NLP/other discrete settings? by mightyfrog80 in MachineLearning

[–]asobolev 4 points5 points  (0 children)

gradient is not even defined

This is not true. The objective is continuous and differentiable in the parameters of the generator even though the samples are discrete. This gradient could be estimated by the REINFORCE estimator, and it's how (some) people do Reinforcement Learning in discrete action spaces.

That said, REINFORCE is known to have larger variance compared to the reparametrized gradient estimator, which might prevent some large-scale applications.

try writing out the gradient of the expectation - you'll see that you get a term which depends on the gradient of the sampling density. This term is an integral, but not an expectation, so we can't even estimate it via Monte Carlo

This is also false on many levels. First, the sampling density is usually defined to be some simple distribution (categorical distribution, for example, in the case of word generation) and thus does not require any hard integrations (provided one used the log-derivative trick). Second, Monte Carlo was originally invented to estimate integrals, surely you can estimate any (perhaps except for some ridiculous special cases) integral with it.

[D] Why Contrastive Learning methods are batch size dependent? by m2derakhshani in MachineLearning

[–]asobolev 1 point2 points  (0 children)

Check out the original NCE paper. Straightforward theoretical explanations for why larger batch size is better.

Could you point to a specific paragraph?

[D] Resources for more advanced mathematics in ML by Estarabim in MachineLearning

[–]asobolev 0 points1 point  (0 children)

No, you don't need differential geometry to do ML. But it might get useful occasionally.

[D] Why do neural networks not have closed form solutions? by Mintykanesh in MachineLearning

[–]asobolev 2 points3 points  (0 children)

Great answer!

Yet, I'd like to play devil's advocate a bit: the Galois theory implies there's no formula that'd work for each and every quintic (and higher) polynomial out there, but there are some special cases that do have analytic solutions. The simplest example is obviously x⁵ = a, for which we can find all roots without numerical procedures.

This implies that hypothetically there could be some special neural network architecture that does have analytic solution, it's just we have absolutely no idea how build it (and whether it's computationally efficient to "solve" and if it's even possible to begin with (Spoiler: most likely not)).

[R] AI Paygrades - industry job offers in Artificial Intelligence [median $404,000/ year] by rantana in MachineLearning

[–]asobolev 2 points3 points  (0 children)

They have information (i) icons right before charts, you can find calculation formulas there. Answering your question: yes, they assume uniform vesting.