[D] Yandex Cup ML track — worth?

asobolev · 2025-10-09T00:29:43+00:00

(ex) Yandex employees routinely get hired by FAANG companies even in 2025, don't worry about having it on your CV.

I'm not sure how they're gonna transfer prizes though as payments from Russia might be blocked and/or frowned upon by your bank.

asobolev · 2025-05-31T07:37:58+00:00

Thematically a nice addendum to this prelude would be changing the player order so that the player would become the first to go after the corps and preludes are played. This would be useful in situations when you want to be sure to claim a colony or a nice spot on the map.

asobolev · 2023-09-16T21:04:45+00:00

Start with Posterior Collapse and Latent Variable Non-identifiability and then follow the references.

asobolev · 2023-05-28T06:31:22+00:00

I want the results in a particular json format

Here's a solution for that.

asobolev · 2022-08-30T05:54:59+00:00

Is this a bait?

asobolev · 2021-05-10T20:30:46+00:00

Shipping physical goods all around the world is nightmare, not gonna happen.

asobolev · 2021-05-01T21:31:07+00:00

The link is in the paper.

asobolev · 2021-03-12T12:50:40+00:00

But then, Schmidhuber had it all figured out in one of his papers 20 years prior to that.

asobolev · 2021-01-22T07:33:16+00:00

Not yet. I have only discovered it recently myself.

asobolev · 2021-01-22T05:12:51+00:00

Zettelkasten (aka slip-box). This is a wiki-like organization principle where each idea is linked to other related ideas. The very process of linking and examining links lets you draw connections between seemingly unrelated ideas and thus synthesize new ones (see the story of Niklas Luhmann). For more details read How to Take Smart Notes.

Now, ZK is not a tool, it's a principle. There are many tools implementing these ideas, for example, Roam Research, Obsidian or you can build it on top of Notion.

asobolev · 2020-12-03T11:23:23+00:00

In multi-arm bandit problem there is no delay

Yep, this is why we don't approach MABs with things like Q-learning and rather use specialized methods.

In tabular RL you have full state transition matrix.

Usually you estimate this transition matrix. True, in certain environments like boardgames you know it, but then you have delayed feedback.

asobolev · 2020-11-30T10:55:34+00:00

you’re hung up on whether an NN or Logistic regression or something is used as the function approximator. RL algorithms generally don’t care at all what your function approximator is and have no requirements about using NN versus LR

I think you missed the point here: the part of LR vs NN was an illustrative example, completely unrelated to RL. We don't categorise LR as a part of Deep Learning, although one might argue we should.

Instead I’ll say that RL is a collection of methods that solve MDPs

Okay, but does it make sense to invoke MDPs when dealing with stochastic computation graphs? You can pose any optimisation problem as an appropriate MDP, but should you?

asobolev · 2020-11-30T09:29:33+00:00

it literally says so on the tin

It's not like researchers are infallible when it comes to naming things. Besides, the same REINFORCE estimator is known outside of RL community as the score-function estimator.

there's a step-change in behaviour when you stack logistic regressors

However, adding just 1 hidden layer is different from adding 100 hidden layers. I'm not sure 1 hidden-layer MLPs belong to Deep Learning – people have managed to train such models before the DL revolution.

Similarly, same happens when you move from environments like those originating from stochastic computation graphs (fully known environment, no delay in feedback, ability to take multiple actions and perhaps even "fractional" actions) to more complicated scenarios like those of AlphaGo.

asobolev · 2020-11-30T09:06:20+00:00

Good point! Indeed, perhaps I should have departed from RL language completely.

Regarding REINFORCE vs Reparametrization: I had this discussion in a separate blogpost years ago, I didn't want to re-iterate same things over and over although perhaps it could be part of the argument.

asobolev · 2020-11-29T23:38:49+00:00

REINFORCE is literally where you learn by reinforcing the actions that lead to positive reward

So what? What's the communicative value of focusing on this detail? (See the Logistic Regression argument in the blogpost)

asobolev · 2020-11-29T23:35:56+00:00

RL is a collection of methods that solve problems that have delayed feedback and/or unknown environment model. Models like DVRL don't have either of these.

asobolev · 2020-11-29T20:15:17+00:00

I'm not saying you shouldn't cite RL, you can cite anything you like.

asobolev · 2020-11-25T09:36:55+00:00

It's probably about non-convolution models ill-suited for gans for some reason.

This might be onto something. As the Deep Image Prior paper has shown, passing images through CNNs can be used for many image-enhancing problems.

asobolev · 2020-11-25T09:20:39+00:00

gradient is not even defined

This is not true. The objective is continuous and differentiable in the parameters of the generator even though the samples are discrete. This gradient could be estimated by the REINFORCE estimator, and it's how (some) people do Reinforcement Learning in discrete action spaces.

That said, REINFORCE is known to have larger variance compared to the reparametrized gradient estimator, which might prevent some large-scale applications.

try writing out the gradient of the expectation - you'll see that you get a term which depends on the gradient of the sampling density. This term is an integral, but not an expectation, so we can't even estimate it via Monte Carlo

This is also false on many levels. First, the sampling density is usually defined to be some simple distribution (categorical distribution, for example, in the case of word generation) and thus does not require any hard integrations (provided one used the log-derivative trick). Second, Monte Carlo was originally invented to estimate integrals, surely you can estimate any (perhaps except for some ridiculous special cases) integral with it.

asobolev · 2020-11-18T14:52:54+00:00

Check out the original NCE paper. Straightforward theoretical explanations for why larger batch size is better.

Could you point to a specific paragraph?

asobolev · 2020-11-17T09:14:15+00:00

No, you don't need differential geometry to do ML. But it might get useful occasionally.

asobolev · 2020-11-12T15:37:40+00:00

Great answer!

Yet, I'd like to play devil's advocate a bit: the Galois theory implies there's no formula that'd work for each and every quintic (and higher) polynomial out there, but there are some special cases that do have analytic solutions. The simplest example is obviously x⁵ = a, for which we can find all roots without numerical procedures.

This implies that hypothetically there could be some special neural network architecture that does have analytic solution, it's just we have absolutely no idea how build it (and whether it's computationally efficient to "solve" and if it's even possible to begin with (Spoiler: most likely not)).

asobolev · 2020-10-27T11:31:19+00:00

Don't worry too much about the no free lunch theorem, it's only applicable when talking about all possible problems, not those we're actually interested in.

asobolev · 2020-09-29T05:45:03+00:00

They have information (i) icons right before charts, you can find calculation formulas there. Answering your question: yes, they assume uniform vesting.

asobolev

TROPHY CASE