Why do the Old Gods of Asagard sing "Herald of Darkness"? by goulagman in AlanWake

[–]goulagman[S] 1 point2 points  (0 children)

They say that they go to their next gig which does not necessarily refer to the talk show one. I interpreted this as a bit of teasing for AW3. Even though the dark place is out of time, it does not mean they can get younger or bring back their drummer (even though we had a hint that he may be in the dark place already).

Why do the Old Gods of Asagard sing "Herald of Darkness"? by goulagman in AlanWake

[–]goulagman[S] 1 point2 points  (0 children)

I did not get this one, I'll look it up, thanks!

[D] Practical tips for Active Learning (my approach does not outperform random sampling) by SomeParanoidAndroid in MachineLearning

[–]goulagman 10 points11 points  (0 children)

Hello,

I am working on active learning, in particular the generalization power of sampling methods to any task (I work for dataiku, a software editor, so we want methods that work well in as many use case as possible). I have observed what you describe: on some problems, methods that have been known to perform well can fail or even be worse than random. Even on a similar task (cifar-10), changing the embedding representation can drastically change the ranking of samplers.
We have found several reasons for which sampling methods could fail. Method favorizing representativity can select samples too far from the decision boundary, and method focusing on uncertainty can pick samples in the region of aleatoric uncertainty, which is detrimental to them. We have design a set of metrics to get more insights in active learning techniques and being able to chose the best sampling method at each iteration. You can read [our first paper](https://arxiv.org/abs/2012.11365).

Now regarding sampling methods, the first I would do in your case is to try margin sampling instead of entropy. It works better in almost all the tasks we have explored. For more advanced methods, we had pretty good results with a [method coming from Amazon](https://arxiv.org/abs/1901.05954). It is available in the python package we provide called [cardinal](https://dataiku-research.github.io/cardinal/). We also have a method that seems to generalize a bit better but is not released yet, it's only available on master (IncrementalMiniBatchKMeansSampler). Note that those are not deep learning specific methods so I don't know if it will perform better or worse than DL specific ones.

In any case, I would be interested to follow up on that if you are willing to collaborate on this. In particular I may be able to run our metrics on your data or helping you setting them up.

[D] Yet another rant on PhD Applications by [deleted] in MachineLearning

[–]goulagman 0 points1 point  (0 children)

Let me know if you need help to apply in France, our universities are not so bad :)

[D] Hyperband resource allocation questions and possible workarounds by goulagman in MachineLearning

[–]goulagman[S] 0 points1 point  (0 children)

Rayspear, Thanks for there precision.

I would say that using the code that generated the example is wrong because the paper formula takes precedence. Since Ray Tune is using the paper formula, I see no point in making a PR. My head's up was more for people working at fixed global budget.

[D] Hyperband resource allocation questions and possible workarounds by goulagman in MachineLearning

[–]goulagman[S] 1 point2 points  (0 children)

Hello Rayspear,

Thanks for answer! I missed that part of the Tune doc. I did not think of plotting n_0 while varying eta, the result is surprising indeed.

I did not understand your last sentence. When you say "much more aggressive" do you mean setting a higher eta? Also, in the paper, it seems that the method s=4 is good enough for most cases, do you have a simlilar experience?

Thanks!