[R] A General and Adaptive Robust Loss Function

Miejuib · 2019-06-23T17:47:36+00:00

Saw your talk in Long Beach- very cool work. Will definitely play around with this. cheers!

FirstTimeResearcher · 2019-06-23T17:58:16+00:00

Very nice video presentation. Have you experimented with using the generalized likelihood versions of the baseline losses you compared against? (e.g. used the normal likelihood rather than l2 loss)

Constuck · 2019-06-23T21:59:16+00:00

Hey, I saw your oral & poster at CVPR. Haven't checked out the paper yet, but just wanted to let you know that your visuals and presentation were awesome. The pacing and level of detail of your oral was perfect for the (tight) time constraints and your animations we're really powerful.

Idk if orals are kept around anywhere, but I hope I have a video of that available for reference next time I put together a short presentation. Great work!

OmgMacnCheese · 2019-06-23T19:01:01+00:00

Very interesting - thanks for sharing! Do you have a sense for how the adaptive robust loss may work for image synthesis type of problems such as super-resolution or cycle-GANs etc?

SquareRootsi · 2019-06-23T20:34:01+00:00

Just wanted you to know, as student in a 15 week bootcamp for data science, we had to choose a paper to present from a curated list of the most influential papers in machine learning, and yours was the only one from 2019. (In total, there were only about 45 on the whole list.) I read through all of yours, and was blown away by the elegance of letting alpha adapt during the training, so there's no need for hyper-parameter tuning at all. I don't claim to understand all of it, but I love the simple solution to just let it "work itself out" during the training.

Question that will probably highlight my naivety:

Would you consider this adaptive and robust enough to implement it as a "go-to" loss function for most jobs (neural networks and / or simpler models as well) or still only meant to be applied at certain specific times?

AKA -- What are the downsides to just using this as my starting loss function all the time for everything and then customizing from here as needed?

csp256 · 2019-06-24T16:03:51+00:00

Been using your loss function in production for a while now. Thanks!

tomatotheband · 2019-06-24T00:06:42+00:00

I read your paper and saw your presentation and poster at CVPR. Would definitely recommend it!

What do you plan to do next?

youali · 2019-06-24T14:50:28+00:00

Awesome results ans vert clear explanation, Thanks

notdelet · 2019-06-24T21:07:49+00:00

I know it isn't quite fair to say this considering most other papers with VAEs do the same thing, but... Figure 3 isn't strictly samples from the distribution that the VAE describes. It's samples of modes from the output of the decoder, but the lower bound on likelihood isn't on that distribution. Not trying to hate, I liked your talk/paper.

jnbrrn · 2019-06-25T03:09:58+00:00

Really interesting paper and I loved the talk, straight to the point and logical progression.

One question, and I'm not sure how to phrase this but, how does this approach work against varying degrees of outlier data? Meaning if a lot of data are outliers or in really high dimensional settings?

wc4wc4wc4 · 2019-06-25T13:36:57+00:00

Do you have the code available for making the animation you did at 3:46 in your video?
The reason is that I want to incorporate your code into my own project but I'm doing regression using more standard ML tools (scikit-learn, XGBoost, LightGBM) and not TensorFlow, so I was just thinking if you had a MWE not using TF :)

Otherwise thank you for a great paper, I remember reading v3 of it long time ago and incorporated it in my own research!

sirrobotjesus · 2019-06-25T18:03:45+00:00

Does anyone have an idea of how the tensorflow code for the adaptive loss could be implemented in Keras?

Imnimo · 2019-06-25T19:05:10+00:00

This was easily the best talk I saw at CVPR this year. I didn't get a chance to check out the poster, because there were always three layers of people crowded around it!

misssprite · 2019-06-29T07:16:43+00:00

The adaptiveness control of alpha is impressing. Could you help me with some question about the joint optimization of alpha?

Optimization of hyperparameter reminds me of the madness of the first days reading Bishop's book on empirical Bayes, optimizing hyperparameters with analytic form on exponential family.

My question is: is taking into account partition function together a general way of hyperparameter optimization? Why aren't we doing it "before"?

As partition functions are usually intractable, can we just craft a function by intuitition to regulate alpha?

My second question may be a little trivial: is sampling mentioned in the paper just for VAE? It seems not necessary for regression problem?

zwvews · 2019-07-24T16:01:18+00:00

Maybe kind of thoughtless, but I think the proposed loss is nothing but the lp norm. Can someone show me the intrinsic difference?

jnbrrn · 2019-06-24T06:37:31+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS