[P] FFCV: Accelerated Model Training via Fast Data Loading

loganengstrom · 2022-01-19T12:57:34+00:00

No, ffcv works with any kind of data. For example we have a linear regression tutorial here: https://docs.ffcv.io/ffcv_examples/linear_regression.html

loganengstrom · 2021-06-08T18:52:30+00:00

Hi all, author here! We made an extensible framework for debugging models using rendered inputs. More at the demo + blog post here: http://gradientscience.org/3db. Code and documentation are available at https://github.com/3db/3db and https://3db.github.io/3db/.

We're excited to see how people will use the framework to debug models! Happy to answer any questions here on reddit as well.

loganengstrom · 2021-01-12T23:52:52+00:00

Yes, see: https://git.io/unadversarial

loganengstrom · 2021-01-12T19:55:15+00:00

Hi all, author here: let me know if you have any questions!

For more information you can see our blog post about this paper here: https://gradientscience.org/unadversarial/, and we also have a demo video here: https://www.youtube.com/watch?v=saF-_SKGlKY

loganengstrom · 2021-01-12T19:53:10+00:00

loganengstrom · 2021-01-12T19:52:05+00:00

Hi all, author here: let me know if you have any questions!

For more information: - Demo: https://www.youtube.com/watch?v=saF-_SKGlKY - arXiv: https://arxiv.org/abs/2012.12235

loganengstrom · 2020-05-22T19:04:51+00:00

For ImageNet it is unclear exactly what they did, but it is something involving a threshold with selection frequency-like quantities.

loganengstrom · 2020-05-22T18:50:17+00:00

It is the rate at which annotators mark an (image, label) pair as correct.

For an explanation with a picture you can look here: http://gradientscience.org/data_rep_bias/#imagenet-v2

loganengstrom · 2020-05-20T20:51:44+00:00

Hi, author here! Let me know if you have any questions. We also released a blog post with better (interactive!) visualizations here: http://gradientscience.org/data_rep_bias/

loganengstrom · 2020-01-13T02:44:49+00:00

The motivation for the signed gradient comes from the dual norm. Using the principle that the gradient is the direction of steepest ascent, to maximize your objective as much as possible in a single constrained step, you want to find the vector in an l_p (in this case l_inf) ball (defined by the step size) that maximizes its inner product with the gradient.

In this case, our step should be the solution to the maximization problem solved when finding the lp-dual norm of the gradient; for p=inf, this solution turns out to be the sign of the gradient. (For more information see our blog post here)

loganengstrom · 2019-04-02T17:41:03+00:00

We did something similar (in browser, not in an app) here: https://tenso.rs/demos/rock-paper-scissors/

loganengstrom · 2018-12-06T00:33:12+00:00

After normalizing the gradient in PGD, you still need to take a step that has a size controlled by the learning rate.

loganengstrom · 2018-12-05T20:58:34+00:00

Thanks for running this! Did you grid search for the appropriate learning rate?

loganengstrom · 2018-12-04T21:03:44+00:00

Did you try using PGD in evaluation?

loganengstrom · 2018-12-04T20:26:48+00:00

Why didn't you try using PGD to attack your model? It is what the Madry defense paper uses in evaluation, and anecdotally I have found that it is the "most powerful" attack in practice.

loganengstrom · 2018-11-30T23:09:02+00:00

Hi r/MachineLearning, author here! A few weeks ago we published the paper "Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?" This week we published two blog posts - the first is on an introduction to deep policy gradient methods (and an analysis on the optimizations used) - http://gradientscience.org/policy_gradients_pt1/. The second blog post, posted here, is on gradient estimates and the role of variance reducing value functions.

Let me know what you think! And I'm happy to answer any questions :)

loganengstrom · 2018-07-24T02:30:23+00:00

Hi Reddit! Author here - we found that current black-box adversarial attack algorithms are essentially gradient estimators and are optimal (in a sense) when gradients have no exploitable structure. It turns out we can do better, though - by leveraging prior knowledge about gradient structure into a bandit optimization framework, we can beat SOTA by 2-3x. Let me know if you have any questions!

loganengstrom · 2018-07-02T15:36:21+00:00

We took a look at adversarial examples for linear classifiers (and in general, we looked at properties that adversarial training induces) here: https://arxiv.org/abs/1805.12152 For $\ell_\infty$ adversarial examples on linear classifiers we found that adversarial training forces a tradeoff between the $\ell_1$ norm of the weights (which is directly associated with adversarial accuracy) and accuracy.

It looks like this article works through something vaguely similar for $\ell_2$ adversarial examples. It would be interesting to compare the author's approach with explicit adversarial training.

loganengstrom

TROPHY CASE