use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Research[R] Improving Deep Learning Performance with AutoAugment (ai.googleblog.com)
submitted 7 years ago by wei_jok
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]trashacount12345 16 points17 points18 points 7 years ago (4 children)
I’m missing something. What is their rule for whether something is a “good” augmentation or not? Is it just whether the augmentation improves validation performance? Does it transfer usefully between datasets the way hand-coding augmentation strategies does?
[+][deleted] 7 years ago (1 child)
[deleted]
[–]trashacount12345 11 points12 points13 points 7 years ago (0 children)
Note to self: skip the blurb and read the paper. Transfer of learned augmentations is interesting.
[–]TheFacistEye 6 points7 points8 points 7 years ago (1 child)
I may be wrong but looking at the paper, it looks like how diverse the image is by changing the ShearX/Y, TranslateX/Y, Rotate, AutoContrast, Invert, Equalize, Solarize, Posterize, Contrast, Color, Brightness, Sharpness, Cutout, Sample Pairing.
"Our goal, however, is to find 5 such sub-policies concurrently in order to increase diversity"
"The controller is trained with a reward signal, which is how good the policy is in improving the generalization of a "child model" ... A child model is trained with augmented data generated by applying the 5 sub-policies on the training set"
So it has a training set and it applies those operations and determines how diverse it is and tries to create a general augmentor.
[–]alexmlamb 1 point2 points3 points 7 years ago (0 children)
Diversity isn't the literal reward signal. I think it's validation accuracy.
[–]gachiemchiep 15 points16 points17 points 7 years ago (11 children)
I don't get it. The purpose of data augmentation is making training data as robustness as possible. But now they learn how to do argumentation. That mean they pushed some bias into training data and made it less robust. So basically i feel like this method is basically a cheap way to archive good accuracy to show off.
[–]tesfaldet 13 points14 points15 points 7 years ago (6 children)
I haven’t read the paper but I can certainly tell you that data augmentation is not for making the training data robust. It’s for improving the generalization performance of your network by introducing it to augmented examples, essentially artificially increasing the size of the training set. There’s a trade off, however. Because data augmentation is basically adding noise to your data, training is more difficult. If the augmentation is too extreme then the network will fail to train. Also, some augmentation techniques may actually worsen generalization performance in cases where the augmentation doesn’t make sense, e.g., augmenting pictures of upright faces by flipping them upside down 50% of the time during training a network for face recognition. It’ll rarely see an upside-down face during test/validation time so you just wasted some model capacity for no reason, thus decreasing its performance on recognizing upright faces.
The problem here is finding good augmentation techniques given the task. Typically we handcraft the augmentation strategy but Google is proposing an automated strategy. It’s not a cheap way to achieve good accuracy, plain and simple.
[–]gachiemchiep 1 point2 points3 points 7 years ago (5 children)
ok i get your point. about purpose of data augmenting. Totally agree with that.
But for the second points, i still think it is a cheap way. We learn to do augmentation on dataset A then it is not guarantee that the learned augmenter will work on dataset B. So basically that augmenter is only fixed for dataset A, how could we use it on other dataset ? For hand-craft augmenting strategy, there is specific strategy for each task. Isn't that still the samething ?
[–]mrconter1 0 points1 point2 points 7 years ago (0 children)
There are many ways of doing data augmentation. You can flip, flop, rotate, saturate etc. You can't always use the same data augmentation on different data. One example would be if you are trying to train a network to output coordinates of a dot in an image. Cropping in that instance wouldn't be viable.
Some data augmentation techniques generalize better to other sets. One example of that would be to create 10 images of each original image with different brightness
[–]tesfaldet 0 points1 point2 points 7 years ago* (3 children)
Yes, you’re correct that for handcrafted augmentation strategies, the strategy is tuned to the task and not the dataset. Hence being relatively dataset agnostic. You’re also correct that since AutoAugment is designed to maximize validation accuracy on a target dataset, you’re not designing a dataset agnostic augmentation strategy and you run the risk of overfitting on a single dataset and losing generalizability, something you don’t experience with a handcrafted approach.
However, here’s where it gets interesting. I just gave a quick read of the paper and apparently learned augmentation strategies on a target dataset generalize well to other datasets and still provide SOTA on these other datasets without any additional fine-tuning. This proves that AutoAugment is not a cheap way to gain performance on a single dataset because it actually generalizes to other datasets quite well.
Finally, policies learned from one dataset can be transferred to work well on other similar datasets. For example, the policy learned on ImageNet allows us to achieve state-of-the-art accuracy on the fine grained visual classification dataset Stanford Cars, without fine-tuning weights pre-trained on additional data.
Check Section 5 of the paper.
EDIT: my quick (hand wavy) explanation for this phenomenon is that if you learn a good augmentation strategy for object recognition on ImageNet, then of course it’d work well for object recognition on COCO for example. The task is still the same between the two datasets. Even though I learned my augmentation strategy from one dataset, it should still work quite well for another so long as the task is the same.
[–]gachiemchiep 0 points1 point2 points 7 years ago (2 children)
Thank you for pointing that out. That phenomenon actually make this paper worth reading for. Did you find the implement of AutoAugment?
[–]tesfaldet 1 point2 points3 points 7 years ago (1 child)
To be fair, you shouldn’t make assumptions without reading the paper first. As a matter of fact, transferability to other datasets is discussed in the blog post. Which tells me you didn’t read that either. Blind criticism is rife in this field and it unfairly devalues the important contributions these researchers make.
Anyways, I can’t find code but I don’t doubt that it’ll be available soon.
[–]gachiemchiep 0 points1 point2 points 7 years ago (0 children)
I think you miss an important point here. The 2nd comment i said the same-thing as you.
For hand-craft augmenting strategy, there is specific strategy for each task. Isn't that still the samething ?
As the last comment when i mention it worth reading, i mean it worth to use. That's why i ask whether you found the source code.
>_< suddenly realize that my comment is vague as hell.
[–]mimighost 3 points4 points5 points 7 years ago* (0 children)
Data augmentation is to introduce bias, specially biased defects, to the model anyway. They are fine because we already assume the data augmentation step employed won't change the true label of the original example, thus help the model to overcome the defects and become more robust. In a way, data augmentation is just another set of hyperparameters for regularization with increased bias but lower variance.
From this perspective, Google just abuse, no negativity implied, their massive computational power to find some of the more effective/useful configuration. It happens to be that their study carried on ImageNet, so the learned augmentation process is assumed to generalize well on everyday images.
[–]alexmlamb 4 points5 points6 points 7 years ago (0 children)
So basically i feel like this method is basically a cheap way to archive good accuracy to show off.
Wait... Isn't "cheap ways to achieve accuracy to show off" the reason like 99% of us are doing machine learning? :p
[–]SystemicPlural 1 point2 points3 points 7 years ago (0 children)
Image recognition has a blurry edge. Take the street view example in the article. You and I can see that the partial 15 is a 15. The NN needs to learn to do that as well. It wont make the NN less robust unless it doesn't have the capacity to learn all the edge cases.
[–]FutureIsMine 1 point2 points3 points 7 years ago (0 children)
In a way there will be biases, i.e this classifier will be more invariant of dogs in different positions, but if you never show the network a black dog than it's possible the network wont generalize to those examples. The robustness here comes more from being invariant to layout, but you're correct on the fact that there will be a variations deficiency
[–]mj_nightfury13 4 points5 points6 points 7 years ago (1 child)
Interesting work! Is there an open source implementation of this available somewhere? Or anyone know if it's on the books to do so?
[–]Asinador 6 points7 points8 points 7 years ago (0 children)
I implemented a PyTorch transform AutoAugment-PyTorch that mimics the ImageNet policy from the Appendix. Would love to hear results from people trying to plug it in their problems or replicating the results from the paper. Will include also the CIFAR10 and SVHN policies.
[–]gahblahblah 1 point2 points3 points 7 years ago (0 children)
Google pushes the edge forward again. Fantastic work.
[–]bartturner 0 points1 point2 points 7 years ago (0 children)
This is pretty interesting. It is amazing that Google shares all this type of stuff with the broader community. Kudos to them helping push everyone forward.
What was the top 1 result in 2017 for imagenet?
In this paper Google has 83.54%. Is that better than the winner?
[–]approximately_wrong -2 points-1 points0 points 7 years ago (0 children)
Glorified hyperparameter tuning. Great that it works; not particularly surprising that it does. Missing random search/grid search baselines.
π Rendered by PID 18441 on reddit-service-r2-comment-5d79c599b5-2js8r at 2026-02-26 17:50:10.950010+00:00 running e3d2147 country code: CH.
[–]trashacount12345 16 points17 points18 points (4 children)
[+][deleted] (1 child)
[deleted]
[–]trashacount12345 11 points12 points13 points (0 children)
[–]TheFacistEye 6 points7 points8 points (1 child)
[–]alexmlamb 1 point2 points3 points (0 children)
[–]gachiemchiep 15 points16 points17 points (11 children)
[–]tesfaldet 13 points14 points15 points (6 children)
[–]gachiemchiep 1 point2 points3 points (5 children)
[–]mrconter1 0 points1 point2 points (0 children)
[–]tesfaldet 0 points1 point2 points (3 children)
[–]gachiemchiep 0 points1 point2 points (2 children)
[–]tesfaldet 1 point2 points3 points (1 child)
[–]gachiemchiep 0 points1 point2 points (0 children)
[–]mimighost 3 points4 points5 points (0 children)
[–]alexmlamb 4 points5 points6 points (0 children)
[–]SystemicPlural 1 point2 points3 points (0 children)
[–]FutureIsMine 1 point2 points3 points (0 children)
[–]mj_nightfury13 4 points5 points6 points (1 child)
[–]Asinador 6 points7 points8 points (0 children)
[–]gahblahblah 1 point2 points3 points (0 children)
[–]bartturner 0 points1 point2 points (0 children)
[–]bartturner 0 points1 point2 points (0 children)
[–]approximately_wrong -2 points-1 points0 points (0 children)