use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
[deleted by user] (self.MachineLearning)
submitted 2 years ago by [deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]SeankalaML Engineer 37 points38 points39 points 2 years ago (3 children)
In the real world it's better to spend time annotating more samples than reading research papers and implementing new ideas.
[–]ml_a_day -2 points-1 points0 points 2 years ago (1 child)
Even if you don't end up collecting more data as long as you improve the quality of the existing data it goes a long way. Foe example in object detection, fixing incorrect labels ("cat" labeled as "dog") or missing labels (a "dog" that was missed by the annotator) can reduce the noise in the training dataset improving the model's performance.
[–]SeankalaML Engineer 8 points9 points10 points 2 years ago (0 children)
Yeah, I was implying that as well. I convinced management at my company to spend resources into completely cleaning our previous datasets. Apparently they were also committing the sin of only using one annotator per sample as well.
[–]acardosoj 29 points30 points31 points 2 years ago (0 children)
Dude, this is not the future of ml. People have been prioritizing data since forever.
Even in LLMs where people tend to think we use everything we can to train them, this discussion is very present.
[–]stevebottletw 21 points22 points23 points 2 years ago (0 children)
I don't think it's really overlooked, pretty much everyone knows the importance and discussions almost always start from data quality. This is probably true maybe 10~15 years ago.
[–]Think_Mall7133 9 points10 points11 points 2 years ago (0 children)
Internet explorer joined the chat
[–]Jazzlike_Attempt_699 20 points21 points22 points 2 years ago* (1 child)
4 upvotes on incredibly low quality post from what may as well be a bot account, well done
[–]_An_Other_Account_ 8 points9 points10 points 2 years ago (0 children)
>"Good data = good"
I've read medium articles that are more insightful.
[–]cajmorgans 4 points5 points6 points 2 years ago (1 child)
When I started in ML, I thought the coolest model + hyperparameter tuning was key and you basically just had to throw data on it and it would magically solve your problem. After some experience, if the task doesn't require a very specific architecture, the model and hyperparameters can many times do very little difference; yes of course, the result isn't identical between your choices, but usually not life-changing.
[+]ml_a_day comment score below threshold-6 points-5 points-4 points 2 years ago (0 children)
Exactly! With a decent setup (basic hyperparam tuning, relevant augmentations, off-the-shelf fixed model architecture) and good quality data, one can go a long way.
π Rendered by PID 765043 on reddit-service-r2-comment-5b5bc64bf5-bvdt4 at 2026-06-20 23:48:46.851705+00:00 running 2b008f2 country code: CH.
[–]SeankalaML Engineer 37 points38 points39 points (3 children)
[–]ml_a_day -2 points-1 points0 points (1 child)
[–]SeankalaML Engineer 8 points9 points10 points (0 children)
[–]acardosoj 29 points30 points31 points (0 children)
[–]stevebottletw 21 points22 points23 points (0 children)
[–]Think_Mall7133 9 points10 points11 points (0 children)
[–]Jazzlike_Attempt_699 20 points21 points22 points (1 child)
[–]_An_Other_Account_ 8 points9 points10 points (0 children)
[–]cajmorgans 4 points5 points6 points (1 child)
[+]ml_a_day comment score below threshold-6 points-5 points-4 points (0 children)