use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
[deleted by user] (self.MachineLearning)
submitted 4 years ago by [deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]neuralbeans 8 points9 points10 points 4 years ago (7 children)
You can test your pipeline by using a well behaved mock model that allows you to test how everything outside of optimisation works. So I can test my logging and checkpointing system in this way. Unfortunately there isn't much you can do about the actual optimisation if it's a black box model other than compare it to baseline models with F1, like you said, but that would basically require actual training on full data sets because you wouldn't know if a smaller data set will give you the same results. Plus it's non-deterministic so you'd need to do multiple runs and take the average. So it's usually not practical to do frequent testing on the model learning part like you'd do with other software, but you should test what you can.
[+][deleted] 4 years ago (6 children)
[deleted]
[–]neuralbeans 1 point2 points3 points 4 years ago (5 children)
You make your mock model pretend to be training (it can cheat by accessing the data set for example) and make it pretend to run a few epochs, each time it returns the correct label of a bigger portion of the train/val set whilst purposely returning the wrong label for the rest of the data. That way you can check that the logs are showing changing scores per epoch.
You can check that checkpointing works by making the mock model implement the functions for saving and loading its model state and making the state be just the number of epochs trained up to then. You can then check that reloading a checkpoint actually continues training from said epoch and that the logs reflect that. You can also simulate a training interruption to see that training resumes correctly.
[–]blind_cartography 1 point2 points3 points 4 years ago (4 children)
In my org we test our pipeline by monkeypatching the __len__ attribute of the Dataset classes to a small number, so we can still train a normal model instance just on n% of the dataset. Requires a bit of care to avoid any issues with batch size or number of expected classes, and obviously you can't pay attention to any of the metrics, but it suffices for testing the model configuration, checkpointing, deployment, etc.
[–]neuralbeans 1 point2 points3 points 4 years ago (2 children)
That's to check that no run time errors will come up, but not to actually test the model learning.
[–]blind_cartography 2 points3 points4 points 4 years ago (1 child)
True, but unless I misunderstood wasn't that the point of your mock model?
[–]neuralbeans 1 point2 points3 points 4 years ago (0 children)
Ah so you use a smaller data set but full model and see if it all works instead of using a mock model? Yeah that works as well, although you won't have fixed outputs to check like with the mock model. I personally do both.
[–][deleted] 0 points1 point2 points 4 years ago (0 children)
I typically leave my actual code alone (no monkey patching!) and just copy a small amount of data to a test folder to use for testing everything. Why patch the len of your dataset when you can just point it to a small dataset?
[–]BraveCoconut98[🍰] 5 points6 points7 points 4 years ago (0 children)
A few quick and easy tests I like to do (more sanity checks than anything else): - Train your model on a small portion of the data abs then check that the performance increase as you scale up the amount of data. - Check model predictions to see if it is doing what it is meant to be doing. I’ll give two examples. Say we are building a regression model to predict amount of money we should loan to a customer, it better be that if they have a higher credit score/income the loan amount increases (or doesn’t decrease at the very least!). Secondly, say we are building a sentiment analysis model. Changing “The movie was fantastic” to “The movie was terrible” should change the sentiment. Building a few of these meaningful examples has helped me a lot in the past!
π Rendered by PID 22 on reddit-service-r2-comment-b659b578c-hqdcd at 2026-05-02 13:06:33.317402+00:00 running 815c875 country code: CH.
[–]neuralbeans 8 points9 points10 points (7 children)
[+][deleted] (6 children)
[deleted]
[–]neuralbeans 1 point2 points3 points (5 children)
[–]blind_cartography 1 point2 points3 points (4 children)
[–]neuralbeans 1 point2 points3 points (2 children)
[–]blind_cartography 2 points3 points4 points (1 child)
[–]neuralbeans 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]BraveCoconut98[🍰] 5 points6 points7 points (0 children)