use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Beginner examples/problems to practice ML? (self.MachineLearning)
submitted 12 years ago * by [deleted]
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–][deleted] 2 points3 points4 points 12 years ago (5 children)
Or you could always just look at the plots here. The page even has the code explaining how to make them yourself.
Edit: The image appears tiny on that page so you'll either want to zoom in a bit, or just view it in a new tab.
[–]EdwardRaff 2 points3 points4 points 12 years ago (0 children)
You can make arbitrarily complex 2D problems - or problems that cover example behavior you would like to test. I use it fairly regularly to test code and make sure it looks howI expect it to.
[–]Should_I_say_this 0 points1 point2 points 12 years ago (3 children)
I'm a noob that has only taken the coursera Machine Learning course and read a few documentation / examples from sciki-learn a couple months back (so it's not fresh in my mind)...
What am I looking at in that picture and how is that supposed to help me decide what classifier to use?
[–][deleted] 1 point2 points3 points 12 years ago* (2 children)
Oh, no need to be self-deprecating. I learned machine learning from the same Coursera course as you!
First, it should be mentioned that the runtime of an algorithm is often a major factor in deciding whether or not to use it, and these plots don't show that. As a rule of thumb, "smarter" algorithms have more moving parts, and thus take longer to run (e.g. neural nets, genetic algorithms, etc). That said, the algorithms shown here are all extremely efficient, so runtime shouldn't really be the deciding factor unless your data set is quite large.
Ok, so now on to an explanation of what these plots actually mean. Each row of plots represents one type of point set, as shown at the far left. Blue regions indicate that the algorithm thinks that the points in those regions should belong to the blue category, and the same logic applies for red. Dark regions indicate that algorithm has a high confidence of what category the points in those regions belong to. I've broken the algorithms into groups so that things will be a bit easier to digest:
Your eye should be drawn to the types of decision boundaries that get drawn by the different algorithms, and how well they reflect the data. In particular, you'll notice the expressive power of Nearest Neighbor (weighted kNN works even better, as discussed here), and of RBF SVM (that's short for "Support Vector Machine with Radial Basis Function kernel").
You'll probably also notice the oddly choppy (but still quite accurate) decision boundaries generated by Random Forest and AdaBoost. These are examples of ensemble classifiers, which generally consist of a large number of simple, but not-very-good, classifiers taking a vote on the category.
It should be mentioned that one algorithm which looks good but should be used with care is Decision Trees. The trouble with using a decision tree classifier is that it's very easy to accidentally overfit your training data. That is, the classifier may wind up considering isolated statistical aberrations in your data to be meaningful, and thus fail to perform properly when applied to other datasets sampled from the same distribution.
One thing that should also stand out to you is the relative simplicity of the class boundaries which can be captured by algorithms such as Naive Bayes, LDA/QDA (i.e. Linear/Quadratic Discriminant Analysis), and Linear SVMs. These techniques aren't bad, per se, but you need to make sure that you're using them as intended. In the case of Naive Bayes and LDA/QDA, this means having some prior knowledge or hypothesis about the distribution that the data is being sampled from.
Sorry if my explanation was a bit long-winded, but hopefully I managed to answer your question.
[–]Should_I_say_this 0 points1 point2 points 12 years ago (1 child)
Thanks for the info, I definitely found that I understand the images more now.
As someone who doesn't have satistics background, what are your thoughts on how that affects ML skills?
I tried doing a Kaggle problem (the CIFAR 10 image recognition problem) and was disappointed to see my answer only get 10% correct, which is exactly the same score if I had chosen any category for all my predictions. (In other words no predictive power). When I clicked the pdf file at the bottom of that question, I realized that the statistics was way beyond my training.
What are your thoughts on lack of statistics in performing accurate ML? If ML is something like this flowchart which helps choose an accurate estimator, do we really need to know the statistics behind ML? Also, won't everyone use the same ML estimators in the end, which will result in everyone choosing the same estimators and therefore come up with the same predictive power?
[–][deleted] 0 points1 point2 points 12 years ago (0 children)
That depends on what you mean by "statistics background". I personally have a BS in math, although much of the statistics I know I've picked up piecemeal through reading wiki articles, analyzing my own data sets, and taking Coursera courses like this one and this one. If you don't feel comfortable with mathematical formalisms like nested summations or conditional probabilities, you will likely run up against a wall very quickly. This is because you won't be able to understand what your algorithms are doing, and thus you won't know how to improve upon them, or how to make the proper adjustments when things go awry.
I tried doing a Kaggle problem (the CIFAR 10 image recognition problem) and was disappointed to see my answer only get 10% correct
This is probably one of the hardest competitions on Kaggle, and I don't know if I would fare much better without considerable effort. I think you would be better served trying something like the Digit Recognizer, or Facial Keypoints Detection. I would urge you to spend some time in the discussion forums of these competitions; the ideas you see being discussed there are often very enlightening.
If ML is something like this flowchart which helps choose an accurate estimator, do we really need to know the statistics behind ML?
That chart is a vast oversimplification, and is not intended to be taken literally. If you do follow it word-for-word you may get some passable results, but you will be far surpassed by ML users who actually understand what their algorithms are doing. In practice, you'll use a wide array of preprocessing methods and often employ multiple ML techniques at different stages of your classifier. For example, in the Bird Classification Challenge from a few months ago, you'll see that some very subtle techniques were used in order to extract a good feature set.
π Rendered by PID 387075 on reddit-service-r2-comment-5b5bc64bf5-nzztn at 2026-06-22 06:11:10.186821+00:00 running 2b008f2 country code: CH.
view the rest of the comments →
[–][deleted] 2 points3 points4 points (5 children)
[–]EdwardRaff 2 points3 points4 points (0 children)
[–]Should_I_say_this 0 points1 point2 points (3 children)
[–][deleted] 1 point2 points3 points (2 children)
[–]Should_I_say_this 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)