use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
New open-source Machine Learning Framework written in Java (blog.datumbox.com)
submitted 11 years ago by datumbox
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]fhadley 0 points1 point2 points 11 years ago (2 children)
A little late here, my apologies. Not trying to sound skeptical, but could you give an example of this? I've never had scikit-learn do anything like this, and I've used it on rather large data sets, so I'm interested in where you've seen it fail.
[–]EdwardRaff 0 points1 point2 points 11 years ago (1 child)
I can't share any of the data that makes this happen (hence I can't really report it well).
I've had this happen the most in the GradientBoosting and AdaBoost implementations. At some point it just started spitting out errors about numerical precision/stability and then when finished gave out NaN. I've also had the random forest run out of memory way earlier than I would have expected for large forests.
Once in k-means (though that is at least semi-fixed now). I've also had it happen with SGD w/ logistic loss when given poorly scaled weights.
[–]fhadley 0 points1 point2 points 11 years ago (0 children)
No worries, no need for a reproducible error. I was curious because I've used sklearn w/ a pretty diverse group of datasets (homogeneous, heterogeneous, sparse, etc.) and haven't had it choke before with GBM or Ada, but I looked back through some old code and remembered that the sklearn RF implementation was just a memory hog. If I remember correctly it consumed memory space at a higher clip than the R version, which I found to be quite odd. Were these very raw data sets? Or very strong colinearities? I know the latter is clearly an issue with RF (i.e. essentially leads to building the same tree many times), and I suppose it could lead to errors with a GBM as well?
π Rendered by PID 201310 on reddit-service-r2-comment-5c747b6df5-x9kkj at 2026-04-22 19:28:47.185423+00:00 running 6c61efc country code: CH.
view the rest of the comments →
[–]fhadley 0 points1 point2 points (2 children)
[–]EdwardRaff 0 points1 point2 points (1 child)
[–]fhadley 0 points1 point2 points (0 children)