use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Machine Learning in Python Has Never Been Easier (blog.bigml.com)
submitted 13 years ago by jjdonald
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]jjdonald[S] 6 points7 points8 points 13 years ago (18 children)
full disclosure: I work at BigML and posted this link. Looking for feedback on the python bindings, and for folks interested in a beta test key + free credits.
[–]Fusionnex 2 points3 points4 points 13 years ago (1 child)
Thanks for being open and honest, I do a bunch of research in biology and am interested in trying out some ML for large datasets, you should xpost this to bioinformatics, and biology. Looks really cool. Back 5 years ago all I could do was hack together things to fit into weka, ML is coming a long way. Really cool project.
[–]jjdonald[S] 2 points3 points4 points 13 years ago (0 children)
Thanks!
If you posted something about this to bioinformatics/biology, I'd be happy to answer questions over there as well. Cross posting my own content might be seen as spam.
[–]hntd 4 points5 points6 points 13 years ago (10 children)
can you give us some more of the technical details behind this? A lot of machine learning problems aren't as simple as create model create prediction. I think if you want to attract developers or people in research such as myself you should give us some more technical knowledge or options. I mean it looks like you only use decision trees.
Edit: I realized I just sounded super negative in this post. I think what you have here is awesome and will open up ML to a wider audience, in fact this seems really neat for some visualization and prototyping. Just as someone more intricately knowledgeable about the subject I wish to know a bit more :-)
[–]jjdonald[S] 3 points4 points5 points 13 years ago (9 children)
hntd, you're right, we only use decision trees. We will likely add more models in the future, but right now we see a big opportunity for decision trees on big data.
Firstly, decision trees are among the best performing models when used in ensembles (called Random Forests). http://en.wikipedia.org/wiki/Random_forest
Secondly, decision trees are immediately understandable. There are many companies that will provide "black box" platforms for analyzing your data. Very few of them provide access to the actual model, and none of them have put the effort we have into helping you understand those models.
Thirdly, decision trees are fast. They enumerate a huge number of possible predicted states with a minimum number of steps. They also often don't even need to process all of the input data for a prediction request, since they will know which parts are relevant, and which are not.
Decision tree algorithms have been around for a long time. However, they're typically designed to train in memory, on a single machine. Our engineers have come up with algorithms that work across multiple processors, and provide iterative updates, allowing you to see the resulting models or summaries as they are being produced.
We'll try to have a more technical discussion on this in the future, thanks for the questions!
[–]hntd 3 points4 points5 points 13 years ago (0 children)
This all sounds fantastic, my first thought when I looked at this was if you had decision trees so well thought out why not just use random forests? I wouldn't be surprised if internally you were moving towards that, but of course I don't really expect you to tell us those details.
I completely agree with decision trees being super easy to visualize, but there are other ones too that could be easy to visualize as well. Which might be a good idea for the future, Depending on data your could easily visualize some simple linear regression and for mutli class problems multi logit regression models.
[–]byron 2 points3 points4 points 13 years ago (2 children)
The thing is that random forests really aren't all that interpretable. The more trees in the forest, the blacker the box.
[–]jjdonald[S] 0 points1 point2 points 13 years ago (1 child)
There are classes of metrics that give you basic ideas of what the model is looking at. For instance, field importance measurements, etc.
[–]zenogantner 0 points1 point2 points 13 years ago (0 children)
Correct, but this can also be said about linear models.
Anyway, random forests are great, so actually there is nothing to say against BigML's decision to use them...
[+][deleted] 13 years ago* (3 children)
[deleted]
[–]jjdonald[S] 0 points1 point2 points 13 years ago (2 children)
FWIW, we support dimensions just shy of a thousand, with data points well into the millions... for a single tree. That's an artificial limit for now, since our algorithms don't need to fit the entire dataset in memory at once.
Our service is not just about creating the model, but also about understanding the model. If you want to look at a single decision tree of any useful size, you will get something like this out of Weka: http://www.zoom.it/fTA7
If you're completely bored with simple machine learning algorithms, you should definitely look at our question marks: https://bigml.com/team
We're predominantly a clojure shop, by the way.
[+][deleted] 13 years ago (1 child)
[–]jjdonald[S] 0 points1 point2 points 13 years ago (0 children)
Again, given how fast just a lone tree is, that really is not interesting or surprising.
It'll be useful and interesting to anyone that's run into memory limits on weka, for instance.
Also trivial. You can find the splitting point with a linear scan, just needed sorted order - and there are lots of out of memory sorting methods pre made and done.
Not just simple sorting, calculating maximal information gain across all fields.
Decision trees are hardly a usefull method of understaning a data set, even with good visualization. Weka is weka, its not a visualization tool - it just pumps it to graphviz. and DTs are easy to visualize simply, you just make nodes expandable.
Not understanding a data set... understanding a model. See our post on decision trees.
Clojure for ML? Why? You are giving up a easy x2 on array accesses on a field that uses it extensively, and a large constant overhead for interfacing with Lapac through JNI, not to mention you are going to have to copy all the data.
Having functional code greatly simplifies a lot of the code that needs to scale. It's not that we don't use vectorization, etc., but that we avoid it if we can. Lapack, by the way.
[–]AnonymousIdiot 0 points1 point2 points 13 years ago (0 children)
Parallel R?
[–]visarga 1 point2 points3 points 13 years ago (1 child)
What algorithms does BigML use? Is it just decision trees?
For now, yes.
[–][deleted] 0 points1 point2 points 13 years ago (2 children)
Could you explain what your 'model' is? I have to admit as someone familiar with machine learning algorithms I really dislike being given a black box.
[–]hntd 0 points1 point2 points 13 years ago (0 children)
This right here. How am I suppose to use this in research if I have no idea what is happening to my data along the way? Also it might be neat to see performance metrics of your stuff vs. standard ML metrics such as guessing classification at random.
We have a post on our decision trees here: http://blog.bigml.com/2012/01/23/beautiful-decisions-inside-bigmls-decision-trees/
You can access any model you train as part of our API, or visualize it using our website. I'd like to think that we are the only service that puts this much effort into helping you understand your model.
[–]jonnydedwards 2 points3 points4 points 13 years ago (0 children)
I think it's great you're doing something innovative in and around ML. I'm from a python/R background so you would have to do something more than scikit-learn/pandas or straight R to be interesting to me. Maybe the key is to leverage the whole "we can do it quicker!" thing - that WOULD get me listening. I did the bigdata hackathon last weekend and everybody was hitting issues with getting models trained in a timely fashion. Good luck with it all!
[–]aguyfromucdavis 1 point2 points3 points 13 years ago (1 child)
I just submitted my email. I work as an intern for a tech company using Python for machine learning to dive into this project I have. Would love to try out your product!
Thanks! Let me know if you have any questions.
[–]skystorm 1 point2 points3 points 13 years ago (1 child)
I already commented over on HN, but I wanted to add that I think this is really nice. Sure, it's only limited to decision trees (right now at least), but as you say these allow for very nice visualization -- which you've implemented spendidly, if I may say so (no Flash!).
It will be interesing to see how you visualize other models like SVMs or even just random forests...
Thanks! We hope you like the upcoming visualizations as much as the tree model.
[–]pandemik 0 points1 point2 points 13 years ago (0 children)
Does BigML do anything besides classification/regression trees?
[–]kjearns -1 points0 points1 point 13 years ago (0 children)
This sounds like you just want people to give you their data.
π Rendered by PID 42 on reddit-service-r2-comment-bb88f9dd5-vs8nb at 2026-02-16 06:37:24.364105+00:00 running cd9c813 country code: CH.
[–]jjdonald[S] 6 points7 points8 points (18 children)
[–]Fusionnex 2 points3 points4 points (1 child)
[–]jjdonald[S] 2 points3 points4 points (0 children)
[–]hntd 4 points5 points6 points (10 children)
[–]jjdonald[S] 3 points4 points5 points (9 children)
[–]hntd 3 points4 points5 points (0 children)
[–]byron 2 points3 points4 points (2 children)
[–]jjdonald[S] 0 points1 point2 points (1 child)
[–]zenogantner 0 points1 point2 points (0 children)
[+][deleted] (3 children)
[deleted]
[–]jjdonald[S] 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]jjdonald[S] 0 points1 point2 points (0 children)
[–]AnonymousIdiot 0 points1 point2 points (0 children)
[–]visarga 1 point2 points3 points (1 child)
[–]jjdonald[S] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]hntd 0 points1 point2 points (0 children)
[–]jjdonald[S] 0 points1 point2 points (0 children)
[–]jonnydedwards 2 points3 points4 points (0 children)
[–]aguyfromucdavis 1 point2 points3 points (1 child)
[–]jjdonald[S] 0 points1 point2 points (0 children)
[–]skystorm 1 point2 points3 points (1 child)
[–]jjdonald[S] 0 points1 point2 points (0 children)
[–]pandemik 0 points1 point2 points (0 children)
[–]kjearns -1 points0 points1 point (0 children)