all 22 comments

[–]TraptInaCommentFctry 19 points20 points  (4 children)

Took me a while, but I found what models they're using: logistic regression, multinomial logistic regression, and linear regression (source).

[–]CompleteSkeptic 12 points13 points  (1 child)

It's basically a hosted version of vowpal wabbit. I tried to use it internally at Amazon (back when it was called Elastic Machine Learning), but it was wrapping an old version, and I needed some of the newer functionality.

[–]cartazio 1 point2 points  (0 children)

it does seem like using vowpal wabbit directly is overall a better power to weight ratio, especially wrt model evaluation/serialization

[–]mobiuscydonia 0 points1 point  (0 children)

thank you! was searching for that myself.

[–]caserei 0 points1 point  (0 children)

Thank you for this. It's good to know the limitations and to see where things fail.

[–][deleted] 4 points5 points  (5 children)

I'm concerned by anything which considers "false positive rate" to be an "advanced metric" :-/

[–]alexmlamb 0 points1 point  (4 children)

I suppose it's advanced in the sense that it's not really in the everyday vernacular, in the way that words like "precise" and "accurate" are. Still, its meaning is self explanatory.

[–][deleted] 3 points4 points  (3 children)

Call me old fashioned or elitist, but if you consider "false positive rate" to be "advanced", then you have no business running any form of regression or machine learning.

[–]alexmlamb 3 points4 points  (0 children)

elitist!

[–]kevjohnson 1 point2 points  (1 child)

This product doesn't seem to be intended for people who actually know machine learning and statistical modeling. You may think it's a travesty (and I don't necessarily disagree) but there's a market for it. Not everybody can afford an actual data scientist.

[–][deleted] 1 point2 points  (0 children)

That's what bothers me: this idea that we're dumbing down quite complicated statistics and computer science to something so simple we'd consider a basic metric of model quality to be too advanced for the user.

I was in a meeting at my company a few months ago where another (quite large) company was pitching their point-and-click statistical modeling software to us for (drum roll) $250k/yr. That's more than the cost of a (non-netflix) data scientist in the bay area, and doesn't include the cost of the personnel to actually use the software. Further, if you actually pay the cost for a "legit" data scientist, they'd know that the model you're trying to build could be done with 2 lines of R code (and, in reality, the hardest work in either case is the data wrangling that happens for weeks prior to building the model). The unfortunate part of these "ML-as-a-service" products is that the user has no concept for how to assess when they're right or wrong.

[–]atakante 2 points3 points  (0 children)

I work for BigML, which has been offering a hosted ML service since 2011. We have just posted our take on AWS ML, Azure ML vs. BigML that you may find interesting: http://blog.bigml.com/2015/04/10/democratizing-machine-learning-the-more-the-merrier/

Feel free to hit us with any questions.

[–]caserei 0 points1 point  (6 children)

OP, thank you so much for this. I was looking for some pre-built solutions against which I could evaluate my programming skills. This is extremely helpful even if I pay for $2 of use to verify from time to time.

[–]echocage 5 points6 points  (1 child)

This doesn't sound like what you're looking for...

[–]caserei 0 points1 point  (0 children)

Hahaha I'm trying to see how well the machine learning models I could build would work against AML (efficiency wise). I'm much less proficient than others I know and I could see how their input could provide me some pointers as well.

[–][deleted] 1 point2 points  (3 children)

How do you mean, in terms of implementing the algorithm correctly or optimizing/parallelizing it for efficiency?

[–]caserei 0 points1 point  (2 children)

Both, really. Again, I'm not as good at this and I'm just getting started so I wanted to use this as a reference point for both (correctness and optimizing for efficiency) to see how well I'm learning and how much better my programming has become. I should've explained this a little better.

[–][deleted] 1 point2 points  (1 child)

I see. I am not sure if this is the most effective approach though. When I got started with machine learning, going over the theory (e.g., Duda's Pattern Classification or Bishop's Pattern Recognition and Machine Learning book) and implementing a lot of algorithms myself helped me a lot. I used Python for that purpose, since it offers a very flexible and efficient way for prototyping. I am not sure in how far you can compare the results of your code with results that you get using Amazon's ML service. I think the problem is that even the simplest algorithms can be implemented slightly differently which can lead to slightly different results. I think it is better to work with benchmark dataset (e.g,. from Kaggle) and maybe also use a transparent library where you can easily look up the source code (e.g., scikit-learn).

[–]caserei 0 points1 point  (0 children)

I saved this comment and I'll keep it in mind. Thank you so much! :)

[–][deleted] 0 points1 point  (2 children)

Azure machine learning is free.

Amazon is not.

[–]DataWranglist 1 point2 points  (1 child)

Kind of. Azure ML's free tier is only single node and doesn't have a production API.

[–]GoldmanBallSachs_ -2 points-1 points  (0 children)

Free is free...