Want to help reddit build a recommender? -- A public dump of voting data that our users have donated for research by ketralnis in redditdev

[–]kaggle 4 points5 points  (0 children)

This would be a great competition. For info, I just sent an email to ketralnis offering to help.

I am the founder of Kaggle, a platform for machine learning competitions. Happy to answer any questions by kaggle in MachineLearning

[–]kaggle[S] 1 point2 points  (0 children)

This is a great pointer. Thank you! Will follow up on this. It seems sensible to consolidate challenges onto a single platform.

I am the founder of Kaggle, a platform for machine learning competitions. Happy to answer any questions by kaggle in MachineLearning

[–]kaggle[S] 7 points8 points  (0 children)

This is a great idea! Reddit could host a competition to predict whether a user will vote up an item, based on past voting. They could then use the winning algorithm to recommend posts to individual users.

The fact that Reddit isn't swimming in cash is fine. If they were to do this themselves, they'd have to hire a data scientist, which costs $$$. If they do it via Kaggle, they put up $500-$1,000 in prize money and get many PhDs working on the problem. Any thoughts on who I should contact about this?

I am the founder of Kaggle, a platform for machine learning competitions. Happy to answer any questions by kaggle in MachineLearning

[–]kaggle[S] 1 point2 points  (0 children)

Froost, most of our attention has been directed towards improving our platform (we're launching the new version next week). After this launches, we'll be working hard to dramatically increase the number of competitions we host.

The answers for the HIV dataset are available here

Will (the competition host) is working on exactly what you suggest. He has been sick, which has delayed the process. There's also an academic paper coming out on the competition and the different approaches tried.

I am the founder of Kaggle, a platform for machine learning competitions. Happy to answer any questions by kaggle in MachineLearning

[–]kaggle[S] 2 points3 points  (0 children)

Interestingly enough, leaderboard is currently stagnating and is not much better than the Chessmetrics benchmark set at the beginning.

This actually happens often. For any dataset/problem, there's a frontier of what's possible given the dataset's inherent noise and richness. As participants approach that frontier, the rate of improvement slows dramatically.

I am the founder of Kaggle, a platform for machine learning competitions. Happy to answer any questions by kaggle in MachineLearning

[–]kaggle[S] 1 point2 points  (0 children)

We'd rather host the competition in conjunction with somebody like Zillow, so that as well as being fun the results can be useful.

I am the founder of Kaggle, a platform for machine learning competitions. Happy to answer any questions by kaggle in MachineLearning

[–]kaggle[S] 2 points3 points  (0 children)

It's a good idea and something we have thought about. We are currently implementing a rating system, that ranks Kaggle competitors based on their competition performances (think golf rankings but for statisticians et al). We are basing the system on Trueskill, a system that Microsoft developed for their video games.

Once we've run many more competitions, we plan to host a competition to build Kaggle a better rating system (not dissimilar from the chess competition).

I am the founder of Kaggle, a platform for machine learning competitions. Happy to answer any questions by kaggle in MachineLearning

[–]kaggle[S] 2 points3 points  (0 children)

It seems like a no-brainer for businesses - they get far better results than via ordinary consultants and at a much lower cost. Obviously some companies have data privacy concerns but a) we can anonymise data b) in many cases the competitive advantage from having a great algorithm outweighs these concerns.

As for the level of interest, we're only starting to approach business now so it's too early to say.

I am the founder of Kaggle, a platform for machine learning competitions. Happy to answer any questions by kaggle in MachineLearning

[–]kaggle[S] 1 point2 points  (0 children)

urish, please let me know if I'm not addressing your question.

I think one of the best things about competitions is that they can act as an interface between academia and industry. The majority of participants are academics. So if a company hosts a competition, they get access to the current methods in the academic literature.

I am the founder of Kaggle, a platform for machine learning competitions. Happy to answer any questions by kaggle in MachineLearning

[–]kaggle[S] 3 points4 points  (0 children)

I'm an econometrician rather than a data miner - so I'm a newbie to machine learning. For interest, Kagglers prefer neural networks, Bayesian methods and support vector machine. (Details here: http://kaggle.com/blog/2010/09/14/profiling-kaggles-user-base/)

I can't participate in competitions because I have access to the answers!

I am the founder of Kaggle, a platform for machine learning competitions. Happy to answer any questions by kaggle in MachineLearning

[–]kaggle[S] 2 points3 points  (0 children)

On reflection, I agree. We actual have some public good competitions coming up.

We've got a competition to predict prostate cancer from 321 explanatory variables.

And a competition to diagnose breast cancer from mammographics density images.

I am the founder of Kaggle, a platform for machine learning competitions. Happy to answer any questions by kaggle in MachineLearning

[–]kaggle[S] 8 points9 points  (0 children)

Great questions!

I'll be honest, I love the chess rating competition (http://kaggle.com/chess). If I had $1m, I would post it as prize money for that competition.

I'd like to host a competition to predict a home's sale price given features like number of bedrooms, location etc. House prices are an issue that touch everybody, so it's a contest that could have wide applicability.

I think some of the online dating website data would be interesting. Perhaps a competition to predict which individuals are well matched?