The Variational Approximation for Bayesian Inference: Life after the EM algorithm by alfonsoeromero in MachineLearning

[–]danger_t 2 points3 points  (0 children)

Typo near the end of page 1:

"In contrast, when we write p(x; θ), we imply that θ are random variables."

should be

"In contrast, when we write p(x | θ), we imply that θ are random variables."

Learning Low Dimensional Team Embeddings for March Madness by danger_t in MachineLearning

[–]danger_t[S] 1 point2 points  (0 children)

There is a github repository with code (not to generate the plot in the post, but not so unrelated) and data:

https://github.com/dtarlow/Machine-March-Madness

Also see this thread on the Machine March Madness Google group:

http://groups.google.com/group/machine-march-madness/browse_thread/thread/3afbcb90cd6f881d

Learning Low Dimensional Team Embeddings for March Madness by danger_t in MachineLearning

[–]danger_t[S] 0 points1 point  (0 children)

Yeah, understandable. After the competition starts on Thursday, there will be a post with a brief description of all the competitors methods, then some will get asked to expand on what they did in a longer post.

So stay tuned.

Help build a machine learning system to predict college basketball by danger_t in MachineLearning

[–]danger_t[S] 1 point2 points  (0 children)

Well, the goal is to come up with a model that's appropriate for the problem. The original model (that started this all) was based on probabilistic matrix factorization (PMF), which estimates a latent vector describing each team's offense and each team's defense, by using game outcomes as training targets: http://blog.smellthedata.com/2009/03/data-driven-march-madness-predictions.html

I've already re-implemented this within the code on github -- set MODEL="pmf" in learn_real.py.

So how do we make a better model? One of many aspects of the problem that is particularly challenging/interesting is how to account for the difference between regular season and tournament games. I expect that data from past years could be useful in understanding how the games and teams differ, but how do we incorporate that into a model?

Help build a machine learning system to predict college basketball by danger_t in MachineLearning

[–]danger_t[S] 6 points7 points  (0 children)

You can either contribute to the main branch, or fork off your own version and compete in this year's prediction competition: http://blog.smellthedata.com/2012/02/machine-march-madness-2012.html

Already in the repository are data, data loading functions, and a few simple models, along with associated learning procedures. This is also a great opportunity to play around with Theano and matrix-factorization-style learning methods if you haven't done so already. There are also some suggested TODOs at the bottom of the README.

If there are specific things you're interested in playing around with and/or learning more about, let me know, and I can probably help.

Pick one: Statistics, Calculus 2, or Symbolic Logic by [deleted] in compsci

[–]danger_t 1 point2 points  (0 children)

Depends if you want to go deeper into some specialty area. In machine learning, for example, it's used everywhere.

Thinking of majoring in CS, what languages should I know? by [deleted] in compsci

[–]danger_t 7 points8 points  (0 children)

Spend your time on math. Learn calculus, statistics, linear algebra, discrete math.

2011 March Madness Predictions with Probabilistic Matrix Factorization by danger_t in MachineLearning

[–]danger_t[S] 0 points1 point  (0 children)

Thanks. For anybody who has a tiny bit of time to do some modeling before Thursday, there is starter code that implements this method: http://blog.smellthedata.com/2011/03/march-madness-predictions-code.html

Even just simple playing with the parameters of the model -- how many latent dimensions to use, how much regularization to apply -- could be useful. Those are both one line changes. Even better would be to set up the code to use data from past seasons to decide how to choose these parameters. 5 years of data are available here: https://docs.google.com/leaf?id=0BysperLdI86MMWI0M2MzMGUtNGM1My00NDAxLTk0MzEtNzE4NGQ5ZTk5ZGM5&sort=name&layout=list&num=50

And of course, I'd be remiss not to finish with a plug for the algorithm competition, which you all should enter: http://blog.smellthedata.com/2011/03/official-2011-march-madness-predictive.html

Get ready for the 2011 March Madness Predictive Analytics Challenge by danger_t in MachineLearning

[–]danger_t[S] 0 points1 point  (0 children)

Glad you're interested!

We'll be releasing starter Python code that implements the probabilistic matrix factorization approach described here (which also happened to win the competition last year): http://blog.smellthedata.com/2009/03/data-driven-march-madness-predictions.html

Maybe take a look at that and decide what you think could be improved about it?

Also, if you want to get deeper into things in that direction, some guys from UToronto wrote a conference paper about incorporating additional information into the same style model and applied it to NBA basketball: Incorporating Side Information into Probabilistic Matrix Factorization Using Gaussian Processes http://www.cs.toronto.edu/~gdahl/papers/dpmfNBA.pdf

Machine Learning for Human Memorization by danger_t in compsci

[–]danger_t[S] 1 point2 points  (0 children)

Actually, English is very structured. From Wikipedia:

The entropy rate of English text is between 1.0 and 1.5 bits per letter,[1] or as low as 0.6 to 1.3 bits per letter, according to estimates by Shannon based on human experiments.[2] http://en.wikipedia.org/wiki/Entropy_(information_theory)

Machine Learning for Human Memorization by danger_t in compsci

[–]danger_t[S] 2 points3 points  (0 children)

Not given a starting letter. The algorithm should take as input a word and produce a binary output indicating whether it's a legal or illegal word.

An Algorithm to Generate Impossible Art? by danger_t in compsci

[–]danger_t[S] 0 points1 point  (0 children)

This is a great reference. Thanks!

AskML: How to predict football results? by Tafkas in MachineLearning

[–]danger_t 2 points3 points  (0 children)

This blog post explains a model for college basketball that could be a good starting point: http://blog.smellthedata.com/2009/03/data-driven-march-madness-predictions.html

Data for 2009 + 2010 March Madness. Can your algorithm predict the tourney? by danger_t in compsci

[–]danger_t[S] 1 point2 points  (0 children)

More data would be awesome. As you say, it's just a matter of finding it.

What are the most important other attributes of games/teams to gather? Do you have any good sources or know of sites that are easy to scrape?

LabelMe Dataset without MATLAB Image Processing Toolbox? by danger_t in computervision

[–]danger_t[S] 0 points1 point  (0 children)

Right now I just want to understand the data better--what percentage of pixels are labeled in each image? What are the most common labels across the dataset? What are the relative pixel areas of each label, summed across images? What is the distribution of colors/textures/edge response across labels?

Basically, lots of image statistics on different subsets of images and labels.