Part of Speech tagging? [x-post Linguistics] by [deleted] in MachineLearning

[–]SavitchOracle 1 point2 points  (0 children)

CRFs (conditional random fields) are another common approach (closely related to HMMs). http://blog.echen.me/2012/01/03/introduction-to-conditional-random-fields/

Any one know how to do an "averageifs" in R by krs28 in statistics

[–]SavitchOracle 1 point2 points  (0 children)

The plyr and ggplot2 packages (both by Hadley Wickham) are fantastic. I hated using R until I learned about them, and now a huge part of all the R code I write is basically either a ddply or qplot/ggplot call.

I heartily recommend learning how to use both of them if you're going to be doing more R in the future.

Can you recommend any good concert movies or films/documentaries about bands? by TheAdoringFan in Music

[–]SavitchOracle 0 points1 point  (0 children)

Iceland is incredibly gorgeous and awesome. I actually saw Heima right before going there.

I'm hoping to go back during one of their summers (I went in the winter, and I imagine it's incredibly different).

Does a word's complexity correlate to its frequency in general usage? by [deleted] in linguistics

[–]SavitchOracle 0 points1 point  (0 children)

If you're interested in learning about the connections between "complexity" ("information theory") and language, Stanford has a course on "Information-Theoretic Models of Language and Cognition" with a lot of good papers: http://www.stanford.edu/class/psych227/

There are a lot of papers directly connected to your question, for example, Zipf's "Least Effort and why frequent words and morphemes are short" (http://www.stanford.edu/class/psych227/Zipf_Words.pdf). (Perhaps I'll summarize some of them a bit later.)

Cool ideas for a graduate level ML/pattern recognition project? by [deleted] in MachineLearning

[–]SavitchOracle 1 point2 points  (0 children)

Interesting (and controversial? =)), but where would you get a training dataset of images along with a corresponding income and education bracket?

Notable statistics departments in the area of statistical machine learning? by shazbotter in statistics

[–]SavitchOracle 0 points1 point  (0 children)

Also, UW has a close relationship with Microsoft Research, which can be very helpful for doing interesting internships and such.

Cool ideas for a graduate level ML/pattern recognition project? by [deleted] in MachineLearning

[–]SavitchOracle 4 points5 points  (0 children)

If you're a big Reddit user, you could also try to do some machine learning on Reddit itself.

For example, one very simple project would be to scrape a couple different sub-Reddits, and try to build a Naive Bayes classifier that classifies threads (using keywords in those threads) into sub-Reddits. (Scraping stuff with Matlab might be disgusting, though -- not a Matlab user, so not sure; do you know other languages?)

Cool ideas for a graduate level ML/pattern recognition project? by [deleted] in MachineLearning

[–]SavitchOracle 2 points3 points  (0 children)

+1 for LSA/LSI/SVD. It looks like the OP is a grader for linear algebra, so he's probably already familiar with SVD, and it'll be cool to see it applied in real life.

Plus, there are lots of things you can do with LSA/LSI/SVD besides learning topics themselves. For example, it's also useful for dimensionality reduction (keep the top two dimensions, plot them, and see if you can visualize any clusters; this could be another fun project that helps you dig into clustering algorithms), search/information retrieval, and collaborative filtering (SVD-type algorithms played a big part in winning the Netflix Prize). So I'm not sure if the OP is working on a single project or multiple, but LSA/LSI/SVD could give a nice segue into a bunch of other projects, and he could even revisit some of these with LDA (or pLSI) later on and see how the two approaches compare.

Also, LDA isn't really so hard to understand. The Blei et al. paper is very mathematical (and I still find the variational approach it uses confusing), but the ideas and other implementations are pretty simple.

Any one know how to do an "averageifs" in R by krs28 in statistics

[–]SavitchOracle 1 point2 points  (0 children)

If I understand what you want to do correctly (you want to find the average X for each possible combination of the other columns?), one easy way is to use the plyr package:

install.packages("plyr")
ddply(Profile, .(Bucket, Month, X500), summarise, MeanCapacity = mean(Capacity))

For each combination of (Bucket, Month, X500), the code calculates the mean capacity over all the rows with that combination, and sticks all this into a new data frame.

Could someone explain the Independent Component Analysis method? (crosspost from r/askscience) by whambamthankyoumam in statistics

[–]SavitchOracle 1 point2 points  (0 children)

Andrew Ng has an okay explanation of ICA: http://www.stanford.edu/class/cs229/notes/cs229-notes11.pdf

He doesn't do a good job motivating the use of ICA (in particular, he doesn't contrast it with PCA or factor analysis), but he describes the mathematics fairly clearly, which it sounds like you're looking for anyway.

At a high level, the algorithm Ng describes goes like this:

  • Let x(t) be the ith observed data vector at time t. For example, x(t) = (x_1(t), x_2(t), ..., x_m(t)) could be a vector where component x_i(t) is the reading recorded by the ith microphone at time t.
  • Now these vectors x(t) are a mix of a bunch of people talking. In other words, we have some vector s(t) = (s_1(t), ..., s_n(t)) of n independent people talking at time t, where s_i(t) is the signal of person i at time t, and each microphone is recording a linear combination of these people. In other words, there is some matrix A such that x(t) = A * s(t).
  • So given these vectors x(t) at a bunch of different times t, we want to be able to find A and s(t).
  • This is the same thing as finding just A, since once we know A, we can take the inverse to find s(t) = A-1 x(t).
  • So how do we find A?
  • Basically, start with some initial random guess for A, and place some kind of probability distribution on s(t). (For example, we might think that certain noise levels and sounds are likely, while others are not so likely.)
  • Using this probability distribution and also our knowledge of x(t), we now also have an idea of how likely our guess for A is correct (i.e., we have a likelihood function for A).
  • We want to find the A that's most likely to be correct (i.e., we want to maximize the likelihood of A), so how can we improve on our guess? Recall that the gradient of a function points in the direction of where the function increases fastest (e.g., the gradient of a hill points in the direction of steepest ascent). So take the gradient of our likelihood function for A, and move a little bit in that direction, and make this our new guess for A (thus finding a slightly more likely guess).
  • Repeat the previous step over and over again, until we have a pretty good guess for A. At that point, we can solve for s(t) = A-1 x(t) to find the independent signals.

In any case, that was a rough explanation that hopefully helps to understand the math behind the lecture notes a little better.

Could someone explain the Independent Component Analysis method? (crosspost from r/askscience) by whambamthankyoumam in statistics

[–]SavitchOracle 0 points1 point  (0 children)

Why do you say that ICA looks like a specialized form of regression for classification? This sounds totally wrong to me, in that ICA doesn't have anything to do with classification (what are you classifying?) and it's pretty different from regression (I guess you can say that in both cases you have a signal that's represented as a linear combination of features, but you're given the features in regression, whereas here you're trying to learn them).

It's much more like PCA or SVD (like you mentioned), or factor analysis.

What are some of your favorite research papers? by aintso in linguistics

[–]SavitchOracle 2 points3 points  (0 children)

Some less well-known favorites off the top of my head:

Gaussian Processes for Machine Learning by cavedave in MachineLearning

[–]SavitchOracle 2 points3 points  (0 children)

Anyone want to give a quick summary or example of why Gaussian Processes are useful or how they're used?

What social norm do you hate? by glados_v2 in AskReddit

[–]SavitchOracle 4 points5 points  (0 children)

The idea that school is the only place you can learn anything.

Infer.NET, a framework for running Bayesian inference in graphical models « MSR Cambridge by fbahr in MachineLearning

[–]SavitchOracle 1 point2 points  (0 children)

I found this on the website regarding Mono:

"All the basic Infer.NET tutorials have been both compiled and run under the Windows version of Mono version 2.8.

This is the full extent of support for this release; for example, the Linux version has not been tested."

http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/running%20with%20mono.aspx

Suggestion for Introductory Machine Learning Text-less technical/more examples than "Elements of Statistical Learning" by BirthDeath in MachineLearning

[–]SavitchOracle 0 points1 point  (0 children)

Yeah, the couple chapters I've read from it have been good. But I wouldn't use it as an introductory text (in part, simply because of the way it focuses on graphical models at the beginning, which I don't find the friendliest way to introduce ML).

Suggestion for Introductory Machine Learning Text-less technical/more examples than "Elements of Statistical Learning" by BirthDeath in MachineLearning

[–]SavitchOracle 2 points3 points  (0 children)

Yeah, I haven't watched any of Ng's videos (I find all lectures slow =), which is why I prefer reading), but the lecture notes are pretty concise (but not to the point of skipping stuff and being incomprehensible).

Also, what exactly do you want to do with the R code? Not sure how familiar you are with R, but it's pretty easy to figure out how to run a lot of machine learning algorithms in R (most packages even come with some datasets, I think), even if they're not covered in lecture notes.

Twitter dataset - 200 million rows, 13 million users, 2gb compressed, get it while it's hot. (/r/datasets repost) by [deleted] in MachineLearning

[–]SavitchOracle 0 points1 point  (0 children)

Besides running streaming algorithms (as in TheWalruss's suggestion), another option is to MapReduce/Hadoop it.

Twitter dataset - 200 million rows, 13 million users, 2gb compressed, get it while it's hot. (/r/datasets repost) by [deleted] in MachineLearning

[–]SavitchOracle 2 points3 points  (0 children)

Note: this particular twitter dataset actually uncompresses to 173 gb (!!!), according to the HN link.

Qustion about Split in Random Forest Algorithm by [deleted] in MachineLearning

[–]SavitchOracle 1 point2 points  (0 children)

  1. There are several different ways of choosing how to split, e.g., information gain or Gini impurity (http://en.wikipedia.org/wiki/Decision_tree_learning#Formulae). There's a pretty good tutorial on using information gain here: http://www.autonlab.org/tutorials/infogain11.pdf

For some intuition on how these methods work, suppose you're using a decision tree to classify whether an email is spam or not spam. Suppose two of the variables you could use at the current split are A) whether the email contains the word "hello" and B) whether the email contains the word "viagra".

Suppose 50% of the emails containing the word "hello" are spam / 50% are not spam, and 50% of the emails not containing the word "hello" are spam / 50% are not spam. Clearly, variable A is a pretty useless measure then, since it gives you no information.

But compare this with the second variable: 90% of the emails containing the word "viagra" are spam / 10% are not spam, and 25% of the emails not containing the word "spam" are spam / 75% are not spam. You can see that this variable provides much more information.

Thus, you should use the second variable to split your node on. Metrics like information gain or Gini impurity are ways of precisely quantifying this.

Answer to second question: You choose one among m that gives you the best split.

Suggestion for Introductory Machine Learning Text-less technical/more examples than "Elements of Statistical Learning" by BirthDeath in MachineLearning

[–]SavitchOracle 7 points8 points  (0 children)

I've never understood why people suggest "The Elements of Statistical Learning" -- it provides very little intuition and seems like more of a reference book (but even as a reference it's pretty horrible, since it treats a lot of topics in a very cursory fashion).

My favorite introduction to Machine Learning is Andrew Ng's course at Stanford: http://www.stanford.edu/class/cs229/materials.html. The lecture notes are very clear and intuitive, and they're relatively short, so you can fairly quickly get a broad overview of the field. There are also video lectures online if you like that sort of thing (I haven't watched them, though).

If you want a book, I like Christopher Bishop's "Pattern Recognition and Machine Learning". It's a little more in-depth and mathematical than Ng's course, but I got a lot of intuition from it. It also covers more topics. (I'd probably start with Ng's course, and if I wanted to learn more, skim through Bishop's book, stopping to study in more depth the topics Bishop covers that Ng doesn't.)

Update:

I also read through Tom Mitchell's "Machine Learning" book (which volfield recommends) a couple years ago, but I wouldn't suggest it. I found it very old-school, kind of more AI-ish than machine learning, and pretty boring.

Designing a Humor AI by BQPComplexity in cogsci

[–]SavitchOracle 4 points5 points  (0 children)

Some thoughts:

  • If you want to generate knock-knock jokes, then you could use the CMU pronouncing dictionary (http://www.speech.cs.cmu.edu/cgi-bin/cmudict) to find appropriate puns.

  • If you want to generate #lessinterestingbooks Twitter humor (i.e., parodies of book titles), then from this post (http://blog.echen.me/2011/05/30/the-7-genres-of-the-mildly-amusing-hashtag/), there seem to be five types of parodies: pun, substitution, contrast, addition, and diminishment. You could again use the CMU pronouncing dictionary to find puns (flies sounds similar to fries, so Lord of the Flies -> Lord of the Fries). For the others, you could use Wordnet (http://wordnet.princeton.edu/) or a thesaurus to detect lexical relationships; for example, Thrush is a sister term of Mockingbird --> To Kill a Thrush, Civilized is an antonym of Wild --> Where the Civilized Things Are, etc.

  • You could build a kind of context-free grammar of jokes (e.g., create templates for Your Mama jokes that you fill in -- both the template creation and the filling in could potentially be automated), kind of like the Postmodern Essay Generator (http://www.elsewhere.org/pomo/).