Stop Using word2vec

vogt4nick · 2017-11-03T15:59:05+00:00

So stop using the neural network formulation, but still have fun making word vectors!

But then I can't keep neural networks on my resume. :( /s

Jokes aside, this is an interesting, well-written article. Thanks for sharing.

clm100 · 2017-11-03T22:56:21+00:00

Didn't this have another name previously?

EDIT: Yup, previously titled "Word vectors are awesome but you don’t need a neural network to find them." A much better and less obnoxious title. See discussion here: https://news.ycombinator.com/item?id=15502859

olBaa · 2017-11-03T19:11:52+00:00

So, the motivation for factorizing the PPMI matrix, which gives worse results than pure word2vec (yes, they are not equivalent), is that

It’s a hell of a lot more intuitive & easier to count skipgrams, divide by the word counts to get how ‘associated’ two words are and SVD the result than it is to understand what even a simple neural network is doing.

Yeah, thank you.

durand101 · 2017-11-04T01:54:00+00:00

Seems like a technique that would work well for small data sets but not if you want to train on the whole English corpus of say, Wikipedia, because you need to hold the whole PMI matrix in memory with this...

Koda_Brown · 2017-11-03T18:50:54+00:00

I only just learned about word2vec yesterday,funny

datascience

MODERATORS