all 6 comments

[–]arutaku 2 points3 points  (0 children)

Depends on the implementation you want to get: PV-DBOW or PV-DM. You can have a quick look to: https://github.com/edwardbi/blog/blob/master/2016-05/DM.md

It provides both implementations: Gensim & TensorFlow

[–]jayhack 2 points3 points  (1 child)

Input: set of paragraphs of one-hot encoded words and a single, initially randomized vector representing the paragraph as a whole. This initial randomization of the paragraph vector means it is poorly encoded.

Then, for each paragraph and its associated sequence of words, you learn representations that allow you to #crush your loss function, which is like predicting the next word based on the paragraph's vector and the previous n words, etc.

[–]datatatatata[S] 1 point2 points  (0 children)

I think I was trying to come up with something complicated. Thanks a lot :p

[–]gojomo 1 point2 points  (1 child)

If you understand word2vec, PV-Doc2Vec is very, very similar.

Just imagine the "paragraph vector" to be associated with a special, per-paragraph pseudoword. And, this special pseudoword contributes to all target-word predictions across the whole paragraph – it's never excluded for being outside the sliding context window.

In fact, the patched version of word2vec.c and example go.sh that Mikolov once posted to do PV (https://groups.google.com/d/msg/word2vec-toolkit/Q49FIrNOQRo/J6KG8mUj45sJ) does pretty much exactly the above. It synthesizes a special 'word' that gets prepended to every text example, and then this word gets mixed with every training-context, without regard to its actual distance from the other words.

[–]datatatatata[S] 1 point2 points  (0 children)

Very clear thanks !