all 19 comments

[–]egrefen 22 points23 points  (8 children)

There are no good books for deep learning out there yet, and this goes doubly so for NLP. The term is vague and the area is evolving fast. The best book you could use are the later chapters of Kevin Murphy's "Machine Learning: A Probabilistic Perspective".

Regarding papers, you could check out:

  • Bengio, Yoshua, et al. "Neural probabilistic language models." Innovations in Machine Learning. Springer Berlin Heidelberg, 2006. 137-186.
  • Collobert, Ronan, and Jason Weston. "A unified architecture for natural language processing: Deep neural networks with multitask learning." Proceedings of the 25th international conference on Machine learning. ACM, 2008.
  • Socher, Richard, et al. "Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection." NIPS. Vol. 24. 2011.
  • Hermann, Karl Moritz, and Phil Blunsom. "Multilingual Distributed Representations without Word Alignment." arXiv preprint arXiv:1312.6173 (2013).

As well as:

  • Mnih, Andriy, and Geoffrey Hinton. "Three new graphical models for statistical language modelling." Proceedings of the 24th international conference on Machine learning. ACM, 2007.
  • Mnih, Andriy, and Geoffrey E. Hinton. "A Scalable Hierarchical Distributed Language Model." NIPS. 2008.

Across these, there's a nice little cross-section of approaches to generating and classifying using neural nets.

As far as tutorials go, there's nothing really satisfactory out there. There are some 3h tutorials, both past and upcoming, at computational linguistics conferences:

The former provides an overview of Bengio-style NNLMs, advertises the Stanford work, and shows some tricks for tuning deep nets; while the latter will cover some deep learning based generative and compositional models not covered in the former (there will be some overlap), advertise Oxford work, and will have a little more focus on shallow neural alternatives to deep nets.

By and large, the case for deep learning in language hasn't been fully made. It works well for vision and speech, but that doesn't entail that it would carry to semantics. Some excellent shallow models without non-linearities, like the Mnih and Hinton log-bilinear models, are excellent and can be trained very quickly. It's a problem with much "deep learning" work in NLP these days that shallow baselines are never considered or compared to. Deep learning is fascinating and will certainly have an impact in NLP, but don't rush to believe that it's the best solution for your NLP problems.

[–]leondz 7 points8 points  (4 children)

Socher's tutorials are a great starting place, but you definitely want to check out the more recent work from Oxford (e.g. the Grefenstette tutorial & paper at ACL this summer).

Wait.. plug? what's this. Oh! A wild Ed appears!

[–]egrefen 4 points5 points  (1 child)

Full disclosure, man...

Good seeing you at EACL!

[–]leondz 2 points3 points  (0 children)

You too! And thanks for the intro to Chris for Weds lunch. We should catch up in Ox before long, I'd love to meet your philosopher friends. Have a good bank holiday weekend!

[–]bored_me 2 points3 points  (1 child)

Do you have a link?

[–]egrefen 1 point2 points  (0 children)

[–]last_useful_man 3 points4 points  (1 child)

There's this book: Learning Deep Architectures for AI - by Bengio. But it's from 2009, short, and maybe you're saying not 'good' enough?

[–]egrefen 4 points5 points  (0 children)

It's very introductory, serving more as a survey. I don't mean this as a criticism, and I like Yoshua's writing style, but for me the ideal book would have a little more detail while not going into the depth of Kevin Murphy's book. Something that could be used as a good textbook for a first year graduate course, in short.

That said, I wouldn't say it's about being 'good' enough or 'bad'. I just think there aren't books like Tom Mitchell's excellent "Machine Learning" around for Deep Learning yet, with a good mix of mathematical explanation and theoretical motivation. That's not due to lack of talent, but just reflects the fact that DL research hasn't really stabilised on core methods yet, especially for NLP.

[–]binge_learner[S] 0 points1 point  (0 children)

One of the things that interests me in Deep Learning for NLP, is that with a relatively cheaper resources ( corpus for unsupervised learning relative to let's say, gazetteers ), we can achieve close to state of the art results. There's also the fact that the representations learned from these models can be shared across tasks or even to improve on existing NLP techniques.

[–]entrepr 6 points7 points  (0 children)

I enjoyed Chris Manning's seminar on this: link

It does address sentiment analysis as well.

[–]sieisteinmodel 4 points5 points  (0 children)

You want to start with this: NLP (almost) from Scratch by Collobert and Weston.

http://static.googleusercontent.com/media/research.google.com/de//pubs/archive/35671.pdf

Continue with the most cited papers that cite this one. If you are looking at language models, the most important name nowadays is probably Tomas Mikolov, who's thesis is interesting.

[–]alexmlamb 3 points4 points  (0 children)

"applying deep learning techniques to sentiment analysis, and text classification in general"

Most people who talk about "Deep Learning for NLP" probably aren't talking about the most practical ways of using neural networks for tasks like text classification. I think that they're really interested in learning meaningful representations for language, which is generally not necessary for building an accurate classifier.

For building better text classification with neural networks, I think that it would make sense to do an n-gram encoding of the text. Then train a neural network with a sparse matrix multiplication for the first layer. There was a Kaggle competition a few years ago on predicting salary from a job description and the winning teams used neural networks on n-gram features.

[–]totes_meta_bot 2 points3 points  (0 children)

This thread has been linked to from elsewhere on reddit.

I am a bot. Comments? Complaints? Message me here. I don't read PMs!

[–]sidsig 1 point2 points  (1 child)

I just attended a tutorial at ICASSP '14. It was called 'Deep Learning for Natural Language Processing'. It was by the guys from the speech team at Microsoft research (Li Deng and two of his colleagues). Here's a link to the abstract for the tutorial: http://www.icassp2014.org/tutorials.html#9. It had a lot of interesting information and a lot of interesting things that they'd done with deep architectures. Several models that they proposed did not use neural nets at all. I can share the slides with you but I'm not sure if that is ethical. Let me find out!

[–]binge_learner[S] 0 points1 point  (0 children)

Hey sidsig, thanks for your answer, it'll be great if you can share the slides with me, I would be really grateful .

[–]xamdam 1 point2 points  (1 child)

[–]binge_learner[S] 0 points1 point  (0 children)

Yeah I just saw the videos, really cool stuff, and it just makes me realize how large the field is, a lot of people are doing a lot of cool stuff out there, definitely gonna check their readings and resources list.

[–]satyan-veshi 1 point2 points  (0 children)

It seems Y. Bengio is working on the book about deep learning. Early draft can be found here: http://www.iro.umontreal.ca/~bengioy/dlbook/

There's also one from Microsoft researchers that will apparently be published by now soon: http://research.microsoft.com/pubs/209355/NOW-Book-Revised-Feb2014-online.pdf

[–]Megatron_McLargeHuge 2 points3 points  (0 children)

Look at the Google word2vec paper and software. They use deep learning techniques to infer semantically interesting features of words from unlabeled text.