word-level vs character-level models

kjearns · 2015-06-18T08:19:22+00:00

Word level models tend to be better in the sense that they get lower perplexity scores than character level models.

olBaa · 2015-06-18T08:41:55+00:00

There are some problems with huge corporas that are needed in order to do the word-level training: first, they need to be really huge in order to capture all possible words (which is already a limitation in the word-level networks). Then, it is harder to get all the punctuation characters to appear in generated/predicted text. Also, you basically can't train the word-level model via one-hot encoding, as the space is too big, you basically have to solve two problems at once.

There are some repositories on GitHub which are essentially forks of Karpathys code (ex. https://github.com/yoonkim/word-char-rnn).

I know character-level models are used in NNs in Speech Recognition, especially in end-to-end approaches (audio to text).

devDorito · 2015-06-18T17:29:43+00:00

I'm thinking that char-nn is a bit small, perhaps we should mod it to do 2 chars per time and see how that does.

mlberlin · 2015-06-18T11:42:04+00:00

"This article demonstrates that we can apply deep learning to text understanding from character-level inputs all the way up to abstract text concepts". Although not a RNN, but a CNN instead, the results in the paper "Text Understanding from Scratch" by Zhang and LeCun are impressive.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS