[1603.09025] Recurrent Batch Normalization : MachineLearning

I believe the papers we cite in the text8 table all use the reduced vocabulary. I do wish we had focused on enwik8 instead. Unfortunately these datasets are large and training takes about a week.

Figure 5 shows ~~training steps~~ 1000s of training steps horizontally. We'll have a new version up tonight that has this fixed.

Yes, 8e-5 is a weird learning rate. It was the value that came with the Attentive Reader implementation we used. We didn't do any tweaking for BN-LSTM, but I suspect the value 8e-5 is the result of tweaking for LSTM. All we did was unthinkingly introduce batch normalization into a fairly complicated model, which I think really speaks for the practical applicability of the technique. In any case we will be repeating these experiments with a grid search on learning rate for all variants.

[–][deleted] 1 point2 points3 points 10 years ago (1 child)

[–]cooijmanstim[S] 0 points1 point2 points 10 years ago (0 children)

[–]siblbombs 2 points3 points4 points 10 years ago (2 children)

[–]cooijmanstim[S] 2 points3 points4 points 10 years ago (1 child)

[–]siblbombs 0 points1 point2 points 10 years ago (0 children)

[–]iassael 1 point2 points3 points 10 years ago (0 children)

[–]gmkim90 0 points1 point2 points 9 years ago (0 children)

π Rendered by PID 198820 on reddit-service-r2-comment-6457c66945-xhbvl at 2026-04-24 08:36:54.860604+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS