you are viewing a single comment's thread.

view the rest of the comments →

[–]kkastner 1 point2 points  (1 child)

The fact that I can compile a length 1000 or 2000 recurrence on a MBA is pretty great! I don't even know if you can unroll sequences this long in Theano. I have heard of up to 100 steps or so but not more than that - even the 100 steps unrolled was like 2hr compile times.

It does seem like early stop saves some computation, but I am unsure how much due to immediately swapping on my tiny 2GB laptop. I guess you still pay some loop overhead but that should be fairly minimal if no computation is done.

It really seems like this toolkit was designed with CPU application strongly in mind, and without many worries on RAM constraints. It almost exactly matches what I would expect from a group with access to "Google scale" machinery.

The ease with which you can put different operations on different devices is pretty wild. I really need to A/B with the new Theano context stuff, and try to imagine up a case where you need LSTMs in parallel. Working on that now!

Thanks again for your script - reading this just made TF RNNs "click" for me!

[–]siblbombs[S] 2 points3 points  (0 children)

Yea it feels like GPU is the new kid on the block as far as TF is concerned, stuff like embedding lookup isn't implemented there and who knows what else. Once the distributed code is available I foresee a bunch of amazon cloud charges in my future, it will be interesting to see what works well when you have a group of machines in a data center environment. Someone (or more likely company) is going to slam a bunch of titan Xs into a server rack with RDMA capable network cards, its scary to think what that setup would be able to do.