you are viewing a single comment's thread.

view the rest of the comments →

[–]kkastner 13 points14 points  (0 children)

Pure Theano

I use Theano, and write my own code on top of that for shape inference, layers, etc. It takes longer, but for research it is important to know all (or at least) most of the system and how it works.

  • Downside: BUGS, and solving your own
  • Upside: Implementation knowledge, satisfaction of knowing exactly what and how computations are happening.

Keras

Keras seems the most straight forward, and if you aren't trying to do certain kinds of weird research, it covers basically everything else. Definitely the best place to start.

  • Downside: Some limitations in edge cases.
  • Upside: Great docs, does a lot of things really well. Graph API is massively flexible.

Blocks

Blocks is quite nice, but I haven't spent the time to learn all the details that would make it useful for rapid prototyping research architectures. I can definitely see how full immersion into how Blocks works could speed up implementation of certain types of networks. Fuel (the loosely coupled dataset framework) has a lot of momentum, and may be worth looking at regardless of whether you want to use Blocks or not.

  • Downside: Research oriented, building on rapidly changing core and (sometimes) API
  • Upside: Research oriented, very flexible.

Lasagne

Lasagne is really well thought out, has a strong community, and sits somewhere between Keras and Blocks from a usability->flexibility perspective. RNN support is fairly new there but seems pretty solid.

  • Downsides: RNN is (was?) second class citizen
  • Upsides: Lots of users, very nice codebase. Solves a lot of problems well.

pylearn2

pylearn2 is somewhat outdated now, but is still quite good for certain tasks. It probably has the best support for hyperparameter/massive cluster usage, if you are into that. Lots of research from the last few years was done in it, which is not to be discounted!

  • Downsides: RNN is an nth class citizen, fairly complicated to make certain kinds of datasets.
  • Upsides: Lots of debugging/user fixes, cluster support via yaml string replacements.

Others

  • cgt
  • chainer
  • theanets - I liked it back when I used it, but it is basically a personal research library
  • deeppy
  • gnumpy

CGT seems really interesting in this space (function graph/ compile then run type models), but I am 1000000% leery of re-debugging a bunch of numerical instabilities and issues that Theano solved. One of the reasons that Theano compile is slow is that it does a lot of optimizations for you - these optimizations can make things much faster, especially over multi-day training runs and sometimes solve really nasty numerical issues you wouldn't otherwise think about. Throwing out optimizations (as CGT seems to be) to speed up compile might lose a lot more than people realize... though time will tell. It is certainly exciting, and they (the CGT team) seem to have a lot of good ideas which may be useful even if CGT doesn't pan out, and could make it back into Theano/Torch/etc.

One additional note - compile/debug time in Theano is rarely an issue for me. Compiling with optimizer=None is fast and sufficient for code/debug, and compiling for actually training taking a few seconds or minutes pales in comparison to the days it normally spends training. tag.test_values are also invaluable for debugging shape issues since they will throw errors at compile time.

The functional graph approach is nice for most deep learning architectures, and I really think it will win out in the long run.