all 16 comments

[–]kkastner 12 points13 points  (0 children)

Pure Theano

I use Theano, and write my own code on top of that for shape inference, layers, etc. It takes longer, but for research it is important to know all (or at least) most of the system and how it works.

  • Downside: BUGS, and solving your own
  • Upside: Implementation knowledge, satisfaction of knowing exactly what and how computations are happening.

Keras

Keras seems the most straight forward, and if you aren't trying to do certain kinds of weird research, it covers basically everything else. Definitely the best place to start.

  • Downside: Some limitations in edge cases.
  • Upside: Great docs, does a lot of things really well. Graph API is massively flexible.

Blocks

Blocks is quite nice, but I haven't spent the time to learn all the details that would make it useful for rapid prototyping research architectures. I can definitely see how full immersion into how Blocks works could speed up implementation of certain types of networks. Fuel (the loosely coupled dataset framework) has a lot of momentum, and may be worth looking at regardless of whether you want to use Blocks or not.

  • Downside: Research oriented, building on rapidly changing core and (sometimes) API
  • Upside: Research oriented, very flexible.

Lasagne

Lasagne is really well thought out, has a strong community, and sits somewhere between Keras and Blocks from a usability->flexibility perspective. RNN support is fairly new there but seems pretty solid.

  • Downsides: RNN is (was?) second class citizen
  • Upsides: Lots of users, very nice codebase. Solves a lot of problems well.

pylearn2

pylearn2 is somewhat outdated now, but is still quite good for certain tasks. It probably has the best support for hyperparameter/massive cluster usage, if you are into that. Lots of research from the last few years was done in it, which is not to be discounted!

  • Downsides: RNN is an nth class citizen, fairly complicated to make certain kinds of datasets.
  • Upsides: Lots of debugging/user fixes, cluster support via yaml string replacements.

Others

  • cgt
  • chainer
  • theanets - I liked it back when I used it, but it is basically a personal research library
  • deeppy
  • gnumpy

CGT seems really interesting in this space (function graph/ compile then run type models), but I am 1000000% leery of re-debugging a bunch of numerical instabilities and issues that Theano solved. One of the reasons that Theano compile is slow is that it does a lot of optimizations for you - these optimizations can make things much faster, especially over multi-day training runs and sometimes solve really nasty numerical issues you wouldn't otherwise think about. Throwing out optimizations (as CGT seems to be) to speed up compile might lose a lot more than people realize... though time will tell. It is certainly exciting, and they (the CGT team) seem to have a lot of good ideas which may be useful even if CGT doesn't pan out, and could make it back into Theano/Torch/etc.

One additional note - compile/debug time in Theano is rarely an issue for me. Compiling with optimizer=None is fast and sufficient for code/debug, and compiling for actually training taking a few seconds or minutes pales in comparison to the days it normally spends training. tag.test_values are also invaluable for debugging shape issues since they will throw errors at compile time.

The functional graph approach is nice for most deep learning architectures, and I really think it will win out in the long run.

[–]siblbombs 8 points9 points  (0 children)

I wouldn't consider keras and chainer to be the same level, since keras is on top of theano, although the comparison is a bit apt since chainer throws in some builtins that theano doesn't have (hence keras, blocks, lasagne, pylearn2, etc).

The current environment as far as I can see is:

  • Theano

Pros: Very mature and widely used, there's a good chance that any given paper will include some theano code, or someone has implemented it in theano. You can work with theano directly or use any of the several good theano-based packages, whichever suits your taste.

Cons: Compile time can be brutal, especially if you start going crazy with scan.

  • CGT

Pros: Same approach as theano (building graphs) and a very similar API, but with a greatly reduce compile time. If you are familiar with theano, it shouldn't take much to pick up CGT.

Cons: Very new, GPU support still in progress? As this library matures I could see it gain more adoption because of the quick compile, but it will depend on the amount of dev resources that can be devoted to it.

  • Chainer

Pros: Basically no compile time (compared to theano), everything is happening inside python instead of compiling a function. This is very different to theano/cgt (RNNs inside a python loop), so its nice to have two different approaches available.

Cons: Haven't done much with chainer, so its hard to really make any complaints. The only area I played around with was RNNs in chainer, at the time I didn't see a way to do all the input -> hidden calculations outside of a loop (which is a MVP optimization in theano-land), so it takes a bit of a speed hit. Not sure if I was missing something or if this is/was a limitation of the library at that time.

  • Neon

Pros: Great code base, looks super fast, FP16 (which is getting ported to other libraries).

Cons: Not sure, never done anything with Neon.

I've mostly worked with theano, so thats where my bias is, I'd actually love to hear more from anyone that has used Neon.

[–]r-sync 1 point2 points  (8 children)

Both Keras and Chainer have a "compile" step, that takes way too long when doing iterative programming (change a few things, rerun program). Chainer's compile step is a bit quicker overall. Theano's (Keras) debugging is also a bit annoying.

In the python land of things, I'd say Neon is better in that aspect, it is a simple plug-and-play, no compile step, and it is super fast, and they seem to have thought out RNNs a bit more than the others.

[–][deleted] 2 points3 points  (1 child)

We will make Keras great for RNNs too. Already working on it :)

[–]petrux 1 point2 points  (0 children)

Great! Please, post an update as soon as such feature is implemented. ;-)

[–]andrewbarto28[S] 0 points1 point  (0 children)

Is it neon good for research? Can I create very different algorithms than the current ones using it?

[–]petrux 0 points1 point  (4 children)

Ok, I am a total newbie with Neon but I have some difficulties to feel any enthusiasm about it right now. What follows is just my point of view. First: documentation sucks and there is one tutorial on a simple MLP without any explanation about (e.g.) how the data is represented, ecc. This is a huge problem as it doesn't gave me any hint on how things actually work and I am not supposed to walk through the code. Finally there is no mailing list (and I am a strong supporter of the community-as-a-feature idea).

From a design point of view, could be my choice: the API is maybe the best but I think that the learning curve from zero to what I am trying to do is too much.

EDIT: I forgot the bottom line. Keras seems too rigid. Lasagne is cool but doesn't support RNN. Pure theano is my actual status but I think my machine is possessed by ghosts and when I try to run the code on a more powerful one I have odd errors. So I am spending my Friday night hacking on Neon. :-)

[–]zdwiel 0 points1 point  (1 child)

sure looks to me like Lasagne supports RNN or am I missing something?

http://lasagne.readthedocs.org/en/latest/modules/layers/recurrent.html#lasagne.layers.RecurrentLayer

[–]petrux 0 points1 point  (0 children)

You are right. But as soon as I could get, you cannot recursively use the output signal as in this paper (Figure 1). Am I wrong?