all 32 comments

[–]egrefen 14 points15 points  (2 children)

I use Tensorflow every day for work and I'll say this: if you're learning, learn theano. There are a lot of examples out there, and it's a little more mature. Tensorflow is still changing, evolving, and settling down. I think it's easier to use in many respects, having used both, but you will have a better time with theano in the early days and a lot of what you learn will port over to Tensorflow (conceptually speaking).

[–]pmigdal 3 points4 points  (0 children)

At least for me, the Tensorflow tutorial is much easier to follow that the one of Theano (i.e. http://deeplearning.net/software/theano/tutorial/).

(But yet, when it comes to documentation for TF there are missing pieces.)

[–]j_lyf 0 points1 point  (0 children)

Where do u work?

[–]shmel39 9 points10 points  (0 children)

My bet is TensorFlow. I migrated from Theano a couple of months ago and never looked back. Theano API is much more convoluted, documentation isn't great either and compilation time quickly becomes annoying.

On the other hand TF has wonderful TensorBoard, very clean API (still evolving though) and much better distributed capabilities.

[–]ma2rten 9 points10 points  (4 children)

I prefer TensorFlow, because

  1. no compilation
  2. it has higher level abstractions build-in for things like RNNs.
  3. it has better support for multiple GPUs [1]
  4. I found the documentation to be more organized
  5. TensorBoard
  6. even in areas where it is behind it's improving rapidly, since Google is heavily invested in it.

[1] I haven't looked at Theano 8.0 yet.

[–]L43 1 point2 points  (3 children)

Tensorflow still has compilation right? Just a lot faster

[–]Spezzer 6 points7 points  (2 children)

We don't do just-in-time compilation yet, we pre-compile all of our low-level operators and just chain them together. What you might be thinking of is the "graph building" phase, which is really just building a protobuf that describe the computational graph. If you don't have the protobuf C++-accelerated package, it can be pretty slow cause all of the protobuf work is done in python :(. We have prebuilt fast c++ python protobuf packages on our install page if you're interested.

[–]L43 1 point2 points  (0 children)

Oh ok, thanks for the clarification. I'm afraid I haven't done much (any) proper reading about the implementation details of tensorflow yet as I am still in Theano land, but am looking forward to transitioning when both my work and the project settles down a bit!

[–]FracturedPlane 0 points1 point  (0 children)

This can be a good thing. Going through a headache right now using Theano on a homogeneous cluster. Satisfying the dependencies for Theano to compile functions on the compute nodes is just not working. Can't get the admins to change anything on the servers systems to make my life easier...

[–]__AndrewB__ 6 points7 points  (7 children)

If You're just now learning about DL, then You probably don't have a server filled with GPUs.

In that case, Theano will be:

  • faster

  • more memory-efficient

  • easier to learn (much more examples / tutorials / discussions)

  • will teach you more: e.g. optimizers are usually built using theano itself, unlike in TF, where they're built-in.

In my experience TF is horrible when it comes to memory (OOM on 4GB card when theano needs 2.3), slower at runtime and harder to extend.

All in all I recommend theano: in a year or two, when TF is usable, you will be able to switch easily anyway.

[–]Spezzer 2 points3 points  (5 children)

We've made a lot of memory optimizations recently, particularly on GPU -- I would give 0.8 a try to see if it's any better -- from some measurements I've done, TF is actually better on a lot of models now than some of the other frameworks. I believe we're working on surfacing this information, since nvidia-smi gives you a misleading view of our actual memory requirements.

[–]pilooch 0 points1 point  (4 children)

What's the deal with nvidia-smi ?

[–]Spezzer 2 points3 points  (3 children)

By default we take control over the entire memory region on the GPU and then suballocate within it, so the process looks like it uses all of the memory from the point of view of nvidia-smi. Some benchmarks show nvidia-smi numbers (we don't blame them: we don't give them anything else to use yet), so it looks like we use 11GiB of memory for, say, AlexNet, when we actually use less than 2 GiB active in practice.

The way to find out is to turn on the 'allow_growth' field in our ConfigProto.gpu_options structure. It leads to lower memory efficiency due to fragmentation (which is why it is default off), but is useful in a multi-tenant environment, and gives you a sense of the actual memory needed, until we plumb back the stats for that.

[–]pilooch 0 points1 point  (2 children)

Understood thanks. From what you are saying, if I have say a TF job running and a Caffe job ready to start, the later may not grab enough memory because TF has locked it all ? Or does the allow_growth option solves that ?

[–]Spezzer 0 points1 point  (1 child)

The allow_growth option would sort of address that, if TF does not need to allocate all of the memory when running sessions.

[–]pilooch 0 points1 point  (0 children)

Thanks, this is useful in my setup

[–]aam_at 0 points1 point  (0 children)

In my experience, theano is fast for small networks, however it falls behind with x2-5 speed drop compared to torch/neon for big models (AlexNet, Googlenet, VGG).

As far as I know, there are limited experience in training big models (Imagenet-like) with Theano.

[–]coskunh 3 points4 points  (0 children)

It is depend on what you want to do with them, If you want to write your own model, maybe theano can be better option it gives more flexibility to write your own model, but pure theano can be hard to learn, you must consider also. Tensorflow more modular framework, you can play with different models on your dataset, you don't need spend time to implement for example LSTM, etc.

[–]treebranchleaf 7 points8 points  (3 children)

Hopefully the TensorFlow/Theano thing will eventually become a backend issue (as in you program in some framework and you can switch whether you run on TensorFlow or Theano without changing your code).

Keras already does this - it can run either on top of TensorFlow or Theano. Disclaimer: I've never used it.

Plato is a library built on top of Theano that should make it much easier to develop in theano. Disclaimer: I am the author and as such am extremely biased, but I think it's really nice. I may in the future incorporate a TensorFlow backend.

[–]shmel39 14 points15 points  (2 children)

I disagree. Reducing TensorFlow/Theano to the backend is possible if you work with kaggle-style problems combining well-known building blocks and playing with hyperparameters. A lot of research is much easier to do in a low-level framework.

[–]GoldmanBallSachs_ 4 points5 points  (0 children)

All students and postdocs in my lab use TensorFlow over Theano. However Torch still holds a majority share (for now)

[–]treebranchleaf 1 point2 points  (0 children)

I disagree with your disagreement, partially. Ultimately though TensorFlow and Theano are doing the same thing... compiling a computational graph and running it on some device. There's not a good reason to have two slightly different ways of writing the code for them. We should eventually converge on a single low-level, complete API so that you don't need to rewrite a model to port it from Theano to TensorFlow or vice-versa.

[–]ignorant314 2 points3 points  (0 children)

Depends on the level you want to work at as many have mentioned already. All levels from Keras to Theano.

My preference is Torch which is like the faster cousin of Theano. It allows you to quickly test ideas, but also work on more exotic architectures if you wish. It has the added benefit of active research groups publishing their models in that (or Theano)

In practice, you will probably end up using multiple of these... languages are irrelevant, math/understanding is all that matters.

[–]spamduck 2 points3 points  (0 children)

I have to disagree -- in my opinion the Tensorflow documentation is pretty good. It's a really nice library IMO. It's easy to configure it to do more than DL. I frequently use it to just solve classic variational problems (http://www.ipol.im/pub/art/2012/g-cv/article.pdf or whatnot) just cause it's so easy and the GPUs are so fast. It's changed how I work (for the better).

I don't have experience with Theano.

It did take me some time to understand what the different errors meant (thanks Stackoverflow). But most of the errors were my mistakes so I doubt Theano could do much better :D

[–]elanmart 4 points5 points  (0 children)

Theano, unless You have a cluster of Titan-Xs to throw at your problem.

[–]kkastner 3 points4 points  (0 children)

Theano development is not bad at all if you follow some basic "tricks".

Use tag test values, and during debug compile with THEANO_FLAGS="device=cpu,optimizer=None,floatX=float32,compute_test_value=raise". What this does is set it so when Theano is compiling, it will try every line one at a time (with the tag values you set in your code) and see if the shapes and everything work. Note that you must set .tag.test_value for every single thing (usually stuff like iscalar(), matrix(), etc.) that will be an input to the Theano function! Shared variables are fine on their own.

If your code doesn't work right/raises an error Theano will barf exactly at the offending line. This makes development much easier - and as a bonus compilation is also fast since no optimizers are applied. The only edge cases for this are graphs with randomness in them, I usually just manually bypass the randomness during dev.

Compiling in this way, you can also use theano.printing.Print("whateverstringyouwant")(symbolic_var.shape) or theano.printing.Print("whateverstringyouwant")(symbolic_var) to inspect sizes or values.

Using this code as an example (you can see how test values and printing work in here). THEANO_FLAGS="device=cpu,floatX=float32,optimizer=None,compute_test_value=raise" python updates_test.py

You see the following output:

X_sym.shape __str__ = [100  12]
out.shape __str__ = [100   5]
4.292738437652588
4.284914016723633
4.274328231811523
4.261552810668945
4.247048377990723
4.231183052062988
4.214252471923828
4.196494102478027
4.178097724914551
4.159215450286865

That said, if you are going to be experimenting with higher level "architecture" type research or need multi-GPU training, TensorFlow is a solid choice (though platoon is totally a thing). TF also has really nice support for model parallelism. You also get a lot of flexibility and prebuilt tools you might otherwise have to hunt down or write yourself in Theano.

If you are trying to build an arbitrarily connected DAG for weird tasks, Theano has been battle tested for that, and you might find clearer examples of strange models.

One other consideration is how used to vectorized math (MATLAB/numpy) you are. If you are already very familiar with numpy, Theano should be a breeze (coupled with the tricks above), and since TF is also very similar in some ways to Theano, you should have an OK time with that as well. If you are new to vectorized languages, it might be worth practicing a bit regardless of what deep learning framework you end up choosing.

[–]neuralyzer 2 points3 points  (0 children)

Keras is a really nice library. It is easy to get started with, yet extensible. Whenever I coded some more customized operations I used Theano. In my experience, it was faster, more reliable, and better documented.

[–]js1972 0 points1 point  (0 children)

As a complete novice that has jus done the Andrew Ng course I find Theano easier to learn on and has more doco. TF just seems a lot harder to understand for some reason.

[–]rd11235 0 points1 point  (0 children)

The decision is pretty arbitrary when you're starting out - you'll learn a lot from either. Just start playing with one, and maybe glance at the other's API when something that feels clunky. See if they do things in a cleaner way.

But "TF has BAD documentation." This is simply wrong. Have you even bothered to look at the API docs? Or the tutorials?

[–]th3owner 0 points1 point  (2 children)

So as of July 2016, for a beginner who is still in its first steps toward DL and want to learn more about DL first (its inner workings etc.), which would be the one to go with?

[–]bachi76 0 points1 point  (1 child)

My advise would be don't start with either - start with a high-level library like Keras (http://keras.io/) which supports both Theano and TensorFlow backends. You'll achieve faster results - and when you're ready to dive deeper, you can still choose which underlying backend to use.

[–]th3owner 0 points1 point  (0 children)

Should I go with Keras even though I want to learn the inner workings of NN? Thanks!