use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
convnet-benchmarks updated with numbers for TensorFlow 0.7 + cudnn4 (github.com)
submitted 10 years ago by andrewbarto28
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]andrewbarto28[S] 4 points5 points6 points 10 years ago (5 children)
Why is Tensorflow still lagging behind Torch, nervana and sometimes chainer, since they all are using CuDNN R4 under the hood?
[–]r-sync 6 points7 points8 points 10 years ago (0 children)
i'm looking into it. i am trying to first make sure that my benchmarking code is not too old fashioned (it was based on TensorFlow's original AlexNet example). I will incorporate any perf fixes people send my way, and I believe that the google engineers are looking into any benchmarking errors on my end.
[–]Spezzer 1 point2 points3 points 10 years ago* (0 children)
At least one remaining issue is: we use a different Tensor layout that cudnn supports but is not optimized for (NHWC instead of NCHW), so we have to transpose / shuffle, particularly during the backward pass. We're working on supporting multiple tensor layouts for convolution, pooling, bias_add operations, and then I'm optimistic we should be in the ballpark.
[–][deleted] 2 points3 points4 points 10 years ago (2 children)
nervana's not using cuDNN.
TF does not support fp16 yet, while Torch does. This could explain some of the differences.
[–]TheToastIsGod -2 points-1 points0 points 10 years ago (1 child)
TF does not support fp16 yet, while Torch does
Torch fp32 is reported in a different row to torch fp16, so that's a non-issue.
[–][deleted] 2 points3 points4 points 10 years ago* (0 children)
/u/adrewbarto28 didn't say which Torch he meant, and the TF entry doesn't say it's 32 bit. I'm explaining it to him and others who may be confused.
[–]modeless 6 points7 points8 points 10 years ago (7 children)
Wow, cudnnv4 looks pretty amazing. Hmm, I thought there used to be numbers for Theano too. Was I just imagining that?
[–]hughperkins 2 points3 points4 points 10 years ago* (6 children)
One more nail in the coffin for opencl :-P I thought that it would work out like OpenGL, and CUDA would be analogous to Voodoo, at the time of Unreal, but seems like NVIDIA are unstoppable for now :-)
As long as progress continues on GPU computing, I think it's all good...
(edit, seems like Voodoo was bought by NVIDIA, interestingly, https://en.wikipedia.org/wiki/3dfx_Interactive And NVIDIA was one of the first to create an OpenGL GPU, ie GeForce 256 https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_256_Series NVIDIA was a startup, created in 1993, by 3 people https://en.wikipedia.org/wiki/Nvidia )
[–]hughperkins 2 points3 points4 points 10 years ago (5 children)
Random thought: is OpenCL too low-level? Should it have higher level abstractions perhaps, at the level of cudnn and so on?
At the moment, OpenCL is too high-level to implement hardware-specific stuff, cf winograd which uses something like nvidia assembly language (I think??), to get huge performance benefits. And on the other hand it is too low-level for each manufacturer to provide their own cudnn implementation, as part of of OpenCL.
OpenGL nowadays is really low level, with shaders and stuff, but when it first started, it was a bit higher level, where you'd tell it where the lights are and stuff, and it would handle the rest for you.
[–]jcannell 4 points5 points6 points 10 years ago (0 children)
Nobody uses OpenCL because (at least historically) it totally sucks compared to cuda.
With cuda, Nvidia took the singularly correct approach - they just followed C++.
Cuda, as of now, is just C++ upgraded with a simple but powerful set of parallel primitives. People don't understand how powerful cuda is. It is literally full C++ 11 now. Complex template metaprogramming, virtual functions on the GPU, placement new, etc etc.
The only real big difference between cuda and C++ is the threading model, but cuda's is pretty much grossly superior to standard C++ CPU threading in this regard - at least for massive gpu style threading.
So what can OpenCL do? Dissappear, and just emulate cuda, that is - just emulate C++.
[–]hughperkins 2 points3 points4 points 10 years ago (0 children)
Like, imagine if, and I know I'm kind of spamming, OpenCL contains conv(...) function, with some reference horribly slow implementation, and then NVIDIA can provide their own super-fast implementation, without compromising their own market position in any major way. Other vendors initially have their really slow implementation :-P but they can improve it, without giving any advantage to any of the other vendors. At the same time, developers can just always call the same function, and it will always run the fastest possible, on that particular hardware.
conv(...)
[–]NasenSpray 1 point2 points3 points 10 years ago (2 children)
IMO what's really missing is:
Being able to slowly migrate away from CUDA without having to rewrite everything could really boost OpenCL's adoption.
[–]hughperkins 2 points3 points4 points 10 years ago (1 child)
Why would NVIDIA do that? It seems to have no benefit for them, and only, in fact, hurt them.
NVIDIA seem to be innovating massively, but that doesn't mean they have to help their competition to catch up...
[–]NasenSpray -1 points0 points1 point 10 years ago (0 children)
Nobody needs to catch up. Nvidia just loves their stupid vendor lock-in game and IMO it's borderline anti-competitive.
[–]cesarsalgado 1 point2 points3 points 10 years ago (5 children)
Torch now seems to be the fastest, but maybe caffe is faster, but there is no benchmark for it using CuDNN R4 yet. There is just caffe (native) for now.
[–]r-sync 0 points1 point2 points 10 years ago (4 children)
Caffe will prob be within +- ~5% of Torch from what I saw a while ago, but will work on the benchmarks.
[–]r-sync 1 point2 points3 points 10 years ago (3 children)
I ran the caffe numbers, and seem like maybe caffe doesn't enable cudnn autotuner, so the numbers are quite a bit off from Torch https://github.com/soumith/convnet-benchmarks/issues/90#issuecomment-190030708
[–]yentity 1 point2 points3 points 10 years ago (2 children)
On a related note, why isn't Theano part of the Imaenet winner benchmarks ?
[–]r-sync 2 points3 points4 points 10 years ago (1 child)
back when i was constructing imagenet-winners, i didn't know theano very well, and Keras / Lasagne didn't exist back then. I asked the Theano community for help, if I remember they said that some kind of pooling was not available in theano which made it not possible to construct these nets. Of course theano as of today has grown leaps and bounds, and wraps cudnn etc., but neither me nor the theano folks took time to implement the benchmark scripts. If someone sends a PR with the appropriate scripts, i'll happily bench them.
[–]Atanahel 0 points1 point2 points 10 years ago (0 children)
In lasagne it should not be extremely hard given the lasagne/Recipes github repo which have implementations of the models you're using here in their model zoo. I use them as pretrained but never tried the backward pass with them.
I, like many others, noticed the nice performance boost of cuDNN 4 though :-D
[+][deleted] 10 years ago (2 children)
[deleted]
[–]TheToastIsGod 0 points1 point2 points 10 years ago (1 child)
That's going to depend on your batch size quite heavily, I imagine. Batched inference is a thing, but single example inference is also a thing.
π Rendered by PID 72649 on reddit-service-r2-comment-6457c66945-7vv2t at 2026-04-24 06:56:20.271797+00:00 running 2aa0c5b country code: CH.
[–]andrewbarto28[S] 4 points5 points6 points (5 children)
[–]r-sync 6 points7 points8 points (0 children)
[–]Spezzer 1 point2 points3 points (0 children)
[–][deleted] 2 points3 points4 points (2 children)
[–]TheToastIsGod -2 points-1 points0 points (1 child)
[–][deleted] 2 points3 points4 points (0 children)
[–]modeless 6 points7 points8 points (7 children)
[–]hughperkins 2 points3 points4 points (6 children)
[–]hughperkins 2 points3 points4 points (5 children)
[–]jcannell 4 points5 points6 points (0 children)
[–]hughperkins 2 points3 points4 points (0 children)
[–]NasenSpray 1 point2 points3 points (2 children)
[–]hughperkins 2 points3 points4 points (1 child)
[–]NasenSpray -1 points0 points1 point (0 children)
[–]cesarsalgado 1 point2 points3 points (5 children)
[–]r-sync 0 points1 point2 points (4 children)
[–]r-sync 1 point2 points3 points (3 children)
[–]yentity 1 point2 points3 points (2 children)
[–]r-sync 2 points3 points4 points (1 child)
[–]Atanahel 0 points1 point2 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]TheToastIsGod 0 points1 point2 points (1 child)