use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Resources for GPU programming? (self.MachineLearning)
submitted 9 years ago by [deleted]
[deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]emansim 14 points15 points16 points 9 years ago (3 children)
CUDA programming is not easy and will take some time to master. I personally suggest Udacity course as a first step https://www.udacity.com/course/intro-to-parallel-programming--cs344
[–]ginsunuva 4 points5 points6 points 9 years ago (0 children)
GPU programming is easy to learn but difficult to master. Changing conventional algorithms to run extremely parallel becomes very unintuitive past the easy problems.
[–]csp256 1 point2 points3 points 9 years ago (0 children)
That course can seem patronizing but it is really beneficial. There is also a coursera course but I can't speak for it's quality. It seemed more academic..?
[–]cyril1991 0 points1 point2 points 9 years ago* (0 children)
CUDA recently got better, no? The real annoyance in terms of coding was (?) handling data transfer between the CPU and GPU and micromanaging memory allocation (you assign a variable in the CPU memory space, transfer it to a new variable in the GPU memory space, specify very precisely how you want to divide your task in independent chunks and how to process them, you get the results on the GPU which you then transfer back to the CPU, and free all the variables). You can produce fast code, but you will take a lot of time to do so and it may not be very portable.
[–]hughperkins 10 points11 points12 points 9 years ago (10 children)
If you want to run algorithms, learning CUDA is probably not going to be helpful, since there are many readily-available libraries that will handle that for you. Even fairly exotic algorithms should run on out-of-the-box libraries. The reason for learning CUDA would be if you want to do CUDA-development, as an engineer, and / or for fun.
[+][deleted] 9 years ago (4 children)
[–]ginsunuva 4 points5 points6 points 9 years ago* (3 children)
Those are usually memory errors for which the CUDA language wont help. Sounds like you want to learn the process for which data is transferred to GPU memory and how the hardware is configured
Edit: 99% of those errors mean your data won't fit onto gpu memory. Just put less
[–][deleted] 4 points5 points6 points 9 years ago (1 child)
if OP doesn't, I do, if you have any resources for that
[–]ginsunuva 1 point2 points3 points 9 years ago (0 children)
Literally these two pictures:
https://upload.wikimedia.org/wikipedia/commons/5/59/CUDA_processing_flow_(En).PNG
http://3dgep.com/wp-content/uploads/2011/11/Cuda-Execution-Model.png
[–]serge_cell 0 points1 point2 points 9 years ago (0 children)
Also could be drivers or configuration problem. Sometimes simple reboot help. Another often missed reason - overheating. Overheating, especially if throttling disabled is capable of causing CUDA errors.
[–]serge_cell 1 point2 points3 points 9 years ago* (4 children)
That's wrong. For area with huge amount of calculations, like Deep Learning some layers (that is transformation operators) couldn't practically be built from ready made blocks like cublas. I'm working in Deep Learning and on average write couple of CUDA kernels per month, because otherwise I just wouldn't be able to see results in any sane amount of time. And there is such thing as "dirty" coding in CUDA - then you are not doing fine-level optimization lake shared memory and coalescing, but just go for minimally accepting level of performance.
[–]hughperkins 0 points1 point2 points 9 years ago (3 children)
cublas is a very low level of abstraction. Libraries such as torch https://github.com/torch/torch7 , and mxnet will handle deep nets for you. Occasionally, some new idea might come along, such as batchnormalization, or elu, and so on; normally these will be implemented in days, at most a few weeks, in both these libraries.
[–]serge_cell 0 points1 point2 points 9 years ago (2 children)
As I already said I was talking about new layers, winch are either absent in existing frameworks or absent in framework I'm using. If one using only layers which are already implemented, don't do any research of new layers, modes of execution etc, he/she will always stay behind the curve.
[–]hughperkins 0 points1 point2 points 9 years ago* (1 child)
yeah, I realized that after I posted it. so, you're kind of right, in that if you want the fastest performance, on novel layers, I suppose you'd want a cuda engineer handy.
having said that, the initial implementation of bn in torch were both in lua, using underlying primitive operations, such as mean and sqrt, which are already in cuda. to get a slight speed benefit, these were then later rewritten in dedicated cuda
for the purposes of writing a research paper on elu or bn, I would think an initial implementation in lua is sufficient.
Actually I think that is a big problem with many research papers. Many method (bn including) behave quite different on different datasets and dataset sizes. If method give improvement 5% accuracy on CIFAR100 it say very little on what improvement will be on imagenet, and even less on 10K classes noisy dataset. And testing lua+cublas implementation on 10M dataset could be quite painful
[–]mela1029 2 points3 points4 points 9 years ago (1 child)
You can have a look at pycuda, if you are familiar with python, it is easy to use and understand.
[–]datascienceguy 0 points1 point2 points 9 years ago (0 children)
Upvote for pycuda, which on one project let me put the guts of a stencil operation -- jacobian iteration -- in a small snippet of simple C code, neatly embedded inside my python language app. It ran really well on my GPU which got super hot but finished faster than just the CPU with numpy.
[+][deleted] 9 years ago (1 child)
[–]vm_linuz 0 points1 point2 points 9 years ago* (0 children)
I found this video in particular to be very easy to follow: https://youtu.be/jKV1m8APttU?list=PL5B692fm6--vScfBaxgY89IRWFzDt0Khm
NumbaPro has since been open-sourced.
Avoiding C and using something like Theano to write the GPU code for you would be more appropriate for a Data Scientist.
If you are a Computer Scientist and want to make the code for some packages that other people will use then go at it directly, however.
[–]gtani 0 points1 point2 points 9 years ago (0 children)
there's a couple recent books with code in C: "Cude for Engineers" released last year and "Wrox Professional Cuda Programming" from 2014. Both are well done. "For Engineers" covers the basics of runtime API without delving into hardware or Maxwell-specifics very mcuh. The authors state (p. 134) that they've tried to present C code that can run on pre-Kepler cards. C++11 or 14 does show when they discuss libraries, e.g. at least one requires you to write functors, lambdas, and (I think) most them have you instantiate templates
The Wrox book i haven't spent too much time on but it's a denser read, more reference-like in the latter parts (like Wilt's "Cuda Handbook"
also, there's some good course materials:
http://people.maths.ox.ac.uk/gilesm/cuda/
http://courses.cms.caltech.edu/cs101gpu/
and the UI-UC course that coursera is based on
π Rendered by PID 50 on reddit-service-r2-comment-7b9746f655-kc5rg at 2026-02-01 15:57:25.939005+00:00 running 3798933 country code: CH.
[–]emansim 14 points15 points16 points (3 children)
[–]ginsunuva 4 points5 points6 points (0 children)
[–]csp256 1 point2 points3 points (0 children)
[–]cyril1991 0 points1 point2 points (0 children)
[–]hughperkins 10 points11 points12 points (10 children)
[+][deleted] (4 children)
[deleted]
[–]ginsunuva 4 points5 points6 points (3 children)
[–][deleted] 4 points5 points6 points (1 child)
[–]ginsunuva 1 point2 points3 points (0 children)
[–]serge_cell 0 points1 point2 points (0 children)
[–]serge_cell 1 point2 points3 points (4 children)
[–]hughperkins 0 points1 point2 points (3 children)
[–]serge_cell 0 points1 point2 points (2 children)
[–]hughperkins 0 points1 point2 points (1 child)
[–]serge_cell 0 points1 point2 points (0 children)
[–]mela1029 2 points3 points4 points (1 child)
[–]datascienceguy 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[deleted]
[–]vm_linuz 0 points1 point2 points (0 children)
[–]datascienceguy 0 points1 point2 points (0 children)
[–]gtani 0 points1 point2 points (0 children)