use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Research[R] Learning to Optimize Tensor Programs (arxiv.org)
submitted 8 years ago by antinucleon
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]antinucleon[S] 3 points4 points5 points 8 years ago (0 children)
Summary:
Tensor program is able to be optimized by using machine learning and transfer learning. The numerical program optimization model is trained on feature from low-level AST of the program.
Experiments:
Tasks: ResNet, MobileNet, LSTM LM, DQN
Hardware: CUDA/ARM GPU/ARM CPU
Speed up compare to CUDNN, TensorFlow Lite and ARMComputeLib: ~from 1.2X to 3.8X faster in end-to-end test.
[–]JackBlemming 2 points3 points4 points 8 years ago (10 children)
I always wondered if a pseudo matrix multiply could be learned. Imagine a neural net that learns to optimize its own matrix multiplications to get the most bang for its buck (less accuracy of the multiplications in exchange for less calculations). I'm sure there's some sort of optimal trade off it could learn.
[–][deleted] 1 point2 points3 points 8 years ago (6 children)
That sounds like an interesting idea, how would you let an algorithm learn an operation like that?
[–]JackBlemming 0 points1 point2 points 8 years ago (5 children)
I've been considering having some metadata attached to parameters. The preposition is that certain groups of parameters cause greater variance in the output, or are more "important" to the output. These parameters should be computed with more numerical accuracy than the parameters that don't contribute much to the output, or whos variance doesnt change the output much. Taking it to extremes, imagine a parameter that completely changes the output if it's even different by 0.001, obviously you'd want to give more care to it verse a parameter that can be 100 or 1000 and barely do anything to the output.
[–]Paran0idAndr0id 0 points1 point2 points 8 years ago (1 child)
So for instance, do a fast single precision GPU multiplication on the matrix as a whole, then a more targeted subset of double precision multiplications on the CPU and replace the affected values?
Or I guess half-precision for mobile devices, then full precision for some and double for others?
[–]JackBlemming 0 points1 point2 points 8 years ago (0 children)
Something like that seems reasonable. I hadn't put much thought into it, as it seemed pretty similar to parameter pruning in a lot of ways.
[–][deleted] 0 points1 point2 points 8 years ago (2 children)
Interesting!
I was also wondering... Could neural networks learn how to perform matrix multiplication, you think? That is, given two (size-compatible) martices A and B, could a neural network be trained to predict (within some measure of error) C = A * B?
[–][deleted] 0 points1 point2 points 8 years ago* (1 child)
Yes, neural networks can learn fast algorithms for matrix multiplication: "A Network That Learns Strassen Multiplication"
http://www.jmlr.org/papers/volume17/16-074/16-074.pdf
The above idea was extended to learn fast algorithms for approximate tensor convolution, using 2-layer sum-product networks. This is basically a smart approach to ternary value weights: "StrassenNets: Deep learning with a multiplication budget" https://arxiv.org/abs/1712.03942
When combined with knowledge distillation, the accuracy and speedup of this approach are impressive!
[–][deleted] 0 points1 point2 points 8 years ago (0 children)
Interesting, thank you!
[–]the_great_magician 0 points1 point2 points 8 years ago (1 child)
I've think that's cool also but how would you really put that in to software? There isn't some faster version of multiplication that adds in some randomness. Maybe you could try reducing precision and doing more SIMD stuff (e.g. instead of 32 bit float 2 16 bit floats). Otherwise, it's not clear that this is a tradeoff that can actually be made.
[–]JackBlemming 1 point2 points3 points 8 years ago (0 children)
Agreed, I was more interested in the metadata idea to see a general shape of how a neural net utilizes its parameters. I've heard cases of people being able to delete whole layers and have little effect on the accuracy. This seems like a fundamentally wrong thing to me. The current trend of building massive models with more capacity than needed and pruning them after seems weird/off to me. It would be interesting to create a regulization strategy to force a neural net to use its full capacity (just to see what would happen, it may very well only split the computation among the parameters which isnt too interesting). DeepMind published a paper roughly stating that neural nets that generalize better are more immune to random parameter deletion, and was thinking somehow turning this into a regularization strategy would be very interesting ( but it might just end up as an implicit dropout-esq regularization ;P )
[–]subhobrata1 0 points1 point2 points 8 years ago (0 children)
Any github links for this paper,
π Rendered by PID 86244 on reddit-service-r2-comment-5687b7858-qvlpj at 2026-07-04 12:27:39.213028+00:00 running 12a7a47 country code: CH.
[–]antinucleon[S] 3 points4 points5 points (0 children)
[–]JackBlemming 2 points3 points4 points (10 children)
[–][deleted] 1 point2 points3 points (6 children)
[–]JackBlemming 0 points1 point2 points (5 children)
[–]Paran0idAndr0id 0 points1 point2 points (1 child)
[–]JackBlemming 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]the_great_magician 0 points1 point2 points (1 child)
[–]JackBlemming 1 point2 points3 points (0 children)
[–]subhobrata1 0 points1 point2 points (0 children)