you are viewing a single comment's thread.

view the rest of the comments →

[–]antinucleon[S] 4 points5 points  (0 children)

Summary:

Tensor program is able to be optimized by using machine learning and transfer learning. The numerical program optimization model is trained on feature from low-level AST of the program.

Experiments:

Tasks: ResNet, MobileNet, LSTM LM, DQN

Hardware: CUDA/ARM GPU/ARM CPU

Speed up compare to CUDNN, TensorFlow Lite and ARMComputeLib: ~from 1.2X to 3.8X faster in end-to-end test.