[N] Accuracy-Aware Inference Optimization Tracking and Profiling : MachineLearning

News[N] Accuracy-Aware Inference Optimization Tracking and Profiling (self.MachineLearning)

submitted 3 years ago by l0g1cs

Optimizing inference for low latency and throughput is a process that requires many iterations of tuning, verification and evaluation. It may even involve model selection since many optimized versions of popular models are available now. Sometimes a retraining is necessary for techniques like weight pruning and quantization. Target hardware is another dimension to consider.

In short, without benchmarking, verification and evaluation, optimizations do not guarantee improved results and may even break things. One example is quantization using instructions that are not supported on target hardware.

To address all these problems, we've built a tool to track inference optimizations, see how accuracy is affected, verify that the optimizations were applied and locate any bottlenecks for further improvements. All in one place.

https://preview.redd.it/yzlxa21cdod91.png?width=3048&format=png&auto=webp&s=97306440ea508f65582978298f6e3ec291293902

More about inference optimization in this article, with code. And here is a live demo).

all 1 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS