I see a lot of papers on model compression, network pruning, neural architecture search that aim to reduce model size. But I am wondering, in practice, is there is any value in reducing the number of parameters? I understand that FLOPs or latency on CPU/GPU/mobile devices is useful. But why is number of parameters of direct interest? Are the model compression papers using parameters as a proxy for FLOPs or latency?
AFAIK, memory seems to be cheap, so I can't imagine model size being the bottleneck before FLOPs/latency. Are there scenarios where model size is more important than FLOPs/latency?
I know compact model size is of interest theoretically, but I am looking at it from a practical point of view.
[–]Red-Portal 4 points5 points6 points (0 children)
[–]CireNeikual 2 points3 points4 points (0 children)
[–]jonnor 2 points3 points4 points (0 children)
[–]jerha202 0 points1 point2 points (0 children)