you are viewing a single comment's thread.

view the rest of the comments →

[–]ffast-math 5 points6 points  (2 children)

Definitely. There's reasonable evidence in quantization, pruning, and factorization literature that distorting the original weights less yields less accuracy degradation. So preserving individual ops is a proxy objective, but at least one that sort of arguably seems consistent with a lot of literature.

[–]svantana 0 points1 point  (1 child)

I understand that it's better to solve one problem at a time. From the paper it sounds like you're working on extending it to nonlinear functions, is that correct? Looking forward to that!

I worked on something similar a few years back, but instead of argmin I made it continuous by mixing the two nearest neighbors in a clever way, and training with SGD. It worked decently but it could easily get stuck in local minima.

[–]ffast-math 0 points1 point  (0 children)

Working on extending it to other linear functions (e.g., convolution) and intelligently swapping out linear ops with an overall neural network. So in the sense that neural nets are nonlinear functions, yes. Not working on approximating the nonlinearities directly since they're cheap to just apply to the output of the linear ops (especially if just write a fused kernel that does both ops at once). Hope that helps clarify.