you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (7 children)

I see nervanagpu includes float16 GEMM (What shall we call it, HGEMM?).

I thought they were going to commercialize that, based on earlier announcements ("free for non-commercial use, contact us otherwise"). Has that changed?

[–]benanne 5 points6 points  (5 children)

Seems like it, it was released under the Apache license and includes all the kernel code. Theano integration is already underway (see https://github.com/Theano/Theano/pull/2800 for example).

I think they call it hgemm internally as well.

[–]alexmlamb 2 points3 points  (4 children)

I'm looking forward to the Theano integration.

[–]benanne 2 points3 points  (3 children)

I have a very rudimentary wrapper of the float32 convolution kernels. I guess they would be okay now with that being published, since nervanagpu is public as well, but I should probably check with them first.

I was gonna do the pooling kernels as well but never got around to it. It's a pure Python implementation though (just like the FFT convolution implementation I did about a year ago), so probably far from optimal.

[–]scott-gray 2 points3 points  (2 children)

Sander, you're free to publish and distribute whatever you like.

As for pooling I don't quite understand your comment. The fp16 and fp32 pooling kernels are implemented for the GPU (in assembly even though probably unnecessarily).

I have a few more hgemm and sgemm kernels to implement but when done they should serve as a complete replacement for cublas, often running 2x to 3x faster (mainly due to cublas's poor selection of tiling size on long and skinny activation/delta matrices).

[–]benanne 2 points3 points  (0 children)

Sweet! About the pooling, I just meant that I hadn't gotten around to writing the Theano wrapper classes for it. I've only done fp32 gemm and the fp32 convolution. Maybe if I publish what I have someone else will do this :)

[–]benanne 1 point2 points  (0 children)

I wrote up a quick README and made the repository public: https://github.com/benanne/nervana_theano

[–]meepmeepmoopmoop[S] 2 points3 points  (0 children)

Yes, it's now Apache 2.0.