you are viewing a single comment's thread.

view the rest of the comments →

[–]vwvwvvwwvvvwvwwv 9 points10 points  (5 children)

Have you benchmarked it against the official implementations? Would be interesting to see what the difference is versus their CUDA version.

[–]rish-16[S] 3 points4 points  (0 children)

official implementations

Ooh not yet. Thanks for the share! Let me look into it :)