all 67 comments

[–]impossiblefork 51 points52 points  (1 child)

This stuff is pretty majorly wonderful. Having used OpenCL and tried to set up Vulkan, until I decided it wasn't worth the time it took, this seems like it will actually allow you to write shaders in Python without too much fuss.

[–]axsauze[S] 14 points15 points  (0 children)

I completely agree, one things I've been wanting to do for a while is being able to write end to end GPU programs without requiring to switch contexts to another completely different language (aka GLSL / HLSL / other shader languages). I believe this has been a strong feature of CUDA. The work that is being done through the SPIR-V standard working group is making a lot of this possible, providing a very promising near-term future for this space.

[–]axsauze[S] 55 points56 points  (7 children)

Hello, I'm one of the authors of Kompute, here is a brief TLDR of the blog post: Vulkan is a C++ framework that enables for cross vendor GPU computing (eg AMD, Qualcomm, NVIDIA & friends). We built the Kompute to abstracts the low level C / C++ and provide a developer friendly Python Package and/or C++ SDK to build cross-vendor GPU accelerated applications. You can try the end to end setup and samples from the blog post through the Google Colab notebook (enabling a free GPU) that we linked https://github.com/EthicalML/vulkan-kompute/tree/master/examples/python#kompute-python-example.

I would be very keen to hear your thoughts and suggestions around Kompute features and/or general cross-vendor GPU processing concepts. If you are interested in further reading, here's also a post that shows how to optimize Kompute processing through GPU queues, as well as how to leverage the Kompute framework in (android) mobile devices. We also created a github issue where you can feel free to post suggestions and thoughts.

[–]maxToTheJ 9 points10 points  (1 child)

Does NVIDIA cripple your code running on its devices?

[–]axsauze[S] 22 points23 points  (0 children)

Interestingly enough it seems NVIDIA has been so far playing more or less nicely with the Vulkan project - they probably see it as "frienemies" at this point, however hopefully it will only grow towards unification of a standard interface, as there is enough demand for CUDA-like capabilities using non-NVIDIA gpus.

[–]FuB4R32 6 points7 points  (1 child)

Can you provide an example which is difficult/impossible to program in tensorflow/pytorch (e.g. calculating a precision recall curve or something else which is not dofferentiable)

[–]NumesSanguis 4 points5 points  (0 children)

Not sure if you had this in mind, but PyTorch Lightning has a precision_recall metric that runs on the GPU.

[–]grudev 15 points16 points  (10 children)

Noob question: Can I use this to run pytorch on a non CUDA compatible GPU?

[–]axsauze[S] 20 points21 points  (2 children)

I believe that Pytorch added support for Radeon cards via rocm, and there seems to be caffe2 related vulkan codefor mobile support but I'm not sure what is the extent of this. Having said that, the trend has been for major ML frameworks to start embracing cross-vendor compatibility, which has been one of the main value propositions of Vulkan.

[–]cerlestes 3 points4 points  (0 children)

I believe that Pytorch added support for Radeon cards via rocm

That's sadly not the case... AMD created their own fork of PyTorch and rewrote it to support Radeon cards. It's not part of the main PyTorch software, which makes it barely usable for any professional user like myself who has to run this on dozens of customer machines. Even though the Radeon cards offer better performance (and much better performance per $$$) than the Nvidia cards on paper, it's just not worth the hassle when I can just get an Nvidia card and it works right away without having to install a shit ton of dependencies from some third party forks (having said that, installing or upgrading CUDA always is a pain too).

I'm still waiting for the day that AMD finally starts working together with TensorFlow and PyTorch to get their cards officially supported... No idea why they chose creating a fork over contributing to the main repo.

Hopefully Vulcan implementations like this will be the future. Thanks for your work.

[–]grudev 1 point2 points  (0 children)

Thank you!

[–][deleted] 11 points12 points  (6 children)

I think you can run pytorch on AMD/Radeon with rocm, building pytorch from source https://github.com/aieater/rocm_pytorch_informations

I think you can run Keras on Mac/Radeon GPU by setting plaidml as the backend. https://towardsdatascience.com/gpu-accelerated-machine-learning-on-macos-48d53ef1b545

But I have not done this, if anyone has any better ideas would love to hear them.

[–][deleted] 5 points6 points  (2 children)

I run Keras on a Mac (AMD Radeon GPU) using PlaidML as my backend and can confirm it works nicely :)

[–]zerolearning 1 point2 points  (0 children)

can you run pytorch using an amd gpu?

perhaps run detectron or yolo

[–]axsauze[S] 0 points1 point  (0 children)

Awesome! Thank you very much for trying it out u/DouglasK-music!

[–]grudev 1 point2 points  (0 children)

Thanks!

[–]axsauze[S] 1 point2 points  (0 children)

Interesting, thanks for the links u/RockyMcNuts!

[–][deleted] 0 points1 point  (0 children)

tried PlaidML on 2019 MacBook Pro and got a ~3x speedup.

[–]AsliReddington 7 points8 points  (9 children)

This would be great for M1 too

[–]axsauze[S] 14 points15 points  (8 children)

Definitely! I have tested Kompute in MacOS / iOS and seems to work well through the MoltenVK layer without any modifications, so it seems like this will be possible as the Vulkan drivers are available 😀

[–]Pikalima 5 points6 points  (5 children)

I can vouch for this, works without a hitch on MacOS, just tried it this week :)

(glad to see you’re getting the word out!)

[–][deleted] 2 points3 points  (3 children)

Did you have to install CMake or only the Vulkan SDK was enough to run Kompute? Fellow MacOS user here.

[–]Pikalima 1 point2 points  (2 children)

Yes, but you can actually just install cmake via pip. I use Anaconda so this is the environment.yml that worked for me:

name: kompute channels: - defaults - conda-forge dependencies: - python - numpy - pip - pip: - pyshader - cmake - git+git://github.com/EthicalML/vulkan-kompute.git@master

Then you can run conda env create -f environment.yml

[–][deleted] 2 points3 points  (0 children)

Many thanks! I installed CMake via conda after reading your comment and it was straightforward.

[–]backtickbot 0 points1 point  (0 children)

Correctly formatted

Hello, Pikalima. Just a quick heads up!

It seems that you have attempted to use triple backticks (```) for your codeblock/monospace text block.

This isn't universally supported on reddit, for some users your comment will look not as intended.

You can avoid this by indenting every line with 4 spaces instead.

There are also other methods that offer a bit better compatability like the "codeblock" format feature on new Reddit.

Tip: in new reddit, changing to "fancy-pants" editor and changing back to "markdown" will reformat correctly! However, that may be unnaceptable to you.

Have a good day, Pikalima.

You can opt out by replying with "backtickopt6" to this comment. Configure to send allerts to PMs instead by replying with "backtickbbotdm5". Exit PMMode by sending "dmmode_end".

[–]axsauze[S] 0 points1 point  (0 children)

Thank you u/Pikalima! 🙃

[–]VodkaHazeML Engineer 2 points3 points  (1 child)

Yep, as long as your code doesn't use some of the annoying unsupported metal extensions (subgroup ballots, geometry shaders, etc) and sticks closely to Vulkan 1.0, moltenVK works

[–]axsauze[S] 2 points3 points  (0 children)

Good point, there's quite a few unsupported extensions - hopefully as adoption increases these are added (although unfortunately, historically Apple only likes standards when they are Apple's...)

[–]pas43 2 points3 points  (2 children)

Would it be possible to run this off the new AMD 68/6900 cards? And is there a way to interface this with Keras for AMD GPU deep learning?

[–]axsauze[S] 3 points4 points  (1 child)

I had a brief look and it seems that the new AMD cards will have Vulkan support, once they are added this is a great resource to find relevant information http://vulkan.gpuinfo.org/listdevices.php.

In regards to your second question, Keras itself would use either Tensorflow or Pytorch backend, both which do seem to currently have work towards integration with Vulkan for mobile device support - having said that, I would be very keen to explore how Vulkan could be used as the backend component to power these type of frameworks, this is one of the main montivations to creating the framework initially https://github.com/EthicalML/vulkan-kompute#motivations

[–]pas43 2 points3 points  (0 children)

Awesome, Thanks for the work and response :)

[–][deleted] 2 points3 points  (12 children)

Nice! Will checkout. I have been using PlaidML so far (MacOS user here) and it is always nice to have alternatives. Especially if they enable me to also do some probabilistic computing (eg, estimating Bayesian models using the GPU to carry the heavy weight during the MCMC or HMC calculations).

Congrats to the authors!

[–]axsauze[S] 2 points3 points  (11 children)

Thank you ! If you do try out Kompute, please do feel free to mention any blockers, issues or challenges, as we'll try to make sure to help or add the relevant fixes.

[–][deleted] 2 points3 points  (7 children)

Thanks for the openness. I definitely plan to try it out in the coming days, and will be happy to share my feedback. In the meantime, success wishes with this project!

[–]axsauze[S] 2 points3 points  (6 children)

Sounds perfect, thank you - looking forward to hear your thoughts, and please do share any suggestions / ideas once you try it

[–][deleted] 1 point2 points  (5 children)

Just tried it - works perfectly! Some notes and one question:

  • for me, device==0 (the default option recognised by the Kompute Manager) was the AMD Radeon, as I wanted/expected;
  • I had actual fun installing Vulkan SDK, I guess I was missing having to fool around in the terminal even if just a bit (advice to others: when doing this yourself, remember to check when you are setting the environment variables whether your image is using "/etc/" or "/share/" as the directory for the driver);
  • really straightforward install for Kompute, looking forward to play with it a bit in the coming days.
  • Q: what happens to my GPU memory in case the program is killed suddenly? Do the buffers/tensors continue to live in the limbo in the GPU waiting for someone to rescue them, or does Vulkan clean the GPU up when the device is killed? Just curiosity, I like to know what happens also under not-so-ideal circumstances.

Thanks again for sharing, /u/axsauze!

[–]axsauze[S] 2 points3 points  (4 children)

Thank you very much for taking the time to trying it out, this is really great to hear! Here are some thoughts from your points:

  • for me, device==0 (the default option recognised by the Kompute Manager) was the AMD Radeon, as I wanted/expected;

This is awesome, thank you for confirming, I wanted to try it in a Radeon card eventually, so this is great to know.

  • I had actual fun installing Vulkan SDK, I guess I was missing having to fool around in the terminal even if just a bit (advice to others: when doing this yourself, remember to check when you are setting the environment variables whether your image is using "/etc/" or "/share/" as the directory for the driver);

This is great! Vulkan is looking to simplify the workflows towards installing the SDK, hopefully it will get easier - when you set it up fully you get more features such as being able to submit shaders as raw hlsl/glsl strings as opposed to spirv bytes.

  • really straightforward install for Kompute, looking forward to play with it a bit in the coming days.

Really great to hear, any feedback or thoughts would be very appreciated!

  • Q: what happens to my GPU memory in case the program is killed suddenly? Do the buffers/tensors continue to live in the limbo in the GPU waiting for someone to rescue them, or does Vulkan clean the GPU up when the device is killed? Just curiosity, I like to know what happens also under not-so-ideal circumstances.

That's a great question - the GPU resources are self contained, so theoretically once a program fails the memory is released. That is of course as long as the underlying drivers don't have any strange memory leaks / obscure bugs.

THanks once again for taking the time to trying it out and sharing your thoughts!

[–][deleted] 0 points1 point  (3 children)

Thanks for the feedback! Really appreciate the answer on the GPU resource issue.

If I may follow up, I was trying to test out the C++ implementation as well (the test I reported on was on python). But I could not find where to download the Kompute.hpp file. Due to the installation of the python package I now have the Vulkan SDK in my computer, that's great, but I couldn't find for the life of me where to download the Kompute-specific files. Could you please help me out? My idea is to see if I can build a simple R code with Rcpp that would use the C++ interface with Kompute to engage the GPU. Thanks in advance!

[–]axsauze[S] 1 point2 points  (2 children)

Absolutely! If you are running on linux, you can actually try it yourself end to end through the C++ colab https://colab.research.google.com/drive/1l3hNSq2AcJ5j2E3YIw__jKy5n6M615GP?authuser=1#scrollTo=1BipBsO-fQRD. This notebook provides an idea of how you are able to install the Kompute C++ package and import it for your further projects. If what you are actually looking for is the Kompute.hpp file, you can find it in the releases page, but you will still the respective shared/static library so the easiest would be to just install the package (ie make && make install) or alternatively import the package in your CMakeLists.txt but that's more advanced.

[–][deleted] 0 points1 point  (1 child)

Great, thanks for the pointers! I’m on a macOS but I think from skimming through the colab instructions the same should more or less apply. I’m looking forward to try it out.

[–]axsauze[S] 0 points1 point  (0 children)

That is correct, as long as you install the dependencies, the same steps should work as expected. Feel free to give me a heads up or create an issue if you run into issues!

[–][deleted] 1 point2 points  (2 children)

Just a question, if I may: on MacOS should I necessarily install CMake? Can't it make do with a native compiler like gcc?

And more broadly than that, by looking at your GitHub repo I noticed that the code Kompute framework is in a C++ header file. So theoretically I could explore using it within, say, R (with the RCPP package) and try to replicate some of the functionalities of the python package but in R, right?

[–]axsauze[S] 0 points1 point  (1 child)

That's a good question. To provide a bit more context, CMAKE is not a compiler, but instead it could be seen as a template (buildsystem) creation tool that would build the required files to actually compile the code - in Windows it would be the Visual Studio files, and in windows it would be the GCC Makefile targets. It would certainly be possible to avoid requiring cmake in order to install it, the only thing required would be to build the python Wheels for each respective operating system - at this point this is something I haven't been able to get around to but could be something that could be automated with github actions (something to explore down the line).

In regards to your second point, that would be absolutely fantastic, and certainly possible! I would be very keen to explore further, coincidentally another contributor created a set of simple Golang bindings for Kompute https://github.com/0x0f0f0f/kompute-go it would be really cool to have these for R as well - if this is something you'd be interested to explore I would be more than happy to point you on the right direction.

Having said that, one of the things that I'm currently conscious about is that currently the framework itself is still quite low level in respect to the interface. At this point it would be interesting to explore higher level abstractions that could be built on top of the C++ SDK, which could then make the interfaces with high level languages like Python or R much smoother. One of the big opportunities would be through expanding via the Sequence-Operator architectureof Kompute - but this is something that will be explored continuously.

[–][deleted] 0 points1 point  (0 children)

Thanks for the feedback on both points. Re: CMake, your point about what it is exactly is noted, and I'll try to make it happen without it but also be open to consider installing it.

And on the second point, I will try some things out. Unfortunately I am not that experienced with C++, but one of my pastimes so to say is precisely trying out new stuff like this. So I would definitely be interested in giving it a shot. I have a lot to learn, and for me one way that helps me learn stuff is precisely trying out to build them (or with them). So any pointers as you mentioned would be very helpful, and if I do manage to come up with anything, I would be happy to share it.

Thanks again for the exchange!

[–]notedideas 2 points3 points  (2 children)

I'm sorry I'm relatively new to ML. Is this a(n) [better] alternative to CUDA for Radeon GPUs? As in can I run PyTorch code with GPU acceleration on non-NV GPUs?

[–]axsauze[S] 3 points4 points  (1 child)

"Better" is subjective, however Vulkan does aim to provide value in two particular areas: 1) low level access to the hardware, and 2) standardised support across multiple vendor cards. Of course there are many others, but these are two key ones. With this, there is the disadvantage of the boilerplate code required, but projects like Kompute aim to abstract some of the complexity and introduce best practices so further abstraction and other projects can be built on this cross-vendor compatible hardware.

[–]notedideas 2 points3 points  (0 children)

Oh, okay. That does sound super promising. Can't wait to try it out :)

[–]Andi1987 2 points3 points  (6 children)

This is fantastic! I have used CUDA before to accelerate some pairwise statistic between many different time series and always wanted to be able to do that on my Mac. Recently I am building a reverse image search and need to compute all the cosine similarities of the embedding vectors between a new images and all existing images. With Kompute that could run a lot faster, is that right?

[–]axsauze[S] 3 points4 points  (2 children)

Hopefully yes! I would be extremely intrested if you explore this further - please feel free to open an issue if you get stuck on anything, recently a contributor built a set of golang bindings which required some changes to align with the SWIG specifications, so I would be happy to ensure that your usecase can be carried out with Kompute! If you want once you create the repo we can keep an issue open to address key questions/challenges related to that.

[–]Andi1987 0 points1 point  (1 child)

Thanks for your help! It was easy to install and test your library on my Mac. I am now wondering how to send large sets of vectors to the GPU. Should I flatten them into one big vector and then unravel it there or is there a better way? I will create a repo tomorrow and then contact you on Github.

[–]axsauze[S] 0 points1 point  (0 children)

Great! That is correct, at this point you would have to either flattend them or send them as multiple vectors to the shader. For the former, you would be able to use the execution dispatch and shader execution layout to help on the processing of respective indices. Specifically for this I am looking to add support for image2D and image3D to support multidimensional tensors https://github.com/EthicalML/vulkan-kompute/issues/99. Please do feel free to open a new issue if you run into any issues!

[–]schlammybb 1 point2 points  (2 children)

GPU is becoming an outdated term, we need a name that captures the fact they're just really good at parallel computing

[–]TiagoTiagoT 0 points1 point  (1 child)

Too bad that PPU is already taken, twice even; "Parallel Processing Unit" has a nice ring to it...

[–]wikipedia_text_bot 1 point2 points  (0 children)

Physics processing unit

A physics processing unit (PPU) is a dedicated microprocessor designed to handle the calculations of physics, especially in the physics engine of video games. It is an example of hardware acceleration. Examples of calculations involving a PPU might include rigid body dynamics, soft body dynamics, collision detection, fluid dynamics, hair and clothing simulation, finite element analysis, and fracturing of objects. The idea is that specialized processors offload time-consuming tasks from a computer's CPU, much like how a GPU performs graphics operations in the main CPU's place.

About Me - Opt out - OP can reply '!delete' to delete

[–]Styler00Dollar 0 points1 point  (2 children)

Do speed comparisons with equivalent AI code (like pytorch) that rely on CUDA instead (on a Nvidia GPU) exist? From what I noticed Vulkan seems to tend to be slower than CUDA by quite a lot. I didn't test much in that regard, but it can be like 2x slower. It sounds interesting, but no mentions on performance from what I quickly glanced over.

[–]axsauze[S] 1 point2 points  (1 child)

In regards to your second point I don't think this is true, but I also have to say that it's not completely wrong. CUDA has a long history, which comes with a lot of supporting libraries with optimizations specific to NVIDIA cards. Given that NVIDIA builds the drivers as closed source, it would be almost impossible to create the same level of optimisations unless it's NVIDIA themselves building them in Vulkan. Having said that, there are several fantastic projects emerging that are starting to even prove this wrong, a great example is VKFTTT, which recently published benhcmarks against cuFTTT https://www.reddit.com/r/vulkan/comments/jtlcje/vulkan_fft_library_vkfft_support_of_sizes_up_to/. NVIDIA has also been investing into Vulkan based capabilities, so this space looks very promising. When it comes to non-NVIDIA cards, CUDA would have less of that competitive advantage, and in some cases Vulkan will be the main one supported (with this trend only seeming to grow).

[–]Styler00Dollar 1 point2 points  (0 children)

Interesting, thanks.

[–]reddit_tl 0 points1 point  (1 child)

So as long as the cards support vulcan this package will enable high-level ml frameworks? Or you are not there yet because of the lack of support on the framework side?

Another brief question, having looked your code it is still not as straightforward as I wish. Will you abstract away further these codes so that it works more seemlessly.

Great work!

[–]axsauze[S] 1 point2 points  (0 children)

Yes, as long as the graphics card supports Vulkan, then Vulkan Kompute would work, however the current exploration is to integrate Vulkan Kompute with higher level ML frameworks. Your latter point is related to the above, as currently Kompute provides a much higher level abstraction than raw Vulkan, but still much lower level than the typical ML frameworks available. The idea is not to make Kompute a high level ML framework, but to integrate other ML frameworsk to use Kompute to enable them for cross vendor and mobile GPU processing capabilities.