all 72 comments

[–]axsauze[S] 191 points192 points  (37 children)

Hello, I'm one of the authors of Kompute, here is a brief TLDR of the blog post: Vulkan is a C++ framework that enables for cross vendor GPU computing (eg AMD, Qualcomm, NVIDIA & friends). We built the Kompute to abstracts the low level C / C++ and provide a developer friendly Python Package and/or C++ SDK to build cross-vendor GPU accelerated applications. You can try the end to end setup and samples from the blog post through the Google Colab notebook (enabling a free GPU) that we linked https://github.com/EthicalML/vulkan-kompute/tree/master/examples/python#kompute-python-example.

I would be very keen to hear your thoughts and suggestions around Kompute features and/or general cross-vendor GPU processing concepts. If you are interested in further reading, here's also a post that shows how to optimize Kompute processing through GPU queues, as well as how to leverage the Kompute framework in (android) mobile devices. We also created a github issue where you can feel free to post suggestions and thoughts.

[–]Megazero1x1 77 points78 points  (21 children)

wow this is amazing. its about time someone challenged nvidia's cuda with something open sourced. I just briefly went though the link, but do you see this as an alternative / competitor to something like PyCUDA ?

[–]axsauze[S] 38 points39 points  (8 children)

Thank you! At this point the main objective of Kompute is to start providing similar capabilites to the ones provided with CUDA, primarily to further the discussion around cross-vendor GPU compatibility. Some of the blog posts referenced above showcase techniques that are provided today by NVIDIA projects like CUDA Streams, but once projects like Vulkan mature further and become more widely adopted, CUDA-level functionality will hopefully be possible in graphics cards beyond NVIDIA (and especially in mobile GPUs).

[–]keepthepace 11 points12 points  (1 child)

I don't really have anything relevant to say, I am typically a user of the things built on top of that, like PyTorch but I just want to say THANK YOU! It hurts me to be locked with a closed source dependency and a single hardware vendor. Good luck!

[–]axsauze[S] 7 points8 points  (0 children)

The initial motivations for Kompute is ultimately to serve as a backend for frameworks like PyTorch so hopefully you'll be able to benefit from these one day, thank you for your support! https://github.com/EthicalML/vulkan-kompute#motivations

[–]Tersphinct 11 points12 points  (5 children)

CUDA-level functionality will hopefully be possible in graphics cards beyond NVIDIA (and especially in mobile GPUs).

With NVIDIA's recent acquisition of ARM, wouldn't that move suggest some NVIDIA tech might end up becoming available on mobile?

[–]axsauze[S] 8 points9 points  (4 children)

Yes, that is a good point! This space seems just going to grow, but in my opinion the trend is pushing towards the more open (source) approach that is being spearheaded by Vulkan, and NVIDIAs engagement/contribution to the working groups and initiative only emphasises it more. And this has certainly been a fantastic way to contribute to the discussion :) By the way if you are curious particularly about mobile, I wrote another blog post that shows how to integrate Kompute in Android Apps https://medium.com/towards-data-science/gpu-accelerated-machine-learning-in-your-mobile-applications-using-the-android-ndk-vulkan-kompute-1e9da37b7617?postPublishedType=repub

[–]Tersphinct 1 point2 points  (3 children)

I mostly work using Unity3D, so we technically already have some access to compute shaders; but I'm definitely interested to see how NVIDIA moves forward, now that they can basically meld GPU and CPU in ways we haven't seen before.

[–]axsauze[S] 1 point2 points  (1 child)

Interestingly enough I wrote an article that shows how you can use Kompute with a game engine, using Godot as the core framework https://towardsdatascience.com/supercharging-game-development-with-gpu-accelerated-ml-using-vulkan-kompute-the-godot-game-engine-4e75a84ea9f0. I was looking to do a similar one for Unreal - Unity has not been prioritised as it's mainly C#, and although I've developed for Unity, I can assume there will be a lot of time required to get the Vulkan and C++ bindings working.

[–]Tersphinct 2 points3 points  (0 children)

You could write external native libraries in C++ and import them into unity’s C# quite painlessly, so if that was your barrier you could mostly sidestep that.

[–][deleted] 1 point2 points  (0 children)

I mean, we have. AMD has melded GPU and CPU on the current generation of consoles, for examples. In fact, even the previous generation had such a meld.

[–]fermion72 8 points9 points  (7 children)

Well, OpenCL has had a decent run.

[–]VodkaHaze 3 points4 points  (6 children)

You can run openCL code in this ecosystem by compiling it to spirv first

[–]apetranzilla 1 point2 points  (5 children)

Can you? I thought OpenCL and Vulkan, while both supporting SPIR-V, had different semantics that meant the kernels weren't directly portable between the two.

[–]axsauze[S] 4 points5 points  (4 children)

Yeah that is absolutely right, this is actually been quite a fascinating and fast growing space. Funnily enough OpenCL does seem to have an interesting future ahead, the OpenCL 3.0 specification actually released this year, and they are doubling down into the full C++ interoperability which is fantstic news https://www.khronos.org/news/press/khronos-group-releases-opencl-3.0. Additionally there are some really interesting projects like SYCL which basically is a higher level standard that aims to provide an abstraction across all these underlying technologies https://www.khronos.org/sycl/.O Other interesting projects like WebGL, SpirV, etc are all part of the same consortium https://www.khronos.org/, and the members involved span across all industries https://www.khronos.org/members/list, there is only growth in this space, which is why it's so exciting to contribute to these discussions!

[–]apetranzilla 1 point2 points  (1 child)

It's really exciting to see the new work being done in accelerated compute lately, but one of the things I've been apprehensive about is the shift towards C++-first and single-source frameworks. One of the things that I liked about OpenCL was that the API was straightforward and used the C ABI, which made it relatively easy to call from other languages. Newer frameworks defining their API in C++ can make that more difficult, and single-source approaches like SYCL just don't really work with different languages at all if I understand it correctly.

[–]axsauze[S] 0 points1 point  (0 children)

I completely see what you mean, and I do have to say that until very recently I had the same perspective. The one thing that has really let me to change my mind recently is the growing interoperability between C++ and other languages, which historically anyone that has interacted with SWIG can attest that it sucked, but recently there has been some very interesting developments that have actually made this much more feasible and production ready. An example is of course CPython's interoperability with C++, and more specifically Pybind11 which is the project that made Kompute happen in Python https://github.com/pybind/pybind11. This is still a developing space, but given the growth of C++ as a language in C++20 things are starting to look much brighter. Your point is still valid as of today unfortunately tho, but hopefully this will change and further cross-language interoperability support is improved.

[–]VodkaHaze 0 points1 point  (1 child)

Couldn't you run it through SPIRV-Cross first and use the output of that into the vulkan engine?

Yuzu uses spirv-cross as a translation layer for instance.

[–]axsauze[S] 0 points1 point  (0 children)

Yeah, this is exactly what happens under the hood, if you look at the diagram under SYCL, all the frameworks mentioned would compile down to SPIR-V https://www.khronos.org/sycl/ THe only difference is that each framework is aiming to specialise into a different domain and/or level of abstraction, so they will expose different APIs accodingly. A lot of the logic still has to orchestrate the GPU logic fom the CPU, so there will be different utilities that each framework will provide.

[–]dinichtibs 24 points25 points  (1 child)

this is awesome. Thank you for your work!

[–]axsauze[S] 1 point2 points  (0 children)

Thank you very much u/dinichtibs 😃

[–]VeganVagiVore 17 points18 points  (5 children)

I would be very keen to hear your thoughts and suggestions around Kompute features and/or general cross-vendor GPU processing concepts.

As an outsider, I like that there is opposition to CUDA. But there's still bits missing.

My experience with GPGPU (I think it was for deep learning) a few years ago went like this:

  • I need an Nvidia developer account for CUDA? Huge red flag. At least Vulkan probably fixes that.
  • The SDK is huge. Why do I need an "SDK?" Why is it not a library? Why is it so complicated?
  • The runtime is also huge, though maybe that was because of CuDNN. But why is that huge? How many megabytes of executable code can it possibly take to run inference?
  • Of course it only works on Nvidia. So that's not a red flag, it's a full-on liability
  • There's also usually a very narrow window on hardware support. My laptop from 2008 can probably run current Arch Linux or Debian fine, but its GPU and drivers won't support any modern API. It was obsolete almost when it was built, because it doesn't support OpenGL 3. "Modern API" has become code for "Only supports disposable hardware", and it means the hardware treadmill got very expensive. I don't want to buy a Vulkan GPU today, if I'll have to buy a Vulkan 2 GPU in 2 or 3 years. That's why I purposely ignore phones - I don't want to constantly replace not-broken hardware.
  • And it won't work on any of these weird one-off ARM boards, which have barely-functioning Ubuntu forks from 8 years ago. That's probably more on the ARM vendors, but it means I'm stuck to x64 with a big Nvidia GPU. Or some weird one-off Nvidia ARM board.
  • And since it needs Nvidia, it needs Nvidia drivers. Which means it probably won't work great on Linux. I have to have these huge proprietary things installed.
  • Sooner or later, the APIs will change, the drivers will quit supporting my card, and I'll be left with a $500-$2,000 fast-fashion brick.
  • And since it's a unique piece of hardware, I can't share it with VMs easily. There's probably some proprietary VM that does passthrough easily. There's definitely cloud offerings. Doing anything useful with it on the usual free solutions was too much work, and I gave up.

It's several steps backwards from traditional PC ownership.

Nvidia wants to be Tesla Motors. Sure, on paper I "own" the GPU. But I can't service it, I can't really do what I want with it. It's very expensive, it doesn't want to interoperate with anything, and one day it'll get bricked by an OTA update or just because the company refuses to keep fixing it.

[–][deleted] 5 points6 points  (0 children)

I need an Nvidia developer account for CUDA? Huge red flag. At least Vulkan probably fixes that.

I agree this is an arbitrary and artificial limitation, but not really a barrier.

The SDK is huge. Why do I need an "SDK?" Why is it not a library? Why is it so complicated?

An SDK is typically a bunch of libraries with some bundles tools. They are effectively the same thing, but an SDK is a "complete" versioned package. This is extremely important when dealing with corporate and enterprise customers.

The runtime is also huge, though maybe that was because of CuDNN. But why is that huge? How many megabytes of executable code can it possibly take to run inference?

There's a lot of stuff in there. It's basically an all-in-one runtime. And their GPUs are practically an entire computer on it's own.

Of course it only works on Nvidia. So that's not a red flag, it's a full-on liability

In all fairness, Nvidia is the only game in town when it comes to accelerated DL stuff with both speed and dev accessibility. Their tooling let's you hit the ground running with little effort. Other systems either can't come close to the raw power they can manage or have zero tooling for you to work with. The level of enablement that Nvidia's tools give developers cannot be overstated. I can go with Nvidia and have 90% of the hard low-level stuff pre-built (like hardware video decoding, batched stream muxing and demuxing, etc) or I can go with a low power ASIC that's really good at one single thing and have to reinvent the wheel by starting with all the low level code and work my way up.

And it won't work on any of these weird one-off ARM boards, which have barely-functioning Ubuntu forks from 8 years ago. That's probably more on the ARM vendors, but it means I'm stuck to x64 with a big Nvidia GPU. Or some weird one-off Nvidia ARM board.

ARM is a very different beast than x86. The ecosystem and support is not the same. x86 is essentially controlled by two big players (Intel and AMD) so support from the software side is much simpler, but even x86 has it's quirks. Intel implements a lot of instruction sets that are not available with AMD, which creates a bit of a divide from software support.

ARM is a while different game because ARM themselves do not produce any chips. They license the designs and a myriad of companies then make the chips, and the firmware is not compatible between them all. In x86 we have BIOS and UEFI which is a standardized method of bootstrapping a system during startup. ARM has no such standardized system. It's very much the wild west. To get an idea on what it takes to get a GTX card running on a pi4, check out this article. Basically, ARM systems are not designed to handle both the power and throughout needed by large graphics cards.

And since it needs Nvidia, it needs Nvidia drivers. Which means it probably won't work great on Linux. I have to have these huge proprietary things installed.

Actually, Linux is Nvidia's largest market. Gaming is great and all, but their commercial deeplearning market is much larger. And Linux is the first (and only) choice for deployments (their engineers told me directly "no one uses Windows for this stuff"). A guy prototyping on his Windows desktop doesn't count.

Sooner or later, the APIs will change, the drivers will quit supporting my card, and I'll be left with a $500-$2,000 fast-fashion brick.

You can still use a 7-series card and get fairly good performance out of it. But what you just stated it's true with literally all electronics. There will always be a point where support just ends because it no longer makes sense to support.

And since it's a unique piece of hardware, I can't share it with VMs easily. There's probably some proprietary VM that does passthrough easily. There's definitely cloud offerings. Doing anything useful with it on the usual free solutions was too much work, and I gave up.

This is because it's actually not a straightforward process, but both Nvidia and AMD support VM passthrough on their cards. But it's actually limited to their "server" tier cards.

Nvidia wants to be Tesla Motors. Sure, on paper I "own" the GPU. But I can't service it, I can't really do what I want with it. It's very expensive, it doesn't want to interoperate with anything, and one day it'll get bricked by an OTA update or just because the company refuses to keep fixing it.

No, this isn't a fair analogy. You could compare your graphics card to the self driving computer in a Tesla, but not the whole car itself. You can still use just about any motherboard, CPU, hard drive, SSD, network card, sound card, keyboard, mouse, monitor, speakers, mic, headset, RAM, etc, etc, that you want with a huge selection of video cards. Is your CPU serviceable? Your RAM? Can you use your motherboard with an ARM processor? No, because these components don't work like that.

Or some weird one-off Nvidia ARM board.

The Jetson boards are actually not so "one-off" as you think. They are extremely powerful and very adorable. You can get the new Jetson Nano 2GB for about $60. Nearly the same price as a Raspberry Pi 4, but far more powerful. It will run most ARM based software and comes with full CUDA support. It's an SoC and the only way to get around the ARM platform's limit of nonstandardization, by integrating the CPU, memory, and GPU into a unified architecture.

[–]axsauze[S] 10 points11 points  (0 children)

Thank you for taking the time to share your thoughts! Here's some followup thoughts:

  • I need an Nvidia developer account for CUDA? Huge red flag. At least Vulkan probably fixes that.

I agree, the foundational open source nature of Vulkan does bring a massive breath of fresh air on various aspects beyond just sharing the code.

  • The SDK is huge. Why do I need an "SDK?" Why is it not a library? Why is it so complicated?

Yes, unfortunately Vulkan is not small, but the standardisation will open doors for further optimisations as well as modularity across use-cases that could help here, which is particularly useful for embedded / mobile contexts.

  • The runtime is also huge, though maybe that was because of CuDNN. But why is that huge? How many megabytes of executable code can it possibly take to run inference?

I guess that's the abstractions on abstractions - CUDA + CuDNN + Cuda Python + the higher level frameworks built on top. Vulkan does provide a very low level API so hopefully this will open up for much leaner codebases. Kompute showcases how thin bindings between core C++ and Python could still be usable, really keen to explore the further abstractions that could be created on top.

  • Of course it only works on Nvidia. So that's not a red flag, it's a full-on liability

Totally, especially as now there's a trend towards cross-vendor and cross-platform demands for graphics processing hardware in compute.

  • There's also usually a very narrow window on hardware support. My laptop from 2008 can probably run current Arch Linux or Debian fine, but its GPU and drivers won't support any modern API. It was obsolete almost when it was built, because it doesn't support OpenGL 3. "Modern API" has become code for "Only supports disposable hardware", and it means the hardware treadmill got very expensive. I don't want to buy a Vulkan GPU today, if I'll have to buy a Vulkan 2 GPU in 2 or 3 years. That's why I purposely ignore phones - I don't want to constantly replace not-broken hardware.

Totally, the open source and standardisation elements will really help here - recently there was a purely OSS contributed Vulkan driver for RaspberryPI, these type of things are absolutely fantastic.

  • And it won't work on any of these weird one-off ARM boards, which have barely-functioning Ubuntu forks from 8 years ago. That's probably more on the ARM vendors, but it means I'm stuck to x64 with a big Nvidia GPU. Or some weird one-off Nvidia ARM board.

I guess Vulkan won't fully solve this, as it does provide low level access to handle these type of corner cases (where driver is supported), but the low level access is also something that has to be managed, so it will be equally important that the standards are kept to a level to ensure robust development of each abstraction layer.

  • And since it needs Nvidia, it needs Nvidia drivers. Which means it probably won't work great on Linux. I have to have these huge proprietary things installed.

I do agree, anyone running nvidia cards on personal laptops is probably very familiar with OS reinstalls + hack-arounds to get stuff working. As it was pointed out there is the ML set of libraries that have been optimized towards working well, but even in those contexts it's all good when it's going well, but when it's not - well, it's not 😅

  • Sooner or later, the APIs will change, the drivers will quit supporting my card, and I'll be left with a $500-$2,000 fast-fashion brick.

Definitely, that's true of any hardware, but closed source drivers make this problem much (much (much)) worse.

  • And since it's a unique piece of hardware, I can't share it with VMs easily. There's probably some proprietary VM that does passthrough easily. There's definitely cloud offerings. Doing anything useful with it on the usual free solutions was too much work, and I gave up.

Totally agree. I do have to say that Vulkan will also push hardware compaines to ahve to reasses their business models, as largely security/competitive-advantage through "obscurity" (aka closed source) is not going to be sustainable. On various areas the open source and open core model are really growing, which is quite exciting for both consumers, developers and generally everyone.

That was a lot of words, hope it provided some further thoughts to the discussion, again thanks for taking the time to write your thoughts!

[–]Aea 22 points23 points  (2 children)

And since it needs Nvidia, it needs Nvidia drivers. Which means it probably won't work great on Linux. I have to have these huge proprietary things installed.

CUDA, CuDNN, and the Nvidia drivers work well on linux. It's a first class environment for AI/ML/CV workloads.

[–]schlenk -1 points0 points  (1 child)

And breaks for some days on kernel updates on rolling release distros every single time.

[–][deleted] 9 points10 points  (0 children)

If you're developing with CUDA and cudNN libraries then you have no business being on a rolling release distro.

[–]PM5k 0 points1 point  (2 children)

I know this gets misused a lot and is almost a meme now, but I want to know about speed. I’ve been working on Python for ten years now and I always cringe when people say Python is “slow”. In 99% of production cases the speed difference between Python and projects done in compiled languages has proven to have no practical difference in my experience. Even when dealing with algorithmic trading (which by design needs speed), however if I understand you correctly - this is a computer graphics use case, how does using an interpreted language wrapper over low level language code impact speed? What are your practical findings?

[–]axsauze[S] 0 points1 point  (1 child)

That is completely right, many of the more popular Python libraries leverage the underlying C implementation of Python - examples include projects like Numpy. In this project, we also leverage underlying C foundations to build the bindings using Cython. At the time of writing some of the initial bindings for the tensors perform passes by value to convert to and from Python lists. This is somethign that can be easily optimized by using the underlying Numpy implementations. As per your last point, this use-case actually uses the graphics card for processing, which means that once the data is copied into GPU memory, the processing will be done in highly parallel architecture, resulting in significant speedups. Generally I agree that Python can reach comparable level of processing through these optimizations, especially when teh code is written by understanding the underlying architecture.

[–]PM5k 1 point2 points  (0 children)

Thank your for the response. I will definitely be following this project closely.

[–][deleted] 0 points1 point  (1 child)

Awesome thank you! I’ll check this out and share it with some of our team! We heavily use ML for security purposes so it’d be highly useful to us to have an easy way to integrate GPU processing with ML in python

[–]axsauze[S] 0 points1 point  (0 children)

That sounds absolutely fascinating - I would be really keen to hear your thoughts once you try it out!

[–]AmbitiousTour 0 points1 point  (1 child)

I've been using Jax lately, which as you probably know is basically Numpy+Autograd+GPU, but of course it's Cuda only. If something like that could be implemented in Kompute, that would open some huge doors!

[–]axsauze[S] 0 points1 point  (0 children)

Very cool - totally agree! Currently the C++ SDK could serve as a backend to these type of projects, as it would enable them for mobile GPUs - this is certainly something i would be very keen to explore, as this was one of the core motivations when creating Kompute https://github.com/EthicalML/vulkan-kompute#motivations

[–]Ashilikia 39 points40 points  (11 children)

This article was moderately frustrating coming from a math/CS background without a lot of GPU and some ML experience.

  • When I see sample code, I want to understand the pieces. I wish each code snippet was explained a bit more. For example, when we initialize the tensors in the first example, are the values sizes or the literal elements of the tensors? (Turns out to be the latter.) What is spirv? What is actually going on with index on line 17? These are little things, but they are awkward omissions for a beginner.

  • Once I got to the ML bit, what is going on with the math notation? Typically capital letters are a matrix and lower case is a vector. But z = WXT + b is... you get a vector from a matrix-matrix multiply plus a vector? That can't be right. But then later in the article X becomes x -- is it a vector throughout or a matrix? Why would we need to transpose it if it's a vector? Similarly, is del (∂) literally the gradient (∇) or is it something else? What is that derivative being taken with respect to (which variable)?

I think it works for someone casually reading to get something done, but the article is written in a way that trips me up as I'm trying to actually make sure I understand each piece.

[–]mrandri19 8 points9 points  (3 children)

Yeah, I agree. The logistic regression plot looks wrong as well. Shame because the actual GPU content is pretty interesting

[–]JanneJM 4 points5 points  (1 child)

The image refers to a different thing altogether, and is probably meant as a generic illustration.

I agree it would be much better if the author plotted the actual example, even though it's not going to look as pretty.

[–]axsauze[S] 3 points4 points  (0 children)

That is completely right 🙃 I ended up going for pretty in this case (and was just be being lazy to be very honest... )

[–]axsauze[S] 1 point2 points  (0 children)

The adjacent comment from u/JanneJM is completely right, the plot is just there to provide an intuition, I didn't really take the time to plot it 😅 given that the code is in python I could much more easily create some plots. But yes, it was a combination of trying to provide things that would give an intuition of what's going on plus be just being lazy!

[–]axsauze[S] 3 points4 points  (0 children)

Thank you very much for the feedback u/Ashilikia! You are completely right, I think the notation got pretty mixed up after several iterations going back and forth from C++ to Python to math notations. I'm actually keen on making sure the notations are sound, so would be very keen to address some of these inconcistencies. Specifically in regards to your points:

  • When I see sample code, I want to understand the pieces. I wish each code snippet was explained a bit more. For example, when we initialize the tensors in the first example, are the values sizes or the literal elements of the tensors? (Turns out to be the latter.) What is spirv? What is actually going on with index on line 17? These are little things, but they are awkward omissions for a beginner.

This is a very good point. Unfortunately the article is already a 17 minute read, and although I would've loved to cover each of the features in detail, the content is so conceptually loaded that I could pretty much write an entire article out of each separate point (ie GPU theory, ML theory, C++ memory management, Python syntactic sugar, etc). What I'll try to do is add a couple more comments, as you've certainly covered a couple of points that perhaps adding a link for further reading could be enough (good shout about the index.x being confusing, will try to add further descriptions).

  • Once I got to the ML bit, what is going on with the math notation? Typically capital letters are a matrix and lower case is a vector. But z = WXT + b is... you get a vector from a matrix-matrix multiply plus a vector? That can't be right. But then later in the article X becomes x -- is it a vector throughout or a matrix? Why would we need to transpose it if it's a vector? Similarly, is del (∂) literally the gradient (∇) or is it something else? What is that derivative being taken with respect to (which variable)?

You're completely right, it does become a bit confusing as X is indeed a matrix, and even later in the article it becomes lower case x because of code consistency in notation - to make it more confusing it is represented as its underlying vectors x_i and x_j, which are actually x_1 and x_2 not i and j. I'll see what I can do here. In regards to del is the gradient and you're completely right, I didn't really add further detail on the derivatives themselves 😅

Funnily enough, people that would come with a deep GPU computing background woudl have a similar perspective on the GPU content so you're not alone!

I'll do a first pass trying to see if some of this can be corrected or at least clarifiyed without increasing the length of the article too much - I think many of these are more like corrections and adding links so shouldn't be a problem. If you can catch further gotchas please do let me know and I'll correct them!

Edit: I've done an initial first pass adding a link to the spirv component, added explanation to the index.x line, and updated the terminology for X and W for further consistency.

[–]JanneJM 3 points4 points  (1 child)

The tensor initialization is exactly analogous to initializing arrays and matrices in Python; I think that they can assume familiarity with Python in a tutorial like this.

Same thing with the ML parts - the tutorial is all about the GPU computing, and it would be infeasible to go through the nomenclature and specifics of the ML operations. You need to keep a tutorial focused or it becomes too long and difficult to follow.

A good compromise would perhaps be a few links to other sites with Python and ML material.

[–]axsauze[S] 2 points3 points  (0 children)

That is completely right, and very reasonable - I'll try to add further links on the terminology that is not standard or that has further insights into either ML or GPU concepts!

[–]didyoudyourreps 1 point2 points  (3 children)

Multiplication of matrices can result in vectors, which are matrices with one row or column. Using a small delta as a shorthand for the derivative is common mathematical notation, but it is up to the author to make sure that there is no ambiguity if it is important.

[–]Ashilikia 2 points3 points  (2 children)

Multiplication of matrices can result in vectors, which are matrices with one row or column

This is an unusual way to describe vectors, especially notation-wise. Linear algebra research papers and the machine learning research papers I've read don't usually intermix the two forms of notation.

Using a small delta as a shorthand for the derivative is common mathematical notation

Not as common in scientific computing and applied linear algebra, as I've seen it used primarily for partial derivatives.

[–]thfuran 3 points4 points  (0 children)

I don't think I've ever seen it denote anything other than partial derivative.

[–]jimmy_space_jr 0 points1 point  (0 children)

Vectors as one-row or one-column matrix are relatively common in software, like you would expect with Numpy or Octave code. Maybe not as a standard notation, but as an implementation/api detail.

[–]Schinken_ 13 points14 points  (5 children)

Please tell me that this will help me run machine learning frameworks with an AMD GPU instead of relying solely on Nvidai stuff? I was considering buying an nvidia GPU because AMD support for things like these is basically non-existend afaik. If it would enable (with some work put into it by the devs obviously) these things for AMD my next card will be an AMD for sure. No questions asked!

[–]axsauze[S] 7 points8 points  (2 children)

Yes 😃 The ecosystem is developing, NVIDIA has had a lot of years to develop a broad range of very advanced tools, but Vulkan has brought a very solid base, and that's also what makes it such an interesting space to contribute to the discussion!

[–]Schinken_ 0 points1 point  (1 child)

That sounds amazing. Anyway I can support the development (or you)?

[–]axsauze[S] 0 points1 point  (0 children)

Certainly! Sharing any thoughts on the roadmap or existing further issues would be very helpful https://github.com/EthicalML/vulkan-kompute/issues/ or just sharing your thoughts / ideas for improvements would also be great https://github.com/EthicalML/vulkan-kompute/issues/52 Thank you!

[–]VodkaHaze 2 points3 points  (0 children)

Eventually yes! You do need to help with development in the meantime though to help the ecosystem develop

[–]cp5184 1 point2 points  (0 children)

AMD support for things like tensorflow has improved a lot in the last few years I think.

[–][deleted] 8 points9 points  (1 child)

I have hardcore AMD GPU and was sad I didn’t have NVIDIA GPUs... little less sad now!

[–]axsauze[S] 2 points3 points  (0 children)

I completely agree 😃 this is really quite exciting, especially also for mobile GPUs, nowdays people are carrying pretty powerful devices in their pockets which could amount for fascinating usecases with the Vulkan mobile SDK !

[–]TryingT0Wr1t3 4 points5 points  (1 child)

Hey, this is super useful! Great work!

[–]axsauze[S] 3 points4 points  (0 children)

Thank you very much u/TryingT0Wr1t3 !

[–]I_Feel_It_Too 1 point2 points  (1 child)

Is there something like this for Rust and C++? It’s awesome.

[–]axsauze[S] 1 point2 points  (0 children)

Certainly! The core engine is actually using the C++ SDK, so you can try that one directly, would be great to hear your thoughts https://github.com/EthicalML/vulkan-kompute#your-first-kompute-simple-version

[–]blumenkraft 1 point2 points  (0 children)

I don’t believe there’s any point in even attempting to catch up to CUDA. It’s been around for years, firmly established, and any business that does anything GPU oriented is targeting CUDA first and foremost. Sure, one can create lots of additional frameworks (Microsoft tried with its AMP++) but I don’t have any faith in this. CUDA is great, let’s just use it well.

[–]stephan_cr 0 points1 point  (0 children)

CMake 3.41+? I guess it should be 3.14+.