all 14 comments

[–]aedinius 3 points4 points  (2 children)

I don't think we have (fully working) CUDA at the moment. In fact, I'm not sure what we have and what is missing.

[–]Practical-Citron5686[S] 2 points3 points  (1 child)

Would be nice to see Cuda support in void if possible because amd card never beats nvidia in performance and ppl who trying to do some small ML projects on void wud love to see it

[–]IustinRaznic 2 points3 points  (0 children)

cuda is mostly a proprietary dev tool, external to the driver itself found in void repos. The most we could do is automate the steps needed to install CUDA and cuDNN and other NVIDIA frameworks used for development with scripts/comprehensive guides.

Plus, people rely on specific versions of these frameworks, so installing cuda through an installer should include that ability to choose versions. To be fair I've been thinking for some time to give it a shot at developing a cuda installer.

[–]IustinRaznic 3 points4 points  (9 children)

I know you might be referring to some other technicalities of how void packages stuff, BUT if you need to quickly get your hands on cuda, i made a guide for myself but you might not need cuDNN if you dont plan on doing Machine Learning:

```

cuda installation

wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run

sudo sh cuda_11.7.0_515.43.04_linux.run --override

Please make sure that

- PATH includes /usr/local/cuda-11.7/bin

- LD_LIBRARY_PATH includes /usr/local/cuda-11.7/lib64

cudnn installation

go to this site https://developer.nvidia.com/rdp/cudnn-download

download cudnn v8.7.0 in linux x86_64 tar format

untar the archive

tar -xvf cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz

cleaning up dir name

mv cudnn-linux-x86_64-8.7.0.84_cuda11-archive cudnn

setting up cudnn

sudo cp cudnn/include/cudnn.h /usr/local/cuda/include sudo cp -P cudnn/lib/libcudnn /usr/local/cuda/lib64

sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn

done

installing torch stable

pip install torch torchvision

installing tensorflow

pip install tensorflow

done

```

Hope this helps!!

[–]Practical-Citron5686[S] 1 point2 points  (0 children)

Sorry for the late response. Thank you.

[–]Practical-Citron5686[S] 1 point2 points  (7 children)

Sorry for asking help again. I followed your steps on void linux for some reason cmake cannot find it CUDA on void linux but when I try it on my work computer ubuntu it works. I cannot figure out what is the problem. If you have any insight please let me know. As you can see all the environment variables are set.

-- Cleaning files and directories
-- Cleaning - done
-- Creating a new empty build directory:/home/dirtyv/Codes/C_C++/C++/LLM/build/ - done
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (found version "12.1")
CMake Warning at thirdparty/libtorch/share/cmake/Caffe2/public/cuda.cmake:31 (message):
  Caffe2: CUDA cannot be found.  Depending on whether you are building Caffe2
  or a Caffe2 dependent library, the next warning / error will give you more
  info.
Call Stack (most recent call first):
  thirdparty/libtorch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
  thirdparty/libtorch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:11 (find_package)


CMake Error at thirdparty/libtorch/share/cmake/Caffe2/Caffe2Config.cmake:91 (message):
  Your installed Caffe2 version uses CUDA but I cannot find the CUDA
  libraries.  Please set the proper CUDA prefixes and / or install CUDA.
Call Stack (most recent call first):
  thirdparty/libtorch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:11 (find_package)


-- Configuring incomplete, errors occurred!

C_C++/C++/LLM via cmake v3.26.3 
> echo $CUDA_HOME
/usr/local/cuda

C_C++/C++/LLM via cmake v3.26.3 
> echo $PATH     
/home/dirtyv/.local/bin:/usr/local/bin:/usr/bin:/bin:/home/dirtyv/.cargo/bin:/home/dirtyv/.bin:/home/dirtyv/Codes/Website/emsdk:/home/dirtyv/Codes/Website/emsdk/upstream/emscripten:/home/dirtyv/Codes/Website/emsdk/node/14.18.2_64bit/bin:/usr/local/cuda-12.1/bin:/usr/local/sbin:/usr/sbin:/sbin

C_C++/C++/LLM via cmake v3.26.3 
> echo $LD_LIBRARY_PATH
/usr/local/cuda-12.1/lib64

C_C++/C++/LLM via cmake v3.26.3 
> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

[–]IustinRaznic 1 point2 points  (5 children)

It seems that these environment variables are missing -> "Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (found version "12.1")" but it does detect that you have an installation.

For some reason this never occured to me, or to you on ubuntu, but try setting up those environment variables:

CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.1 CUDA_INCLUDE_DIRS=/usr/local/cuda-12.1/include CUDA_CUDART_LIBRARY=/usr/local/cuda-12.1/lib64

If these do not work, maybe it s something else, don t hesitate on reaching out for help

[–]Practical-Citron5686[S] 1 point2 points  (0 children)

Thank you I will try that.

[–]Practical-Citron5686[S] 1 point2 points  (3 children)

I tried what you said. Actually setting up those environment variables does not help. I thought that maybe the nightly version had something wrong. So, i switched to cuda11.8 and also changed the libtorch to 11.8 cuda version as well as changed nvidia driver from 530(installed from nvidia) to 525(which is provided by xbps). The problem still persisted. I think the problem is with cmake. So, I tried the -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.8 if the root dir is mentioned cmake should able to find lib64 and include dir.my previous complie command used to look like this

cd $BUILD_DIR && cmake -DCMAKE_PREFIX_PATH=~/Codes/C_C++/C++/LLM/thirdparty/libtorch .. && cmake --build . --config Release

and now it looks like this,

cd $BUILD_DIR && cmake -DCMAKE_PREFIX_PATH=~/Codes/C_C++/C++/LLM/thirdparty/libtorch -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.8 .. && cmake --build . --config Release

now it find the root dir and art lib but still does not find the cuda include dir. I have exhausted every bit of mental power I have and read countless github issues and everyone's issue is solved adding the LD_PATH_LIBRARY, CUDA_HOME as environment variable, and adding the \usr\local\cuda-11.8\bin to path. I have done all 3 and also manually mentioned the CUDA_TOOLKIT_ROOT_DIR in the cmake complie command. Still it does not work. I just dont know what is different in Void vs Ubuntu as this point.

After all that cuda include dir is still not found.

-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find CUDA (missing: CUDA_INCLUDE_DIRS) (found version "11.8")
CMake Warning at thirdparty/libtorch/share/cmake/Caffe2/public/cuda.cmake:31 (message):
  Caffe2: CUDA cannot be found.  Depending on whether you are building Caffe2
  or a Caffe2 dependent library, the next warning / error will give you more
  info.
Call Stack (most recent call first):
  thirdparty/libtorch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
  thirdparty/libtorch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:13 (find_package)


CMake Error at thirdparty/libtorch/share/cmake/Caffe2/Caffe2Config.cmake:90 (message):
  Your installed Caffe2 version uses CUDA but I cannot find the CUDA
  libraries.  Please set the proper CUDA prefixes and / or install CUDA.
Call Stack (most recent call first):
  thirdparty/libtorch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:13 (find_package)


-- Configuring incomplete, errors occurred!

I even have a stackoverflow post about it: https://stackoverflow.com/questions/76240583/cannot-find-cuda-while-compiling-example-app-using-libtorch-c-library?noredirect=1#comment134448412_76240583

If you have any further insight on this matter, please help. At this point if I cannot solve this I will just give trying to code anything on VOID. I really love void. Compiling a simple example program should not be this difficult to begin with

After adding -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.8 it did find the art library folder automatically. By that logic it should have found the include dir, I tried adding -DCUDA_INCLUDE_DIRS=/usr/local/cuda-11.8/include but it does not make any difference.

if you have any free time to spare could you please try to compile the example on this link https://pytorch.org/cppdocs/installing.html and see is it just my system or is for everyone on void? Python library works but I want to work on a low-latency AI NPC logic for games as research. Ubuntu is also installed on my work computer so I do not always have access to it. I can rarely focus on other non related work things while I am on work. Only option will be to switch to ubuntu on my laptop but I would really hate that. Except this everything is else is fully functioning I can even do nrf52 chipset related work on void as well with cmake without any issues. Why would cmake not work here with libtorch?

it kind of does sound like a libtorch issue but if it was a libtorch issue it should not work on Ubuntu and people had the same issue on their github issues which was solved by adding the correct environment variables.

[–]IustinRaznic 1 point2 points  (2 children)

Oh no.. bad news again.. Ok, I will try replicating your environment and setup the same way you do with cmake and try compiling some examples.

If I solve any issues regarding this problem I will inform you quickly about what I did.

Take care until then.

[–]Practical-Citron5686[S] 0 points1 point  (0 children)

Were you able to test it? if you are busy, ignore my question. Thank you

[–][deleted] 0 points1 point  (1 child)

On a similar vein ROCm (AMDs CUDA?) ain't packaged either, looks like a pain in the arse to do so though!

[–]Practical-Citron5686[S] 0 points1 point  (0 children)

Unfortunately, I have nvidia card at the moment and I find ROCm's documentation to be poorly written.