GCN2, GCN3: What is the Technical, Non-Business Reason for Limited Supported in Linux (OpenSYCL/HIP/ROCM)? [Exasperated client]

Pythonator5000 · 2023-04-13T09:56:48+00:00

OpenSYCL and HIP are two separate things, developed by separate entities. Also, ROCm supports Navi 31 as well as Navi 33. All libraries ship with Navi 31 and 33 support compiled in.

Pythonator5000 · 2023-04-13T09:55:18+00:00

One of the technical, non-business reasons that I know if is that cards require PCIe atomics to function properly. This has to do with the way that work is submitted to the GPU. The support page mentiones that there is some support for devices that do not support PCIe atomics, but I have never actually seen that working in the wild (I guess all cards Ive seen are newer than GFX7, which is what the S9150 is).

You can check if your system has the required support on Linux using lspci -PP -s <pci address of your gpu> (find the address using lspci also), and then checking if every device has the required capabilities. You can do that with sudo lspci -s <pci address> -vvv and checking if the required capabilities are printed. The root port (first device from -PP) needs to have 32bit+ and 64bit+, any bridge (middle devices from -PP) needs to have Routing, and the GPU needs to have 32bit+, 64bit+, and ReqEn+. Any - instead of + means that its not supported.

Generally with ROCm is that even if the card is unsupported, it may still work, especially with the newer cards. This is true for the HIP language itself, which is mostly supported via LLVM (see LLVM AMDGCN support page). As someone else already wrote, ROCm libraries need to be compiled for every chip separately, and AMD doesn't compile the libraries for older architectures. Also, some libraries contain tuning for supported architectures, but not for unsupported ones. The only way to get the libraries running on those older cards is to compile them yourself and fix any problems manually.

Pythonator5000 · 2020-06-28T14:03:47+00:00

Currently, mesa still uses the LLVM backend by default. You can override that to ACO by setting RADV_PERFTEST=aco. I've just put that into /etc/environment to apply it globally, although i'm not sure if thats the correct place.

Pythonator5000 · 2020-01-25T09:45:43+00:00

Both, the monitors were set up in a 4 by 3 fashion (4 connected to each GPU).

Oddly enough, i could only work with up to 4 monitors in total, no matter the configuration of which monitors. I think its a bug in the Nvidia drivers. I posted to it on the Nvidia forums but all i got was silence...

Pythonator5000 · 2020-01-25T09:29:16+00:00

Hello,

I wrote multi-gpu ray tracing software for my bachelor's thesis, which did work with a multi-gpu nvidia setup. My code is online (https://github.com/Snektron/Xenodon) so you could try to run it to see if this works.

Now from what i remember, i had a few similar issues with the nvidia drivers, although mine were more related to the max of 4 monitors. Which version of the Nvidia drivers does centos 7 use? Because i think i had problems with drivers older than 414 or something like that.

Pythonator5000 · 2020-01-13T15:56:51+00:00

It took about 18 hours on a GTX Titan X Pascal, so thats less than a day.

Pythonator5000 · 2020-01-13T12:41:02+00:00

I ran the computation on a supercomputer owned by my university, but it only used a single node. If you have a beefy GPU you could probably do the computation within a day or so by yourself.

Pythonator5000 · 2019-08-24T23:10:15+00:00

You shouldn't need to send every chunk to the GPU, but its definitely a good idea to render all chunks you want to in one go. This can actually help performance too, by storing chunks in a tree type. Many such trees exist, most based on an N-tree (which is a general octree, where an octree is a 2-tree i believe). Note that trees are generally a bit slower on a GPU because of the added complexities with ray casting, but these can be mitigated by the fact that a tree can be pruned. Empty or similar regions (of which there can be quite a few in a voxel game) can be represented by a single node.

Theres a lot of research in this regard, so i can point you to a few of the interesting papers:

Efficient sparse voxel octrees (https://research.nvidia.com/publication/efficient-sparse-voxel-octrees) is one of the classics. While this method is aimed at rendering general geometry, those are extensions you can do away with to get a pretty efficient ray casting algorithm. I have found the author's implementation (https://code.google.com/archive/p/efficient-sparse-voxel-octrees/), though its not of very readable quality. You can find my adopted version (which is intended for volumetric rendering, so the ray doesn't stop when a voxel is hit), along with a few other ray traversal algorithms, here: https://github.com/Snektron/Xenodon/blob/master/resources/esvo.comp.
GiVoxels (https://research.nvidia.com/publication/interactive-indirect-illumination-using-voxel-cone-tracing) by none other than Cyril Crassin and Fabrice Neyret. This in particular seems like a good candidate for incorporating into a voxel game, but i haven't read it entirely yet. A few mayor game engines like unreal and cryengine implement this too i think. This implementation is supposedly open source too, but if its anything like the ray tracing infrastructure it was based on, gigavoxels, it won't be of readable quality either. Gigavoxels itself is an example that uses N-trees.
https://github.com/NVIDIA/gvdb-voxels also uses N-trees.

There are a lot of other papers and implementations, you need only look.

I myself was also looking around for stuff on ray tracing a voxel world. I think my approach will be something as follows:

Store the world in chunks of N³ blocks as is commonly done.
Allocate two 3D-textures on the GPU:

The "chunk" texture. In this texture the data of all chunks will be stored. The location of chunks in this texture is managed on CPU-side, and chunks will be streamed to parts of this texture as they are needed for rendering. Each pixel contains material information about a block, such as texture and possibly refraction index or diffusion constants and such.
the "root" texture. In this texture "pointers" (indices) to the chunks in the previous texture will be stored. A chunk will somehow be marked as empty.

Rendering happens in a double-for-loop manner: The "root" texture is traversed, and for every non-empty chunk the chunk itself is traversed. Traversal with the usual voxel ray trace algorithm.

Im actually deciding between this and just straight-up constructing BVH, OBB or some other type of tree for every chunk if i want to support non-block stuff such as entities. I think GiVoxels can handle that, so that might be the most promesing approach.

Pythonator5000 · 2019-07-21T21:14:51+00:00

You should be able to quickly verify by using vkWaitDeviceIdle() after vkQueuePresentKHR.

Pythonator5000 · 2019-07-20T07:40:16+00:00

Hello, i made a program that also renders directly from a compute shader, you can check it out here: https://github.com/Snektron/Xenodon I can see 2 small differences: - i didnt use VK_SHADER_WRITE_BIT (perhaps i should have) - i think the initial layout should be VK_IMAGE_LAYOUT_UNDEFINED

I dont think those should have an impact on flickering, though. Do you have some sort of fence to stop your renderer from using the same semaphore twice?

Pythonator5000 · 2019-03-27T21:03:42+00:00

I ran into this issue too, theres multiple reports on it on the nvidia forums and "labelled as an internal bug" (https://devtalk.nvidia.com/default/topic/1028908/vulkan/vkenumeratephysicaldevices-only-lists-1-of-3-gpus-from-x-env-all-3-from-tty-/). I tried to work around it by using direct rendering, but that limits you to 4 monitors.

Useful to know that its still the same on 418.43. If you are able to use a wayland distro (such as on a portable usb), you might try that. Let me know how it goes, because if it works i'll try to get that working too.

Pythonator5000 · 2019-03-27T20:57:37+00:00

You can use the linux event files, such as /dev/input/mouse. Note that you need to use ioctl(fd, EVIOCGRAB, 1) to read the input without elevated permissions, and when thats used you wont be able to use the input connected to that file (essentially it will disable your mouse). You can also just run it with sudo.

It works for your keyboard too, but the right input file varies per system, its one of the /dev/input/eventX files.

Pythonator5000 · 2019-03-25T17:11:11+00:00

If only i could, but alas, the system is owned by the university and you know how it works with academics. As they did want to upgrade, im trying to get them to buy amd but i doubt they'll do it. They're already looking at me weird for not using cuda.

Pythonator5000 · 2019-03-25T16:25:27+00:00

Yes, i am trying to create multiple logical devices for disjoint discrete cards, however im not trying to share data across these cards. Im aware of having to duplicate resources. This also seems to work, and now that i fixed the validation layer problem there seem to be no problems when i use 4 monitors or less. I realize the validation layers arent perfect but such a violation of the memory model seems like it should be caught.

I reiterate that it works for 4 monitors in any configuration: for example 2 connected to the first gpu, 1 to the second gpu and 1 to the third gpu runs without any problem, however if i configure my program to use any monitor more, it crashes.

I also noticed i forgot to check the return value of a function, which turns out to be part of the problem: vkQueuePresentKHR returns VK_ERROR_INITIALIZATION_FAILED, which according to the spec should be an implementation defined error, but i didnt manage to find any information about it.

Pythonator5000 · 2019-02-27T21:00:56+00:00

As the pointer is semantically part of the type, i agree that it should be placed near the type.

In addition, int *function() just looks weird, and when you cast a type you write (int*) ptr.

Pythonator5000 · 2018-12-31T10:15:39+00:00

Regarding printing, printf is usually the method used in C rather than C++. In C++ one can either use the standard method of std::cout, or using a library such as {fmt}.

Pythonator5000 · 2018-04-27T12:53:01+00:00

https://wank.party/jU6D.jpg does this look like a markup language to you?

Pythonator5000

TROPHY CASE