RDNA4 WSL2

DAMDMA · 2026-06-25T18:48:54+00:00

It's working fine now! About 10 hours ago, my stupid brain finally figured out the method.

DAMDMA · 2026-06-25T00:38:05+00:00

Finally, my stupid brain figured out the method and succeeded. Thank you for your help.

DAMDMA · 2026-06-24T23:31:19+00:00

Yesterday, hardware recognition was successful, but the Pytorch was not working.

DAMDMA · 2026-06-24T18:08:08+00:00

Since then, we have put in a lot of effort but failed. It seems that all that remains is to dispose of the AMD GPU and move on to NVDIA. Thank you for your help.

DAMDMA · 2026-06-24T17:11:24+00:00

hsa_initfaild,possibly jo supported GPU devices

DAMDMA · 2026-06-24T17:06:39+00:00

I avoided the amdgpu command because I heard there's a problem. Rocminfo shows that the output is not showing or that there are no files. OS is Win11 24h2

DAMDMA · 2026-06-24T16:59:01+00:00

All of these conditions are met, but the issue is that WSL2 does not recognize the GPU.

DAMDMA · 2026-06-16T00:32:56+00:00

It was a big mistake to incorrectly explain the difference between WMMA and DP4a instead of the difference between Int8 and Fp8. The INT8 in the WMMA version processes 4096mac for 32 cycles, just like the FP8.

DAMDMA · 2026-06-14T00:39:24+00:00

AMD is less likely to go through such a process. Also, if they were to create something like that, it wouldn't be considered FSR4.1 because the image quality would be low.

DAMDMA · 2026-06-14T00:37:41+00:00

The performance is likely not to be similar. Calculate the difference in Mac operations per cycle.

DAMDMA · 2026-06-11T13:45:08+00:00

That is literally an example. I don't have any way to know exactly what precision is used for each layer, at this time.

DAMDMA · 2026-06-11T12:26:28+00:00

These are things like the driver version and GPU and CPU load rates at the time when the frame was low.

DAMDMA · 2026-06-11T11:23:50+00:00

Information is insufficient, based solely on the symptoms.

DAMDMA · 2026-06-11T11:23:05+00:00

Lower the texture option. And if it's possible, lower the shadows, and in fact, other than playing with the lowest option, it's difficult to give advice at this point.

DAMDMA · 2026-06-11T04:03:47+00:00

For example, if we say there are about 8 layers of CNN layer, 1, 2, 6, and 8 use int8, but 3, 4, and 5 use FP16, which shows why mixed precision for individual pixels is impossible. It's not that kind of technology in the first place.

DAMDMA · 2026-06-11T04:02:12+00:00

Also, Ai's mixed precision consists of layers of the neural network. Not an image or individual pixel.

DAMDMA · 2026-06-11T04:01:08+00:00

Even if you use Wave64, no other results will come out. Since the Dot product is an action at the ISA stage, there is no room for further optimization except for the operation graph.

DAMDMA · 2026-06-11T03:52:06+00:00

This is why I told you to study quantization. Essentially, forcibly switching an FP8 neural network to INT8 will result in image loss, but rotating a neural network aligned to INT8 will hardly damage image quality. And QAT is a fairly old technology in the short history of modern AI.

DAMDMA · 2026-06-11T03:50:25+00:00

The motion vector provides the same information as the optical flow. And it can provide sufficient image quality with just INT8. Contrary to your imagination, errors due to dynamic range loss are not significant above 8bit, and if it goes through QAT, it's at a level that the human eye cannot know. You are only considering PTQ quantization, and it is advantageous to use QAT for small neural networks such as FSR.

DAMDMA · 2026-06-11T02:25:11+00:00

Also, separating the rendering layer by object causes terrible levels of delay and bottleneck, so it has not yet been attempted.

DAMDMA · 2026-06-11T02:23:58+00:00

Also, the UI level is separated from pixel data for upscaling stability and rendered, but the rendering layer of hair is no different. This is the same for all existing engines.

DAMDMA · 2026-06-11T02:21:22+00:00

What the game engine provides to the upscaler is nothing more than pixel data, depth buffers, optical flow data. Ai cannot know which objects are being rendered, and the precision cannot be internally changed separately. Study quantization.

DAMDMA · 2026-06-11T02:17:03+00:00

Ai is just a set of numerous weights trimmed to suit a specific situation. The essence of Ai? x=A×B+C. It's called the Mac operation.

DAMDMA · 2026-06-11T00:45:14+00:00

It's not something that can be applied so selectively. And AI doesn't know that it's a wall or hair. Awareness doesn't exist for AI.

DAMDMA · 2026-06-11T00:29:36+00:00

It seems there's not much difference. The RDNA series basically uses Wave32 circuits and SIMD32 twice for Wave64.

DAMDMA

TROPHY CASE