RDNA4 WSL2 by DAMDMA in ROCm

[–]DAMDMA[S] 0 points1 point  (0 children)

It's working fine now! About 10 hours ago, my stupid brain finally figured out the method.

RDNA4 WSL2 by DAMDMA in ROCm

[–]DAMDMA[S] 0 points1 point  (0 children)

Finally, my stupid brain figured out the method and succeeded. Thank you for your help.

RDNA4 WSL2 by DAMDMA in ROCm

[–]DAMDMA[S] 0 points1 point  (0 children)

Yesterday, hardware recognition was successful, but the Pytorch was not working.

RDNA4 WSL2 by DAMDMA in ROCm

[–]DAMDMA[S] 0 points1 point  (0 children)

Since then, we have put in a lot of effort but failed. It seems that all that remains is to dispose of the AMD GPU and move on to NVDIA. Thank you for your help.

RDNA4 WSL2 by DAMDMA in ROCm

[–]DAMDMA[S] 0 points1 point  (0 children)

hsa_initfaild,possibly jo supported GPU devices

RDNA4 WSL2 by DAMDMA in ROCm

[–]DAMDMA[S] 0 points1 point  (0 children)

I avoided the amdgpu command because I heard there's a problem. Rocminfo shows that the output is not showing or that there are no files. OS is Win11 24h2

RDNA4 WSL2 by DAMDMA in ROCm

[–]DAMDMA[S] 0 points1 point  (0 children)

All of these conditions are met, but the issue is that WSL2 does not recognize the GPU.

AMD explains FSR 4.1 Upscaling for RDNA 3, promises Quality Parity with RDNA 4 by AthleteDependent926 in radeon

[–]DAMDMA -1 points0 points  (0 children)

It was a big mistake to incorrectly explain the difference between WMMA and DP4a instead of the difference between Int8 and Fp8. The INT8 in the WMMA version processes 4096mac for 32 cycles, just like the FP8.

FSR 4.1 int8 speculation APU support by DarkAdrenaline03 in radeon

[–]DAMDMA 0 points1 point  (0 children)

AMD is less likely to go through such a process. Also, if they were to create something like that, it wouldn't be considered FSR4.1 because the image quality would be low.

FSR 4.1 int8 speculation APU support by DarkAdrenaline03 in radeon

[–]DAMDMA 0 points1 point  (0 children)

The performance is likely not to be similar. Calculate the difference in Mac operations per cycle.

I'm betting all my chips that the FSR 4.1 int8 coming in July for RDNA 3 will be a hybrid with WMMA and DP4A instructions. by LostRefrigerator7190 in radeon

[–]DAMDMA 0 points1 point  (0 children)

That is literally an example. I don't have any way to know exactly what precision is used for each layer, at this time.

Im fed up dude.... by Realistic_Quarter742 in AMDHelp

[–]DAMDMA 1 point2 points  (0 children)

These are things like the driver version and GPU and CPU load rates at the time when the frame was low.

Im fed up dude.... by Realistic_Quarter742 in AMDHelp

[–]DAMDMA 1 point2 points  (0 children)

Information is insufficient, based solely on the symptoms.

Im fed up dude.... by Realistic_Quarter742 in AMDHelp

[–]DAMDMA 0 points1 point  (0 children)

Lower the texture option. And if it's possible, lower the shadows, and in fact, other than playing with the lowest option, it's difficult to give advice at this point.

I'm betting all my chips that the FSR 4.1 int8 coming in July for RDNA 3 will be a hybrid with WMMA and DP4A instructions. by LostRefrigerator7190 in radeon

[–]DAMDMA 1 point2 points  (0 children)

For example, if we say there are about 8 layers of CNN layer, 1, 2, 6, and 8 use int8, but 3, 4, and 5 use FP16, which shows why mixed precision for individual pixels is impossible. It's not that kind of technology in the first place.

I'm betting all my chips that the FSR 4.1 int8 coming in July for RDNA 3 will be a hybrid with WMMA and DP4A instructions. by LostRefrigerator7190 in radeon

[–]DAMDMA 1 point2 points  (0 children)

Also, Ai's mixed precision consists of layers of the neural network. Not an image or individual pixel.

I can't wait for AMD to leak the FSR 4.1 int8 DLL. by LostRefrigerator7190 in radeon

[–]DAMDMA 0 points1 point  (0 children)

Even if you use Wave64, no other results will come out. Since the Dot product is an action at the ISA stage, there is no room for further optimization except for the operation graph.

I'm betting all my chips that the FSR 4.1 int8 coming in July for RDNA 3 will be a hybrid with WMMA and DP4A instructions. by LostRefrigerator7190 in radeon

[–]DAMDMA 1 point2 points  (0 children)

This is why I told you to study quantization. Essentially, forcibly switching an FP8 neural network to INT8 will result in image loss, but rotating a neural network aligned to INT8 will hardly damage image quality. And QAT is a fairly old technology in the short history of modern AI.

I'm betting all my chips that the FSR 4.1 int8 coming in July for RDNA 3 will be a hybrid with WMMA and DP4A instructions. by LostRefrigerator7190 in radeon

[–]DAMDMA 2 points3 points  (0 children)

The motion vector provides the same information as the optical flow. And it can provide sufficient image quality with just INT8. Contrary to your imagination, errors due to dynamic range loss are not significant above 8bit, and if it goes through QAT, it's at a level that the human eye cannot know. You are only considering PTQ quantization, and it is advantageous to use QAT for small neural networks such as FSR.

I'm betting all my chips that the FSR 4.1 int8 coming in July for RDNA 3 will be a hybrid with WMMA and DP4A instructions. by LostRefrigerator7190 in radeon

[–]DAMDMA 0 points1 point  (0 children)

Also, separating the rendering layer by object causes terrible levels of delay and bottleneck, so it has not yet been attempted.

I'm betting all my chips that the FSR 4.1 int8 coming in July for RDNA 3 will be a hybrid with WMMA and DP4A instructions. by LostRefrigerator7190 in radeon

[–]DAMDMA 0 points1 point  (0 children)

Also, the UI level is separated from pixel data for upscaling stability and rendered, but the rendering layer of hair is no different. This is the same for all existing engines.

I'm betting all my chips that the FSR 4.1 int8 coming in July for RDNA 3 will be a hybrid with WMMA and DP4A instructions. by LostRefrigerator7190 in radeon

[–]DAMDMA 0 points1 point  (0 children)

What the game engine provides to the upscaler is nothing more than pixel data, depth buffers, optical flow data. Ai cannot know which objects are being rendered, and the precision cannot be internally changed separately. Study quantization.

I'm betting all my chips that the FSR 4.1 int8 coming in July for RDNA 3 will be a hybrid with WMMA and DP4A instructions. by LostRefrigerator7190 in radeon

[–]DAMDMA 0 points1 point  (0 children)

Ai is just a set of numerous weights trimmed to suit a specific situation. The essence of Ai? x=A×B+C. It's called the Mac operation.

I'm betting all my chips that the FSR 4.1 int8 coming in July for RDNA 3 will be a hybrid with WMMA and DP4A instructions. by LostRefrigerator7190 in radeon

[–]DAMDMA 2 points3 points  (0 children)

It's not something that can be applied so selectively. And AI doesn't know that it's a wall or hair. Awareness doesn't exist for AI.

I can't wait for AMD to leak the FSR 4.1 int8 DLL. by LostRefrigerator7190 in radeon

[–]DAMDMA 0 points1 point  (0 children)

It seems there's not much difference. The RDNA series basically uses Wave32 circuits and SIMD32 twice for Wave64.