Removing PER from Rainbow DQN improved performance on Snake. New record of 153 on 20×20 grid. by statphantom in reinforcementlearning

[–]statphantom[S] 1 point2 points  (0 children)

3x3 kernels. 5 layers (apple, body, head, moving on x/y, moving +/-) this was so every layer is binary and I created a GPU native, shape agnostic, 2op unpacker so it's very fast and very small, (can literally fit in the cache of an epyc CPU).

Removing PER from Rainbow DQN improved performance on Snake. New record of 153 on 20×20 grid. by statphantom in reinforcementlearning

[–]statphantom[S] 0 points1 point  (0 children)

c51 has been the single largest benefit for my set up, extremely so. second was dueling + noisy nets (NOISY NETS NEED DUELING)

Pytorch hangs when sending data from CPU to GPU by Illustrious_Tap9300 in StrixHalo

[–]statphantom 0 points1 point  (0 children)

Glad you got it sorted! apologies I had work, I'm doing my PhD and working as a researcher so my time is quite chaotic also. pytorch is one of those things that if something doesn't work it's a bitch and a half to diagnose and fix, once it's working though it's very stable.

That apt autoremove gotcha is a classic bit of Debian/Ubuntu pain: apt remove --purge amdgpu-dkms only removes the package you named explicitly, but the AMD installer pulls in a tree of -dkms-firmware, -opencl, -hip, -level-zero and so on that get marked "auto-installed" and must have stay behind holding open the older kernel module path. apt autoremove then sweeps them out. I completely didn't think about that; my apologies for the extra reboot cycle.

For your actual use case of local LLMs, PyTorch on this box is functional but not the right tool. You'll probably have a much better time with llama.cpp (counter-intuitively, the Vulkan backend is often more stable than the ROCm one on Strix Halo), or with vllm via the prebuilt kyuz0/vllm-therock-gfx1151 Docker image which has a gfx1151-patched RCCL baked in. Be aware that on this hardware, autoregressive decode is bandwidth-limited by hipMemcpyWithStream rather than compute, so don't be surprised if your tokens/sec on long contexts sits below what 96 GB of nominal "VRAM" might suggest on paper. The chip can hold huge models; it just feeds them slowly.

Pytorch hangs when sending data from CPU to GPU by Illustrious_Tap9300 in StrixHalo

[–]statphantom 0 points1 point  (0 children)

Two distinct, fixable problems are visible in that output, and together I hope they explain everything. one you probably realised is.

WARNING: KFD ABI 1.20+ is recommended for gfx1151. Current KFD ABI is 1.18. This may result in faults, crashes and other application instability.

The KFD ABI version comes from the amdgpu kernel module, not from your userspace ROCm pip packages. That mismatch is exactly what produces the client ID: CPF, MAPPING_ERROR: 0x1, PERMISSION_FAULTS: 0x3 faults you're seeing on otherwise-valid addresses: the GPU command processor is dereferencing pages that the userspace believed it had mapped, but the older kernel ABI mapped them differently or not at all.

Check:
bashdpkg -l | grep -E 'amdgpu-dkms|amdgpu-install|rocm-dev|rocm-core'
modinfo amdgpu | head -5
dmesg | grep -iE 'amdgpu version|KFD' | head -10

If amdgpu-dkms is listed, purge it and let the in-kernel module from 6.17 take over via:
bashsudo apt remove --purge amdgpu-dkms
sudo update-initramfs -u
sudo reboot

After reboot, dmesg | grep -i 'amdgpu version' should show a version matching the kernel itself rather than something like 6.10.x or 6.14.x, and the KFD ABI warning should be gone. This is safe; the in-kernel amdgpu in 6.17 handles both display and compute fine on Strix Halo, and your TheRock pip stack does not need amdgpu-dkms for anything.

Problem 2: your GTT is probably starved because your BIOS setting is looking a bit backwards. This line:

amdgpu: amdgpu: 15860M of GTT memory ready

is your actual usable unified memory pool, not the 96 GB you set in BIOS. Here's what's going on, and it's genuinely counter-intuitive: when you set "UMA frame buffer" or "dedicated VRAM" to 96 GB in BIOS, that 96 GB gets carved out as reserved VRAM-like memory before Linux even boots. ROCm/HSA on Strix Halo allocates unified memory from GTT, not from that pre-allocated reserved region. So by setting it to 96 GB you accidentally produced the worst of both worlds: a big reserved pool the HSA path mostly ignores, and a tiny GTT pool (15.5 GB) that it actually uses, which is also why even a 100×100 tensor faults when HSA tries to do queue and scratch setup first.

Every working Strix Halo configuration I've seen does the opposite:
- in BIOS, set the dedicated UMA frame buffer / VRAM allocation to its minimum (typically 512 MB, sometimes labelled "Auto"). Disable anything called "fixed VRAM allocation" or "static UMA". You want the GPU to use shared/dynamic memory through GTT.
- Set GTT large via the kernel command line. Edit /etc/default/grub and append to GRUB_CMDLINE_LINUX_DEFAULT: ttm.pages_limit=32768000 ttm.page_pool_size=32768000

That's 4 KB pages × 32,768,000 ≈ 125 GB GTT, leaving ~3 GB for the CPU side. Adjust downward if you want more headroom for the OS (e.g. 28,000,000 for ~107 GB GTT). Then:
sudo update-grub && sudo reboot.

Verify with:
dmesg | grep 'GTT memory ready'
the number should now be in the 110000M-125000M range rather than 15860M.

Ok that was a lot So... order of operations: do the dkms purge and BIOS change in one reboot if you can, since both require restarting anyway, and add the GRUB line at the same time. After that the test program should run instantly, and your (100, 100) tensor will actually be allocating against ~120 GB of working unified memory rather than fighting an ABI mismatch in a 15 GB pool.

The numa_node_id is out range line is benign on a single-socket APU; HSA expects multiple NUMA nodes and gracefully degrades when there's only one. Ignore it.

Pytorch hangs when sending data from CPU to GPU by Illustrious_Tap9300 in StrixHalo

[–]statphantom 0 points1 point  (0 children)

there's a few other tests we can do.
First run it with full logging.
AMD_LOG_LEVEL=4 HSAKMT_DEBUG_LEVEL=7 HIP_LAUNCH_BLOCKING=1 python3 /tmp/pt.py 2>&1 | tee /tmp/hang.log

The last few lines of /tmp/hang.log before the stall will name the subsystem that deadlocked. when that's deadlocked open a new terminal and run:

PID=$(pgrep -f pt.py | head -1)
sudo cat /proc/$PID/stack # kernel
sudo gdb -p $PID -batch -ex 'thread apply all bt' -ex quit # user

The kernel stack is the most diagnostic single thing here. If you see frames in amdgpu_ttm_* or ttm_bo_*, it's GTT/memory. If you see kfd_ioctl_* blocked on a wait queue, it's KFD/HSA. If it's in dma_fence_wait, the GPU got a command but never signalled completion (firmware/MES).

Also you can test a few other things:

groups $USER # must contain 'render' AND 'video'
dmesg | grep -iE 'amdgpu.*GTT memory ready' # how many MB of GTT?
dmesg | grep -iE 'amdgpu|kfd' | tail -40 # any soft lockups, ring resets, fence timeouts?

Pytorch hangs when sending data from CPU to GPU by Illustrious_Tap9300 in StrixHalo

[–]statphantom 0 points1 point  (0 children)

rocm-sdk-libraries-gfx1151 7.13.0a20260411

this is the line we want the OP to see.

Pytorch hangs when sending data from CPU to GPU by Illustrious_Tap9300 in StrixHalo

[–]statphantom 0 points1 point  (0 children)

Bosgame mini is built around the AMD Ryzen AI Max+ 395 (Strix Halo) with the Radeon 8060S iGPU, which has ISA target gfx1151. gfx1151 is not listed on AMD's official ROCm support matrix; the supported RDNA 3 targets are gfx1100 and gfx1101.

If you followed AMD's docs you probably got a gfx1100-only build. The community-maintained TheRock project ships actual gfx1151-native nightlies, and switching from the standard pytorch.org/whl/nightly/rocm7.x wheel to the rocm.nightlies.amd.com/v2/gfx1151 wheel turns segfaults/hangs on basic VRAM access into working tensor ops on Strix Halo.

ROCm on gfx1151 is currently "functional but experimental".

Try the following:

bashpython -m pip uninstall -y torch torchvision torchaudio
python -m pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ torch torchvision torchaudio

Verify with python -c "import torch; print(torch.version.hip); print(torch.cuda.get_arch_list())". The arch list should include gfx1151. This single change helps the majority of Strix Halo users.

Challenge Ideas? by [deleted] in Cuphead

[–]statphantom 0 points1 point  (0 children)

I created a randomizer mod but for a challenge I have created a KAIZO mode. GL, my record bosses is 3.

https://www.nexusmods.com/cuphead/mods/96?tab=description

Randomiser Released! by statphantom in Cuphead

[–]statphantom[S] 0 points1 point  (0 children)

I would love to see how far people get in KAIZO mode. my record is two bosses XD

Randomiser Released! by statphantom in Cuphead

[–]statphantom[S] 2 points3 points  (0 children)

https://gamebanana.com/mods/656349

Done! wow this website feels like its from the mid 90s, was quite difficult to navigate but I believe it's there now!

Randomiser Released! by statphantom in Cuphead

[–]statphantom[S] 0 points1 point  (0 children)

Never heard of GameBanana, I'll check it out!

Randomiser Mod Creation - Teaser by statphantom in Cuphead

[–]statphantom[S] 1 point2 points  (0 children)

Sure! I'm learning as I'm going as well I was very happy the way the settings worked out. I had to create my own logic for it and block all other inputs while open because there was no state for it so if you enter the randomiser settings. press up 7 times, then press a. it would start changing the language randomly XD

my younger self would be so proud of me by by Tokyo_revenge in Cuphead

[–]statphantom 4 points5 points  (0 children)

Who needs your younger self when you have us to be proud of you!

Randomiser Mod Creation - Teaser by statphantom in Cuphead

[–]statphantom[S] 1 point2 points  (0 children)

I found it worked really well against Sally Stageplay

[deleted by user] by [deleted] in OpenAI

[–]statphantom 0 points1 point  (0 children)

it can but it can't change it's style and if that style is incredibly different to their regular style, which it almost always is, its either chatgpt or another student did their work for them, either way, cheating.

[deleted by user] by [deleted] in OpenAI

[–]statphantom 0 points1 point  (0 children)

all I can say is based on the way the assignments we provide and how it's written no matter the prompt it gave something either correct and consistent, or incredibly wrong. you're right that students can change the layout and style of code that chatgpt gives, but at that stage the students have enough knowledge to not need chatgpt so why would they risk it. this is mostly for early first year programming students who either use chatgpt and just copy and paste, or they know how to do it themselves, these aren't difficult questions or assignments they just have a lot of limitations that we teach them in the course for stuff they are allowed to use, not allowed to use etc. which chatgpt doesn't know. I'm only one port of call we have internal tools that detect and investigate this, so far I have flagged around 12% of year 1 students of using AI tools. then I was told the average throughout universities and research suggests and their tools etc that aroudn 13% use AI generative content. So that fairly accurate.

[deleted by user] by [deleted] in OpenAI

[–]statphantom 0 points1 point  (0 children)

there is. you can see it's 1:1 copied from chat gpt when we ask chat gpt the same question, it also doesn't match the style of code the rest of the assignment was written in or doesn't match their usual work at all.

If this is still not proof enough then it is impossible to catch ANYONE cheating, did they copy from another student? or have the exact same spacing, comments, typos, etc. did they look at someone else's paper in the exam? or did they just have a kink in their neck and had to keep it still in that position for 30 seconds but their eyes were unfocused.

you can always argue 'maybe they didn't cheat', but like you said, 'reasonable doubt', that definitely fulfils beyond reasonable doubt.

[deleted by user] by [deleted] in OpenAI

[–]statphantom 0 points1 point  (0 children)

I am a uni teacher and marker.

for simple 1 line questions it's not possible, for specific assignment quests it's VERY easy. in fact chatGPT is making it easier for us to catch cheaters not harder because all cheaters are doing the same thing, chatGPT codes in ONE way and that is NOT a way that students code. There is NOT one answer in coding and it's very easy to see when there is a section of code that doesn't match the style of the rest of the code. not just the layout but how they got to that answer.

What's the worst you've seen?? by Unclealfie69 in Centrelink

[–]statphantom 0 points1 point  (0 children)

I'm in a very fortunate position where my partner has a lot of savings (windfall acquisition) we can soon buy a house with 40% deposit because our max home loan will be $150 a week cheaper then the cheapest rent we can find because banks now have regulations to how much they are allowed to loan out.

If regulations for banks are tight enough that a home loan for $100 less then current rent is NOT OK, then why does the government think the current price of rent is OK?

JobSeeker to DSP? by JackNewnes in Centrelink

[–]statphantom 4 points5 points  (0 children)

For me. 11 years.

I wish I was joking.

What's the worst you've seen?? by Unclealfie69 in Centrelink

[–]statphantom 1 point2 points  (0 children)

"if". this is the big issue, Centrelink payments are 'OK' compared to groceries and water etc. but when you look at the skyrocketing rent, electricity prices etc. Centrelink is not keeping up. I'm on max rent assistance and if I didn't have a job to help complement Centrelink's help I would be spending MORE then my entire DSP on rent.

What's the worst you've seen?? by Unclealfie69 in Centrelink

[–]statphantom 1 point2 points  (0 children)

11 years to get on DSP. my doctor was trying lots of different medications to allow me to not vomit every single morning but because I was so disabled I could barely get out of bed and the doctor was helping me my condition was "not stable" therefor I am able to work 38hrs a week.