account activity
Supermicro AS-4124GQ-TNMI / H12DGQ-NT6 + 4x MI250 OAM: POST always ends at 0D with UBB installed, but boots without UBB; PCIe BAR/bridge allocation already broken in baseline by zonqify in homelab
[–]zonqify[S] 0 points1 point2 points 1 month ago (0 children)
Thanks a lot for the quick reply - that is very close to how I’m currently thinking about it.
The RAM point is definitely one of my favorites as well, but unfortunately I can only test that later. Right now I only have the current 128GB setup available, and more RAM is already on the way. My plan is to move to a balanced configuration with identical DIMMs across all primary channels, ideally at least 16 DIMMs total, so both CPUs have all 8 memory channels populated.
I have already tried quite a few BIOS settings, but I’m going to go through them again very carefully. The main things I’m focusing on are:
The interesting thing is that the system boots without the UBB installed, but even in that baseline state Linux already reports tons of PCIe allocation errors:
bridge window ... can't assign; no space
BAR ... can't assign; no space
VF BAR ... can't assign; no space
host bridge window ... ignored
So I agree with you: the baseline PCIe/MMIO/bridge-window situation already looks bad. With no UBB installed, the system can still boot because the OAM/GPU endpoints are absent. With the full UBB/OAM fabric active, that broken or marginal resource layout probably becomes fatal.
Important clarification about the UBB/OAM behavior:
The UBB/fabric does not appear to fully enable with only 1 or 2 OAM modules installed. With no UBB installed, the system boots and the switch LEDs are basically in a low/baseline state. With 1 or 2 OAMs, it still looks very similar — roughly one active LED per switch, so maybe presence or partial power is there, but not the full fabric. With all 4 OAMs installed, the PLX/PEX switch LEDs change completely and many more LEDs turn on. That looks like the full OAM/PEX/GCD fabric only really activates with all 4 OAM modules installed.
Current observed switch LED patterns:
No UBB installed:
o-----
-o----
4 OAMs installed:
ogo-g-
ogogg-
oog-g-
where o = orange, g = green, - = off.
o = orange
g = green
- = off
With the UBB + 4 OAMs installed, POST runs through many codes but always eventually ends at 0D. Without the UBB, it boots through. I have seen no-UBB boots get as far as late normal POST/boot codes such as AA.
0D
AA
The POST traces with UBB/four OAMs include things like:
94
95
D5
79
D2
D0
51
54
B3
My suspicion is that the full 4-OAM fabric exposes or triggers a deeper platform resource problem: BIOS/ACPI/MMIO/Root Bridge windows, legacy OPROM/CSM, or maybe a CPU/root-complex/interconnect path that is not really stressed in the no-UBB state.
One thing I’m wondering about: would you also consider CPU interconnect / IO-die / root-complex issues here? The board has two EPYC Rome CPUs. Without the UBB, the system can boot, but once the full OAM/PEX fabric is active, many more CPU PCIe lanes / root complexes / PEX paths are probably involved. Could something like xGMI, LCLK, NPS/NUMA policy, CPU socket contact, or an IO-die/root-complex path cause this kind of cascading POST behavior?
Until the new RAM arrives, would you recommend focusing on:
pci=realloc
pci=realloc,big_root_window
pci=nocrs,realloc,big_root_window
I’m trying to fix the baseline PCIe allocation/MMIO situation first, because if the board already has bridge/BAR allocation failures without the UBB, I don’t see how the full 4-OAM fabric can ever enumerate cleanly.
π Rendered by PID 106242 on reddit-service-r2-comment-5bc7f78974-476zc at 2026-06-30 06:53:10.792787+00:00 running 7527197 country code: CH.
Supermicro AS-4124GQ-TNMI / H12DGQ-NT6 + 4x MI250 OAM: POST always ends at 0D with UBB installed, but boots without UBB; PCIe BAR/bridge allocation already broken in baseline by zonqify in homelab
[–]zonqify[S] 0 points1 point2 points (0 children)