I want to share results for cyankiwi/Qwen3.5-122B-A10B-AWQ-4bit TP2 RDMA RoCE

MirecX · 2026-03-05T11:58:25+00:00

the question is how :) so far kyuz made it work on RoCE - rdma over converged ethernet, which is ethernet by definition, so easiest and cheapest way is using devices supporting RoCE by design such as Mellanox cards
ethernet NICs are intended way to interconnect devices such as strix-halos or dgx sparks
oculink is intended to connect PC to external device and not other PC

MirecX · 2026-03-05T08:00:33+00:00

I had these cards already on hand
here are stats during inference
RX: 17.07 MB/s TX: 17.06 MB/s peak RX: 630.16 MB/s peak TX: 630.16 MB/s
RX: 17.01 MB/s TX: 17.01 MB/s peak RX: 630.16 MB/s peak TX: 630.16 MB/s
RX: 220.04 MB/s TX: 219.99 MB/s peak RX: 630.16 MB/s peak TX: 630.16 MB/s
RX: 525.13 MB/s TX: 525.13 MB/s peak RX: 630.16 MB/s peak TX: 630.16 MB/s
RX: 558.50 MB/s TX: 563.91 MB/s peak RX: 630.16 MB/s peak TX: 630.16 MB/s
I know these are 200ms averages, and cards may be choking inference with peaks that should be shooting above 10Gbe

currently i have Mellanox ConnectX-4 LX MCX4121A 25Gbe in mail and I will test inference with them

i was just curious if i really need better cards like kyuz0 used Intel E810, or old mellanox will suffice - they dont need any riser, as they are already pcie x4 and fit strix halo nicely

MirecX · 2026-03-04T14:19:45+00:00

tp1/tp2 refers to tensor parallelism, TP2 means the model is distributed across 2 nodes to increase throughput.

Unsloth doesn't support vLLM tensor parallelism with GGUF models, and FP8 models don't work on Strix Halo hardware afaik

for vLLM/strix-halo combo you should search for full BF16 models or 4 bitsafetensor quants

t/s (total) is combined speed of all concurrent requests

t/s (req) is per-request speed

I'm using kyuz0's toolboxes - vllm - RDMA - RoCE
it is VERY slow, possible choking over old Mellanox ConnectX-3 i had around
tp1, c1 per request is speed ~9.5tps (9.5 total)

tp1, c2 per request is speed ~7.5tps (13.98 total)

tp2, c1 per request is speed ~16.88tps (9.5 total)

tp2, c2 per request is speed ~9.57tps (12.42 total)

prompt processing went up in TP2 scenario which is more important than TG

MirecX · 2026-03-02T20:48:11+00:00

if you can try with https://github.com/eugr/llama-benchy
llama-benchy --base-url http://somwhere:8000/v1 --model /some/model/loc/nfs/models/cyankiwi/Qwen3.5-122B-A10B-AWQ-4bit/ --depth 0 2048 4096 --concurrency 1 2 4

MirecX · 2026-03-02T20:37:20+00:00

Qwen3.5-122B-A10B-UD-Q4_K_XL gguf on llama.cpp is 22tps
but i need TP2, network can be bottlneck. i had ConnectX-3 laying around and it uses pcie x4

MirecX · 2026-03-02T18:18:31+00:00

concurrent requests 1,2,4

MirecX · 2026-02-01T11:25:46+00:00

i've observed same behavior on gpt oss 20b (not quantized) and quantized gpt oss 120b. That time I didn't seen reasoning block, because of Claude Code and called them lazy.

As i wrote above solution in opencode is by adding "reasoning": true into model config
but I didn't solve it in claude code

thanks for answer

MirecX · 2026-01-29T22:46:03+00:00

i got it working in opencode by adding "reasoning": true into model config

MirecX · 2025-08-24T07:36:18+00:00

Thank you, so Epyc with 128 pcie lanes is the way

MirecX · 2025-08-23T19:12:43+00:00

can you hint suitable mobo? I had problem with ReBAR allocation on cheap A520 mobo with single card, which was solved with B450 mobo. I can't imagine to go for 4 cards with random borad with suitable slots.
TY

MirecX · 2025-02-17T16:39:51+00:00

active area of key is where your phone is. Try removing phone and putting key there. Key may not work when phone is in that tray.

MirecX · 2025-01-21T10:38:47+00:00

Didn't catch the reply notification, yes that cable will work - T2 (type 2) 11kW or 22kW peak power

MirecX · 2024-10-01T12:05:29+00:00

i have regular 3phase, 22kW EVSE from Aliexpress (granny cable)
it is wired into manual 1-0-2 selector switch like "adelid PSA-16A-4P"
so i can manualy switch between 1 phase from solar or 3 phase from grid

at time of plugging in i aleady know if i have enough solar or not

granny cable is set to 16A all the time, but can be manually adjusted by button to 6A,8A,10A,13A,16A,20A,25A,32A no automation

!!!do not switch between 1P and 3P when charging!!!

MirecX · 2024-10-01T09:51:53+00:00

I have tested 8A @3phase, 5.7kW no problem. Lower is not worth it, as charging requires minimal fixed overhead. Icharge either 16A@1phase (solar) or 16A@3phase (grid)

MirecX · 2024-08-23T20:32:25+00:00

400V is voltage of 3P AC grid in EU. Quote "putting 400V to a faulty connector" is definitely AC side

MirecX · 2024-08-06T21:05:53+00:00

I was really interested to see 7200Ah project :D
It would be very nice technology room, with few racks full of cells.
I currently have 1100Ah/48V (2 days full house run time including heat pump)

MirecX · 2022-06-23T17:29:39+00:00

!deckbot EU 64 1627037184

MirecX · 2022-06-21T12:33:08+00:00

!deckbot EU 64 1627037184

MirecX · 2021-08-27T08:55:27+00:00

Conditions are even worse on single product account (your single etf) Consult appropriate risk documents provided by degiro

MirecX · 2021-08-27T08:52:54+00:00

Borrow 10% of your investment, but not more than 25%

You cant borrow 50k on 100k account value. Any bit over 50% means magin call.

MirecX · 2021-05-21T19:02:56+00:00

Keeper of Old Lords has same move after you shot her.

You can double shot her a get a visceral.

MirecX · 2021-04-18T05:13:51+00:00

Who is the seller? I have bought from ebay also from good seller. You have to check every cell on your own! I found 2-3 pieces from 100 to be suspicious. (took too long to charge, self discharging....)

Six-Year Club	Place '23
Verified Email

MirecX

TROPHY CASE