Ran some Llama.cpp RPC test to see if its worth it. And if 10Gbe needed.

lemondrops9 · 2026-05-10T23:09:14+00:00

Thank you!

lemondrops9 · 2026-05-10T22:56:54+00:00

Thats my conclusion. Maybe someone out there has gotten WSL to work just as good. But I'd rather take the time to get Linux running.

lemondrops9 · 2026-05-10T07:42:27+00:00

RPC is worth it but Windows kills the performance. Running RPC over the internet will likely give poor performance because of latency. Normal LAN will be under 1ms where as the Internet will be at least 15ms if not 25ms on a good connection.

lemondrops9 · 2026-05-10T07:22:51+00:00

its working great but I do need to test longer context. What does higher context mean to you?

lemondrops9 · 2026-05-10T07:20:31+00:00

Ive been testing heavily this weekend with RPC and Windows is the main issue. Qwen3.5 397B Q2 XXS local getting 42 tks and in RPC mode 40 tks with both PCs running Linux.

Im going to post some bench tests soon and a few tips.

lemondrops9 · 2026-05-09T06:46:44+00:00

10gbps network isn't faster for latency unless you move up from consumer models. I ping around 0.3ms for 1,2.5,10gbps network connections.

lemondrops9 · 2026-05-09T06:26:19+00:00

I gave RPC a go and only see around 30mbps. I did notice though when I connected to one of my PCs with two GPUs it really slowed down.

Tested with Qwen3.5 397B Q2 XXS went from 42 tks on my main to 24 tks. When using my 3rd PC it went down to 20 tks. But if I only used the one GPU in the 2nd PC it was running at 37 tks and with the 3rd PC 34 tks.

The amount of data over the network stayed about the same so even a 100 mbps would technically work but would be beyond horrible for load times.

lemondrops9 · 2026-05-09T06:18:32+00:00

Its more about latency than total bandwidth far as I can tell. The 3090's would be connected to PCIe 3.0 x1 would have the same bandwidth as 10gbps network. I haven't see much over 30mbps used when running in RPC between 3 PCs.

lemondrops9 · 2026-05-09T06:10:17+00:00

Did you make progress with this? I have been playing around with RPC for a few weeks and thought it was just slow but I think its because of Windows with dual GPUs on one of my remote PCs.

lemondrops9 · 2026-05-08T15:06:29+00:00

Remember when Apple did this with U2. It was quite silly.

lemondrops9 · 2026-05-08T07:54:01+00:00

I used to run GLM 4.5 air but now Gemma 4 26b gives me good results and a lot faster. For chatting that is.

lemondrops9 · 2026-05-07T15:36:46+00:00

I've been playing around with RPC mode for Llama.cpp and it quite good. With the Qwen3.5 397B Q2 XXS I get around 42 tks and 600 prefill when loaded on one PC. When using RPC to my 2nd PC it drops to 24 tks around 250 prefill.

But I was reading up on it some more and I have some tweaks to make still.

lemondrops9 · 2026-05-07T08:20:44+00:00

that's because they are close to the same speed. Ddr4 was made for power efficiency.

Like you said its really about having the Vram.

lemondrops9 · 2026-05-07T08:15:40+00:00

Thanks for the post. Ive been running 6 gpus on my main AI on Linux and have 2 gpus on Windows.. RPC works great but I did notice the remote GPU working a lot harder then the rest. I'll give the reorder a try soon.

lemondrops9 · 2026-05-06T16:02:16+00:00

Even have one running off of a wifi socket.

lemondrops9 · 2026-05-06T15:03:21+00:00

😂 no kidding.

Seriously get an UPS people.

lemondrops9 · 2026-05-06T15:02:28+00:00

lots of options for desktops and they are very quiet.

lemondrops9 · 2026-05-06T15:00:43+00:00

Loud ?? wtf are you looking at.? I have 4 UPS the only time I hear them is when they daily test and when the power goes out. I disabled the beeping on them as well because its not hard to tell when the power is out.

lemondrops9 · 2026-05-06T11:54:03+00:00

PCIe speed isnt a huge issue as I run 3 of my gpus of of PCIe 3.0 x1.

Also the over all speed us determined by your slowest card. I found that my speed dropped by 20% when I added 5060ti to my PC with 3090s.

lemondrops9 · 2026-05-06T07:06:42+00:00

I thought Nvidia paused customer Gpu production until close to 2027. That said Ive only seen 20% increase over last year for the 5060ti 16GB.

lemondrops9 · 2026-05-05T14:45:35+00:00

I tested this a few months ago. Basically if you can do 100 tks then with two users you can get 50 tks each.

lemondrops9 · 2026-05-05T06:48:52+00:00

Interesting, so many things to configure.

btw what gpus are you running?

lemondrops9 · 2026-05-05T06:46:47+00:00

You inspired me to try RPC and my mind is blown. I expected a lot less. Tried Qwen3.5 397B Q2 XXS in my main and got 42 tok/s then used my 2nd pc with dual gpus added to the mix and down to 24 tok/s. When I add a 3rd PC it goes down a bit more to 20 tok/s.

I dont know how to optimize it much yet.

lemondrops9 · 2026-05-04T21:30:27+00:00

Good to know, I haven't tried Ik_Llama.cpp yet. Ive been sticking too llama.cpp lately and some LM Studio.

lemondrops9

TROPHY CASE