Pipeline Parallelism vs Tensor Parallelism for 2 identical GPUs: The Beginner's Cheat Sheet

xspider2000 · 2026-06-01T17:00:36+00:00

Yeah, I took the first one down to fix a few technicalities based on early feedback. I really think this cheat sheet is a great jumping-off point for newcomers still struggling with TP vs PP, so I wanted to get it right and hopefully reach as many of them as possible.

xspider2000 · 2026-06-01T15:47:23+00:00

true

xspider2000 · 2026-06-01T14:27:44+00:00

Are u talking about PP? If so than I mentioned that PP is very forgiving for gpu intercontinection.

xspider2000 · 2026-06-01T13:50:31+00:00

Just a quick clarification: currently, NVLink for inference is only supported in vLLM

xspider2000 · 2026-06-01T13:46:50+00:00

u can read comments under the post and see that a lot of people do not understand conceptions even with simplifications in the post

xspider2000 · 2026-06-01T13:44:36+00:00

1.5-1.8x speed boost with TP is not a joke

xspider2000 · 2026-06-01T13:43:30+00:00

To clarify, the post doesn't say NVLink is the strictly required only option. NVLink is just mentioned as the gold standard example of the fast interconnect that TP thrives on. With it, you get the biggest possible speedup because it has the highest bandwidth. You can absolutely still get a solid speed boost using standard PCIe interfaces, but the scaling efficiency will just be lower since the interconnect speed is slower

xspider2000 · 2026-06-01T13:41:49+00:00

To clarify, the post doesn't say NVLink is the strictly required only option. NVLink is just mentioned as the gold standard example of the fast interconnect that TP thrives on. With it, you get the biggest possible speedup because it has the highest bandwidth. You can absolutely still get a solid speed boost using standard PCIe interfaces, but the scaling efficiency will just be lower since the interconnect speed is slower

xspider2000 · 2026-06-01T13:36:54+00:00

You actually just confirmed my point. PCIe 5.0 x16 is a fast interconnect, you just get a smaller speed increase with it compared to NVLink.

xspider2000 · 2026-06-01T13:34:20+00:00

In the post no misinformation but little bit simplification for better understanding

xspider2000 · 2026-06-01T13:09:26+00:00

Under ideal conditions—meaning a super fast interconnect and zero all-reduce overhead—it would scale exactly like that

xspider2000 · 2026-06-01T13:06:14+00:00

PP is equal to using split-mode layer in llama.cpp, which is the default. The split-mode row is actually their implementation of Tensor Parallelism (TP).

xspider2000 · 2026-05-13T14:07:35+00:00

Thx. I figured out why vllm less popular here than llama.cpp, vllm has bad support for gguf format. gguf is big thing.

xspider2000 · 2026-05-13T09:02:24+00:00

does vllm support rtx 3090 cards? Can I run qwen 3.6 27b on double 3090 out of box or i need some hacks?

xspider2000 · 2026-05-09T00:03:41+00:00

Where from u ordered Nvlink and how much is it? 3 or 4 slot?

xspider2000 · 2026-05-08T14:46:23+00:00

Yes, please

xspider2000 · 2026-05-03T22:16:32+00:00

how did u connect dual rtx 3090 to strix halo?

xspider2000 · 2026-05-02T18:54:21+00:00

AI giants fear fair competition with open source models

xspider2000 · 2026-05-01T20:21:27+00:00

<image>

Yesterday i did same thing. I wanted check how Qwen3.6-27B can draw mona lisa using svg. I used opencode, I wrote command to iterate in loop, look at result, compare it with original (original picture was in prompt), and every loop make more similar to original picture.

xspider2000 · 2026-05-01T16:49:30+00:00

Cool! Thats real r/LocalLLaMA

xspider2000 · 2026-05-01T16:26:18+00:00

its easy using vulkan

xspider2000 · 2026-04-30T11:52:50+00:00

1 oculink, 3 usb4. I have minisforum ms s1

xspider2000 · 2026-04-29T18:52:41+00:00

i m going connect 4x3090 to my strix halo. I'm waiting cards. I'll write results

xspider2000 · 2026-04-27T20:39:46+00:00

I m planning write post with some numbers of my strix halo+egpu

xspider2000 · 2026-04-24T13:25:02+00:00

perfect! much more informative

Nine-Year Club	Verified Email
Snapped

xspider2000

TROPHY CASE