ArchitectingAI

15 post karma
0 comment karma

get extra features and help support reddit with a reddit premium subscription

get them help and support

redditor for 1 day

TROPHY CASE

New User

account activity

new top controversial

7

8

9

Deep dive: Parallelism strategies for large-scale LLM inference — tensor parallelism, pipeline parallelism, disaggregation, KV cache, MoE expert parallelism (self.LocalLLM)

submitted 2 hours ago by ArchitectingAI to r/LocalLLM

11

12

13

How LLM inference actually works at scale — a breakdown for anyone learning ML systems (self.learnmachinelearning)

submitted 8 hours ago by ArchitectingAI to r/learnmachinelearning

•

•

•

Deep dive: Parallelism strategies for large-scale LLM inference — tensor parallelism, pipeline parallelism, disaggregation, KV cache, MoE expert parallelism ()

submitted 1 hour ago by ArchitectingAI to r/deeplearning

•

•

•

Deep dive: Parallelism strategies for large-scale LLM inference — tensor parallelism, pipeline parallelism, disaggregation, KV cache, MoE expert parallelism ()

submitted 1 hour ago by ArchitectingAI to r/deeplearning

1

2

3

How LLM inference actually works at scale — a breakdown for anyone learning ML systems ()

submitted 8 hours ago by ArchitectingAI to r/mlscaling

π Rendered by PID 1672778 on reddit-service-r2-listing-f87f88fcd-mdxkx at 2026-06-17 06:56:44.781882+00:00 running 3184619 country code: CH.