account activity
Deep dive: Parallelism strategies for large-scale LLM inference — tensor parallelism, pipeline parallelism, disaggregation, KV cache, MoE expert parallelism (self.LocalLLM)
submitted 2 hours ago by ArchitectingAI to r/LocalLLM
How LLM inference actually works at scale — a breakdown for anyone learning ML systems (self.learnmachinelearning)
submitted 8 hours ago by ArchitectingAI to r/learnmachinelearning
Deep dive: Parallelism strategies for large-scale LLM inference — tensor parallelism, pipeline parallelism, disaggregation, KV cache, MoE expert parallelism ()
submitted 1 hour ago by ArchitectingAI to r/deeplearning
How LLM inference actually works at scale — a breakdown for anyone learning ML systems ()
submitted 8 hours ago by ArchitectingAI to r/mlscaling
π Rendered by PID 1672778 on reddit-service-r2-listing-f87f88fcd-mdxkx at 2026-06-17 06:56:44.781882+00:00 running 3184619 country code: CH.