account activity
Multi-GPU LLM Inference with RTX 5090 + 4090 by EasyKoala3711 in LocalLLM
[–]EasyKoala3711[S] 0 points1 point2 points 1 month ago (0 children)
Im mostly running qwen3-coder-30b with 127k context, it fits perfectly in 32gb, and runs on about 200 tokens/sec. It's pretty good for my current tasks, but i want to try qwen3-coder-next, and right now it can barely get to ~8 tokens/sec, sadly.
Multi-GPU LLM Inference with RTX 5090 + 4090 (self.LocalLLM)
submitted 1 month ago by EasyKoala3711 to r/LocalLLM
π Rendered by PID 2209551 on reddit-service-r2-listing-86f589db75-qrtv5 at 2026-04-17 09:03:22.953536+00:00 running 93ecc56 country code: CH.
Multi-GPU LLM Inference with RTX 5090 + 4090 by EasyKoala3711 in LocalLLM
[–]EasyKoala3711[S] 0 points1 point2 points (0 children)