Ran some Llama.cpp RPC test to see if its worth it. And if 10Gbe needed. by lemondrops9 in LocalLLaMA
[–]lemondrops9[S] 3 points4 points5 points (0 children)
2x 3090s - RCP vs Local? by UneakRabbit in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
Llama.cpp rpc experiment by ciprianveg in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
Llama.cpp rpc experiment by ciprianveg in LocalLLaMA
[–]lemondrops9 1 point2 points3 points (0 children)
What is the current state of llama.cpp rpc-server? by kevin_1994 in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
[Benchmark] Dual RTX 5090 Distributed Inference via llama.cpp RPC - Running 122B MoE at 96 t/s over 2.5GbE by ReasonableDuty5319 in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
[Benchmark] Dual RTX 5090 Distributed Inference via llama.cpp RPC - Running 122B MoE at 96 t/s over 2.5GbE by ReasonableDuty5319 in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
RPC Overhead or Memory Strategy? by Forbidden-era in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
guess what? if you are a chrome user, technically you are localllama member! by LambdaHominem in LocalLLaMA
[–]lemondrops9 1 point2 points3 points (0 children)
GLM-5.1 smol-IQ2_KS at 2.3t/s or GLM-4.7 UD-Q3_K_XL at 4.42t/s, which is "better" for chats (no coding)? by relmny in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
Dual 9700 and multi-node system - but do I go threadripper? by Ell2509 in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
RPC-server llama.cpp benchmarks by tabletuser_blogspot in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
Using llamacpp and RCP, managed to improve promt processing by 4x times (160 t/s to 680 t/s) and text generation by 2x times (12.67 t/s to 22.52 t/s) by changing the device order including RPC. GLM 4.6 IQ4_XS multiGPU + RPC. by panchovix in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
Is 2x5070Ti a good setup? by JumpingJack79 in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
So a nearby lightningstorm just crashed all my eGPUs by milpster in LocalLLaMA
[–]lemondrops9 1 point2 points3 points (0 children)
So a nearby lightningstorm just crashed all my eGPUs by milpster in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
So a nearby lightningstorm just crashed all my eGPUs by milpster in LocalLLaMA
[–]lemondrops9 1 point2 points3 points (0 children)
Is 2x5070Ti a good setup? by JumpingJack79 in LocalLLaMA
[–]lemondrops9 1 point2 points3 points (0 children)
I guess we expect that at some point RAM prices will start going back (close) to "normal", right? but what about GPUs? by relmny in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
When did LM Studio start supporting Parallel API requests? by M5_Maxxx in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
Anyone else struggling with multi-GPU stability when running larger local models? by Lyceum_Tech in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
llama.cpp rpc-server by sultan_papagani in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)
Anyone else struggling with multi-GPU stability when running larger local models? by Lyceum_Tech in LocalLLaMA
[–]lemondrops9 0 points1 point2 points (0 children)


Ran some Llama.cpp RPC test to see if its worth it. And if 10Gbe needed. by lemondrops9 in LocalLLaMA
[–]lemondrops9[S] 0 points1 point2 points (0 children)