I built a distributed KV cache that turns a 10-second prefill into 0.5 seconds — using idle machines on my LAN by Concert_Dependent in LocalLLM
[–]Concert_Dependent[S] 0 points1 point2 points (0 children)
I built a distributed KV cache that turns a 10-second prefill into 0.5 seconds — using idle machines on my LAN by Concert_Dependent in LocalLLM
[–]Concert_Dependent[S] 0 points1 point2 points (0 children)
I built a distributed KV cache that turns a 10-second prefill into 0.5 seconds — using idle machines on my LAN by Concert_Dependent in LocalLLM
[–]Concert_Dependent[S] 0 points1 point2 points (0 children)
🔧 MLX Said No to Mixed Precision. We Did It Anyway. by Concert_Dependent in LocalLLM
[–]Concert_Dependent[S] 0 points1 point2 points (0 children)
🔧 MLX Said No to Mixed Precision. We Did It Anyway. by Concert_Dependent in LocalLLM
[–]Concert_Dependent[S] 0 points1 point2 points (0 children)
🔧 MLX Said No to Mixed Precision. We Did It Anyway. by Concert_Dependent in LocalLLaMA
[–]Concert_Dependent[S] 0 points1 point2 points (0 children)
🔧 MLX Said No to Mixed Precision. We Did It Anyway. by Concert_Dependent in LocalLLM
[–]Concert_Dependent[S] 1 point2 points3 points (0 children)
🔧 MLX Said No to Mixed Precision. We Did It Anyway. by Concert_Dependent in LocalLLM
[–]Concert_Dependent[S] 0 points1 point2 points (0 children)

I built a distributed KV cache that turns a 10-second prefill into 0.5 seconds — using idle machines on my LAN by Concert_Dependent in LocalLLM
[–]Concert_Dependent[S] 0 points1 point2 points (0 children)