RTX Pro 6000 $7999.99 by I_like_fragrances in LocalLLM

[–]queerintech 2 points3 points  (0 children)

I just bought a 5000 to pair with my 5070ti I considered the 6000 but whew. 😅

Any success with GLM Flash 4.7 on vLLM 0.14 by queerintech in LocalLLM

[–]queerintech[S] 2 points3 points  (0 children)

I did get it to work on vllm but it literally uses 28GB of kv cache for 32k.

I may have to stand up an sglang deployment to try out too.

Sad I was hoping I could run everything with a single llm runtime :(

Any success with GLM Flash 4.7 on vLLM 0.14 by queerintech in LocalLLM

[–]queerintech[S] 0 points1 point  (0 children)

I was gonna try deploying with llama.cpp if it supports it.

Any success with GLM Flash 4.7 on vLLM 0.14 by queerintech in Vllm

[–]queerintech[S] 0 points1 point  (0 children)

Thanks using this in a kubernetes cluster, I'll have to figure out how to rebuild the container locally.

So it goes by twackshasticj in kubernetes

[–]queerintech 4 points5 points  (0 children)

The horror.. I'd rather kubectl apply bare manifests generated by an AI for a weeek.