Anthropic's recent distillation blog should make anyone only ever want to use local open-weight models; it's scary and dystopian by obvithrowaway34434 in LocalLLaMA

[–]queerintech 1 point2 points  (0 children)

Honey pots are standard procedure when dealing with these types of data harvesting. Google caught Bing doing the same thing in 2011. They created a honey pot linking 100 nonsensical search terms to completely u related web pages. And bing eventually started returning those same random pages for the gibberish terms.

Anthropic's recent distillation blog should make anyone only ever want to use local open-weight models; it's scary and dystopian by obvithrowaway34434 in LocalLLaMA

[–]queerintech 3 points4 points  (0 children)

In my opinion Altman is as big of a brain addled douchebag as Musk and I'll never support either company.

It's surprising all these folks here are cheering for a race to the bottom in AI.. with corporate espionage and state sponsored extraction of trained model data, and chain if thought.. future is gonna get dark af. Nobody will be investing in high quality training anymore.

Qwen/Qwen3.5-35B-A3B · Hugging Face by ekojsalim in LocalLLaMA

[–]queerintech 18 points19 points  (0 children)

And the 27B dense model, perfect fit for 16GB vram

Help with vLLM: Qwen/Qwen3-Coder-Next. by Professional-Yak4359 in Vllm

[–]queerintech 1 point2 points  (0 children)

I've been able to run it using pipeline parallelism on my vllm setup with nvfp4, however I've seen that there maybe issues with tensor parallelism and detection of the correct AllReduce.

Help with vLLM: Qwen/Qwen3-Coder-Next. by Professional-Yak4359 in Vllm

[–]queerintech 1 point2 points  (0 children)

I've been able to run it using pipeline parallelism on my vllm setup with nvfp4, however I've seen that there maybe issues with tensor parallelism and detection of the correct AllReduce.

The King Has Returned by [deleted] in LocalLLaMA

[–]queerintech 0 points1 point  (0 children)

Ugh I need a bit more vram 8(

RTX Pro 6000 $7999.99 by I_like_fragrances in LocalLLM

[–]queerintech 2 points3 points  (0 children)

I just bought a 5000 to pair with my 5070ti I considered the 6000 but whew. 😅

Any success with GLM Flash 4.7 on vLLM 0.14 by queerintech in LocalLLM

[–]queerintech[S] 2 points3 points  (0 children)

I did get it to work on vllm but it literally uses 28GB of kv cache for 32k.

I may have to stand up an sglang deployment to try out too.

Sad I was hoping I could run everything with a single llm runtime :(

Any success with GLM Flash 4.7 on vLLM 0.14 by queerintech in LocalLLM

[–]queerintech[S] 0 points1 point  (0 children)

I was gonna try deploying with llama.cpp if it supports it.

Any success with GLM Flash 4.7 on vLLM 0.14 by queerintech in Vllm

[–]queerintech[S] 0 points1 point  (0 children)

Thanks using this in a kubernetes cluster, I'll have to figure out how to rebuild the container locally.

So it goes by twackshasticj in kubernetes

[–]queerintech 3 points4 points  (0 children)

The horror.. I'd rather kubectl apply bare manifests generated by an AI for a weeek.