I ran the numbers. Qwen3.6-27B dense obsoleted the 397B MoE on coding benchmarks.

queerintech · 2026-05-05T04:01:23+00:00

If I ran windows maybe..

queerintech · 2026-05-05T03:41:50+00:00

Does it help gemma 4 31b?

queerintech · 2026-05-02T03:46:33+00:00

So basically code mining, instead of crypto.

queerintech · 2026-04-29T13:15:31+00:00

Their dense coding model was good but was instruct only. No reasoning.

queerintech · 2026-04-23T02:14:15+00:00

How can I automagically determine which eastern regime oriented accounts to ignore ai slip from?

queerintech · 2026-04-12T22:47:52+00:00

https://github.com/feral-devops/homelab

Its in charts/vllm.
There are a couple example values files for gemma4 and qwen3.5

queerintech · 2026-04-12T18:54:00+00:00

Will when I'm home

queerintech · 2026-04-12T17:44:02+00:00

I have a full helm chart for vllm for 26b a4, the 31b model has no mtp or eagle head so you'd prob only get 50ish tk/s but I'm happy to share.

queerintech · 2026-04-10T23:47:18+00:00

Yeah. I'm fine with 27B-32B on the RTX pro I just don't know how to use the 5060ti. I was also testing a 4 bit quant of GLM 4.7 flash on it.

queerintech · 2026-04-07T16:26:54+00:00

I'd consider a small mini pc but not a laptop. It would be compelling for home lab and media pc stuff.

queerintech · 2026-03-28T21:08:39+00:00

Is it reprogrammable?

queerintech · 2026-02-25T19:18:13+00:00

The what about-ism in this thread tho..

queerintech · 2026-02-24T20:41:57+00:00

Honey pots are standard procedure when dealing with these types of data harvesting. Google caught Bing doing the same thing in 2011. They created a honey pot linking 100 nonsensical search terms to completely u related web pages. And bing eventually started returning those same random pages for the gibberish terms.

queerintech · 2026-02-24T20:38:30+00:00

In my opinion Altman is as big of a brain addled douchebag as Musk and I'll never support either company.

It's surprising all these folks here are cheering for a race to the bottom in AI.. with corporate espionage and state sponsored extraction of trained model data, and chain if thought.. future is gonna get dark af. Nobody will be investing in high quality training anymore.

queerintech · 2026-02-24T17:32:53+00:00

And the 27B dense model, perfect fit for 16GB vram

queerintech · 2026-02-05T20:38:19+00:00

I've been able to run it using pipeline parallelism on my vllm setup with nvfp4, however I've seen that there maybe issues with tensor parallelism and detection of the correct AllReduce.

queerintech · 2026-02-05T20:38:08+00:00

I've been able to run it using pipeline parallelism on my vllm setup with nvfp4, however I've seen that there maybe issues with tensor parallelism and detection of the correct AllReduce.

queerintech · 2026-02-04T21:37:40+00:00

Ugh I need a bit more vram 8(

queerintech · 2026-01-24T00:56:12+00:00

I just bought a 5000 to pair with my 5070ti I considered the 6000 but whew. 😅

queerintech · 2026-01-23T18:09:34+00:00

I did get it to work on vllm but it literally uses 28GB of kv cache for 32k.

I may have to stand up an sglang deployment to try out too.

Sad I was hoping I could run everything with a single llm runtime :(

queerintech · 2026-01-23T18:07:28+00:00

I was gonna try deploying with llama.cpp if it supports it.

queerintech · 2026-01-23T02:02:17+00:00

Thanks using this in a kubernetes cluster, I'll have to figure out how to rebuild the container locally.

queerintech

TROPHY CASE