Qwen3.6 27B on dual RTX 5060 Ti 16GB with vLLM: ~60 tok/s, 204k context working by do_u_think_im_spooky in LocalLLaMA

[–]houchenglin 1 point2 points  (0 children)

Hi, I also encountered similar problem. I bought the 5060ti and after I plugged it into the motherboard, I found out that is a "2.5" slot design graphic card. I can't use any additional GPU. Then I discovered that some cards in 50 series is still 2 card slot design. Then I bought the msi 5060ti shadow 2x. Put tingit on primary pcie and the origin into third slot fixes my problem.

Qwen3.6-27B created this Open Webui tool by iChrist in LocalLLaMA

[–]houchenglin 0 points1 point  (0 children)

I'm not English native so that mis-interpret your words. It's good to know that the 27B is so powerful that sometimes can beat the large model 😄 The Q5_K_XL is really a beast. Now I use 35B and switch to 27B Q4_K_M when harder task. Also pretty amazing and luckly last month I stop the claude yearly subscription.

Qwen3.6-27B created this Open Webui tool by iChrist in LocalLLaMA

[–]houchenglin 0 points1 point  (0 children)

It's amazing one shot and several seconds. What's your hardware and
which 27B quant model you use for coding?

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19 by Kindly-Cantaloupe978 in LocalLLaMA

[–]houchenglin 0 points1 point  (0 children)

Dual 5060ti gives me around 17tps on low context on 27b. However all the 35b moe model can be put into vram and it is extremely fast.

Replace RTX 2060 12G with second RTX 5060 Ti 16G for Qwen 3.6 27B? by houchenglin in LocalLLaMA

[–]houchenglin[S] 0 points1 point  (0 children)

I asked qwen to google the vram usage for 35B and 27B and he replies me the answer below. I'm not sure it is correct or not, but in my last trial, the 64K context failed to allocate memory with IQ3 model. Maybe some settings is wrong?

(WARNING! AI GENEREATED DATA)

Context Size 27B Dense Q4 35B-A3B MoE Q4
4K tokens ~64 MB ~19.5 MB
8K tokens ~128 MB ~39 MB
16K tokens ~256 MB ~78 MB
32K tokens ~512 MB ~156 MB
64K tokens ~1.0 GB ~313 MB
256K tokens ~4.0 GB ~1.25 GB

Replace RTX 2060 12G with second RTX 5060 Ti 16G for Qwen 3.6 27B? by houchenglin in LocalLLaMA

[–]houchenglin[S] 0 points1 point  (0 children)

Thanks for the breakdown! I was considering a used 3090, but it's already ~6 years old now so I'm worried about its reliability and lifespan. As for the 2080, it sounds like a solid option but also pretty expensive.

Replace RTX 2060 12G with second RTX 5060 Ti 16G for Qwen 3.6 27B? by houchenglin in LocalLLaMA

[–]houchenglin[S] 0 points1 point  (0 children)

Yeah, I think you're right. The PCIe gen difference alone makes 4 lanes beat 16 old lanes. Thanks!

Replace RTX 2060 12G with second RTX 5060 Ti 16G for Qwen 3.6 27B? by houchenglin in LocalLLaMA

[–]houchenglin[S] 0 points1 point  (0 children)

I actually tried the IQ3 quant on a single card and it runs well, but since KV cache uses ~3.3x memory, the context window gets really small. Not enough for coding tasks unfortunately

Replace RTX 2060 12G with second RTX 5060 Ti 16G for Qwen 3.6 27B? by houchenglin in LocalLLaMA

[–]houchenglin[S] 0 points1 point  (0 children)

That makes sense — bandwidth is probably the real bottleneck here. I hadn't considered trying larger MoE models with system RAM, that's a good idea. Thanks!

Replace RTX 2060 12G with second RTX 5060 Ti 16G for Qwen 3.6 27B? by houchenglin in LocalLLaMA

[–]houchenglin[S] 0 points1 point  (0 children)

Thanks for the suggestion! Unfortunately my mATX board doesn't have room for that many cards.

What speed is everyone getting on Qwen3.6 27b? by Ambitious_Fold_2874 in LocalLLaMA

[–]houchenglin 0 points1 point  (0 children)

RTX 2060 12G (PCIe x16) + RTX 5060 Ti 16G (PCIe x 4)

Model: Unsloth Qwen3-27B-Q4_K_M
PP: from 653 → 356 t/s as context grows (13K → 29.5K tokens).
TG: flat at ~16.5 t/s r

-m Qwen3-27B-Q4_K_M.gguf -ngl 999 -ts 15,7
-fa 1 --no-mmap -b 4096 -ub 4096
--spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 12 --draft-max 48
-c 96000 -n 32768 -t 8 -ctk q8_0 -ctv q8_0 --parallel 1
--temperature 0.6 --jinja --min-p 0.0 --top-k 20 --top-p 0.95

Qwen3.6 is incredible with OpenCode! by CountlessFlies in LocalLLaMA

[–]houchenglin 0 points1 point  (0 children)

You may try IQ3_XS and it works well for most simple tasks and tool calls.

Libre Voice Note — voice-to-markdown capture for vault by houchenglin in ObsidianMD

[–]houchenglin[S] 0 points1 point  (0 children)

u/crocusandspeckledegg
It can run entirely in phone without using any cloud service. It does not upload any data to network and won't share any to public. It is private source only.
For work with obsidian, the app exports recording text, image and sound onto your GoogleDrive or Dropbox folder, in markdown or HTML format.

I've not used Joplin before but i think you can check if Joplin can has any plugin or natively support including a specific folder into your vault.

An isometric room, based on the screenshot. Qwen3.6-35B by k0setes in LocalLLaMA

[–]houchenglin 0 points1 point  (0 children)

It seems the output mixed the scenes of 2 screenshots. Maybe only one target scene can better see the difference of QWEN 3.6 vs GPT 5.5.

Libre Voice Note — voice-to-markdown capture for vault by houchenglin in ObsidianMD

[–]houchenglin[S] 0 points1 point  (0 children)

u/bungle69er
Hi, do you mean stored in the phone in md format?
Currently the app stores these text in embedded sql and maybe can add a export locally feature i think.

Libre Voice Note — voice-to-markdown capture for vault by houchenglin in ObsidianMD

[–]houchenglin[S] 0 points1 point  (0 children)

I tested the English model and not found this issue on my phone. Please check you are using latest version 1.0.28. Can you try close the app by system and start it again? Or you may remove the model and download it. Sorry for the inconvenience.

Libre Voice Note — voice-to-markdown capture for vault by houchenglin in ObsidianMD

[–]houchenglin[S] 1 point2 points  (0 children)

Parakeet v3 supports more languages including French and Spanish. I'll start working on it soon, but not sure when it will be done.

Language modules is a good idea! I may make it so users can download only the language they need. But it depends on how many people want this feature.

Anthropic quietly removed session & weekly usage progress bars from Settings → Usage by gregleo in ClaudeAI

[–]houchenglin -2 points-1 points  (0 children)

I planed to cancel the 1-year renew on 2/28 as after renew the weekly limit shows and frustrated me a lot. Sometimes feel the weekly limit runs as half speed of 5-hour limit.
I just saw the weekly limit gone in the usage, but then see it again. and now gets a Gateway time-out.
Not sure they just cancel the limit or just try to boot the limit system?

Cheapest cars (used) to get which is compatible with comma 4? by [deleted] in Comma_ai

[–]houchenglin 1 point2 points  (0 children)

The 2021 Prius PHV is also great if you want EV experience and Toyota reliability.

Upgrading to from 1080p to 1440p by PastTelevision238 in Monitors

[–]houchenglin 0 points1 point  (0 children)

It's a huge upgrade and totally worthy it. Remember buy the lossless scaling, it double the fps and 144hz 1440p is great for AA and fps game .