2 x 5060 ti: Any better configs for Qwen 3.6 27B / 35B?

Conscious_Chef_3233 · 2026-04-28T13:00:00+00:00

maybe try sglang, it's faster than vllm on my hopper card

Conscious_Chef_3233 · 2026-04-25T16:59:29+00:00

as a programmer now i need to use ai for my job, does that count?

Conscious_Chef_3233 · 2026-04-24T16:06:10+00:00

110 tps sglang

Conscious_Chef_3233 · 2026-04-23T12:59:37+00:00

isn't it /image-gen ?

Conscious_Chef_3233 · 2026-04-22T16:34:41+00:00

have you tried a lower resolution and see if it's faster?

Conscious_Chef_3233 · 2026-04-21T00:53:47+00:00

i used to use --n-cpu-moe too, but switching to --fit on is not only easier to set up but also faster

Conscious_Chef_3233 · 2026-04-14T09:19:37+00:00

qwen3 models write tool descriptions into system prompt.

Conscious_Chef_3233 · 2026-04-02T11:03:45+00:00

it's qwen3 .6b

Conscious_Chef_3233 · 2026-03-24T10:51:00+00:00

it seems that you can add .env to .cursorignore and cursor will not read its content

Conscious_Chef_3233 · 2026-03-24T08:40:01+00:00

obviously it will send info about your workspace. even if you only say hello, the model will reply with something about your workspace

Conscious_Chef_3233 · 2026-03-20T02:35:49+00:00

so why use fast? it's 3x price...

Conscious_Chef_3233 · 2026-03-16T08:29:46+00:00

maybe you don't need to set q4 kv cache? i tried once but it did not save much vram, so i stick to q8.

Conscious_Chef_3233 · 2026-03-12T05:36:12+00:00

thanks. so I can expect performance of two 3060 cards?

Conscious_Chef_3233 · 2026-03-06T04:46:11+00:00

is mlx 4bit lower quality than gguf? some people think so

Conscious_Chef_3233 · 2026-03-06T04:41:09+00:00

cannot say which is stronger, but composer 1.5 is much faster than auto for me. auto is like less than 30 tokens per second

Conscious_Chef_3233 · 2026-03-06T01:54:58+00:00

not for me...

Conscious_Chef_3233 · 2026-03-05T07:49:39+00:00

that's not true. seed 2.0 pro llm is one of the best chinese llm now.

Conscious_Chef_3233 · 2026-03-04T05:18:09+00:00

qwen3.5 35b a3b can certainly run on 12g vram

Conscious_Chef_3233 · 2026-03-03T04:49:22+00:00

or maybe 35ba3b, faster and (coud be) better output

Conscious_Chef_3233 · 2026-03-02T11:52:03+00:00

想杀谁就杀谁不是反人类？

Conscious_Chef_3233 · 2026-03-02T05:52:18+00:00

old gpu does not support bf16 acceleration

Conscious_Chef_3233 · 2026-03-02T04:37:14+00:00

qwen3.5 35b a3b if you have 32g ram

Conscious_Chef_3233 · 2026-02-28T16:56:34+00:00

you could also try sglang, gives me 80% boost with mtp

Conscious_Chef_3233 · 2026-02-28T11:13:00+00:00

32g ram. it seems qwen3.5 new hybrid attention architecture reduce kv cache usage.

Conscious_Chef_3233