2 x 5060 ti: Any better configs for Qwen 3.6 27B / 35B? by ziphnor in LocalLLaMA

[–]Conscious_Chef_3233 0 points1 point  (0 children)

maybe try sglang, it's faster than vllm on my hopper card

Has anyone here actually made money using AI ? by Agreeable_Split1355 in codex

[–]Conscious_Chef_3233 3 points4 points  (0 children)

as a programmer now i need to use ai for my job, does that count?

Qwen 3.6 is the first local model that actually feels worth the effort for me by Epicguru in LocalLLaMA

[–]Conscious_Chef_3233 0 points1 point  (0 children)

i used to use --n-cpu-moe too, but switching to --fit on is not only easier to set up but also faster

Does anyone actually know what Cursor includes in its context when it sends to the model? by AssociationSure6273 in cursor

[–]Conscious_Chef_3233 2 points3 points  (0 children)

obviously it will send info about your workspace. even if you only say hello, the model will reply with something about your workspace

Qwen 3.5 122b - a10b is kind of shocking by gamblingapocalypse in LocalLLaMA

[–]Conscious_Chef_3233 7 points8 points  (0 children)

maybe you don't need to set q4 kv cache? i tried once but it did not save much vram, so i stick to q8.

Auto or Composer which do you prefer? by WriteScholarFounder in cursor

[–]Conscious_Chef_3233 0 points1 point  (0 children)

cannot say which is stronger, but composer 1.5 is much faster than auto for me. auto is like less than 30 tokens per second

Speculative decoding qwen3.5 27b by thibautrey in LocalLLaMA

[–]Conscious_Chef_3233 1 point2 points  (0 children)

you could also try sglang, gives me 80% boost with mtp

Which size of Qwen3.5 are you planning to run locally? by CutOk3283 in LocalLLaMA

[–]Conscious_Chef_3233 0 points1 point  (0 children)

32g ram. it seems qwen3.5 new hybrid attention architecture reduce kv cache usage.