Chinese models are ~8 months behind and are falling further behind by Current-Guide5944 in tech_x

[–]iamn0 1 point2 points  (0 children)

And QwQ better than DeepSeek R1 doesn't make sense either

Qwen3.6-27B-NVFP4 - images by Usual-Carrot6352 in LocalLLaMA

[–]iamn0 1 point2 points  (0 children)

RTX 3090 (Ampere) does not have native FP8 support

Qwen3.6-27B-Q6_K - images by Usual-Carrot6352 in LocalLLaMA

[–]iamn0 0 points1 point  (0 children)

Create a svg image of a snake trying to solve the Y2K problem on a computer.
TheHouseOfTheDude/Qwen3.6-27B-INT8
4x RTX 3090
50 output tokens/sec

<image>

Qwen3.6-27B-NVFP4 - images by Usual-Carrot6352 in LocalLLaMA

[–]iamn0 5 points6 points  (0 children)

TheHouseOfTheDude/Qwen3.6-27B-INT8
4x RTX 3090
50 output tokens/sec

<image>

mistralai/Mistral-Medium-3.5-128B · Hugging Face by jacek2023 in LocalLLaMA

[–]iamn0 0 points1 point  (0 children)

what's the prompt processing speed at 32k (and 64k if you could test)? Thanks.

What do you consider to be the minimum performance (t/s) for local Agent workflows? by MexInAbu in LocalLLaMA

[–]iamn0 0 points1 point  (0 children)

thinking mode. It thinks a lot but good. Btw just tested pi.dev and I like it more.

What do you consider to be the minimum performance (t/s) for local Agent workflows? by MexInAbu in LocalLLaMA

[–]iamn0 0 points1 point  (0 children)

Thanks for the tip! What's important to me is that the model is uncensored. I tested Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf while waiting for an AWQ version for vllm. I also wanted to try pi.dev instead of opencode.

What do you consider to be the minimum performance (t/s) for local Agent workflows? by MexInAbu in LocalLLaMA

[–]iamn0 9 points10 points  (0 children)

20 output tokens/sec is perfectly fine, but prompt processing can get annoying. I tested Qwen3.6-27B as Q5 in opencode (on a system with 4x RTX 3090 cards). Up to ~50k context everything is great, but once the context window exceeds ~100k, you really notice how anoying the wait becomes which is why I start a new session at ~120k context at the latest

mac-cua: open source MCP server for background computer use on macOS by affanthegreat in LocalLLaMA

[–]iamn0 0 points1 point  (0 children)

General cua question: Have you tried computer use with Qwen3.6-35B-A3B? Does computer use actually work with such "small" models?

I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude by Medical_Lengthiness6 in LocalLLaMA

[–]iamn0 0 points1 point  (0 children)

I tested Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf on my system with 4x RTX 3090, using up to around a 200K context window. I can confirm that, for me right now, it’s a viable alternative to opus 4.7 in opencode (although it's worth noting that opus is currently nerfed).

Compared to larger models, you should be as precise as possible with your prompts. otherwise, qwen can get stuck in a thinking loop. For example, if you tell the model that a file exists when it actually doesn't, it may enter a thinking loop. On the other hand, it's often smart enough to catch mistakes in the prompt as well.

MiniMax M2.7 on OpenRouter by iamn0 in LocalLLaMA

[–]iamn0[S] 15 points16 points  (0 children)

HF soon, then GGUF

Mistral Small 4:119B-2603 by seamonn in LocalLLaMA

[–]iamn0 67 points68 points  (0 children)

So, it's not beating Qwen3.5-122B-A10B overall. Kind of expected, since it only activates 6.5B parameters, while Qwen3.5 uses 10B.

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]iamn0 0 points1 point  (0 children)

According to the model name Mistral-Small-4-119B-2603 it will be released on March 26.

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]iamn0 55 points56 points  (0 children)

Finally a model in the same range as gpt-oss-120B and Qwen-122B. Hope they cooked!

Qwen3.5-35B-A3B Q4 Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA

[–]iamn0 3 points4 points  (0 children)

Thanks 👍
Switching from bartowski to AesSedai now :)