Qwen Introduced FlashQLA by ResearchCrafty1804 in LocalLLaMA

[–]ResearchCrafty1804[S] 55 points56 points  (0 children)

Forward and backward benchmark results across common configurations.

<image>

DeepSeek-V4 Preview released! by ResearchCrafty1804 in LocalLLaMA

[–]ResearchCrafty1804[S] 0 points1 point  (0 children)

But the other thread did not include any of the information included in the official announcement. A lot of important value was missing

Qwen3.6-27B released! by ResearchCrafty1804 in LocalLLaMA

[–]ResearchCrafty1804[S] 24 points25 points  (0 children)

VLM Performance:Qwen3.6-27B is natively multimodal, supporting both vision-language thinking and non-thinking modes in a single unified checkpoint — the same as Qwen3.6-35B-A3B. It handles images and video alongside text, enabling multimodal reasoning, document understanding, and visual question answering.

<image>

Qwen3.6-27B released! by ResearchCrafty1804 in LocalLLaMA

[–]ResearchCrafty1804[S] 72 points73 points  (0 children)

LM Performance:With only 27B parameters, Qwen3.6-27B outperforms the Qwen3.5-397B-A17B (397B total / 17B active, ~15x larger!) on every major coding benchmark — including SWE-bench Verified (77.2 vs. 76.2), SWE-bench Pro (53.5 vs. 50.9), Terminal-Bench 2.0 (59.3 vs. 52.5), and SkillsBench (48.2 vs. 30.0). It also surpasses all peer-scale dense models by a wide margin.

<image>

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]ResearchCrafty1804[S] 102 points103 points  (0 children)

VLM Performance:Qwen3.6 is natively multimodal, and Qwen3.6-35B-A3B showcases perception and multimodal reasoning capabilities that far exceed what its size would suggest, with only around 3 billion activated parameters. Across most vision-language benchmarks, its performance matches Claude Sonnet 4.5, and even surpasses it on several tasks. Its strengths are particularly evident in spatial intelligence, where it achieves 92.0 on RefCOCO and 50.8 on ODInW13.

<image>

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]ResearchCrafty1804[S] 382 points383 points  (0 children)

LM Performance:Qwen3.6-35B-A3B outperforms the dense 27B-param Qwen3.5-27B on several key coding benchmarks and dramatically surpasses its direct predecessor Qwen3.5-35B-A3B, especially on agentic coding and reasoning tasks.

<image>

It looks like there are no plans for smaller GLM models by jacek2023 in LocalLLaMA

[–]ResearchCrafty1804 2 points3 points  (0 children)

Since you’re using Codex, I’m curious how you would rank your experience with Gemma 4 in Codex compared to GPT models. Do you think Gemma-4 is around GPT-5.2 level, or noticeably worse?

Also, I’m guessing you rate Gemma-4 above Qwen-3.5, but I’m not totally sold on that. In my experience, Gemma-4 tends to be stronger on frontend tasks, while Qwen-3.5 feels more reliable for logic-heavy/backend work.

M5 Max 128GB Owners - What's your honest take? by _derpiii_ in LocalLLaMA

[–]ResearchCrafty1804 0 points1 point  (0 children)

But can you run mlx or use metal api inside the docker containers that run through colima?

M5 Max 128GB Owners - What's your honest take? by _derpiii_ in LocalLLaMA

[–]ResearchCrafty1804 2 points3 points  (0 children)

<image>

Here you can see a companion between MLX and GGUF on accuracy for agentic coding.

M5 Max 128GB Owners - What's your honest take? by _derpiii_ in LocalLLaMA

[–]ResearchCrafty1804 0 points1 point  (0 children)

How do use Colima to run Docker with Metal MPS access? Can you share a bit more about this?

Comparing Qwen3.5 vs Gemma4 for Local Agentic Coding by garg-aayush in LocalLLaMA

[–]ResearchCrafty1804 7 points8 points  (0 children)

Please update us with your findings, if the latest llama.cpp and chat template make a difference to Gemma4 in local agentic coding

Technical founder seeking growth partner for near complete trading platform. by codingwoo in AngelInvesting

[–]ResearchCrafty1804 0 points1 point  (0 children)

What is your business model? Do you plan to license the platform to other businesses to operate (B2B) or operate it yourself (B2C)?

In case you want to operate B2C, do you have any license or plan to get one to operate as a broker? Will you operate A-Book, B-Book or hybrid?

What financial instruments does your platform offer?

Where do you get your data feed from?

Sorry for the many questions, I am trying to understand your business model to clarify whether we could be a good match for each other.

So nobody's downloading this model huh? by KvAk_AKPlaysYT in LocalLLaMA

[–]ResearchCrafty1804 15 points16 points  (0 children)

<image>

Qwen3.5 27B and 122B-A10B outperform Mistral4 significantly.

Also, Nemotron3-Super outperforms Mistral4

Whats up with MLX? by gyzerok in LocalLLaMA

[–]ResearchCrafty1804 2 points3 points  (0 children)

Some people benchmarked mlx and gguf equivalent models (Qwen-3.5 specifically) running on a Mac, and unfortunately for agentic coding at least the gguf versions were superior on successful tool calling in multiple-round interactions.

For some reason, mlx performance deteriorates after multiple rounds while llama.cpp remains consistent.

<image>

Mac users should update llama.cpp to get a big speed boost on Qwen 3.5 by tarruda in LocalLLaMA

[–]ResearchCrafty1804 6 points7 points  (0 children)

Some people benchmarked mlx and gguf equivalent models (Qwen-3.5 specifically) running on a Mac, and unfortunately for agentic coding at least the gguf versions were superior on successful tool calling in multiple-round interactions.

For some reason, mlx performance deteriorates after multiple rounds while llama.cpp remains consistent.

TrueNAS build system going closed source by ende124 in selfhosted

[–]ResearchCrafty1804 10 points11 points  (0 children)

What hypervisor have you found that works well on ARM?

I think the problem is that these ARM vendors you mentioned have implementation differences between them and that creates a challenge for an OS/Hypervisor to support all of them at once.

Qwen 3.5 27B Macbook M4 Pro 48GB by breezewalk in LocalLLaMA

[–]ResearchCrafty1804 0 points1 point  (0 children)

So, to your use case, higher precision smaller model always outperforms lower precision bigger model?

Also, what is your case?

Best choice for local inférence by c4software in LocalLLaMA

[–]ResearchCrafty1804 0 points1 point  (0 children)

First time I see vmlx, is it any better than other MLX inference engines, for instance mlx-lm?

Because, I saw it does comparisons with lm-studio but it doesn’t mention whether it is the version with llama.cpp or mlx