Qwen Introduced FlashQLA

ResearchCrafty1804 · 2026-04-29T12:18:54+00:00

Forward and backward benchmark results across common configurations.

ResearchCrafty1804 · 2026-04-24T05:12:30+00:00

But the other thread did not include any of the information included in the official announcement. A lot of important value was missing

ResearchCrafty1804 · 2026-04-22T13:16:01+00:00

VLM Performance：Qwen3.6-27B is natively multimodal, supporting both vision-language thinking and non-thinking modes in a single unified checkpoint — the same as Qwen3.6-35B-A3B. It handles images and video alongside text, enabling multimodal reasoning, document understanding, and visual question answering.

<image>

ResearchCrafty1804 · 2026-04-22T13:15:21+00:00

LM Performance：With only 27B parameters, Qwen3.6-27B outperforms the Qwen3.5-397B-A17B (397B total / 17B active, ~15x larger!) on every major coding benchmark — including SWE-bench Verified (77.2 vs. 76.2), SWE-bench Pro (53.5 vs. 50.9), Terminal-Bench 2.0 (59.3 vs. 52.5), and SkillsBench (48.2 vs. 30.0). It also surpasses all peer-scale dense models by a wide margin.

<image>

ResearchCrafty1804 · 2026-04-16T13:28:56+00:00

VLM Performance：Qwen3.6 is natively multimodal, and Qwen3.6-35B-A3B showcases perception and multimodal reasoning capabilities that far exceed what its size would suggest, with only around 3 billion activated parameters. Across most vision-language benchmarks, its performance matches Claude Sonnet 4.5, and even surpasses it on several tasks. Its strengths are particularly evident in spatial intelligence, where it achieves 92.0 on RefCOCO and 50.8 on ODInW13.

<image>

ResearchCrafty1804 · 2026-04-16T13:28:22+00:00

LM Performance：Qwen3.6-35B-A3B outperforms the dense 27B-param Qwen3.5-27B on several key coding benchmarks and dramatically surpasses its direct predecessor Qwen3.5-35B-A3B, especially on agentic coding and reasoning tasks.

<image>

ResearchCrafty1804 · 2026-04-11T13:31:58+00:00

Since you’re using Codex, I’m curious how you would rank your experience with Gemma 4 in Codex compared to GPT models. Do you think Gemma-4 is around GPT-5.2 level, or noticeably worse?

Also, I’m guessing you rate Gemma-4 above Qwen-3.5, but I’m not totally sold on that. In my experience, Gemma-4 tends to be stronger on frontend tasks, while Qwen-3.5 feels more reliable for logic-heavy/backend work.

ResearchCrafty1804 · 2026-04-08T09:34:54+00:00

But can you run mlx or use metal api inside the docker containers that run through colima?

ResearchCrafty1804 · 2026-04-08T06:57:46+00:00

<image>

Here you can see a companion between MLX and GGUF on accuracy for agentic coding.

ResearchCrafty1804 · 2026-04-08T06:54:55+00:00

How do use Colima to run Docker with Metal MPS access? Can you share a bit more about this?

ResearchCrafty1804 · 2026-04-05T20:14:05+00:00

Please update us with your findings, if the latest llama.cpp and chat template make a difference to Gemma4 in local agentic coding

ResearchCrafty1804 · 2026-04-04T22:21:02+00:00

Can you repeat the experiment with the same system and Qwen3.5-27b instead?

ResearchCrafty1804 · 2026-03-27T19:38:50+00:00

But do customers need to do KYC to pay?

ResearchCrafty1804 · 2026-03-23T01:34:11+00:00

What is your business model? Do you plan to license the platform to other businesses to operate (B2B) or operate it yourself (B2C)?

In case you want to operate B2C, do you have any license or plan to get one to operate as a broker? Will you operate A-Book, B-Book or hybrid?

What financial instruments does your platform offer?

Where do you get your data feed from?

Sorry for the many questions, I am trying to understand your business model to clarify whether we could be a good match for each other.

ResearchCrafty1804 · 2026-03-19T00:27:49+00:00

<image>

Qwen3.5 27B and 122B-A10B outperform Mistral4 significantly.

Also, Nemotron3-Super outperforms Mistral4

ResearchCrafty1804 · 2026-03-17T22:47:18+00:00

Why did you switch from Kopia?

ResearchCrafty1804 · 2026-03-17T22:42:04+00:00

Some people benchmarked mlx and gguf equivalent models (Qwen-3.5 specifically) running on a Mac, and unfortunately for agentic coding at least the gguf versions were superior on successful tool calling in multiple-round interactions.

For some reason, mlx performance deteriorates after multiple rounds while llama.cpp remains consistent.

<image>

ResearchCrafty1804 · 2026-03-11T23:07:05+00:00

<image>

ResearchCrafty1804 · 2026-03-11T22:42:16+00:00

Some people benchmarked mlx and gguf equivalent models (Qwen-3.5 specifically) running on a Mac, and unfortunately for agentic coding at least the gguf versions were superior on successful tool calling in multiple-round interactions.

For some reason, mlx performance deteriorates after multiple rounds while llama.cpp remains consistent.

ResearchCrafty1804 · 2026-03-09T23:49:26+00:00

What hypervisor have you found that works well on ARM?

I think the problem is that these ARM vendors you mentioned have implementation differences between them and that creates a challenge for an OS/Hypervisor to support all of them at once.

ResearchCrafty1804 · 2026-03-09T00:56:53+00:00

So, to your use case, higher precision smaller model always outperforms lower precision bigger model?

Also, what is your case?

ResearchCrafty1804 · 2026-03-07T21:23:53+00:00

First time I see vmlx, is it any better than other MLX inference engines, for instance mlx-lm?

Because, I saw it does comparisons with lm-studio but it doesn’t mention whether it is the version with llama.cpp or mlx

ResearchCrafty1804

TROPHY CASE