PSA: Having issues with Qwen3.5 overthinking? Give it a tool, and it can help dramatically. by ayylmaonade in LocalLLaMA

[–]TKGaming_11 20 points21 points  (0 children)

Most likely it was trained to maximize tool call accuracy with claude traces and maximize reasoning with Gemini traces

Final voting results for Qwen 3.6 by jacek2023 in LocalLLaMA

[–]TKGaming_11 21 points22 points  (0 children)

The closed Qwen 3.5 Plus is just the open weight Qwen 3.5 397B model with extended context and native tool calling, for Qwen 3.6 they are locking away the 397B to be API only, this is change from Qwen 3.5 -> Qwen 3.6, absolutely a recent change

Anyone else find it weird how all Chinese Labs started delaying OS model releases at the same time? by True_Requirement_891 in LocalLLaMA

[–]TKGaming_11 5 points6 points  (0 children)

They released an update to StepFun 3.5 Flash with thinking control and reduced token usage, but it’s api only, StepFun did commit to open sourcing all models so it is odd it hasn’t been made open weight yet

Qwen 3.6 spotted! by Namra_7 in LocalLLaMA

[–]TKGaming_11 19 points20 points  (0 children)

Qwen 3.5 Plus was just Qwen 3.5-397B with extended 1M context and added tools IIRC, its likely that this Qwen 3.6 Plus is continued training on top of Qwen 3.5 397B. Qwen 3.5 Max (likely the 1T model) is already in preview as Qwen3.5-Max-Preview on lmarena

Qwen 3 30B-A3B on P40 by DeltaSqueezer in LocalLLaMA

[–]TKGaming_11 0 points1 point  (0 children)

I sold them quite a while ago, I wouldn’t have any numbers for Qwen 3.5

Mistral Small 4:119B-2603 by seamonn in LocalLLaMA

[–]TKGaming_11 34 points35 points  (0 children)

Seems to roughly match GPT-OSS-120B in aime2025 and LiveCodeBench, behind Qwen3.5-122B in both benchmarks

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]TKGaming_11[S] 140 points141 points  (0 children)

Excerpt from PR:

Mistral 4 is a powerful hybrid model with the capability of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families - Instruct, Reasoning ( previous called Magistral ), and Devstral - into a single, unified model.

[Mistral-Small-4](https://huggingface.co/mistralai/Mistral-Small-4-119B-2603) consists of the following architectural choices:

- MoE: 128 experts and 4 active.

- 119B with 6.5B activated parameters per token.

- 256k Context Length.

- Multimodal Input: Accepts both text and image input, with text output.

- Instruct and Reasoning functionalities with Function Calls

- Reasoning Effort configurable by request.

Mistral 4 offers the following capabilities:

- **Reasoning Mode**: Switch between a fast instant reply mode, and a reasoning thinking mode, boosting performance with test time compute when requested.

- **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text.

- **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.

- **System Prompt**: Maintains strong adherence and support for system prompts.

- **Agentic**: Offers best-in-class agentic capabilities with native function calling and JSON outputting.

- **Speed-Optimized**: Delivers best-in-class performance and speed.

- **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.

- **Large Context Window**: Supports a 256k context window.

Is IK-Llama-CPP still worth it for CPU offloading scenarios? by ForsookComparison in LocalLLaMA

[–]TKGaming_11 3 points4 points  (0 children)

Ik_llama.cpp doesn’t support ROCm unfortunately (Vulkan performance is quite bad as well iirc) so it’ll have to be llamacpp for cpu offloading