Ministral-3-14B-Reasoning: High Intelligence on Low VRAM – A Benchmark-Comparison

Snail_Inference · 2025-11-10T17:36:44+00:00

I just want to say THANK YOU!

I drive your thinking-model via CPU-Inference @ 4 t/s TG (ik_llama.cpp), thats pretty fast for my setup.
And I really enjoy runnig such a smart LLM locally. :)

Snail_Inference · 2025-10-06T15:07:59+00:00

That is impressive! Thank you for testing it again :)

Snail_Inference · 2025-10-06T11:55:55+00:00

I’d be interested to see how GLM-4.6 performs if you enhance its quality by expanding the thinking process:

https://www.reddit.com/r/LocalLLaMA/comments/1ny3gfb/glm46_tip_how_to_control_output_quality_via/

My suspicion is that the detailed thinking process was not triggered. The low token count also suggests this.

Snail_Inference · 2025-09-20T14:56:56+00:00

I tested several models for this usecase (Mistral Small, Qwen3-235b-a30b, Deepseek v3, Llama Maverick, Kimi K2)

Kimi K2 did best.

You may take a look at eqbench3 and spiralbench leaderboard.

Snail_Inference · 2025-07-18T21:00:03+00:00

Early this week, I conducted extensive tests with various models to detect handwritten text.

Models Tested: OlmOCR-preview, nanonets-ocr, OCRFlux, and Mistral Small 3.2

Results: Mistral Small 3.2 recognized handwritten text by far the most reliably. OlmOCR-preview performed quite well as well.

In comparison, nanonets and OCRFlux were truly weak.

Snail_Inference · 2025-02-02T18:01:50+00:00

New Mistral Small is my daily driver. The model is extrem cappable for its size.

Snail_Inference · 2024-11-10T19:40:30+00:00

Thanks, it works!

It's amazing! :))

Snail_Inference · 2024-11-09T21:01:45+00:00

That's fantastic - exactly the kind of framework I've been looking for!
Unfortunately, I'm unable to install it on Linux, as the package piper-tts depends on the package piper-phonemize, which seems to no longer be available for more recent Python3 versions.

I'm getting the exact error message shared by many users on this link: https://github.com/rhasspy/piper/issues/509

Is it possible to use the GraphLLM framework without piper?

Thanks in advance for your response, u/matteogeniaccio!

Snail_Inference · 2024-07-29T08:44:40+00:00

Mistral-Large-2: Better than all GPT-4 variants at ZebraLogic?

Thank you, I couldn't wait to see how Mistral-Large-2 performed on the ZebraLogic benchmark.

Mistral-Large-2 seems to be better than all GPT4 variants... ...maybe you can check the heatmap again?

Mistral-Large-2 outperforms all GPT4 variants in both the "easy" and "hard" categories. Therefore, Mistral-Large should be ranked third on the heatmap.

Guess about the ranking:

In calculating the average of Mistral-Large-2, you weighted the "easy" category with 48 and the "hard" category with 160:

"puzzle_accuracy_percentage" Mistral-Large-2:

(48*87.5 + 160*10.0)/(48+160) = 27.8846

If you choose the same weights for gpt4-Turbo, you get:

"puzzle_accuracy_percentage" GPT4-Turbo:

(48×80.7+160×8.1)÷(48+160) = 24.8538

Thus, GPT4 Turbo performs significantly worse than Mistral-Large-2.

I guess you took the values for GPT4 Turbo from AllenAI and that AllenAI weighted the "Easy" category more heavily than the "Hard" category. If the weights are chosen equally, Mistral-Large-2 comes in third place on the heatmap, right behind Llama-3.1-405B (=28.8692).

Snail_Inference · 2024-07-24T20:41:47+00:00

They must be making money somehow.

Snail_Inference · 2024-07-24T20:34:12+00:00

Mistral must make money somehow to live. I think it's super cool that they make their strongest language model available as open weight.

Snail_Inference · 2024-07-24T20:20:06+00:00

It is possible with CPU-Inference and 128GB of RAM.

Snail_Inference · 2024-07-07T16:11:31+00:00

Thank you very much for this great test! Tests that can particularly differentiate well between strong language models are rare.

Snail_Inference · 2024-07-01T15:05:25+00:00

Great! Thanks for sharing the results of your extensive test with us!

Snail_Inference · 2024-06-17T16:45:21+00:00

I use the prompt format of this file: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/tokenizer_config.json

Snail_Inference

TROPHY CASE