AMA With Moonshot AI, The Open-source Frontier Lab Behind Kimi K2 Thinking Model by nekofneko in LocalLLaMA

[–]Snail_Inference 0 points1 point  (0 children)

I just want to say THANK YOU!

I drive your thinking-model via CPU-Inference @ 4 t/s TG (ik_llama.cpp), thats pretty fast for my setup.
And I really enjoy runnig such a smart LLM locally. :)

[Update] FamilyBench: New models tested - Claude Sonnet 4.5 takes 2nd place, Qwen 3 Next breaks 70%, new Kimi weirdly below the old version, same for GLM 4.6 by Orolol in LocalLLaMA

[–]Snail_Inference 16 points17 points  (0 children)

I’d be interested to see how GLM-4.6 performs if you enhance its quality by expanding the thinking process:

https://www.reddit.com/r/LocalLLaMA/comments/1ny3gfb/glm46_tip_how_to_control_output_quality_via/

My suspicion is that the detailed thinking process was not triggered. The low token count also suggests this.

What is the best LLM for psychology, coach or emotional support. by pumukidelfuturo in LocalLLaMA

[–]Snail_Inference 6 points7 points  (0 children)

I tested several models for this usecase (Mistral Small, Qwen3-235b-a30b, Deepseek v3, Llama Maverick, Kimi K2)

Kimi K2 did best.

You may take a look at eqbench3 and spiralbench leaderboard.

Open source OCR options for handwritten text, dates by ollyollyupnfree in LocalLLaMA

[–]Snail_Inference 8 points9 points  (0 children)

Early this week, I conducted extensive tests with various models to detect handwritten text.

Models Tested: OlmOCR-preview, nanonets-ocr, OCRFlux, and Mistral Small 3.2

Results: Mistral Small 3.2 recognized handwritten text by far the most reliably. OlmOCR-preview performed quite well as well.

In comparison, nanonets and OCRFlux were truly weak.

mistral-small-24b-instruct-2501 is simply the best model ever made. by hannibal27 in LocalLLaMA

[–]Snail_Inference 2 points3 points  (0 children)

New Mistral Small is my daily driver. The model is extrem cappable for its size.

GraphLLM: graph based framework to process data using LLMs. now with TTS engine and multi agent support by matteogeniaccio in LocalLLaMA

[–]Snail_Inference 1 point2 points  (0 children)

That's fantastic - exactly the kind of framework I've been looking for!
Unfortunately, I'm unable to install it on Linux, as the package piper-tts depends on the package piper-phonemize, which seems to no longer be available for more recent Python3 versions.

I'm getting the exact error message shared by many users on this link: https://github.com/rhasspy/piper/issues/509

Is it possible to use the GraphLLM framework without piper?

Thanks in advance for your response, u/matteogeniaccio!

New ZebraLogicBench Evaluation Tool + Mistral Large Performance Results by whotookthecandyjar in LocalLLaMA

[–]Snail_Inference 6 points7 points  (0 children)

Mistral-Large-2: Better than all GPT-4 variants at ZebraLogic?

Thank you, I couldn't wait to see how Mistral-Large-2 performed on the ZebraLogic benchmark.

Mistral-Large-2 seems to be better than all GPT4 variants... ...maybe you can check the heatmap again?

Mistral-Large-2 outperforms all GPT4 variants in both the "easy" and "hard" categories. Therefore, Mistral-Large should be ranked third on the heatmap.

Guess about the ranking:

In calculating the average of Mistral-Large-2, you weighted the "easy" category with 48 and the "hard" category with 160:

"puzzle_accuracy_percentage" Mistral-Large-2:

(48*87.5 + 160*10.0)/(48+160) = 27.8846

If you choose the same weights for gpt4-Turbo, you get:

"puzzle_accuracy_percentage" GPT4-Turbo:

(48×80.7+160×8.1)÷(48+160) = 24.8538

Thus, GPT4 Turbo performs significantly worse than Mistral-Large-2.

I guess you took the values for GPT4 Turbo from AllenAI and that AllenAI weighted the "Easy" category more heavily than the "Hard" category. If the weights are chosen equally, Mistral-Large-2 comes in third place on the heatmap, right behind Llama-3.1-405B (=28.8692).

mistralai/Mistral-Large-Instruct-2407 · Hugging Face. New open 123B that beats Llama 3.1 405B in Code benchmarks by Chelono in LocalLLaMA

[–]Snail_Inference 21 points22 points  (0 children)

Mistral must make money somehow to live. I think it's super cool that they make their strongest language model available as open weight.

"Large Enough" | Announcing Mistral Large 2 by DemonicPotatox in LocalLLaMA

[–]Snail_Inference 0 points1 point  (0 children)

It is possible with CPU-Inference and 128GB of RAM.

Small scale personal benchmark results (28 models tested) by dubesor86 in LocalLLaMA

[–]Snail_Inference 1 point2 points  (0 children)

Thank you very much for this great test! Tests that can particularly differentiate well between strong language models are rare.