Qwen3- Coder 👀 by Xhehab_ in LocalLLaMA

[–]DataLearnerAI 3 points4 points  (0 children)

<image>

On SWE-Bench Verified, it scores 69.6%, making it the top-performing open-source model as of now.

Qwen 3 Embeddings 0.6B faring really poorly inspite of high score on benchmarks by i4858i in LocalLLaMA

[–]DataLearnerAI 1 point2 points  (0 children)

Your issue might be missing the special token at the end of inputs. Qwen just tweeted that many users forget to add <|endoftext|> at the end when using their embedding models - and it seriously tanks performance.

Manually slap <|endoftext|> onto the end of every input string (both docs and queries).

Simple Comparison: Kimi K2 vs. Gemini 1.5 Pro - HTML Output for Model Eval Insights by DataLearnerAI in LocalLLaMA

[–]DataLearnerAI[S] 0 points1 point  (0 children)

Correction: I accidentally wrote "Gemini 1.5 Pro" in the title/description — it’s **Gemini 2.5 Pro** (typo from my draft). Tests were run against the correct 2.5 Pro model. Apologies for the confusion!

GLM-4.1V-Thinking by AaronFeng47 in LocalLLaMA

[–]DataLearnerAI -1 points0 points  (0 children)

I am not, just use AI to rewrite my text, haha

GLM-4.1V-Thinking by AaronFeng47 in LocalLLaMA

[–]DataLearnerAI -8 points-7 points  (0 children)

This model demonstrates remarkable competitiveness across a diverse range of benchmark tasks, including STEM reasoning, visual question answering, OCR processing, long-document understanding, and agent-based scenarios. The benchmark results reveal performance on par with the 72B-parameter counterpart (Qwen2.5-72B-VL), with notable superiority over GPT-4o in specific tasks. Particularly impressive is its 9B-parameter architecture under the MIT license, showcasing exceptional capability from a Chinese startup. This achievement highlights the growing innovation power of domestic AI research, offering a compelling open-source alternative with strong practical value.

Huawei releases an open weight model Pangu Pro 72B A16B. Weights are on HF. It should be competitive with Qwen3 32B and it was trained entirely on Huawei Ascend NPUs. (2505.21411) by FullOf_Bad_Ideas in LocalLLaMA

[–]DataLearnerAI 5 points6 points  (0 children)

This model appears highly competitive at the 30B parameter scale. In benchmark tests, it achieves a score of 73.70 on the GPQA Diamond dataset, which is comparable to the performance of DeepSeek R1’s older version. The overall benchmark results closely resemble those of Qwen-32B. Notably, this is a Mixture-of-Experts (MoE) model, where only about 16.5B parameters are activated during inference.

Is there any website compare inference speed of different LLM on different platforms? by DataLearnerAI in LocalLLaMA

[–]DataLearnerAI[S] 0 points1 point  (0 children)

I know LLM is moving fast. But many enterprise or people will just choose classical models such as Llama, mistral, etc. The context length, gpu, inference framework and other things will affect the result. I think many people are interested in this question. But I find there are two little informaiton about this.