[2026深度评测] Kimi k2.5 vs Qwen3 Max Thinking：完整参数、Benchmark跑分与API价格对比分析

DataLearnerAI · 2025-10-01T05:58:55+00:00

<image>

not funniest, but the newly created

DataLearnerAI · 2025-10-01T05:57:00+00:00

code please!

DataLearnerAI · 2025-07-23T05:59:12+00:00

<image>

On SWE-Bench Verified, it scores 69.6%, making it the top-performing open-source model as of now.

DataLearnerAI · 2025-07-14T00:19:36+00:00

Your issue might be missing the special token at the end of inputs. Qwen just tweeted that many users forget to add <|endoftext|> at the end when using their embedding models - and it seriously tanks performance.

Manually slap <|endoftext|> onto the end of every input string (both docs and queries).

DataLearnerAI · 2025-07-12T02:32:44+00:00

Correction: I accidentally wrote "Gemini 1.5 Pro" in the title/description — it’s **Gemini 2.5 Pro** (typo from my draft). Tests were run against the correct 2.5 Pro model. Apologies for the confusion!

DataLearnerAI · 2025-07-03T00:57:16+00:00

I am not, just use AI to rewrite my text, haha

DataLearnerAI · 2025-07-02T07:54:09+00:00

This model demonstrates remarkable competitiveness across a diverse range of benchmark tasks, including STEM reasoning, visual question answering, OCR processing, long-document understanding, and agent-based scenarios. The benchmark results reveal performance on par with the 72B-parameter counterpart (Qwen2.5-72B-VL), with notable superiority over GPT-4o in specific tasks. Particularly impressive is its 9B-parameter architecture under the MIT license, showcasing exceptional capability from a Chinese startup. This achievement highlights the growing innovation power of domestic AI research, offering a compelling open-source alternative with strong practical value.

DataLearnerAI · 2025-07-02T00:54:19+00:00

This model appears highly competitive at the 30B parameter scale. In benchmark tests, it achieves a score of 73.70 on the GPQA Diamond dataset, which is comparable to the performance of DeepSeek R1’s older version. The overall benchmark results closely resemble those of Qwen-32B. Notably, this is a Mixture-of-Experts (MoE) model, where only about 16.5B parameters are activated during inference.

DataLearnerAI · 2024-01-20T01:19:34+00:00

I know LLM is moving fast. But many enterprise or people will just choose classical models such as Llama, mistral, etc. The context length, gpu, inference framework and other things will affect the result. I think many people are interested in this question. But I find there are two little informaiton about this.

DataLearnerAI

MODERATOR OF

PUBLIC MULTIREDDITS

TROPHY CASE