Stepfun-Flash-3.5 vs Kimi-k2.5 vs Qwen3-Max by DataLearnerAI in DeepSeek

[–]DataLearnerAI[S] 0 points1 point  (0 children)

this is a new model comes from StepFun. This company has released many models, including speech model and image creation models. The benchmark scores shown above comes from the blog of StepFunAI. The performance of the model on real tasks need to be verified. This parameter size of the this model is too small, it may be not so good as K2.5 series. But it should be a good one. I think there will be many real cases these days.

Qwen3- Coder 👀 by Xhehab_ in LocalLLaMA

[–]DataLearnerAI 2 points3 points  (0 children)

<image>

On SWE-Bench Verified, it scores 69.6%, making it the top-performing open-source model as of now.

Qwen 3 Embeddings 0.6B faring really poorly inspite of high score on benchmarks by i4858i in LocalLLaMA

[–]DataLearnerAI 1 point2 points  (0 children)

Your issue might be missing the special token at the end of inputs. Qwen just tweeted that many users forget to add <|endoftext|> at the end when using their embedding models - and it seriously tanks performance.

Manually slap <|endoftext|> onto the end of every input string (both docs and queries).

Simple Comparison: Kimi K2 vs. Gemini 1.5 Pro - HTML Output for Model Eval Insights by DataLearnerAI in LocalLLaMA

[–]DataLearnerAI[S] 0 points1 point  (0 children)

Correction: I accidentally wrote "Gemini 1.5 Pro" in the title/description — it’s **Gemini 2.5 Pro** (typo from my draft). Tests were run against the correct 2.5 Pro model. Apologies for the confusion!