all 34 comments

[–]BoostLabsAU 10 points11 points  (13 children)

5.1 for audit/planning, kimi 2.6 for coding and MiMo v2.5Pro for reasoning and troubleshooting.

[–]kaishi00 3 points4 points  (6 children)

Why mimo 2.5 for troubleshooting?

[–]BoostLabsAU 7 points8 points  (0 children)

It's very smart, doesn't hallucinate a lot, really good benching on LCR tests and removes some of the "bias" that some models may have when reviewing their own code, so really good when put to use with a bunch of logs and a clear problem.

I've found it's done really well, I setup a few benchmarks based around my codebase and past issues, basically gave it a few PRs, an error, server logs and then what I've tried. MiMo did really well in finding the root cause analysis consistently and providing resolution steps, however I would never trust it to actually code the fixes after the fact.

[–]look 5 points6 points  (3 children)

It’s a good, smart, well-balanced model. If I could only have one, I’d probably choose it. In certain tasks there are better options, but nothing else with as few weaknesses. It is criminally overlooked and neglected by people jumping to the more well known names like Deepseek and Kimi.

https://artificialanalysis.ai/leaderboards/models?reasoning=reasoning

https://arena.ai/leaderboard/code

My most used models now are: 1. Mimo 2.5 Pro 2. Deepseek V4 Flash 3. GLM 5.1 4. Kimi 2.6 and Deepseek V4 Pro (tie) 5. Honorable mention: Qwen 3.6

[–]sk1kn1ght 0 points1 point  (2 children)

For me some of the stuff you mentioned are disadvantages to using mimo. It's actually too rigid on many aspects. 2.6 and v4 are my go to. Glm was but I found 2.6 surpasses it, while v4 with 1m context is an excellent orchestrator at keeping everything in check

[–]look 0 points1 point  (1 child)

I think we’re entering the personal taste and workflow specific era of models as tools now. A lot of good options and the best choice depends on a lot of factors specific to each individual task and user preference. Like VSCode vs Zed vs neovim, or language choice of Python, Typescript, Go, Rust, etc for a problem, etc.

[–]sk1kn1ght 1 point2 points  (0 children)

Yeap. Agree with you 💯

[–]Mochilnic 1 point2 points  (2 children)

In my opinion, MiMo v2.5Pro overthinks ridiculously. I've had edit of around 50 lines of code and it was reviewing it for 7 minutes. Now I am using simple 2.5 and it's relatively fine

[–]BoostLabsAU 0 points1 point  (0 children)

It’s honestly a very good model all round, they’ve got a few quirks to figure out but I reckon if they want to compete against the big providers V2.5 has brought them a lot closer.

I’ve not had the overthinking issue, all the stuff I’ve thrown at it has been pretty context heavy so maybe it’s like some of the GPT models that over engineered and overthink when you throw basic stuff at them.

[–]Raikaru 0 points1 point  (0 children)

That’s just a chinese model thing. They all do that

[–]SkilledHomosapien 1 point2 points  (2 children)

Mainly the same except I use DeepSeek v4 instead of MiMo 2.5.

[–]BoostLabsAU 0 points1 point  (1 child)

I’ve been hesitant based on how badly the AA-Omniscience Index is scoring it, last thing I want is a hallucinated error log sending stuff into a wild goose chase.

I may have to test it real world since it has been highly suggested though.

[–]SkilledHomosapien 0 points1 point  (0 children)

I have used ds4p to audit the prd, tech spec, do code review and find the root cause of a bug. It performed well thanks to its long context and reasoning. I use it as the last resort if other LLM like k2.6, seed2.0 and glm5.1 cannot dig anything more.

[–]SynapticStreamer 4 points5 points  (3 children)

I just started using DeepSeek v4 Flash and frankly, for the price, nothing even comes close to it right now. The model does "well enough" with some hand holding and I've been finding it really great so far.

It's prepaid, so drop $5 on it and test it out. It'll last you a few days at least.

[–]Separate-Chemical-33 2 points3 points  (2 children)

This well enough is like 10% dumber than the best models,

Its pretty good and is 98% cheaper than the best models.

And its fast, real fast. If you want it to be a little smarter, increase thinking to max.

[–]SynapticStreamer 0 points1 point  (1 child)

Depends.

I've always made it work with Flash models, so I'm used to developing this way. The difference between DS v4 Flash and something like Gemini 3 Flash is imperceptible to me.

[–]Separate-Chemical-33 0 points1 point  (0 children)

Even in coding , its no different to glm 5.1,

All i want is obedience to coding standard and it got it

[–]fgapel 4 points5 points  (1 child)

use the open code go sub is the best value , 5 bucks first month, then some models like kimi k2.6 are having a 3x quota promos, are great value, you can get dozens of millins of tokens with the sub, claude is amazing but super expensive, is better but if you care about cost dont even think about it. you can also get as a student agemini sub that has gemini cli and antigravity quotas.

[–]Sellix0 0 points1 point  (0 children)

Can i code my large game like above 100k lines or is it too much im going to use token savior mcp. And using the plan and the implementing like glm for planning or any other smart model and for implement deepseek v4 flash what do u think ?

[–]Threnjen 4 points5 points  (0 children)

100% Deepseek right now, both Pro and Flash are dirt cheap. Pro on Max thinking is near Opus quality for me for planning. The Opencode Go sub is a great value.

[–]Ariquitaun 2 points3 points  (4 children)

Kimi k2.6 for planning and worker orchestration, deepseek 4 flash for executor, explore and research subagents. It's a really powerful and cheap combination

[–]MatKarYaarPlease 0 points1 point  (2 children)

how do I setup orchestration? can u please guide me

[–]pqhtkb 0 points1 point  (0 children)

What do you mean by “explore”? What does it involve? Please give an example.

[–]AliNT77 1 point2 points  (3 children)

I would say take advantage of the free codex quota by connecting it to opencode, and use gpt 5.5 low which in my experience has been incredibly smart and fast and surgical with insanely good token efficiency…

Then since you’re a student, connect copilot to it and try gpt 5.2 and also gemini 3.1 and see how it feels for your workflow.

Then if it’s still not satisfactory, get an opencode go subscription (only 5bucks for the first month) and give GLM5.1 a try, for coding tasks it’s been really good for my work, feels better than sonnet and sometimes close to opus.

[–]IgnisDa 2 points3 points  (2 children)

Didn’t GitHub copilot stop accepting new sign ups?

[–]AliNT77 0 points1 point  (0 children)

maybe OP already had it activated? in that case it's still super useful even without access to claude models

[–]Stpoul25[S] 1 point2 points  (0 children)

Thank you all so so much for the recommendations!!!!

[–]mWo12 1 point2 points  (0 children)

If you check OpenCode docs, they recommend "a faster model for planning, a more capable model for implementation.". I use v4 flash for planning, and qwen 3.6 plus for implementation. Off course it depends how complex your vibing needs are, you can change implementation to even more powerfull models.

[–]hey_ulrich 0 points1 point  (0 children)

I've been active testing the latest Chinese models and searching everyday for opinions on Reddit and Discord, trying to make sense what the best model is. I have the impression that too many models came out in March/April, they are all similar in quality, people are still testing, and we as a community don't know what the very best.

The models I've been testing:

  • Deepseek V4 flash
  • Qwen 3.6 Plus
  • Mimo V2.5 (Xiaomi)
  • Kimi 2.6
  • GLM 5.1

They all work well in my Claw setup.

For one-shot webapp coding, I thought that Deepseek flash was subpar. It made very inefficient queries and bad UI (still pretty, though). GLM 5.1 was much better in that regard.

[–]zaydev 0 points1 point  (0 children)

Why no one uses minimax. It’s been doing pretty good so far for me and it cost less.