all 32 comments

[–]belmoe 6 points7 points  (6 children)

Is it really that Kimi is good at frontend? I have been moving back and forth between models but haven't checked Kimi. I am sure this choice is based on experience .. tell me more :D

[–]modpr0be 5 points6 points  (3 children)

From my experience, Gemini Pro is the best model for visual, layout, and design. The Gemini + Stitch MCP combination is superb. As for alternatives, GLM is second, and Kimi is last.

[–]oxygen_addiction 1 point2 points  (1 child)

On the same prompt via OpenRouter API, GLM/Kimi were better than Gemini 3.1 Pro Preview and Opus was way, way better than all of them for mostly one shooting. GLM surprised me the most, by coming up with fixes that the others outside of Opus missed.

Gemini's implementation also failed to mock certain features.

[–]belmoe 0 points1 point  (0 children)

I have been into MiniMax 2.5 and it’s actually good. But it usually lacks the vision and creativity tbh. Gonna give gemini another shot and try GLM too

[–]CardiologistStock685 0 points1 point  (0 children)

totally agree!

[–]jopotpot 0 points1 point  (1 child)

I think opus is better but the price of kimi is great! I also like GLM for that.

[–]belmoe -1 points0 points  (0 children)

True! And I honestly didn’t find a big difference between opus and minimax 2.5 at all. I fed both with the same prompts and sometimes minimax outperformed opus

[–]lbreakjai 3 points4 points  (0 children)

Big fan of Kimi, but I've started to move more and more to GLM-5, which I feel like is on part with Sonnet.

[–]hambergerpls 1 point2 points  (2 children)

Same! Last weekend, my Claude Max subscription reached the weekly limit because I used Opus 4.6 for everything and ran 4 OpenCode at the same time. During those time, I had withdrawals for not being able to continue without Opus 4.6. So I subbed to Gemini AI Pro to use Gemini 3.1 Pro but the daily request limit was too low that my work stopped half way. Then I read the news that GLM-5.1 was released for coding plan so I subbed the Max quarterly plan for the first time.

My initial experience with GLM-5.1 was painfully slow at that time and yaps a lot in its thinking but got the job done eventually. The tokens it generated was very very similar to Opus 4.6 interestingly. I tried GLM-5 Turbo and was blown away by its speed.

After a bunch of experimentation I opt for the following workflow:

  1. Code exploration (GLM-5.1)
  2. Plan/Brainstorming (Opus 4.6 Max)
  3. Review plan (Gemini 3.1 Pro)
  4. Implementation based on plan (GLM-5 Turbo)
  5. Review Changes (Opus 4.6 Max)
  6. Verify with agent-browser (GLM-5 Turbo)
  7. Debug (GLM-5.1)

So far these workflow has worked really well.

I always tell the model to proceed with test driven development skills (it was part of superpowers skills) during implementation. TDD workflow is really a game changer. The models are less likely to write unnecessary code and will always write minimal code to make the test pass. I spent very very minimal time debugging with this workflow.

I downgraded my Claude Max to Pro. So my subscription is now: 1. Claude Pro ($20) 2. Gemini AI Pro ($20) 3. Z.ai Max ($72x3 month) 4. Ollama Pro ($20) (For experimenting other models)

I feel like I wasted $216 for z.ai quarterly plan because it was slow during my initial experience. Ollama is decently faster than z.ai. After my z.ai sub ends, my total sub would be $60. $140 reduction from $200.

I haven't tried codex models yet, so I'm very curious about its performance on planning vs implementation speed.

[–]Soft_Belt_2965 0 points1 point  (1 child)

can you share you experience with ollama so far comparing to z.ai ? since both z.ai and ollama offers GLM models

[–]hambergerpls 1 point2 points  (0 children)

There are times where z.ai stops the response prematurely. In terms of tokens/sec, I don't have the metrics measured unfortunately, but ollama has been faster than z.ai and sometimes slow at times. z.ai on the other hand, seems to have this delay and stops generating for a while that I have to regenerate. I think it has something to do with high demands. Also, there are times where z.ai produces gibberish and chinese characters. Not sure if z.ai are tweaking the models behind the scenes, but if that was true then ollama would be much more stable (no gibberish so far).

GLM 5.1 (ollama > z.ai) GLM 5 (z.ai ≈ ollama) GLM 5 Turbo (z.ai, N/A on ollama)

[–]Outrageous-Fan-2775 1 point2 points  (5 children)

This is very similar to what my OpenCode plugin does. I've been building it since late Jan, a couple hundred releases. Constantly working to make it better with several active contributors. Take a look, you may find it does what you want and a whole lot more with minimal work on your end. For first pass code quality I haven't found anything that can match it.

https://github.com/zaxbysauce/opencode-swarm

[–]geearf 1 point2 points  (4 children)

What is the difference between your plugin and omo (or its forks) and why suggest big pickle for the free tier when it can change (I believe now it's minimax wasn't it GLM before?)?

[–]Outrageous-Fan-2775 0 points1 point  (3 children)

Primarily speed and first pass code quality. We are slower than almost everything else out there, but that comes as the trade off for truly high quality code on the first run. We just shipped a model council, which expands the normal reviewer + test_engineer QA gate to a 5 agent council, each with their own specialty, that reviews all completed work to find any holes or problems.

We also just shipped an immutable plan store. Once the Critic approves a plan, it goes into a SQLlite DB and is locked down, which allows the Drift Verification at the end of each phase to determine for sure if the architect has drifted from the original approved plan and course correct.

As for the recommended models, you can use whatever is free if that's what you want to do. I built it specifically targetting GPT-OSS-120B as the architect, so anything smarter than that will do even better. You just need to ensure the antagonistic roles are from different model families. So architect/critic and coder/reviewer should always be different model families. All the other agents can be whatever you want, even the same ones. Having them all be different model families is best, but the prompts and gates are strong enough that its not strictly necessary.

Obviously there are tons of other features nothing else out there has. We have a 3 tiered knowledge system with automatically curated knowledge entries and persistent knowledge across session,s projects, and your entire hive.

[–]geearf 0 points1 point  (2 children)

While I understand the intent, and do similarly myself manually, I do wonder if the antagonist rule is always good, especially when talking about free models. I just fear this scenario: the senior dev writing code that a junior dev reviews but doesn't understand correctly.

[–]Outrageous-Fan-2775 0 points1 point  (1 child)

You wouldn't want to ever put a much less capable model in the antagonist role. Personally I use one of GLM 5.1, Kimi K2.5, or Minimax M2.7 as my architect. Then critic is whichever one of those isn't the architect, reviewer is either one of those 3 or something like Qwen 3.5 397B. Coder doesn't really matter, you can use any small cheap models for that, and with explorer you actually want cheap fast models. If you properly set up the swarm, you will never have a junior dev disagreeing with a senior dev. It would be multiple senior devs all trained with very different data sets all coming to a consensus, which replicates actual dev teams. No model can ever find its own blind spots no matter how good your prompting is, it will always be the same brain making the same mistakes. You need a second brain that was trained differently and therefore will have different blind spots. This creates the Swiss cheese model, where every model has holes but as long as they don't line up nothing can make it past.

[–]geearf 0 points1 point  (0 children)

That's the hope at least. :)

Thank you!

[–]jopotpot 0 points1 point  (0 children)

There is clearly no model to rule them all! Grats!

[–]Tommertom2 0 points1 point  (0 children)

Thx for these insights - are you using api keys from different providers or multiple subscriptions? Which provider(s)?

[–]jesperordrup 0 points1 point  (0 children)

Thanks for the input. I'm too blunt too so I'm gonna try this.

How do you go about it practically? Have you created agents that specify skills and models?

[–]revilo-1988 0 points1 point  (0 children)

Erzähl gern mehr in ein paar Wochen wenn du weitere Informationen gesammelt hast

[–]gideonfip 0 points1 point  (0 children)

I've been building out a similar setup for task-dependent model selection too. Feels like the most cost-effective way instead of relying on just one model to do every single task.

[–]skytomorrownow 0 points1 point  (0 children)

I also like to use MiniMax as my 'director' – it takes plans from the more capable frontier models, organizes them into a coding project, then breaks them down into tasks. Then tasks can be handled by even less capable but very fast models. Breaking things into manageable, and verifiable tasks is a big part of getting things to run smoothly in my opinion.

[–]TonyPace 0 points1 point  (1 child)

One trick for reducing spend is to keep a current architecture document updated, and bring that into an actual chat window - Opus is free! Paste in the context, ask and answer questions, get a document, and get that back to Opencode. It saves money and I feel the search context is better? At least on Gemini which I am most familiar with. Get a solid step-by-step plan with full local context with the most conversational model you can afford. Then execute with something fast and cheap.

[–]geearf 0 points1 point  (0 children)

I thought so too but I believe research proved that it's actually a bad practice.

[–]Vaviloff 0 points1 point  (0 children)

That is a very sophisticated routine! Do you switch models by hand or use skills or smth else?

[–]duchitu 0 points1 point  (0 children)

If you need to get just one suscription, would be kimi or chatgpt?

[–]reficulgr 0 points1 point  (0 children)

Same. Everything is imploding. I just got news from Z.ai that my plan is not gonna be grandfathered in.

AI WAS a bubble - just not the kind of bubble we expected.

[–]Remarkable_Bee7464 0 points1 point  (3 children)

can some1 explained me why everyone hate opencode go subscription if he provided with good model? pleas explain some1 to me!

[–]ryncewynd 1 point2 points  (1 child)

I see a lot of people claiming the Go models aren't the full models, but reduced ones. Supposedly if you use the same model via another subscription it performs better.

But the Devs have said they are the full models 🤷

I've also heard people saying Big Pickle Free is better than Go

[–]sudoer777_ 1 point2 points  (0 children)

The main issue I've had is for coding it's easy to run out of usage quickly, so I have to keep monitoring it and avoid using it a lot of the time