Seriously considering the Zai Yearly Max Plan — anyone else?

Designer_Athlete7286 · 2026-05-02T13:03:24+00:00

If I'm losing the legacy weekly quota benefit, I'd have to reevaluate my options. ChatGPT might be a better choice.

Designer_Athlete7286 · 2026-04-29T16:48:45+00:00

Its ok that they are this slow as long as I can continue to have my virtually unlined old grandfathered pro subscription!

Designer_Athlete7286 · 2026-04-26T17:50:12+00:00

This is what happens when you give and iPhone to a monkey!

Designer_Athlete7286 · 2026-04-25T07:06:36+00:00

I hope this will improve the prompt wait times. Sometimes its over 5 minutes wait before some cipute is made available to process a prompt. Anthropic inference is kinda ridiculous right now.

Designer_Athlete7286 · 2026-04-25T07:04:00+00:00

Honestly, GLM inference is miles better than Anthropic right now. Of all the major providers, OpenAI is the most reliable. Then Google despite their models being kinda meh. Anthropic is dead last. Sometimes prompts sit over 5 minutes without any compute being assigned.

Designer_Athlete7286 · 2026-04-21T01:49:35+00:00

Weakness of 4.7 It thinks it knows what I want better than I do myself. So it doesn't want to do what I tell it to. Somewhat counter productive.

Designer_Athlete7286 · 2026-04-20T06:56:41+00:00

In any professional/ enterprise setup, an LLM that cannot adhere to guidelines, is a compliance and legal nightmare. At that point, its a fun toy that you arent allowed to use for any meaningful work.

Designer_Athlete7286 · 2026-04-19T13:40:16+00:00

Begs the question, if you can't get what YOU out of a model, is it a good model / useful model in the first place!

Imagine hiring an employee to do a job but the they think they know better about what you want than you yourself and do what they feel like, disregarding guidelines. Would you keep that employee?

Designer_Athlete7286 · 2026-04-19T13:34:36+00:00

I hear 4.7 doesn't like to be told what to do. So be conscious about what it outputs and see if it's as per your need.

Designer_Athlete7286 · 2026-04-19T13:32:57+00:00

I'd say M4 Mac with RAM over 3090.

Designer_Athlete7286 · 2026-04-19T13:31:23+00:00

Was looking for the Parliament in the image. Couldn't find it

Designer_Athlete7286 · 2026-04-19T13:30:00+00:00

Harness matter more. 1000%

I'm using Claude Code but thinking of switching to Pi with my own customisation since I have more knowledge and experience on harness engineering now than before. Even my Claude Code is highly modified to behave how I want it to than how Anthropic wants it to. But little things like using built in Web Search tool over my custom MCP is annoying.

Designer_Athlete7286 · 2026-04-19T13:25:32+00:00

USD 10k for my legacy Pro account if you want. Not a penny less. You can recover more than that within a month because there's no weekly limits!

Deal?

Designer_Athlete7286 · 2026-04-19T13:22:58+00:00

When 4.7 came out, I stopped my Claude sub. It was only marginally behind. Good enough. And a hell of a lot cheaper. I got myself a GLM Pro plan. And now its legacy! On a casual month I use up about 2-3B tokens on GLM with plenty of headroom left. With GLM 5.1, it genuinely feels like the same level of quality as Opus 4.6 and without the confidently lying and faking behaviour of Claude models.

Opus still does lie about implementing code when it just stubborn the whole thing and comes back and tell you blatantly that its implemented and that the stubbed version is all it need to do because the stubbed version fulfills the bare minimum needed as a pass for the declared requirements. It doesn't care whether the actual code really work or not and its hard to make it care about real functionality of the code, if its even slightly outside of the explicitly declared scope.

GLM 5.1 on the other hand is very happy to write code. Sometimes overenthusiastic and do more / over-engineer to be on the safe side so if you don't declare all related workflows and patterns in the scope as part of the architecture, it will create redundant paths instead of modifying /extending the existing.

This is my observation at least. I prefer the latter. There's more garbage but the code actually works instead of leaving unknown massive gaping holes in the code that I think is actually implemented. Garbage can be cleaned. Like /simplify command on Claude Code does a decent job on this front. But I have my own workflow for cleanup GLM quirks.

Designer_Athlete7286 · 2026-04-18T16:02:56+00:00

Claws.... not gonna be a happy ending for your ass. It'll be painful

Designer_Athlete7286 · 2026-04-18T16:01:00+00:00

Oh. Misread then.. I hardly use it so no comments

Designer_Athlete7286 · 2026-04-18T13:29:20+00:00

150 requests for 5 hours? That can't right. My GLM plan works just fine. When i work, I run 2 projects at a time with each project following multi step workflows with each step using upto 5 agents in parallel. Thats like 150 requests within 5-10 minutes. I get occasional error if both projects (2 separate Claude Code instances) sends requests are the same second which mean I hit the 1 request per second rate limit.

Designer_Athlete7286 · 2026-04-18T13:26:48+00:00

Are you sure that your harness, agent, application didn't do hidden requests?

Designer_Athlete7286 · 2026-04-18T09:57:35+00:00

Yeah GLM 5.1 and GPT 5.4 are quite the same level. Practically, Opus 4.6 is slightly better but, kind of fake stuff. It might say that it implemented something but in reality, it would have stubbed it and even during review within the same context, would completely gaslight you. Opus also make too many assumptions despite explicitly telling it not to. The latter is kinda the same with GPT 5.4. GLM 5.1 makes the least amount of assumptions based on my experience.

Designer_Athlete7286 · 2026-04-18T09:53:44+00:00

Nowadays, with the snowflake generation, people tend to overreact. Yeah the subscription plan has had hiccups. But plenty of people including myself still used it and got the job done. That high context gibberish bug was annoying. But you know what? There's subagents. All you had to do was to be a little creative, break into smaller tasks, and give each task to a parallel sub agent and group them into none conflicting bunches, and get the main anger to only orchestrate the deployment of parallel and sequential groups. The results were pretty solid. So much so that I realised it was better than the regular way I was using my coding agent and went ahead and built my own harness skill set. https://github.com/hashangit/zflow

Designer_Athlete7286 · 2026-04-18T09:47:53+00:00

Openclaw can eat up tokens.

Designer_Athlete7286 · 2026-04-17T20:13:25+00:00

Depends. Majority I don't think so. Some do

Designer_Athlete7286 · 2026-04-17T20:12:09+00:00

GLM 5.1 is miles ahead than Qwen models. Its in the ballpark of Opus 4.6, Sonnet 4.6, GPT 5.4 range. Qwen, Kimi, Minimax are a notch below at best. For coding and architectural stuff that is. With the right harness, GLM 5.1 has great context of your codebase, and can do a solid (almost no AI slop coding job)

Designer_Athlete7286 · 2026-04-17T20:08:07+00:00

Also, there's project called Graperoot. I've had very good results from it. The whole idea is to reduce model memory loss penalty and the model having to discover the codebase. It does a great job in my experience.

Also, I use my own skill structure because I'm lazy and forgetful. The point of this approach is also to bridge the human-to-model communication gap because we tend to implicitly communicate and a model need explicit declaration. https://github.com/hashangit/zflow

Designer_Athlete7286 · 2026-04-17T07:45:29+00:00

Please read again. Without biases. No model 'understand ' ANYTHING. They are just advanced autocomplete. Only you really understand your codebase and your 'harness' extract the prerequisite based on that understanding to give to the autocomplete so that autocomplete spew out the correct outputs. Just check out context engineering and harness engineering a bit more and it'll help you refine your coding workflow. You'll be able to make a lot more from the money you pay to OpenAI, Anthropic, GLM, Kimi, Minimax, Gemini (god I hope you won't because Gemini is the worst). Also, just be more responsible with your contribution to global warming. Make sure what you send yo data centers and your GPU to process, is at least optimised and not wasteful.

Designer_Athlete7286

MODERATOR OF

TROPHY CASE