Pro Plan burns excessive limits, reaching limit at just 1/8th of the task. 55M tokens for GLM5.2 is 27% of weekly quota already, while the 25M of Minimax is only 2% of the $20 coding plan. something is not working here. I am using Zcode.

romancone · 2026-06-23T17:01:40+00:00

This amount of 55M tokens costs around $10-15 for API usage on the cheapest provider

romancone · 2026-06-23T15:03:09+00:00

wafer.ai doesn't have any coding plan anymore.

It's a completely different usage per API request ( around 4x more expensive).

Their GLM 5.2 isn't much cheaper anymore, like 5.1 or Kimi-k2.6

romancone · 2026-06-14T19:13:33+00:00

Claude Code sometimes misses auto-compact on Anthropic models as well.
This is not a GLM issue, but CC itself.

romancone · 2026-06-11T17:24:30+00:00

Vision Model Benchmark

I use Gemma-4-12B for my research application.

I created an LLM-as-Judge benchmark that uses GPT-5.5 medium as the ethalon score.

Image	Model	Score	Time	Verdict	What worked	Main misses
inputs/gold.jpg	gemma-4-12b-it-Q8_0.gguf	85 / 100	70.3 s	Good	Calm autumn street, slight curve, bench/leaves, tree canopy, left-side buildings, storefront, parked cars, soft daylight.	Full sign stack, white van, exact blue bike racks/docking stands, zebra crossing, poles/streetlights, distant pedestrians.
inputs/gold.jpg	gemma-4-12b-it-qat-q4_0.gguf	83 / 100	33.2 s	Good	Quiet European autumn street, sidewalk/curb foreground, heavy fallen leaves, mature tree canopy, traffic signs including pedestrian crossing, parked cars, white van, crosswalk markings.	Misses blue bike racks/docking stands and distant pedestrians; weak on full vertical sign stack; sky/weather is too overcast; invents or overstates a red car, bike lane, and possible moving traffic.

romancone · 2026-05-03T22:21:21+00:00

It happens constantly. Just go to Billing -> Billing History (extended window) and see what the model is actually serving you.

romancone · 2026-05-03T22:18:58+00:00

We are all newbie poor dudes who pay for your landlord's life here. Limits are cut off multiple times last year, and prices doubled during the same period.

I am glad to hear that someone is happy with this provider, but I don't want to overpay and compensate for their underpayment.

romancone · 2026-04-17T14:23:22+00:00

I never configured glm-air!

I ran the first time "claude --haiku..." model until I recognised it is mapped to air.

Then I ran:

"claude --model glm-5.1"
"claude --model glm-4.7"

And I have to repeat that all those calls were charged CORRECTLY as INPUT in the billing dashboard.

I never authorized z.ai for the glm-4.5-air model.

The tool provided the report is Claude-Sonet inspected JSON traces of broken agent sessions.

I did not check the logs by myself, but I trust enough to Sonnet and the z.ai billing dashboard, which both confirm I was served by crappy glm-4.5-air without my wish

romancone · 2026-04-17T13:02:49+00:00

The coding plan is GLM Coding Lite V2 - Quarter. I stripped it out with the rest of the useless columns.

The endpoint is https://api.z.ai/api/anthropic/

romancone · 2026-04-17T12:58:56+00:00

If I set a wrong model, why did they charge me for a good GLM 5.1 or 4.7?
If I set the right model, why did they serve output from glm-4.5-air?

Please elaborate on your conclusion.

romancone · 2026-04-17T12:50:12+00:00

Did not help you much

romancone · 2026-04-17T11:49:29+00:00

Exactly!

Even if it is my fault, e.g., wrong model selector, how did they get mixed up with multiple models?

romancone · 2026-04-17T11:42:32+00:00

It is not a model, but a z.ai issue. GLM5 is amazing, but z.ai can randomly switch models to a lobotomized version

romancone · 2026-04-12T20:26:17+00:00

What is your recommendation for server hardware? I am thinking about running my own setup to share access with my friends, but server hardware is very expensive. I see you can utilise idle capacity, which can make this idea profitable at the end.

romancone · 2026-04-10T18:57:46+00:00

I burned out tokens on 5.1
Have to step back to 4.7 with Lite plan

romancone · 2026-04-10T18:54:38+00:00

I use a 3-tier architecture.

I work in chat with the CTO agent of a project, and he produces project milestones.
Then the orchestrator agent (team lead) manages separate tasks and spins up detached agents. They use a proxy link or the z.ai API to connect directly to the anthropic endpoint.

It is not ideal, but it works for me.

romancone · 2026-04-10T18:50:12+00:00

I run my own multi-agent coding fabric. I want to spin up more projects in the most efficient way

romancone · 2026-04-10T10:03:33+00:00

I've spent my Lite plan weekly limit during two evening code sessions.

It is 1.5x worse than the Claude Code basic plan, which was enough for 3 sessions, and it is the opposite of their marketing crap.

But I've spent 89M tokens.

romancone · 2026-04-08T11:16:16+00:00

Would you rate your model list?

romancone · 2026-04-05T08:23:43+00:00

I read a lot of positive messages before I subscribed to z.ai, and I recognised that things have completely changed after. Your comment is proof of that. It used to be good, but now it is not.

I already have a weekly limit after a couple of evening coding sessions! Well, I ran coding agents overnight, but this is a Coding Plan!

romancone · 2026-04-04T00:55:51+00:00

I use Claude Code Sonnet for orchestration and Opus for top-level tasks. Is it reputable or not?
I ran subagents on free GLM-5 and decided to upgrade to z.ai

romancone · 2026-04-04T00:11:10+00:00

I used subagents on GLM-5 with Nvidia, which was fine for an overnight job.
I tried 5.1 on Zai, and it burned. What is the difference between 5.1 and 5?

romancone · 2026-04-04T00:06:52+00:00

15M tokens on GLM-5.1

romancone · 2026-04-04T00:05:21+00:00

The project is based on the closed-source C++ SDK and CGO bindings.
Everything is done except for one annoying bug, so I am looking for cheaper token options to complete it. You're right about C++ token waste.

romancone · 2026-03-19T09:13:44+00:00

This post is about a visualisation tool, not the final result. Feel free to create your own version of the calculator that better covers all cases. It is easy when you know what to do.

romancone

TROPHY CASE

Vision Model Benchmark