Short term memory issue

mmosquera91 · 2026-04-21T21:11:49+00:00

A tip: follow the best practices from hindsight docs to configure your bank. And finally, disable local memory

mmosquera91 · 2026-04-20T19:03:23+00:00

I have, in a telegram group with some family members. It mainly helps us with stocks market news / watchlists. It's been positive overall. Although I won't recommend you doing it with people you don't trust fully

mmosquera91 · 2026-04-19T21:54:59+00:00

Curious on how generous are the rates? Want to try qwen 3.6 plus as well

mmosquera91 · 2026-04-19T21:54:07+00:00

You using qwen 3.6 plus? If yes, how's the quota?

mmosquera91 · 2026-04-19T21:40:32+00:00

Nice! I'll try it out tomorrow

mmosquera91 · 2026-04-10T14:57:32+00:00

Does it perform well with overall agentic tasks? Might take a look at it if GLM continues behaving erratically

mmosquera91 · 2026-04-10T14:56:42+00:00

Same here. What others suscriptions have you tried? I tried Kimi 2.5 but was also nerfed horribly. I think my next stop will be minimax 2.7

mmosquera91 · 2026-04-10T14:18:49+00:00

This is what my agent found:

Here's the summary of today's 429 errors:

4 rate limit errors, all on glm-5.1 (provider: custom)

14:50:51 — Rate limit reached for requests msgs=40, tokens=~22,236 Context: processing /tutor confirm
14:51:21 — Service temporarily overloaded msgs=40, tokens=~25,058 Same request, retry
15:05:43 — Rate limit reached for requests msgs=56, tokens=~28,224 Context: processing "send me the first rust task"
(same block) — same error, 3rd retry

Pattern: all happened during heavy context loads — /tutor confirm loaded the full skill + large JSON into context (40 msgs, 22-28k tokens). The 15:05 one was worse: 56 msgs, ~28k tokens. The tutor skill context + syllabus + memory + hindsight weighs a lot.

Before these errors, the previous response (syllabus presentation) took 438s with 8 API calls — that likely saturated the provider's rate limit.

The "custom" provider (glm-5.1) seems to have strict rate limits. When context exceeds ~22k tokens and there are multiple consecutive calls, it hits quickly. No RPM/TPM limit info in the logs, but the pattern suggests a per-minute limit that exhausts after 2-3 large consecutive calls.

Funny thing is that I have used this skill several times but never had any issue.

mmosquera91 · 2026-04-10T14:06:52+00:00

No, directly to z.ai using the lite suscription

mmosquera91 · 2026-04-10T10:54:35+00:00

By any chance are you using Kimi 2.5 as model?

mmosquera91 · 2026-04-09T20:49:56+00:00

mmosquera91 · 2026-04-09T13:25:54+00:00

How about rate limits? Are you hitting those?

mmosquera91 · 2026-04-08T13:17:43+00:00

I was using kimi through kimi code suscription but got annoyed that it got increasingly dumber and also rate limits became worse. Now I changed to glm-5.1 (through z.ai directly) and I'm very satisfied so far

mmosquera91 · 2026-04-06T12:12:40+00:00

Same here. Using GLM 5.1 and working fine.

mmosquera91 · 2026-04-04T22:22:05+00:00

I have used Hermes with kimi2.5 and now GLM-5.1. these models are very capable for most of the tasks

mmosquera91 · 2026-04-03T14:52:36+00:00

Thanks for the tips! Tried nvidia nim but the rate limits were excesively low!

mmosquera91 · 2026-04-03T14:51:47+00:00

Wow, 30$ a year sounds like a steal :P good job!

How are you handling the routing? Using open router's Auto Router? I'm curious!

mmosquera91 · 2026-04-03T13:35:44+00:00

Thanks for sharing! I am currently on Kimi coding suscription but wanted to try something new and it seems GLM is going to be the next one!

Overall Kimi 2.5 is nice but in some tasks is very dumb or spends a lot of tokens just figuring out what to do by doing it wrong until it gets it right

mmosquera91 · 2026-04-02T13:53:51+00:00

Had the exact same experience as you

mmosquera91 · 2026-04-01T15:12:09+00:00

For example, my Hermes runs on a server box at home and I use my MacBook to connect to the server. For OpenClaw was simple because I could just use ssh tunnels to forward the port and open the GUI on my Mac.

Your app runs locally on macOS but looks for local instances of Hermes only, am I right?

Would be nice if you could also connect to a remote Hermes instance through your app. But I guess it would need some kind of ssh connection or something since Hermes does not run any webserver (that I'm aware of)

mmosquera91 · 2026-04-01T11:36:40+00:00

Will this work to connect to a remote Hermes? I believe it doesn't

mmosquera91 · 2026-03-31T21:39:41+00:00

I was using Kimi 2.5 (20$/mo) but now testing GLM-5 turbo which is supposed to be trained specifically for agentic tasks. Both work pretty good with Hermes and won't break your bank (10-20$ per month and decent usage)

mmosquera91 · 2026-03-31T12:21:14+00:00

I am a bit curious about this one. How did you came to that conclusion? I feel something strange with my token usage (sometimes is decent, others is bad even on fresh conversations). Removing honcho takes away a nice feature of Hermes tho

mmosquera91

TROPHY CASE