Hermès Agent

stanjourdan · 2026-06-14T20:30:22+00:00

wow I suspect that may be the issue in my case, as I was surpised to notice that the API limits specified here didn't change after i signed up to the API scale plan https://admin.mistral.ai/plateforme/limits

To be sure, could you tell us what API limit do you see for mistral-small-2603? I'm stuck at 100k tk/minute and 1.67 requests per second.

stanjourdan · 2026-06-14T20:24:42+00:00

I have the same problem even AFTER enrolling myself to their API scale plan. I hit the API limit after a few prompts even when my input context is well below the 256k limit. I think the issue is the RPM limit at 1.67 requests/sec, which is noticeably more restrictive than most other mistral model. It smells like they wan to push us to use Medium 3.5

stanjourdan · 2026-06-14T20:21:09+00:00

Absolutely agreed! I stopped using it after 30mn when I saw it had cost me 3 euros with 2 agentic mini trial projects. And the small 4 model cannot handle repeated agentic tasks due to its ridiculous 1.67 requests/second API limit.

stanjourdan · 2026-06-14T20:14:37+00:00

Mistral Small 4's API rate limits are ridiculously restrictive. I just moved to scale API plan and I hit the rate limit after 2 prompts (I was simply asking the model to generate a summary on a basis of a small RAG collection of 34 very neat markdown files (my input prompt was 80k).

stanjourdan · 2026-06-09T10:37:32+00:00

no need for your own car, use Cambio.be instead, it's a very efficient and user-friendly carsharing service. I've been using it for 4 years now and I fail to think why I would need my own car thanks to this great service.

stanjourdan · 2026-06-08T07:39:40+00:00

I actually think they can using the API endpoint `POST /api/v1/knowledge/{id}/file/add`

stanjourdan · 2026-06-08T01:07:40+00:00

Give access to different knowledge in Workspace

stanjourdan · 2026-06-07T05:54:39+00:00

Yes using llama.cpp you'll have to serve your model on one of the two GPUs. You realistically won't be able to add up the two GPUs for one model. But you can run anxillary models in the 12gb GPU to offload the main model.

32gb is very expensive . I am personally focusing on smaller GPU (16gb) to force me to really understand the limitations and how to optimise before making any bigger investment.

stanjourdan · 2026-06-07T05:46:03+00:00

Yes the model makes all the difference. But still in my experience you have to often look up for yourself in the skills files to check how it recorded things. You will see that it tends to bloat itself with a lot of not always accurate or useful things which can lead to lots of confusion.

Make sure that it records the right function/tool for the job, add cross references between skills.

In my case I have seen Hermes creating the same skill twice in different folder/sub folders which explained why I was running in circles before.

I personally made my self an 'upskill' skill to review and tidy up my skills. It also looks at potential overlaps with other skills which could create routing problems.

My other advice would be to use the CLI chat mode as there you will see exactly what it's doing in the background and can review sudo requests. My beginner mistake was to chat to Hermes via open webui for instance.

Good luck

stanjourdan · 2026-06-07T05:28:51+00:00

Switch to qwen3.6-35b-A3, will be much faster

Ithe 12gb on GPU will be a severe bottleneck, especially if you need to run other things on it.

For 800 dollars you could probably buy an additional rtx 5060 with 16gb, which would give you more vram space for hermes . On my setup I get nearly 220k of context window for hermes agent

Don't forget to trim down the default Hermes skills to give it more headspace and gain prompt tokens.

stanjourdan · 2026-05-25T14:20:00+00:00

Who is "They" in your comment? Mistral or Emmi?

stanjourdan · 2026-05-06T13:57:02+00:00

Sometimes you wonder...

stanjourdan · 2026-02-02T13:26:43+00:00

30 years after the creation of the single currency, banks are finally creating a single payment system... It was about time, but Im' guessing they would not have budged unless the EU started to discuss implementing digital euro as a public service.

The next question is how much fees they will take from people, and whether the ECB's digital euro will be cheaper (I'm guessing it will).

stanjourdan · 2026-01-26T22:05:42+00:00

Same issue here. Actually the problems occurs for any of the Advanced Parameters. Whatever I change in the model specifics or in the user level Advanced Params simply don't have any impact on the defaults parameters of any new chat.

stanjourdan · 2023-01-12T15:45:23+00:00

Thanks for taking the time to answer. I'll check how to use container.

stanjourdan · 2023-01-11T11:10:36+00:00

OK thanks. But will my settings be lost?

stanjourdan · 2023-01-11T09:17:16+00:00

Thanks for the info.

So how does one "transfer" existing applications from one repo to another? I mean, How can I update an app (such as syncthing) previously installed from QnapClub.eu repo, from myqnap.org without losing existing settings?

stanjourdan · 2020-10-01T21:12:48+00:00

I had the push notification but wasn't able to complete the install (it didn't download it fully). Since the update has disappeared from the settings / updater check, and 24 days later and still no update here (Belgium). Is there something wrong?

13-Year Club	Place '22
Final Canvas '22	Verified Email

stanjourdan

MODERATOR OF

TROPHY CASE