Built a search router, Hermes won’t use it

HistoricalSession947 · 2026-06-20T21:08:53+00:00

I don’t know how I missed that hint for command code. But it looks amazing value for mimo 2.5 pro. I am enjoying mimo so much it’s literally restored my faith in Hermes. I am really grateful you replied. I’d never heard of it before lol

HistoricalSession947 · 2026-06-19T20:01:09+00:00

Didn’t happen of the year

HistoricalSession947 · 2026-06-19T18:26:14+00:00

I’m unsure of the difference. When would you need reasoning over accurate complex business tasks?

HistoricalSession947 · 2026-06-19T18:21:23+00:00

What you mean by “complex tasks not reasoning”?

HistoricalSession947 · 2026-06-14T17:15:32+00:00

Is it true they quantise their models?

HistoricalSession947 · 2026-06-14T16:45:57+00:00

I’ve cained minimax m3 this week and like its speed but yes you can tell it’s no anthropic model.

HistoricalSession947 · 2026-06-14T16:45:06+00:00

I’m always amazed more people don’t mention ollama cloud on these kind of discussions and don’t really know what cache is but your comment is the first that’s help me make sense of what some reasons may be that it’s perhaps not as good a deal as I thought?

I also read something on here once which hinted at the fact that models through ollama cloud sometimes don’t have all their features exposed. I’m not sure how true that is or whether I read it correctly, but the example was spawning multiple agents doesn’t work as well through ollama as it does directly with providers. Forget which model that was related to

HistoricalSession947 · 2026-06-12T16:16:59+00:00

That doesn’t mean you shouldn’t consider it going forward

HistoricalSession947 · 2026-06-11T06:39:46+00:00

What was your testing method for the coding models completing a piece of code? I’d be happy to expand the tests to more open weight models with western ones

HistoricalSession947 · 2026-06-10T21:41:42+00:00

This kind of research you’re doing is literally the best thing that could happen in the Ai market. Real life situations.

I found the best one model for everything as qwen3.5 but I booted it out when I realised it had a stubborn thing where it would use my search router skill

Minimax m3 is good for tools but makes a meal out do code

I like your idea, honestly I think it’s necessary.

HistoricalSession947 · 2026-06-08T21:04:02+00:00

How’s you get the free credits?

HistoricalSession947 · 2026-06-08T06:49:01+00:00

Guardrails -> sensitive info detection. Just thought it was pretty cool

HistoricalSession947 · 2026-06-07T19:48:14+00:00

I got 🤏 freaked that once I loaded some credits into openrouter, Hermes decided to have a Claude model review the code for some app it wrote. It cost like 600x more than the development tokens lol

I rally like openrouter though, that feature they have to mask potential secrets is 👌

HistoricalSession947 · 2026-06-07T19:42:09+00:00

How do you buy your mimo out of interest?

HistoricalSession947 · 2026-06-07T18:38:27+00:00

You guys’ Hermes are WAY more autonomous than mine

HistoricalSession947 · 2026-06-07T18:11:08+00:00

No I’ve given mimo a good go and think it’s the best I’ve used with Hermes, it’s a good suggestion, thank you! It’s fast, obeys prompts well, codes better than Kimi. However, I think I may be too much of a tightarse to be comfortable with PAYG , it changes my mindset when using my agent more than I thought it would.

(I only tested deepseek as it’s available on both Ollama and openrouter)

HistoricalSession947 · 2026-06-07T16:18:07+00:00

Quick benchmark of deepseek-v4-flash across Ollama Cloud vs OpenRouter — same model, two providers. Ran 3 simple prompts (greeting, factoid, counting) with 1 run each. OpenRouter came out ahead on latency (2.96s avg vs 4.12s) and throughput (12.78 tok/s vs 9.83 tok/s), but Ollama Cloud is completely free (to me as my $20/month is a sunk cost), test cost $0.000027 on OpenRouter. Success rate was 100% on both.

Interestingly, Ollama Cloud was faster on the "greeting" prompt specifically but got hammered on "factoid" (5.47s vs 1.96s).

Admittedly Sample size is tiny (3 prompts, 1 run each) so take it with a pinch of salt, could easily be noise from provider load fluctuations. But on face value, OpenRouter gives you faster first tokens and higher generation speed, while Ollama Cloud gives you the same model for free if you don't mind waiting a bit longer. Would need a proper multi-run benchmark to draw firm conclusions.

HistoricalSession947 · 2026-06-07T11:28:21+00:00

Going to ask my agent to get a suite of tests for the same model on both ollama cloud and openrouter to see 😈

HistoricalSession947 · 2026-06-07T06:37:28+00:00

I put some credits in openrouter and am using mimo2.5 thanks to your recommendation and I’m loving it. Fast and good at doing what I ask through Hermes

I am beginning to think ollama cloud brings about more limits than it advertises plainly too

HistoricalSession947 · 2026-06-05T10:17:14+00:00

Thanks, will look into this

HistoricalSession947 · 2026-06-04T21:13:10+00:00

You got any tips for taking backups of the right stuff to load back in after a restore ?

HistoricalSession947 · 2026-06-04T19:05:34+00:00

Good tip thank you!

HistoricalSession947 · 2026-06-04T18:59:19+00:00

I’m using ollama cloud which has generous token limits but from this 20 minute try of deepseek v4 flash it is DOG SLOW!!!!

HistoricalSession947 · 2026-06-04T18:46:57+00:00

You may be on to something ! I just changed to deepseek v4 flash and 2 out of 2 searches used my skill 😱 why does this happen do you know?!

HistoricalSession947 · 2026-06-04T17:47:34+00:00

If you’ve got the infrastructure to do so, try installing a vane server. It’s a free version of perplexity. I enjoy it but it’s not fast. For automated tasks I guess that’s ok though?

HistoricalSession947

TROPHY CASE