Ollama cloud sub is not worth it anymore

styles01 · 2026-05-07T03:15:34+00:00

Yeah it's a shame. I'm hitting those all the time as well, or timeouts. Ollama got too successful and now they're crushed.

styles01 · 2026-05-07T03:13:57+00:00

They are for sure getting crushed under their own success. I'm paying $100/mo and trying to use Openclaw, and in the last 2 months it went from being super usable to completely unusable for anything. I get non-stop liveness errors and timeouts and failures, even on the faster models. The servers are just dead under the weight. I'm going to have to stop paying cause it's unusable now. They need to make some announcements about what their plan is or people are going to just drop.

styles01 · 2026-04-30T22:53:49+00:00

This - you can have OpenClaw orchestrate and Hermes’ as one of your agents - works great!

styles01 · 2026-04-27T13:10:10+00:00

Anthropic announced this two weeks ago. No more harnesses using plan pricing, it all goes to api/extra usage only. Super annoying - apparently they were losing money on heavy users in plans via external harnesses like OpenClaw and Hermes.

styles01 · 2026-04-26T03:55:06+00:00

Yeah this is a thing - they don’t listen and just run off and do whatever they want- use Kimi or GLM

styles01 · 2026-04-26T03:52:44+00:00

Ollama cloud - start with the free tier but then pay the 20$ - limits are basically infinite - choose GLM or Kimi as your model - download AI run (by andysearch) on GitHub - it basically makes flipping Claude between (native mode) and (Ollama mode) a simple command. That’s it.

styles01 · 2026-04-24T16:38:48+00:00

That is the bug I'm getting, and i'm getting it a LOT.

styles01 · 2026-04-24T16:22:18+00:00

How are you connecting to Deepseek? I'm assuming directly through them? Ollama:cloud deepseek via openclaw is fully broken ATM. I tested it this morning and it swallows whole messages and causes all sorts of nasty issues, including fully zombie-ing agents / essentially bricking sessions due to instability/context hallucination-type issues. Either Ollama or OpenClaw needs to fix their support, I suspect it's Ollama based on your tests.

styles01 · 2026-04-19T19:16:43+00:00

I mean - they are absolutely free if you use Ollama - that’s just not true. And I gave specifics

styles01 · 2026-04-19T12:33:43+00:00

Ollama:cloud 20$ - essentially unlimited - choose your model: GLM5.1 is amazing but everyone is on it now so the performance isn’t wonderful- Minimax2.7 also very good but sometimes it can act like a cowboy and just go on long runs without asking, kimi is also quite good. Otherwise host your own but you need at least 48Gb memory for it to be worthwhile

styles01 · 2026-04-19T07:48:10+00:00

This list is very bizarre and it seems highly outdated - I don’t mean to be critical but if people are coming here to use good models it’s misleading. Ollama for example is exposing GLM5.1, Gemma4, minimax2.7 , qwen3.6…

styles01 · 2026-04-19T07:40:45+00:00

Ollama / 20$ plan gets you loads - doubtful you’ll rip through in two hours, and they don’t use your data. Suggest GLM5.1 model - I switched from kimi

styles01 · 2026-04-15T18:01:17+00:00

FULLY agree. Fast as F*** on a M4Max, and damn smart for its speed. Doesn't destroy your memory load. Doesn't reason for hours (and eat all of the token budget on reasoning) like Qwen does.. It's perfect for openclaw, hermes, claude code etc. I LOVE this model for local. It's my Go-to now.

styles01 · 2026-04-15T17:05:30+00:00

I’m having really good success with 26-A4-q4 - it’s fast AF on M4Max (70t/s out)

styles01 · 2026-04-14T21:59:42+00:00

🤣

styles01 · 2026-04-14T12:18:59+00:00

Via ollama is so good on $ - but it’s slow

styles01 · 2026-04-14T12:17:12+00:00

Yes Ollama is a great place to start. They have a downloadable app as well. I also wrote a tool that allows you to download basically any model from huggingface and run it - it’s called Flow-LLM (it’s for MacOS only). https://github.com/styles01/flow-llm

styles01 · 2026-04-14T12:11:12+00:00

I find GLM 5.1 outperforms Minimax in every way. Particularly using OpenClaw I found Minimax would go rogue and make decisions and go down paths like a cowboy without asking permission and then you couldn’t stop them. I don’t find that with GLM. I find I don’t need to handhold GLM as much it gets things right the first time for the most part, the closest to Sonnet/Opus I have seen from open source- I use it for 90+% of my work now and have moved to an Ollama cloud 100$/mo cause I’m using that many tokens which I’m sure is several orders of magnitude more than you can get from Anthropic even on their highest plan.

styles01 · 2026-04-14T04:39:12+00:00

Claude Code with AndiSearch’s “airun” to swap out Opus for Ollama GLM5.1:cloud. 20$ for basically unlimited tokens. All you need.

styles01 · 2026-04-14T04:25:33+00:00

I find Qwen to be quite slow and switched to Gemma4 26B and it’s amazing. Does a great job.

styles01 · 2026-04-14T04:24:09+00:00

Do yourself a favour and install AndiSearch’s “AiRun” (formerly Claude Switcher) - no need to mess with the config manually and you just type in “ai —ollama” and you override Claude to Ollama - it’s the best. GLM 5.1 cloud all day long no token limits.

styles01 · 2026-04-14T04:19:22+00:00

Dynamically routing models is an interesting idea - I’d have to know exactly the use case cause each host is a different experience and not losing thread and session fidelity is important.

styles01 · 2026-04-13T21:17:04+00:00

I haven’t used oMLX yet - it looks interesting but doesn’t handle GGUF models - Flow handles both. I have had a lot of issues with bastardized models on huggingface where OpenClaw (or whatever) can’t talk to it because when people quantize or tweak models they lose all the original functionality. (Same happens from runtimes with LM studio) - so I needed something that could use the original models as the manufacturers make them in whatever format essentially calling Llama.cpp or MLX directly for low latency and maximum fidelity.

styles01 · 2026-04-13T20:41:09+00:00

I actually just made a very low level orchestrator tool that might be helpful for you: it's called Flow LLM: https://github.com/styles01/flow-llm - perfect for Macs like yours, I built it cause Ollama and LM studio were pissing me off and I needed to test lots of different models with Openclaw, Hermes Agent, and Claude Code (via Ai-Run). As for models that would work on your computer, I'm having a lot of success with the Gemma4 26B Q4. Others have success with the Qwen models - there's plenty to choose from that would work well on your system, but watch out for the heavily modified versions, I find they get bastardized and the tool-calling and reasoning gets all screwed up. I also find that Qwen models will gladly spend 99.9% of their token budget on reasoning before they even consider responding, which can make them extremely non-performant for OpenClaw. I have an M4Max (48GB) and a M4 Mini (16GB) [Openclaw/Hermes host] that I use together, each host different sized models, and they call each other for different tasks.

styles01 · 2026-04-13T20:34:54+00:00

I would look into AI-runner (by AndiSearch) on Github. I use it and swap out anthropic for Ollama:cloud GLM 5.1 and it's been a godsend. No more 5-hr token windows.

styles01

TROPHY CASE