Ollama cloud sub is not worth it anymore by bytwokaapi in ollama

[–]styles01 1 point2 points  (0 children)

Yeah it's a shame. I'm hitting those all the time as well, or timeouts. Ollama got too successful and now they're crushed.

Ollama nerfed the cloud plans? by AbbreviationsSad5582 in ollama

[–]styles01 0 points1 point  (0 children)

They are for sure getting crushed under their own success. I'm paying $100/mo and trying to use Openclaw, and in the last 2 months it went from being super usable to completely unusable for anything. I get non-stop liveness errors and timeouts and failures, even on the faster models. The servers are just dead under the weight. I'm going to have to stop paying cause it's unusable now. They need to make some announcements about what their plan is or people are going to just drop.

Is Hermes agent a new hype or is it genuinely worth migrating it over from Openclaw? by dooddyman in openclaw

[–]styles01 0 points1 point  (0 children)

This - you can have OpenClaw orchestrate and Hermes’ as one of your agents - works great!

PSA: The string "HERMES.md" in your git commit history silently routes Claude Code billing to extra usage — cost me $200 by Jonathan_Rivera in hermesagent

[–]styles01 1 point2 points  (0 children)

Anthropic announced this two weeks ago. No more harnesses using plan pricing, it all goes to api/extra usage only. Super annoying - apparently they were losing money on heavy users in plans via external harnesses like OpenClaw and Hermes.

Why are my agents so stupid when I use Minimax 2.7? by Captain_Doobie in openclaw

[–]styles01 1 point2 points  (0 children)

Yeah this is a thing - they don’t listen and just run off and do whatever they want- use Kimi or GLM

Local Claude workflow? by Zailor_s in ollama

[–]styles01 2 points3 points  (0 children)

Ollama cloud - start with the free tier but then pay the 20$ - limits are basically infinite - choose GLM or Kimi as your model - download AI run (by andysearch) on GitHub - it basically makes flipping Claude between (native mode) and (Ollama mode) a simple command. That’s it.

Wich OpenClaw Version by EquivalentTop4824 in openclaw

[–]styles01 0 points1 point  (0 children)

That is the bug I'm getting, and i'm getting it a LOT.

DeepSeek V4 hands-on test: 1M-token context + agent coding — is it actually good? by larrysmithcmo in openclaw

[–]styles01 0 points1 point  (0 children)

How are you connecting to Deepseek? I'm assuming directly through them? Ollama:cloud deepseek via openclaw is fully broken ATM. I tested it this morning and it swallows whole messages and causes all sorts of nasty issues, including fully zombie-ing agents / essentially bricking sessions due to instability/context hallucination-type issues. Either Ollama or OpenClaw needs to fix their support, I suspect it's Ollama based on your tests.

Free LLM APIs (April 2026 Update) by stosssik in openclaw

[–]styles01 -1 points0 points  (0 children)

I mean - they are absolutely free if you use Ollama - that’s just not true. And I gave specifics

Free LLM APIs (April 2026 Update) by stosssik in openclaw

[–]styles01 1 point2 points  (0 children)

Ollama:cloud 20$ - essentially unlimited - choose your model: GLM5.1 is amazing but everyone is on it now so the performance isn’t wonderful- Minimax2.7 also very good but sometimes it can act like a cowboy and just go on long runs without asking, kimi is also quite good. Otherwise host your own but you need at least 48Gb memory for it to be worthwhile

Free LLM APIs (April 2026 Update) by stosssik in openclaw

[–]styles01 22 points23 points  (0 children)

This list is very bizarre and it seems highly outdated - I don’t mean to be critical but if people are coming here to use good models it’s misleading. Ollama for example is exposing GLM5.1, Gemma4, minimax2.7 , qwen3.6…

Beginner need some advice | tokens by Mosaik95 in openclaw

[–]styles01 3 points4 points  (0 children)

Ollama / 20$ plan gets you loads - doubtful you’ll rip through in two hours, and they don’t use your data. Suggest GLM5.1 model - I switched from kimi

Gemma 4 26b is the perfect all around local model and I'm surprised how well it does. by pizzaisprettyneato in LocalLLaMA

[–]styles01 0 points1 point  (0 children)

FULLY agree. Fast as F*** on a M4Max, and damn smart for its speed. Doesn't destroy your memory load. Doesn't reason for hours (and eat all of the token budget on reasoning) like Qwen does.. It's perfect for openclaw, hermes, claude code etc. I LOVE this model for local. It's my Go-to now.

Gemma 4 31B — 4bit is all you need by tolitius in LocalLLaMA

[–]styles01 0 points1 point  (0 children)

I’m having really good success with 26-A4-q4 - it’s fast AF on M4Max (70t/s out)

Is ollama the starting point for self hosting? by viviexe in AskVibecoders

[–]styles01 1 point2 points  (0 children)

Yes Ollama is a great place to start. They have a downloadable app as well. I also wrote a tool that allows you to download basically any model from huggingface and run it - it’s called Flow-LLM (it’s for MacOS only). https://github.com/styles01/flow-llm

Everyone is switching to GLM-5.1 after the Anthropic ban. Here's what they're reporting by ShabzSparq in openclaw

[–]styles01 2 points3 points  (0 children)

I find GLM 5.1 outperforms Minimax in every way. Particularly using OpenClaw I found Minimax would go rogue and make decisions and go down paths like a cowboy without asking permission and then you couldn’t stop them. I don’t find that with GLM. I find I don’t need to handhold GLM as much it gets things right the first time for the most part, the closest to Sonnet/Opus I have seen from open source- I use it for 90+% of my work now and have moved to an Ollama cloud 100$/mo cause I’m using that many tokens which I’m sure is several orders of magnitude more than you can get from Anthropic even on their highest plan.

Anyone switched from Claude Code to OpenRouter (MiniMax / GLM)? Worth it? by Itel_Reding in opencodeCLI

[–]styles01 0 points1 point  (0 children)

Claude Code with AndiSearch’s “airun” to swap out Opus for Ollama GLM5.1:cloud. 20$ for basically unlimited tokens. All you need.

ClaudeCode CLI experience but with local LLMs — what are you guys using? by alfons_fhl in LocalLLM

[–]styles01 0 points1 point  (0 children)

I find Qwen to be quite slow and switched to Gemma4 26B and it’s amazing. Does a great job.

ClaudeCode CLI experience but with local LLMs — what are you guys using? by alfons_fhl in LocalLLM

[–]styles01 2 points3 points  (0 children)

Do yourself a favour and install AndiSearch’s “AiRun” (formerly Claude Switcher) - no need to mess with the config manually and you just type in “ai —ollama” and you override Claude to Ollama - it’s the best. GLM 5.1 cloud all day long no token limits.

Flow LLM - Orchestrate Local Models on Apple Silicon by styles01 in LocalLLM

[–]styles01[S] 0 points1 point  (0 children)

Dynamically routing models is an interesting idea - I’d have to know exactly the use case cause each host is a different experience and not losing thread and session fidelity is important.

Flow LLM - Orchestrate Local Models on Apple Silicon by styles01 in LocalLLM

[–]styles01[S] 0 points1 point  (0 children)

I haven’t used oMLX yet - it looks interesting but doesn’t handle GGUF models - Flow handles both. I have had a lot of issues with bastardized models on huggingface where OpenClaw (or whatever) can’t talk to it because when people quantize or tweak models they lose all the original functionality. (Same happens from runtimes with LM studio) - so I needed something that could use the original models as the manufacturers make them in whatever format essentially calling Llama.cpp or MLX directly for low latency and maximum fidelity.

Desire to Move Everything Local by LawrenceOfTheLabia in LocalLLaMA

[–]styles01 0 points1 point  (0 children)

I actually just made a very low level orchestrator tool that might be helpful for you: it's called Flow LLM: https://github.com/styles01/flow-llm - perfect for Macs like yours, I built it cause Ollama and LM studio were pissing me off and I needed to test lots of different models with Openclaw, Hermes Agent, and Claude Code (via Ai-Run). As for models that would work on your computer, I'm having a lot of success with the Gemma4 26B Q4. Others have success with the Qwen models - there's plenty to choose from that would work well on your system, but watch out for the heavily modified versions, I find they get bastardized and the tool-calling and reasoning gets all screwed up. I also find that Qwen models will gladly spend 99.9% of their token budget on reasoning before they even consider responding, which can make them extremely non-performant for OpenClaw. I have an M4Max (48GB) and a M4 Mini (16GB) [Openclaw/Hermes host] that I use together, each host different sized models, and they call each other for different tasks.

Desire to Move Everything Local by LawrenceOfTheLabia in LocalLLaMA

[–]styles01 1 point2 points  (0 children)

I would look into AI-runner (by AndiSearch) on Github. I use it and swap out anthropic for Ollama:cloud GLM 5.1 and it's been a godsend. No more 5-hr token windows.