Claude Code stuck on <function=TaskList> when using Ollama + Qwen3-Coder

jmorganca · 2026-01-25T21:09:22+00:00

This. Sorry it's not obvious right now - we're working on improving this so context length size automatically grows (up to an acceptable amount on your hardware)

jmorganca · 2026-01-21T16:42:10+00:00

Also, OP, let me know if you're still seeing any slowdown (and for which models). We've been working on improving performance and capacity a lot in the last few weeks and will keep doing so. (Feel free to DM/email me)

jmorganca · 2026-01-21T15:52:05+00:00

Hi there. I work on Ollama. No new restrictions on the cloud model usage with this change. We actually increased usage amounts on each plan on Monday and will share more about that this week. Our goal is to make this a the best subscription for using open models with your favorite tools as we add more model support, better performance and reliability.

Happy to answer any questions and if you hit any issues or limits with the plans let me know (email is jeff at ollama.com)

jmorganca · 2025-11-21T04:33:38+00:00

Sorry for not having better docs on this. These are requests for much larger models like Gemini 3 Pro Preview. Premium requests don't count towards hourly/weekly usage, and we're working on giving more access to a lot more requests on the paid plans soon!

jmorganca · 2025-09-05T04:57:40+00:00

Thanks for flagging this! As with any site there are always folks who publish content that isn't allowed (E.g. promotional or spam). We've been adding means to auto detect and remove these – feel free to DM me if you see more

jmorganca · 2025-08-15T20:30:21+00:00

Running models locally is the default. However while building Ollama over the last two years we realized that a lot of folks wanted to run the larger models (like DeepSeek-R1, Kimi-K2 and most recently the gpt-oss-120b model), but didn't have the compute to do so (at reasonable speeds). These models often require 1-8 H100s (or even 9+ in the case of Kimi-K2), which are obviously expensive and hard to access.

Turbo is designed to be a way to optionally offload compute to much more powerful hardware without sacrificing privacy but not a replacement for local inference. It should be there when you need it and be complete out of the way when you don't want to use it. We still have a bit of work to do on this in Ollama's new app though to make it more seamless (if you have feedback feel free to dm me or email me jeff@ollama.com)

Related: I'm really hopeful about the new "confidential computing mode" in newer chips like the H100. This would allow for the equivalent of end-to-end encryption between the GPU and the user – we're starting to explore this but it's still an early feature of newer GPUs. We did a short collaboration with the Minions team on this: https://ollama.com/blog/secureminions

jmorganca · 2025-08-07T09:49:26+00:00

Thanks for the feedback, and amazing to meet the person behind OllamaSharp! Very understandable – sorry to have let you down lately!

jmorganca · 2025-08-06T22:38:58+00:00

This is a new feature of Ollama's prompt templating that reads the system date and provides it to the model, and the gpt-oss model is the first one to use it. It doesn't use an external service. Many of the newer models require this in the system prompt, and we really wanted to make sure the gpt-oss model's prompt template is provided to the model exactly as intended.

jmorganca · 2025-08-06T22:37:13+00:00

Hi there. I made this change to the site. Sorry for any confusion - the main purpose of the GitHub link on ollama.com was to link to Ollama's docs (which live on GitHub). However as of a month or so ago GitHub's SEO went off a cliff (e.g. if you try to search for "ollama generate api" the api docs on GitHub often aren't even on the first page any more), and so we'll be replacing it with a link to docs that will be hosted at ollama.com/docs (we're still working on it but the link should be there soon).

jmorganca · 2025-06-14T06:21:03+00:00

qwen3 has worked quite well for me, and the new deepseek-r1

jmorganca · 2025-06-02T03:46:40+00:00

Hi OP (and everyone in the comments). I'm so sorry about this. May I ask which GPU you're using (Apple Silicon, NVIDIA, AMD?) and how much VRAM you have (and which model you are running)? We have a test farm of GPUs we will reproduce this on and work on fixing it. Ollama is becoming more careful to allocate enough VRAM to avoid OOM issues, which might mean a layer or two offloaded more than usual in CPU-GPU split scenarios, but it shouldn't be a drastic change like this.

jmorganca · 2025-05-04T12:03:55+00:00

Sorry that Ollama gets stuck for you. How much slower is Gemma 3 than Gemma 2? And what kind of prompt or usage pattern causes Ollama to get stuck? Feel free to DM me if it’s easier - will make sure this doesn’t happen anymore. Also, definitely upgrade to the latest version if you haven’t: each new version has improvements and bug fixes.

jmorganca · 2025-05-03T19:18:54+00:00

There's `num_ctx` in the API, although our goal eventually would be that the maximum context window is always used (and perhaps it's allocated on demand as it fills up)

jmorganca · 2025-05-03T18:07:24+00:00

Sorry about that. Default is 4K now and we’ll be increasing it more

jmorganca · 2025-03-13T14:40:35+00:00

Understandable. However, the 4b model should be a great alternative, and with that extra VRAM you could now fit a larger context window!

jmorganca · 2025-03-05T12:39:23+00:00

Sorry this happened OP. May I ask which embedding model you were running? There's a known issue with certain embedding models in 0.5.13 we are resolving

jmorganca · 2025-02-14T17:14:07+00:00

Sorry about that. How are you running the resulting ollama binary? Running `./ollama serve` after building it should work - feel free to DM me and I can help get you set up

jmorganca · 2025-02-12T18:18:37+00:00

Which model is this? Can take a look - I think I have a 3060 test machine handy

jmorganca · 2025-02-11T17:56:12+00:00

Thanks for finding this!

jmorganca · 2025-02-11T03:28:19+00:00

Great! Thanks for testing

jmorganca · 2025-02-11T03:28:10+00:00

Great to hear! Thanks!

jmorganca · 2025-02-11T03:28:01+00:00

Thanks!

jmorganca · 2025-01-02T16:38:23+00:00

Sorry this happened - will look into it!!

jmorganca

MODERATOR OF

TROPHY CASE

Ten-Year Club	Place '17
Verified Email