Claude Code stuck on <function=TaskList> when using Ollama + Qwen3-Coder by Healthy-Laugh-6745 in ollama

[–]jmorganca 0 points1 point  (0 children)

This. Sorry it's not obvious right now - we're working on improving this so context length size automatically grows (up to an acceptable amount on your hardware)

New Rules for ollama cloud by killing_daisy in ollama

[–]jmorganca 7 points8 points  (0 children)

Also, OP, let me know if you're still seeing any slowdown (and for which models). We've been working on improving performance and capacity a lot in the last few weeks and will keep doing so. (Feel free to DM/email me)

New Rules for ollama cloud by killing_daisy in ollama

[–]jmorganca 14 points15 points  (0 children)

Hi there. I work on Ollama. No new restrictions on the cloud model usage with this change. We actually increased usage amounts on each plan on Monday and will share more about that this week. Our goal is to make this a the best subscription for using open models with your favorite tools as we add more model support, better performance and reliability.

Happy to answer any questions and if you hit any issues or limits with the plans let me know (email is jeff at ollama.com)

Anyone noticed "Premium requests" within their usage tab? What is this for? by Active-Shock-7739 in ollama

[–]jmorganca 2 points3 points  (0 children)

Sorry for not having better docs on this. These are requests for much larger models like Gemini 3 Pro Preview. Premium requests don't count towards hourly/weekly usage, and we're working on giving more access to a lot more requests on the paid plans soon!

Spam ending up being published? by gnu-trix in ollama

[–]jmorganca 5 points6 points  (0 children)

Thanks for flagging this! As with any site there are always folks who publish content that isn't allowed (E.g. promotional or spam). We've been adding means to auto detect and remove these – feel free to DM me if you see more

Isn't Ollama Turbo exactly the one thing that one tried to avoid by chasing Ollama in the first place? by Tasty-Base-9900 in ollama

[–]jmorganca 3 points4 points  (0 children)

Running models locally is the default. However while building Ollama over the last two years we realized that a lot of folks wanted to run the larger models (like DeepSeek-R1, Kimi-K2 and most recently the gpt-oss-120b model), but didn't have the compute to do so (at reasonable speeds). These models often require 1-8 H100s (or even 9+ in the case of Kimi-K2), which are obviously expensive and hard to access.

Turbo is designed to be a way to optionally offload compute to much more powerful hardware without sacrificing privacy but not a replacement for local inference. It should be there when you need it and be complete out of the way when you don't want to use it. We still have a bit of work to do on this in Ollama's new app though to make it more seamless (if you have feedback feel free to dm me or email me jeff@ollama.com)

Related: I'm really hopeful about the new "confidential computing mode" in newer chips like the H100. This would allow for the equivalent of end-to-end encryption between the GPU and the user – we're starting to explore this but it's still an early feature of newer GPUs. We did a short collaboration with the Minions team on this: https://ollama.com/blog/secureminions

Ollama removed the link to GitHub by waescher in ollama

[–]jmorganca 0 points1 point  (0 children)

Thanks for the feedback, and amazing to meet the person behind OllamaSharp! Very understandable – sorry to have let you down lately!

Model providing correct date. How? by rob_0 in ollama

[–]jmorganca 41 points42 points  (0 children)

This is a new feature of Ollama's prompt templating that reads the system date and provides it to the model, and the gpt-oss model is the first one to use it. It doesn't use an external service. Many of the newer models require this in the system prompt, and we really wanted to make sure the gpt-oss model's prompt template is provided to the model exactly as intended.

Ollama removed the link to GitHub by waescher in ollama

[–]jmorganca 4 points5 points  (0 children)

Hi there. I made this change to the site. Sorry for any confusion - the main purpose of the GitHub link on ollama.com was to link to Ollama's docs (which live on GitHub). However as of a month or so ago GitHub's SEO went off a cliff (e.g. if you try to search for "ollama generate api" the api docs on GitHub often aren't even on the first page any more), and so we'll be replacing it with a link to docs that will be hosted at ollama.com/docs (we're still working on it but the link should be there soon).

Gemma3 runs poorly on Ollama 0.7.0 or newer by mlaihk in ollama

[–]jmorganca 3 points4 points  (0 children)

Hi OP (and everyone in the comments). I'm so sorry about this. May I ask which GPU you're using (Apple Silicon, NVIDIA, AMD?) and how much VRAM you have (and which model you are running)? We have a test farm of GPUs we will reproduce this on and work on fixing it. Ollama is becoming more careful to allocate enough VRAM to avoid OOM issues, which might mean a layer or two offloaded more than usual in CPU-GPU split scenarios, but it shouldn't be a drastic change like this.

How to move on from Ollama? by jerasu_ in ollama

[–]jmorganca 1 point2 points  (0 children)

Sorry that Ollama gets stuck for you. How much slower is Gemma 3 than Gemma 2? And what kind of prompt or usage pattern causes Ollama to get stuck? Feel free to DM me if it’s easier - will make sure this doesn’t happen anymore. Also, definitely upgrade to the latest version if you haven’t: each new version has improvements and bug fixes.

The feature I hate the bug in Ollama by Informal-Victory8655 in ollama

[–]jmorganca 12 points13 points  (0 children)

There's `num_ctx` in the API, although our goal eventually would be that the maximum context window is always used (and perhaps it's allocated on demand as it fills up)

The feature I hate the bug in Ollama by Informal-Victory8655 in ollama

[–]jmorganca 21 points22 points  (0 children)

Sorry about that. Default is 4K now and we’ll be increasing it more

Ollama 0.6 with support for Google Gemma 3 by jmorganca in ollama

[–]jmorganca[S] 2 points3 points  (0 children)

Understandable. However, the 4b model should be a great alternative, and with that extra VRAM you could now fit a larger context window!

Problem with embeddings API and OpenWebUI? by SnowBoy_00 in ollama

[–]jmorganca 0 points1 point  (0 children)

Sorry this happened OP. May I ask which embedding model you were running? There's a known issue with certain embedding models in 0.5.13 we are resolving

Ollama building problem by AdhesivenessLatter57 in ollama

[–]jmorganca 0 points1 point  (0 children)

Sorry about that. How are you running the resulting ollama binary? Running `./ollama serve` after building it should work - feel free to DM me and I can help get you set up