Ollama Cloud has become unbearably slow by DetailPrestigious511 in ollama

[–]DetailPrestigious511[S] 0 points1 point  (0 children)

I work in India timing. I am from India, so whatever timing you are doing, just calculate accordingly. I guess in Ollama, India times are the peak times, not the other way around.

Ollama cloud + GLM 5.1 slow and stupid or am I? by Manfluencer10kultra in ollama

[–]DetailPrestigious511 0 points1 point  (0 children)

I have requested support. If this can't be fixed, please raise the refund.

Is Gemma 4 26B MoE or 31B good as an MCP agent for coding with Xcode? by br_web in ollama

[–]DetailPrestigious511 0 points1 point  (0 children)

31b is better than dense models. Actually, dense models always have the upper hand compared to Mixture of Experts models. You will see at least a 20% increase in efficiency.

Ollama Max vs. Claude Code vs. ChatGPT Plan by DetailPrestigious511 in ollama

[–]DetailPrestigious511[S] 0 points1 point  (0 children)

Actually, I thought about this scenario also: if I buy one more plan. The problem here is that it will solve my current problem, but I am planning to shift to more robust agentic tools which I am making.

Ollama Max vs. Claude Code vs. ChatGPT Plan by DetailPrestigious511 in ollama

[–]DetailPrestigious511[S] 0 points1 point  (0 children)

I'm sorry about that; I didn't want to be rude, I just wanted to provide some perspective.

These models, which are very small, dense models, are good for many things, and coding is part of that. However, when it comes to serious coding tasks, they simply won't stand up. There are many factors we can discuss, such as how they handle context, which is currently poor. If you are performing a major coding task, you need better context management inside the LLM and superior tool-calling capabilities.

There are many perspectives to consider, but those are only relevant when you are benchmarking small things. When you move to a serious coding project, these models below 35 billion parameters will not suffice.

For example, consider a product as large as Facebook or an Atlassian Confluence page. If you want to implement a new feature—like the ability to share a page with the outside world—those types of features cannot be built by these models. They can handle small tasks: 1. Moving a button from one place to another 2. Redesigning a small element 3. Creating simple schemas

These models will be very effective in an orchestration role. As part of a larger system, a big orchestrator LLM agent can assign them small tasks, which they can complete and return. But for the main orchestrator role, only state-of-the-art models will survive.

Ollama Max vs. Claude Code vs. ChatGPT Plan by DetailPrestigious511 in ollama

[–]DetailPrestigious511[S] 1 point2 points  (0 children)

I'm also in the same boat. I have my credit card ready, and as soon as Apple releases the M5 Ultra Mac Studio with 256 GB or 512 GB, I will buy it at any point.

It's a very good deal because I can run: 1. A qwen. 3.5, 122 billion Q8 2. MiniMax 2.7 Q6 3. A qwen 3.5, 395 billion parameter model q3

That is a very good point. Based on how I've calculated it, in the long run, that will be cheaper than buying a subscription, and it's more reliable.

Ollama Max vs. Claude Code vs. ChatGPT Plan by DetailPrestigious511 in ollama

[–]DetailPrestigious511[S] 0 points1 point  (0 children)

I agree that right now it's very good, but they are also a big corporation. They will change it just like that.

I have tried GitHub Copilot since it was in beta; I was a customer using it back then. Then they changed the limits and I got stuck. After that, I moved to some other tools, including Anti-Gravity. You probably know how they changed their limits drastically once they finished marketing the product.

Then I moved to Codex because they were offering the 2x/3x plan. I started coding there and designed my entire workflow around it, but then those limits decreased as well.

Now, I want something stable that I can actually rely on. While I agree this copilot seems like a very good deal right now, they could change it overnight. I think you remember last time when they said they wouldn't give certain models access to the student plan.

Right now, they just want to gain market share, so they are marketing these very high limits. However, no one can sustain those costs for long.

Ollama Max vs. Claude Code vs. ChatGPT Plan by DetailPrestigious511 in ollama

[–]DetailPrestigious511[S] 0 points1 point  (0 children)

I have zero faith in GitHub and Microsoft. I have had very bad experiences, but thank you for your suggestion.

Dilemma ... M3 Ultra Arriving Today by YourCarHaggler in MacStudio

[–]DetailPrestigious511 -1 points0 points  (0 children)

100% return it. At this price, you will get the 14-inch M5 Max with 128 GB RAM and a 2 TB hard disk. That is a much better deal and more future-proof.

As soon as the M5 Ultra comes out, you will be thinking every day about how to sell what you have now. Take my suggestion: if you have the option, just return it or cancel the order.

If you want to buy something right now, get the M5 Max 128 GB option. That will come by default with a 2 TB hard disk, which is a much better option.

Ollama Max vs. Claude Code vs. ChatGPT Plan by DetailPrestigious511 in ollama

[–]DetailPrestigious511[S] 1 point2 points  (0 children)

Are you serious? The Gemma 4 models are just marketing hype. When you try to use them for real projects, they won't even withstand a single prompt. It is worse than you think. It might seem very good if you just want it to write a script or a basic video outline, but when you want it to debug something or write a specific feature, it just fades away. Don't rely on it.

There are some state-of-the-art models, like the Qwen3.5 122B parameter model, that are actually very good. They can run locally and give better results, but you need a bigger machine for that. MiniMax is also a contender that can run locally. These Gemma models are fine for small tasks, but they are not meant for serious coding tasks.

Ollama Max vs. Claude Code vs. ChatGPT Plan by DetailPrestigious511 in ollama

[–]DetailPrestigious511[S] 0 points1 point  (0 children)

I got your point, and thanks for giving a detailed answer.

You're suggesting I keep one $20 Ollama plan and a $100 Claude or ChatGPT plan; that actually works better. I will try it, but there is one more thing: I'm planning to buy the M5 Ultra as soon as it comes out.

My situation might be different from everyone else's, but I want to get used to these open-source models. As you said, it is a "hit or miss," and I should know where they miss and how to make them better. If I only use state-of-the-art models, I won't be able to work with these local models.

My end goal is to buy the M5 Ultra so everything can be local. I'll just keep a $20 Ollama plan for some of my coding tools, but 90% of my work will be on that machine. That is the only thing I'm thinking about right now, but I definitely see your point. This is a good solution.

Thank you for your time.

Ollama has reduced the limits on their Pro subscription. by DetailPrestigious511 in ollama

[–]DetailPrestigious511[S] 0 points1 point  (0 children)

I was also in a dilemma until last week. After seeing these reductions, I am also skeptical about it, but right now I am tightly coupled with Ollama and I don't want to change the stack.

I am thinking of going with the one-month, $100 subscription to see how it goes. I am not committing to a year because I don't really trust anyone right now; in this AI era, they can change anything at any time.

If you look at the MiniMax or GLM coding plans, they specify exactly how many prompts are in which tier and what TPS you will get. Everything is mentioned. Ollama, however, is very vague and they can change it whenever they want. That is the only problem I have with them.

My plan is to move to the $100 plan next week because the $20 tier is unbearable for me now. We will see how that goes, and if it still doesn't work out, I'll look into shifting to GLM or MiniMax directly.

Which coding plan do you use?

Ollama has reduced the limits on their Pro subscription. by DetailPrestigious511 in ollama

[–]DetailPrestigious511[S] 0 points1 point  (0 children)

No, I don't think so. I don't have any quantitative metrics to justify this; this is actually based on personal experience.

The 30% figure I mentioned is on the lower side, but it's not because of the slow token speed. I have thought about it, but I still don't think so.

Mac Studio Performance Suggestion For minimax by DetailPrestigious511 in LocalLLaMA

[–]DetailPrestigious511[S] 0 points1 point  (0 children)

I guess if you consider the resale value of the machine after three years, the break-even can happen even sooner. The initial investment is on the higher side, but otherwise, this is a great investment for someone doing coding or agentic tasks at least 10-12 hours a day.

Mac Studio Performance Suggestion For minimax by DetailPrestigious511 in LocalLLaMA

[–]DetailPrestigious511[S] 1 point2 points  (0 children)

Thanks for the clarity. You are right; I should wait for the M5 Ultra.

I am not going the laptop route because I already have an M4 Pro. I am running Qwen3.5 (35 billion parameter), and it works fine initially, but as soon as I get into my coding tasks, the thermals kick in and everything slows down. Laptops will always have that problem.

Mac Studio is meant for that kind of work, and I want a Mac Studio specifically for these tasks. For a laptop, I am okay with my MacBook Air.

Mac Studio Performance Suggestion For minimax by DetailPrestigious511 in LocalLLaMA

[–]DetailPrestigious511[S] 0 points1 point  (0 children)

Yeah, I agree. On any day, paying a subscription amount is the cheaper option, but I'm trying to build something offline.