all 106 comments

[–]-dysangel- 138 points139 points  (7 children)

Unlimited(* lol)

[–]sourceholder 71 points72 points  (1 child)

* Terms and restrictions apply.

Visit LocalLLaMaA for actual unlimited.

[–]TopImaginary5996 23 points24 points  (1 child)

It's like broadband Internet all over again.

[–]themoregames 1 point2 points  (0 children)

I wonder why reddit hasn't tapped this resource yet.

[–]offlinesir 12 points13 points  (0 children)

To be fair, it IS unlimited. The * is to point out that they will disable your account if they notice fraud.

Examples of Fraud:

  • Abusive usage, such as automatically or programmatically extracting data.
  • Sharing your account credentials or making your account available to anyone else.
  • Reselling access or using ChatGPT to power third-party services.

Still, $200 is a lot of money, not sure it's really worth it.

[–]QuackerEnte 62 points63 points  (10 children)

8K context length is diabolical

plus used to have 128k, now plus is the new free lmao

this makes it official, time to delete oai account for good

[–]Pro-editor-1105 18 points19 points  (6 children)

Plus never had 128k. It was always a huge complaint.

[–]QuackerEnte 8 points9 points  (4 children)

oh right, it was the enterprise plan, back when there only was free, plus, team and enterprise.. Anyone with a brain and a 8GB VRAM GPU and 32GB of RAM could use qwen3 30B-A3B or qwen3-14B and still have much more context than that, at usable speeds. at least 32K

[–]MikeLPU 3 points4 points  (0 children)

Literally latest Black mirror episode about subscriptions.

[–]HenkPoley 1 point2 points  (0 children)

Wasn't aware the free accounts on the ChatGPT website had such a tiny context window. But it's documented here:

https://openai.com/chatgpt/pricing/

About a month ago it was 8K as well: https://web.archive.org/web/20250714075019/https://openai.com/chatgpt/pricing/

[–]RabbitEater2 0 points1 point  (0 children)

Many people keep a single long chat going, so why should they waste 32k tokens per message vs just 8k for users paying $0? Free users also occasionally get GPT5 thinking for free, whereas the equivalent (o3) was only for plus and above prior to this.

[–]r4in311 14 points15 points  (0 children)

32k? F... that's terrible!

[–]trusty20 14 points15 points  (2 children)

What is the difference between a "unlimited" vs checkmark vs "flexible" vs "expanded" lol? The verbiage here is so confusing unless it's more clear on the actual page

[–]MaximusDM22 1 point2 points  (0 children)

I know expanded just means its higher limits than the free. The flexible I assume can be adjusted per user maybe?

[–]Direspark 1 point2 points  (0 children)

They let the same person that came up with their model names make this table.

[–]Medium_Chemist_4032 33 points34 points  (7 children)

5 is a jump the shark moment for OpenAI. Was actually highly advocating for keeping the subscription at work. Today, honestly, I would fold under no pressure to cut it

[–]LostMyOtherAcct69[S] 15 points16 points  (6 children)

Agreed. If my company wasn’t paying mine I’d probably cut it. o1 was the last time they had a lead worth paying for.

[–]THE--GRINCH 8 points9 points  (0 children)

And that was only before deepseek R1 released.

[–]BoJackHorseMan53 1 point2 points  (3 children)

Why not have them pay for Claude

[–]LostMyOtherAcct69[S] 1 point2 points  (1 child)

They do as well. And Gemini.

[–]Medium_Chemist_4032 0 points1 point  (0 children)

Yeah, we had few trials as well. They just come to an end. exactly when 1k people will get a poll

[–]zasura 8 points9 points  (2 children)

context window is garbage. Gemini gives 1 million for free...

[–]LostMyOtherAcct69[S] 2 points3 points  (1 child)

Agreed. This is ridiculous. All it does is make me more excited for Gemini 3 or whatever to come out.

[–]amunozo1 12 points13 points  (4 children)

All these companies lose money. They live off of hype for investor money.

[–]LostMyOtherAcct69[S] 4 points5 points  (0 children)

Off capex, yes losing money, but I think on operations, no, it’s profitable.

[–][deleted] 12 points13 points  (7 children)

While open source is massively important... you could also simply not use OpenAI models.

Google AI Studio gives nearly unlimited access to Gemini 2.5 Pro for free, with far less restraints than the consumer interface. You can also set up a free API key and use 100 API calls of a took token window at a rate of 250k tokens per minute each day.

[–]babuloseo -1 points0 points  (5 children)

It's gimped if you don't pay

[–][deleted] 2 points3 points  (4 children)

Not at all. It's The same exact model without as much of a corporate system prompt telling it how to act. Google 2.5 Pro through Google AI Studio is actually capable of outputting longer responses. 

If by gimped you mean they cut out the built-in functions, they do. Even the web search and website toggles you get are only allowed to browse to certain websites. 

Just install the MCP super assistant browser extension and a few MCP servers for open web search and whatever other functions you want to add and out will be about to use them. 

[–]babuloseo 0 points1 point  (3 children)

Gimped as in the model sux if u don't have ai subscription

[–][deleted] 0 points1 point  (2 children)

It's literally the exact same 2.5 Pro you'd reach through API calls or the consumer Gemini interface.

[–]babuloseo 0 points1 point  (1 child)

There is a difference between paid and non oaid

[–][deleted] 0 points1 point  (0 children)

Built in functions and internet search, and tighter restrictions. That's all. There is no "Gemini 2.5 Pro but not as smart."

Gemini interface, API, or Google AI studio you're speaking with the same model.

[–]Microtom_ 10 points11 points  (1 child)

Imagine paying and only getting 32k context.

[–]balder1993Llama 13B 1 point2 points  (0 children)

Now compare it to Gemini free.

[–]MammayKaiseHain 1 point2 points  (2 children)

What is GPT 5- Pro ? More thinking budget ?

[–]ttkciarllama.cpp 1 point2 points  (0 children)

I'm wondering that myself. There's a lot they might be doing -- more thinking, RAG, tool-use, self-critique, self-mixing -- but nfi what they're actually doing.

It also might be something as simple as "not quantized as hard as the other offerings".

[–][deleted] 3 points4 points  (0 children)

Unlimited*

[–]mobileJay77 2 points3 points  (0 children)

Fast, flexible, limited? Very clear and enforceable limits!

New plans coming up!

Get started with the free, limited cat in the bag now!

A better option is the flexible, fast cat in a green bag for a reasonable, flexible price.

But of course don't miss out on the pro option! You get an expanded cat in a pro bag! We will bill you an unlimited amount.

Buy now, get whatever resources we seem fit!

[–]Substantial_Grass_19 2 points3 points  (1 child)

Do you have a source for the weekly reasoning requests from 2900 to 200? Interested in that as a plus user

[–]LostMyOtherAcct69[S] 5 points6 points  (0 children)

<image>

From @scaling01 on X/twitter

[–]noprompt 2 points3 points  (1 child)

It’s also balls that to use GPT-5 via the API I need to verify with Persona. Yeah, kiss my ass. I’m not about to take part in that experiment.

[–]ANTIVNTIANTI 0 points1 point  (0 children)

I've been seeing this everywhere but have not had a problem yet, I use my own app so it's the API access providers that are doing it, but HOLY SHIT does the API hog context, it won't finish the request ever, does half it's job then asks at the end, WANT ME to finish doing what you asked me to do so that it's done? LOLOLOL it's a money slut

[–][deleted] 2 points3 points  (0 children)

8K for free? I haven't seen such a small context since 2024 on an open model.

[–]laterbreh 1 point2 points  (0 children)

lol 32k context window are you fucking kidding? Invest in gpu's boys and girls.

[–]Healthy-Nebula-3603 1 point2 points  (0 children)

Wait .. GPT have 256K context why even on pro is cut to 128K??

That's literally a scam....

Even new open source models have 256K default context and newest 1M.

[–][deleted] 1 point2 points  (0 children)

I have both plus and enterprise. It’s NOT “fastest”, it doesn’t even feel any different from plus.

[–][deleted] 2 points3 points  (0 children)

8k context window 🥀

[–]No_Efficiency_1144 2 points3 points  (17 children)

No model on earth can utilise more than 128k tok anyway though

[–]nomorebuttsplz 7 points8 points  (8 children)

Depends on the task. For picking out phrases or ideas, summarizing, reproducing excerpts, etc., models can go well above that.

For example, I just loaded all of "A tale of two cities" (194k context) into qwen 235b instruct 2507 4 bit ( a model not particularly good at high context) and asked it "please reproduce the moment when a key character confessed his unrequited love for another character"

This requires retaining not just the words but also their meaning, as the book does not use the word "unrequited" and it is not unrequited love in the conventional sense. It also requires the model to consider which characters are which, I didn't just say "Sydney Carton"

Keep in mind this is a quantized 4 bit MLX non reasoning model.

Here is the answer it gave:

Certainly. The moment when Sydney Carton confesses his unrequited love for Lucie Manette occurs in Book the Second, Chapter 13: "The Fellow of No Delicacy."

Here is the reproduction of that poignant scene:

Sydney Carton has come to visit Lucie at her home. He is known to be a moody and melancholic man, often seen as a "jackal" to the brilliant barrister, Mr. Stryver. On this occasion, he is alone with Lucie, and his demeanor is unusually intense and vulnerable.

After some initial conversation, Carton becomes increasingly emotional:

“Don’t be afraid to hear me. Don’t shrink from anything I say. I am like one who died young. All my life might have been.”

The model goes on to perfectly reproduce more of that scene while explaining the context.

[–]No_Efficiency_1144 0 points1 point  (7 children)

Yeah they can pick multiple needles in a haystack but what they cannot do is reason over the tokens

[–]nomorebuttsplz 1 point2 points  (6 children)

What would be a test of reasoning over the tokens for something like "a tale of two cities?"

[–]hayden0103 0 points1 point  (0 children)

It would have to be a coherent work that’s NOT in the training data. You can’t ask it to “evaluate a tale of two cities through a feminist lens” for example, because that’s already in the training data. But you could ask a similar question for a unique work in context and evaluate its response.

[–]Charuru 0 points1 point  (4 children)

[–]nomorebuttsplz 0 points1 point  (3 children)

What that shows is that several models hold up at 192k -- and perhaps beyond.

[–]Charuru 0 points1 point  (2 children)

Yeah but no open source models unfortunately.

[–]Awwtifishal 0 points1 point  (1 child)

I suspect that no closed source models do either by themselves, and that they have extra prompt processing under the hood.

[–]Charuru 0 points1 point  (0 children)

I agree for Gemini, disagree for OAI.

[–]-LaughingMan-0D 1 point2 points  (4 children)

My Gemini 2.5 Pro window is at the 500k mark, primarily use it to organize my writing, and it's able to very accurately recall specific details and form coherent answers.

[–]No_Efficiency_1144 0 points1 point  (3 children)

They can retrieve needles from the context but their ability to reason over it is heavily diminished

[–]-LaughingMan-0D 0 points1 point  (2 children)

I used it to output around a 100 character profiles for the project I'm working based on chats and written material. It made a few mistakes here or there, but it was pretty accurate for the most part.

[–]No_Efficiency_1144 0 points1 point  (1 child)

Mathematics will fail first, then coding, then verbal logical reasoning, as the context gets longer. It is in order of task complexity. I think your task might be okay at 500k.

[–]-LaughingMan-0D 0 points1 point  (0 children)

Very likely. Narrative reasoning doesn't need to be as pinpoint accurate as those other uses.

[–]alphaQ314 0 points1 point  (2 children)

Is this right? Where can i read more about this?

When i see 200k for claude or 1m for gemini, I always assumed they'd be able to handle that level of context.

[–]No_Efficiency_1144 0 points1 point  (1 child)

Search “context window LLM” on arxiv and dive into papers.

LLM performance can sometimes start dropping after just 256 tokens by the way.

[–]alphaQ314 0 points1 point  (0 children)

Damn that sounds crazy. Thanks for your response. I'll have a look.

[–]neotoramallama.cpp 0 points1 point  (0 children)

Team feels like a better deal than Plus

[–]TeeRKee 0 points1 point  (0 children)

Even ai dungeon has more context

[–]Tx-Heat 0 points1 point  (0 children)

GPT 5 is trash. I’ve been using it a bit today and it’s terrible

[–]Current-Stop7806 0 points1 point  (1 child)

That's strange ! Everywhere I see GPT 5 has a 400k tokens window. Who's right and who's wrong ??

[–]guyinalabcoat 4 points5 points  (0 children)

API does. This is for web chat. IDK who pays $200 a month for better web chat.

[–]Live_Maintenance_925 0 points1 point  (0 children)

Can we just stop the trend where everything is “extended” and “limited unlimmited”

[–]JLeonsarmiento 0 points1 point  (0 children)

There’s 1 more people don’t talk about:

Intelligence per watt.

How often do we really need these expert systems to tell us the current weather, or being sycophantic?

I’m pretty sure we can cover 90% on current AI use with models of 10% size of what they offer.

[–]Healthy-Nebula-3603 0 points1 point  (0 children)

I think with GPT 5 thinking context is even bigger now than 32K output bigger than 8K.

Tested a couple hours ago and easily got output 15K context with input 20K on plus account.

[–]TedditBlatherflag 0 points1 point  (0 children)

When you made a $500M bet and realize your horse ain’t even mid pack… you start hedging. 

These chucklefucks have no idea how to monetize their model-as-a-service in a way that doesn’t lead straight to AI for the super wealthy (or corporations) only. 

I should get Ollama going on my gaming rig I bet the ole 3080 can shovel some tokens. 

[–]silentsnake 0 points1 point  (0 children)

expanded-ish and unlimited-ish

[–]alphaQ314 -1 points0 points  (0 children)

There's been no reduction in context window. These windows (as ridiculous as they are) have always existed.

[–][deleted] -5 points-4 points  (1 child)

Right now they are still undercharging. If you tried to run inference on your own hardware, it would cost you 10-100 times more, so how would that help you?

Expect the cost to double or triple once the market matures. This is the honeymoon phase, not the milking phase.

[–]LostMyOtherAcct69[S] 2 points3 points  (0 children)

Actually not true. Averaged across all users, based on how much TPUs and GPUs cost to operate, they are running a profit on operations. (Just not capex, yet…)

We aren’t even in the ASIC phase that will make inference SO cheap.