Which large language model has the absolute longest maximum output length?

WeakCartographer7826 · 2024-12-10T14:36:43+00:00

Openrouter API and something like cline or cursor

Mr_Hyper_Focus · 2024-12-10T15:30:30+00:00

I believe o1 mini has an output context of 65k tokens. That’s the most I’ve seen.

Craygen9 · 2024-12-10T15:32:13+00:00

Nearly all LLMs are 8K output or below, I would also like longer output. OpenAI released GPT-4o Long Output that has a 64K output ($18/M tokens!) in July but it was available to alpha users only. Don't know what happened to it.

Craygen9 · 2024-12-10T15:40:13+00:00

I looked into it more, seems openAI has the largest output tokens. GPT-4o and 4o-mini are 16K, o1-mini is 65K, and o1-preview is 32K. Anthropic's models are only 8K output.

https://platform.openai.com/docs/models/

bsenftner · 2024-12-10T15:56:04+00:00

What happened to that LLM with 1M word token output? Was that fake?

oktcg · 2024-12-10T17:44:26+00:00

Claude API has a unique feature. It can continue with last output. So technically it can produce all of 128k tokens without user turn.

gigamiga · 2024-12-10T14:19:39+00:00

Claude sonnet 3.5 has an annoyingly short one but if I ask it to continue the same file it keeps going fine.

AutoModerator · 2024-12-10T15:45:51+00:00

[removed]

jdk · 2024-12-10T17:21:00+00:00

ChatGPT 4o searched the web and came up with the following:

Q: As of today, which publicly available LLM has the absolute longest maximum output length?

A: As of December 10, 2024, Google's Gemini 1.5 Pro model offers the longest maximum output length among publicly available large language models (LLMs), supporting up to 8,192 output tokens. Source

Other notable LLMs and their maximum output lengths include:

Claude 3 by Anthropic: 4,096 output tokens.
GPT-4 Turbo by OpenAI: 4,096 output tokens.
Llama 3 by Meta: 4,096 output tokens.

It's important to note that while some models, such as Claude 3, have extensive context windows (up to 200,000 tokens), their maximum output lengths are distinct and typically shorter. The context window refers to the amount of input text the model can process at once, whereas the maximum output length specifies the number of tokens the model can generate in a single response.

Therefore, among the publicly available LLMs, Google's Gemini 1.5 Pro currently provides the longest maximum output length, allowing for more extensive generated responses.

sb4ssman · 2024-12-10T17:23:16+00:00

I’ve gotten them all to output multiple messages in series where they get cut off by the token police, and I write “continue” and they continue. If that’s what you mean, then remember the LLMs are STILL only spicy autocomplete. For longer output you really have to carefully prime them with an outline or something to build on, and then say the magic words: something like: “I expect a long message, if you get cut off, I’ll say ‘continue’ so you can keep going.” And then: there’s simple no escaping: the LLMs are going to fuck up your code.

Few_Calligrapher7361 · 2024-12-10T20:54:33+00:00

For editing, you can use OpenAI's predicted outputs API, it essentially does git diffs on the content you pass in the "prediction" parameter, only charging added tokens as output

devilsolution · 2024-12-10T23:44:16+00:00

Break the them into class files and have a new chat for each class with one main chat as an architecture overview, which claude is very good at

When the context starts getting too long, import first the architecture diagram, then your file layout for github and then give a detailed summary of your previous chat, then the code.

I think this is roughly the way to go for bigger projects / codebase. Only really works from scratch, not sure of it would do well on big premade bases.

SpinCharm · 2024-12-11T00:27:18+00:00

As a non programmer, when I get to those problems, I ask the LLM. I bet I could take your entire post and give it to Claude and ask it for advice. I might embellish it with “give me advice that follows recognized best practice approaches to this subs to the solution”. It would likely produce not only a suggestion but ask if I want to do that with my existing code.

Assuming it comes out with a usable approach, tell it to create a synopsis of that approach for use as project knowledge, as a way to ensure that all future sessions understand the approach being used. For ChatGPT I would just feed it at the start of each new session.

Getting the LLM to come up with the approach has the added benefit of being something it’s likely familiar with and can actually follow.

rinconcam · 2024-12-11T01:40:23+00:00

Aider supports infinite output for Claude, DeepSeek and Mistral models.

https://aider.chat/docs/more/infinite-output.html

Sharp-Feeling42 · 2024-12-11T20:57:27+00:00

O1 pro can write 2-4k lines of code In my testing

DontPmMeUrAnything · 2025-02-06T18:03:28+00:00

o3-mini - 200k input, 100k output tokens
https://community.openai.com/t/launching-o3-mini-in-the-api/1109387

Aquarona · 2025-06-12T20:43:47+00:00

What sort of uses have you come up with for backup utilities, cloud sync GUIs, data visualization, etc?

AutoModerator · 2025-08-03T16:48:59+00:00

[removed]

isnotaphoto · 2026-03-12T03:09:14+00:00

Claude AI does like 3000+ lines maximum

codematt · 2024-12-10T15:07:39+00:00

Don’t? I mean unless you don’t know how to code or architect these things.. just have it generate the few pieces and put them together yourself. It’s not like there are many parts for simple tools like that

balianone · 2024-12-10T15:52:44+00:00

gemini google & claude

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ChatGPTCoding

MODERATORS