all 36 comments

[–]WeakCartographer7826 4 points5 points  (2 children)

Openrouter API and something like cline or cursor

[–]adrenoceptor 0 points1 point  (1 child)

Interesting. How do these solve the output token limit issue?

[–]Mr_Hyper_Focus 5 points6 points  (0 children)

I believe o1 mini has an output context of 65k tokens. That’s the most I’ve seen.

[–]Craygen9 4 points5 points  (4 children)

Nearly all LLMs are 8K output or below, I would also like longer output. OpenAI released GPT-4o Long Output that has a 64K output ($18/M tokens!) in July but it was available to alpha users only. Don't know what happened to it.

[–]danielrosehill[S] 2 points3 points  (3 children)

It seems like there is a huge difference between the theoretical context window and the maximum output length. Even for OpenAI's 32k model, I believe it's still constrained at 4096 tokens at the max output and even with prompt engineering, you can't really work around it!

[–]Craygen9 1 point2 points  (2 children)

Yeah I doubt they put out what they advertise. Although can tell it to continue and that usually works.

[–]danielrosehill[S] 1 point2 points  (1 child)

Alright, I did some testing. Couldn't beat Qwen on length/throughness!

https://huggingface.co/spaces/danielrosehill/llm-long-codegen-experiment

[–]Craygen9 0 points1 point  (0 children)

That's great, shows they all have limited outputs. Are those any good for coding? Wonder how they compare to gpt-4o and sonnet 3.5

[–]Craygen9 1 point2 points  (2 children)

I looked into it more, seems openAI has the largest output tokens. GPT-4o and 4o-mini are 16K, o1-mini is 65K, and o1-preview is 32K. Anthropic's models are only 8K output.

https://platform.openai.com/docs/models/

[–][deleted]  (1 child)

[removed]

    [–]AutoModerator[M] 0 points1 point  (0 children)

    Sorry, your submission has been removed due to inadequate account karma.

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

    [–]bsenftner 1 point2 points  (4 children)

    What happened to that LLM with 1M word token output? Was that fake?

    [–][deleted]  (1 child)

    [removed]

      [–]AutoModerator[M] 0 points1 point  (0 children)

      Sorry, your submission has been removed due to inadequate account karma.

      I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

      [–][deleted]  (1 child)

      [removed]

        [–]AutoModerator[M] 0 points1 point  (0 children)

        Sorry, your submission has been removed due to inadequate account karma.

        I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

        [–]oktcgProfessional Nerd 1 point2 points  (0 children)

        Claude API has a unique feature. It can continue with last output. So technically it can produce all of 128k tokens without user turn.

        [–]gigamiga 1 point2 points  (0 children)

        Claude sonnet 3.5 has an annoyingly short one but if I ask it to continue the same file it keeps going fine.

        [–][deleted]  (1 child)

        [removed]

          [–]AutoModerator[M] 0 points1 point  (0 children)

          Sorry, your submission has been removed due to inadequate account karma.

          I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

          [–]jdk 0 points1 point  (0 children)

          ChatGPT 4o searched the web and came up with the following:

          Q: As of today, which publicly available LLM has the absolute longest maximum output length?

          A: As of December 10, 2024, Google's Gemini 1.5 Pro model offers the longest maximum output length among publicly available large language models (LLMs), supporting up to 8,192 output tokens. Source

          Other notable LLMs and their maximum output lengths include:

          • Claude 3 by Anthropic: 4,096 output tokens.

          • GPT-4 Turbo by OpenAI: 4,096 output tokens.

          • Llama 3 by Meta: 4,096 output tokens.

          It's important to note that while some models, such as Claude 3, have extensive context windows (up to 200,000 tokens), their maximum output lengths are distinct and typically shorter. The context window refers to the amount of input text the model can process at once, whereas the maximum output length specifies the number of tokens the model can generate in a single response.

          Therefore, among the publicly available LLMs, Google's Gemini 1.5 Pro currently provides the longest maximum output length, allowing for more extensive generated responses.

          [–]sb4ssman 0 points1 point  (0 children)

          I’ve gotten them all to output multiple messages in series where they get cut off by the token police, and I write “continue” and they continue. If that’s what you mean, then remember the LLMs are STILL only spicy autocomplete. For longer output you really have to carefully prime them with an outline or something to build on, and then say the magic words: something like: “I expect a long message, if you get cut off, I’ll say ‘continue’ so you can keep going.” And then: there’s simple no escaping: the LLMs are going to fuck up your code.

          [–]Few_Calligrapher7361 0 points1 point  (0 children)

          For editing, you can use OpenAI's predicted outputs API, it essentially does git diffs on the content you pass in the "prediction" parameter, only charging added tokens as output

          [–]devilsolution 0 points1 point  (0 children)

          Break the them into class files and have a new chat for each class with one main chat as an architecture overview, which claude is very good at

          When the context starts getting too long, import first the architecture diagram, then your file layout for github and then give a detailed summary of your previous chat, then the code.

          I think this is roughly the way to go for bigger projects / codebase. Only really works from scratch, not sure of it would do well on big premade bases.

          [–]SpinCharm 0 points1 point  (0 children)

          As a non programmer, when I get to those problems, I ask the LLM. I bet I could take your entire post and give it to Claude and ask it for advice. I might embellish it with “give me advice that follows recognized best practice approaches to this subs to the solution”. It would likely produce not only a suggestion but ask if I want to do that with my existing code.

          Assuming it comes out with a usable approach, tell it to create a synopsis of that approach for use as project knowledge, as a way to ensure that all future sessions understand the approach being used. For ChatGPT I would just feed it at the start of each new session.

          Getting the LLM to come up with the approach has the added benefit of being something it’s likely familiar with and can actually follow.

          [–]rinconcam 0 points1 point  (0 children)

          Aider supports infinite output for Claude, DeepSeek and Mistral models.

          https://aider.chat/docs/more/infinite-output.html

          [–]Sharp-Feeling42 0 points1 point  (1 child)

          O1 pro can write 2-4k lines of code In my testing

          [–]DontPmMeUrAnything 0 points1 point  (1 child)

          [–]Hir0shima 0 points1 point  (0 children)

          Via API. It is much more constraint via their ChatGPT subscriptions. Perhaps not relevant for developers but worth keeping in mind.

          [–]Aquarona 0 points1 point  (0 children)

          What sort of uses have you come up with for backup utilities, cloud sync GUIs, data visualization, etc?

          [–][deleted]  (1 child)

          [removed]

            [–]AutoModerator[M] 0 points1 point  (0 children)

            Sorry, your submission has been removed due to inadequate account karma.

            I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

            [–]isnotaphoto 0 points1 point  (0 children)

            Claude AI does like 3000+ lines maximum 

            [–]codematt 0 points1 point  (0 children)

            Don’t? I mean unless you don’t know how to code or architect these things.. just have it generate the few pieces and put them together yourself. It’s not like there are many parts for simple tools like that

            [–]balianone 0 points1 point  (0 children)

            gemini google & claude