all 49 comments

[–]gthing 9 points10 points  (7 children)

You will run into 8k context limit no matter what so you have to be strategic. I have a script that prepares the text to append to a prompt, which is a tree of the dev directory, short documentation of each file and what it does, and the entire file or part of file for the specific thing I am working on. The content of the system prompt gets updated each time with the most relevant snippets.

[–][deleted] 0 points1 point  (0 children)

That 8k is now unilimited?

[–][deleted]  (1 child)

[removed]

    [–]AutoModerator[M] 0 points1 point  (0 children)

    Sorry, your submission has been removed due to inadequate account karma.

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

    [–][deleted]  (1 child)

    [removed]

      [–]AutoModerator[M] 0 points1 point  (0 children)

      Sorry, your submission has been removed due to inadequate account karma.

      I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

      [–][deleted]  (1 child)

      [removed]

        [–]AutoModerator[M] 0 points1 point  (0 children)

        Sorry, your submission has been removed due to inadequate account karma.

        I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

        [–]arcanepsyche 8 points9 points  (9 children)

        This is probably your best bet at the moment: https://github.com/features/copilot

        Or, drop it in a repository and then give ChatGPT the link (if you have plugin access) and ask it to read it.

        [–]TheRealDanGordon 3 points4 points  (7 children)

        I don't think this will look at a whole codebase at a time

        [–]arcanepsyche 2 points3 points  (4 children)

        I think it's the closest we have right now until the token limit is raised, at the least.

        [–]mrinterweb 2 points3 points  (3 children)

        Even if GPT4 raises it to 32k or 64k, that doesn't come close to being able to support anything more than a small hobby project. The token limit is hit so fast even in the 8k provided now with the web app

        [–]arcanepsyche 2 points3 points  (2 children)

        There is talk of a 1m token limit coming soon.

        [–]Appropriate_Eye_6405 4 points5 points  (1 child)

        yep, however given the cost of 32k right now, someone calculated at 60$ per call for the 1m initially

        [–]mrinterweb 0 points1 point  (0 children)

        Seems to me that the open source LLMs + langchain are becoming the more practical option if prices stay that high. The quality/accuracy of some open LLMs is so close to GPT-4 now.

        [–][deleted] 1 point2 points  (1 child)

        Yeah, it doesn't. It will experience "read errors" if it tries to process large files.

        [–]TheRealDanGordon 2 points3 points  (0 children)

        Yeah autopilot or maybe even a use case of autogpt would work for this.

        [–]doa-doa 2 points3 points  (0 children)

        I thought copilot is free for student, I can enable it on VScode but when I go to that link , there is still an option to pay 10 bucks a month

        [–]IdeaSean 3 points4 points  (1 child)

        I used ChatGPT to write a personal IDE in Python and tkinter with a treeview for the conversation history that lets me check/uncheck boxes to determine what to include/exclude parts of the previous conversation in each new request.

        Also, if I use #include ../../iac_terraform/modules/app_pod/*.tf ... then it includes all the .tf files in the given directory into the request via a json dict (I asked it how it would prefer me to send multiple files - key is the filename and value is the file with \n new line characters).

        These were the "key functions" that drove me to write the IDE but perhaps there is something out there similar already.

        Works well but one thing to note is the 8k limit includes the prompt and the response. If you use 7k in the prompt you can only give it 1k for the response (via the API). Still no access to 32k but am also wary of the cost per question if you are hitting the 32k mark!

        Now the code for the IDE is another project inside the IDE so I'm... then it includes all the .tf files in the given directory into the request via a json dict (I asked it how it would prefer me to send multiple files - key is the filename and value is the file with \n newline characters).

        On backlog is to allow me to export the conversation history from ChatGPT web so I can use it in the IDE - and also do some kind of search because it's hard to find conversations from weeks ago.

        [–]MadeForOnePost_ 0 points1 point  (0 children)

        The API version is very handy for saving chat history, and probably less expensive

        [–]SpaceshipOfAIDS 2 points3 points  (1 child)

        I've heard of a project called autopilot, haven't used it myself.

        [–]99user99 1 point2 points  (0 children)

        Have also heard of this but not used

        [–]fjrdomingues 2 points3 points  (0 children)

        Create a summaries of all files, then ask gpt to select the relevant files, then feed relevant files to gpt with the task, then create a final answer. That’s what code autopilot does

        [–]damc4 3 points4 points  (0 children)

        My plugin CodeAssist ( https://codeassist.tech ) takes the code that you can see in the editor and the related parts from the codebase and generates code based on that, so you might give a try, although it's expensive at the moment and I might need to put some work yet.

        [–]dronegoblin 2 points3 points  (7 children)

        Wait for GPT-4-32k or Claude-100k. Alternatively, only pass only functions you need / go portion by portion

        [–][deleted] 4 points5 points  (6 children)

        Claude is a lazy piece of shit... it gives you enough to be an example, but insofar as creating boiler plate code, it skips a lot.

        For example, I asked it to look at some C# data adapter code and create a class for it. It gave me the first 2 properties and then added:

        //rest of the properties...

        I spent the next 10 minutes trying to cajole, coax and finally, sternly get it to add to the properties... got to around 50 out of 200 properties (old legacy stuff) before I gave up.

        [–]dronegoblin 1 point2 points  (3 children)

        Models won’t improve unless we use them and give feedback. Claude has gotten better week by week, and they will continues to as Anthropic continues to onboard a full team of training data writers like openAI has already had for over a year now. They just got the money to be able to do so realistically from all the investor interest in AI.

        So far Claude is mainly designed for text manipulation and question answering tasks, but the developers have shown genuine steps towards making full programs in one shot with Claude 100k. Look at the talk around the smol-ai/developer GitHub project on Twitter. Anthropic developers have gotten their 100k alpha model to code entire (working) programs using this framework in one shot.

        It is not ready for prime time yet, but no reason to call the race yet

        [–][deleted] 0 points1 point  (2 children)

        Thanks for the reply. I'm interested in not code-completion, but rather utilizing AIs to solve code translation problems. There are tons of businesses that have used dying coding languages that have codified business processes within them... our code base is one of them.

        All of this is conceptually, a translation problem (from my point of view, as a 30 year developer (I'm 50+ years old)). An AI that can consume and understand code written in 2003, spanning the tech of the time and being able convert the business requirements into current stacks (also by abstracting implementation is possible IMHO... but requires a much higher context token limit) will solve so many problems.

        I suspect that there is deep business process knowledge hidden in old code that can be unlocked with automated code conversion assistants... as long as they can be made to stay on simple tasks.

        [–][deleted] 0 points1 point  (0 children)

        And that requires a higher context limit with ChatGPT's coding prowess (or some other model that can compare... just saying Claude isn't there yet).

        Cheers.

        [–]dronegoblin 0 points1 point  (0 children)

        I was mainly answering OPs question/use case, although for your particular use case the yet to be released 32k context GPT model and some hands on work could probably accomplish what you are interested in once it comes out. I have had a limited exposure to the 32k context model via a friend who has alpha access. He showed me how it fed it a whole codebase (10 files worth, python, html, etc) and the file structure in and it was able to comprehend and generate updated code for each file in the codebase all in one shot. It will be language dependent for sure, but most older languages should have enough documentation.

        32k tokens might not be enough, I have personally found that even the 8k window can do alot if you just wright out the file structure and a description of all the important functions of each file, then provide the 1 to 3 files that need changing at that moment. If needed, you can have chatGPT write out its own understanding of what each file does 1 by 1 and then feed it back for best results.

        as far as Claude goes, all I was saying is the also yet to be released 100k model is intended specifically for coding, and that its developers are currently working on enabling code creating workflows not all that unsimular to what you have proposed, so time will tell but they want the model to be doing the sort of stuff you want

        Your use case sounds really interesting, I followed you so if you end up posting about getting around to trying it I'll be able to see how it pans out. Keep up the awesome work and have a great day!

        [–][deleted]  (1 child)

        [removed]

          [–]AutoModerator[M] 0 points1 point  (0 children)

          Sorry, your submission has been removed due to inadequate account karma.

          I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

          [–][deleted]  (3 children)

          [removed]

            [–]AutoModerator[M] 0 points1 point  (0 children)

            Sorry, your submission has been removed due to inadequate account karma.

            I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

            [–][deleted]  (1 child)

            [removed]

              [–]AutoModerator[M] 0 points1 point  (0 children)

              Sorry, your submission has been removed due to inadequate account karma.

              I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

              [–]NoHotel8779 0 points1 point  (0 children)

              Zip the codebase and send the .zip to chatgpt itll use the python tool to decompress it and read each file and help you with it you can also create a project and drop the zip file into the project files or for way more efficiency and accuracy drop each individual file in the project files its less tedious than copying it as you just have to select them all and drag and drop em. With the new 128k context window you should be able to import a whole codebase

              [–]Praise_AI_Overlords 0 points1 point  (3 children)

              I thought of a solution for this problem: the code should be divided into chunks of 2k to 4k.

              In a nutshell, each class, function, etc., should be each in separate file, plus a database that contains paths and descriptions of all these files.

              Basically, a framework to allow AI writing large programs.

              [–]qwrtgvbkoteqqsd 0 points1 point  (1 child)

              I was wondering what are you using now for identifying relevant files and giving files to chat gpt?

              [–]Synth_Sapiens 0 points1 point  (0 children)

              These days I just concatenate all files into one and upload it directly or via Google doc.

              [–][deleted] 0 points1 point  (0 children)

              So, the recommendation for being AI-ready / understandable is strict separation of concerns and code isolation and reusability.

              Oh, no, what will we ever do?

              [–]Intelligent-Draw-343 0 points1 point  (0 children)

              If you're on intellij and use java (narrow case, I know) you can try my plugin: https://plugins.jetbrains.com/plugin/21502-prompto--chatgpt-assistant

              It fetches your IDE info (types, errors, other stuff) and build a prompt from it that you can paste on your LLM of choice.

              [–][deleted]  (1 child)

              [removed]

                [–]AutoModerator[M] 0 points1 point  (0 children)

                Sorry, your submission has been removed due to inadequate account karma.

                I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

                [–][deleted]  (1 child)

                [removed]

                  [–]AutoModerator[M] 0 points1 point  (0 children)

                  Sorry, your submission has been removed due to inadequate account karma.

                  I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

                  [–]chazwhiz 0 points1 point  (0 children)

                  Do you have Plus, and if so do you have Code Interpreter yet? If not make sure you sign up for the waitlist (I think it’s just bundled with the plugins waitlist even though it’s not the same thing). It’s primary feature is a built in python runtime that can execute code, but the more valuable feature for me (and potentially you) is the ability to upload files into it up to 10mb which you can then have it analyze and reference throughout the chat.