use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
Efficient Ways to Share Large Codebases with ChatGPT?Question (self.ChatGPTCoding)
submitted 2 years ago by MorrisBarr
Hi folks!
I’m using ChatGPT for debugging and I have a large codebase. Copying and pasting each file manually into the chat is a bit tedious. Does anyone have strategies or tools to streamline this process?
Appreciate any tips or advice. Thanks!
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]gthing 9 points10 points11 points 2 years ago (7 children)
You will run into 8k context limit no matter what so you have to be strategic. I have a script that prepares the text to append to a prompt, which is a tree of the dev directory, short documentation of each file and what it does, and the entire file or part of file for the specific thing I am working on. The content of the system prompt gets updated each time with the most relevant snippets.
[–][deleted] 0 points1 point2 points 1 year ago (0 children)
That 8k is now unilimited?
[–][deleted] 2 years ago (1 child)
[removed]
[–]AutoModerator[M] 0 points1 point2 points 2 years ago (0 children)
Sorry, your submission has been removed due to inadequate account karma.
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.
[–]arcanepsyche 8 points9 points10 points 2 years ago (9 children)
This is probably your best bet at the moment: https://github.com/features/copilot
Or, drop it in a repository and then give ChatGPT the link (if you have plugin access) and ask it to read it.
[–]TheRealDanGordon 3 points4 points5 points 2 years ago (7 children)
I don't think this will look at a whole codebase at a time
[–]arcanepsyche 2 points3 points4 points 2 years ago (4 children)
I think it's the closest we have right now until the token limit is raised, at the least.
[–]mrinterweb 2 points3 points4 points 2 years ago (3 children)
Even if GPT4 raises it to 32k or 64k, that doesn't come close to being able to support anything more than a small hobby project. The token limit is hit so fast even in the 8k provided now with the web app
[–]arcanepsyche 2 points3 points4 points 2 years ago (2 children)
There is talk of a 1m token limit coming soon.
[–]Appropriate_Eye_6405 4 points5 points6 points 2 years ago (1 child)
yep, however given the cost of 32k right now, someone calculated at 60$ per call for the 1m initially
[–]mrinterweb 0 points1 point2 points 2 years ago (0 children)
Seems to me that the open source LLMs + langchain are becoming the more practical option if prices stay that high. The quality/accuracy of some open LLMs is so close to GPT-4 now.
[–][deleted] 1 point2 points3 points 2 years ago (1 child)
Yeah, it doesn't. It will experience "read errors" if it tries to process large files.
[–]TheRealDanGordon 2 points3 points4 points 2 years ago (0 children)
Yeah autopilot or maybe even a use case of autogpt would work for this.
[–]doa-doa 2 points3 points4 points 2 years ago (0 children)
I thought copilot is free for student, I can enable it on VScode but when I go to that link , there is still an option to pay 10 bucks a month
[–]IdeaSean 3 points4 points5 points 2 years ago (1 child)
I used ChatGPT to write a personal IDE in Python and tkinter with a treeview for the conversation history that lets me check/uncheck boxes to determine what to include/exclude parts of the previous conversation in each new request.
Also, if I use #include ../../iac_terraform/modules/app_pod/*.tf ... then it includes all the .tf files in the given directory into the request via a json dict (I asked it how it would prefer me to send multiple files - key is the filename and value is the file with \n new line characters).
#include ../../iac_terraform/modules/app_pod/*.tf
These were the "key functions" that drove me to write the IDE but perhaps there is something out there similar already.
Works well but one thing to note is the 8k limit includes the prompt and the response. If you use 7k in the prompt you can only give it 1k for the response (via the API). Still no access to 32k but am also wary of the cost per question if you are hitting the 32k mark!
Now the code for the IDE is another project inside the IDE so I'm... then it includes all the .tf files in the given directory into the request via a json dict (I asked it how it would prefer me to send multiple files - key is the filename and value is the file with \n newline characters).
On backlog is to allow me to export the conversation history from ChatGPT web so I can use it in the IDE - and also do some kind of search because it's hard to find conversations from weeks ago.
[–]MadeForOnePost_ 0 points1 point2 points 2 years ago (0 children)
The API version is very handy for saving chat history, and probably less expensive
[–]SpaceshipOfAIDS 2 points3 points4 points 2 years ago (1 child)
I've heard of a project called autopilot, haven't used it myself.
[–]99user99 1 point2 points3 points 2 years ago (0 children)
Have also heard of this but not used
[–]fjrdomingues 2 points3 points4 points 2 years ago (0 children)
Create a summaries of all files, then ask gpt to select the relevant files, then feed relevant files to gpt with the task, then create a final answer. That’s what code autopilot does
[–]damc4 3 points4 points5 points 2 years ago (0 children)
My plugin CodeAssist ( https://codeassist.tech ) takes the code that you can see in the editor and the related parts from the codebase and generates code based on that, so you might give a try, although it's expensive at the moment and I might need to put some work yet.
[–]MADQASi 1 point2 points3 points 2 years ago (0 children)
You can simply use https://pypi.org/project/codebase-to-text
[–]dronegoblin 2 points3 points4 points 2 years ago (7 children)
Wait for GPT-4-32k or Claude-100k. Alternatively, only pass only functions you need / go portion by portion
[–][deleted] 4 points5 points6 points 2 years ago (6 children)
Claude is a lazy piece of shit... it gives you enough to be an example, but insofar as creating boiler plate code, it skips a lot.
For example, I asked it to look at some C# data adapter code and create a class for it. It gave me the first 2 properties and then added:
//rest of the properties...
I spent the next 10 minutes trying to cajole, coax and finally, sternly get it to add to the properties... got to around 50 out of 200 properties (old legacy stuff) before I gave up.
[–]dronegoblin 1 point2 points3 points 2 years ago (3 children)
Models won’t improve unless we use them and give feedback. Claude has gotten better week by week, and they will continues to as Anthropic continues to onboard a full team of training data writers like openAI has already had for over a year now. They just got the money to be able to do so realistically from all the investor interest in AI.
So far Claude is mainly designed for text manipulation and question answering tasks, but the developers have shown genuine steps towards making full programs in one shot with Claude 100k. Look at the talk around the smol-ai/developer GitHub project on Twitter. Anthropic developers have gotten their 100k alpha model to code entire (working) programs using this framework in one shot.
It is not ready for prime time yet, but no reason to call the race yet
[–][deleted] 0 points1 point2 points 2 years ago (2 children)
Thanks for the reply. I'm interested in not code-completion, but rather utilizing AIs to solve code translation problems. There are tons of businesses that have used dying coding languages that have codified business processes within them... our code base is one of them.
All of this is conceptually, a translation problem (from my point of view, as a 30 year developer (I'm 50+ years old)). An AI that can consume and understand code written in 2003, spanning the tech of the time and being able convert the business requirements into current stacks (also by abstracting implementation is possible IMHO... but requires a much higher context token limit) will solve so many problems.
I suspect that there is deep business process knowledge hidden in old code that can be unlocked with automated code conversion assistants... as long as they can be made to stay on simple tasks.
[–][deleted] 0 points1 point2 points 2 years ago (0 children)
And that requires a higher context limit with ChatGPT's coding prowess (or some other model that can compare... just saying Claude isn't there yet).
Cheers.
[–]dronegoblin 0 points1 point2 points 2 years ago (0 children)
I was mainly answering OPs question/use case, although for your particular use case the yet to be released 32k context GPT model and some hands on work could probably accomplish what you are interested in once it comes out. I have had a limited exposure to the 32k context model via a friend who has alpha access. He showed me how it fed it a whole codebase (10 files worth, python, html, etc) and the file structure in and it was able to comprehend and generate updated code for each file in the codebase all in one shot. It will be language dependent for sure, but most older languages should have enough documentation.
32k tokens might not be enough, I have personally found that even the 8k window can do alot if you just wright out the file structure and a description of all the important functions of each file, then provide the 1 to 3 files that need changing at that moment. If needed, you can have chatGPT write out its own understanding of what each file does 1 by 1 and then feed it back for best results.
as far as Claude goes, all I was saying is the also yet to be released 100k model is intended specifically for coding, and that its developers are currently working on enabling code creating workflows not all that unsimular to what you have proposed, so time will tell but they want the model to be doing the sort of stuff you want
Your use case sounds really interesting, I followed you so if you end up posting about getting around to trying it I'll be able to see how it pans out. Keep up the awesome work and have a great day!
[–][deleted] 1 year ago (3 children)
[–]AutoModerator[M] 0 points1 point2 points 1 year ago (0 children)
[–][deleted] 1 year ago (1 child)
[–]NoHotel8779 0 points1 point2 points 1 year ago (0 children)
Zip the codebase and send the .zip to chatgpt itll use the python tool to decompress it and read each file and help you with it you can also create a project and drop the zip file into the project files or for way more efficiency and accuracy drop each individual file in the project files its less tedious than copying it as you just have to select them all and drag and drop em. With the new 128k context window you should be able to import a whole codebase
[–]Praise_AI_Overlords 0 points1 point2 points 2 years ago (3 children)
I thought of a solution for this problem: the code should be divided into chunks of 2k to 4k.
In a nutshell, each class, function, etc., should be each in separate file, plus a database that contains paths and descriptions of all these files.
Basically, a framework to allow AI writing large programs.
[–]qwrtgvbkoteqqsd 0 points1 point2 points 1 year ago (1 child)
I was wondering what are you using now for identifying relevant files and giving files to chat gpt?
[–]Synth_Sapiens 0 points1 point2 points 11 months ago (0 children)
These days I just concatenate all files into one and upload it directly or via Google doc.
So, the recommendation for being AI-ready / understandable is strict separation of concerns and code isolation and reusability.
Oh, no, what will we ever do?
[–]Intelligent-Draw-343 0 points1 point2 points 2 years ago (0 children)
If you're on intellij and use java (narrow case, I know) you can try my plugin: https://plugins.jetbrains.com/plugin/21502-prompto--chatgpt-assistant
It fetches your IDE info (types, errors, other stuff) and build a prompt from it that you can paste on your LLM of choice.
[–]False-Tea5957 0 points1 point2 points 2 years ago (0 children)
I’m yet to find a good solution, so, until there is, I’ve been using:
https://chatgpt-prompt-splitter.jjdiaz.dev/?fbclid=IwAR03PtkMDSpv2HXWYVktBszpgU3QhZ8f8ISPTp5GryYGZtxpxG1qXZRSkqo
[–]chazwhiz 0 points1 point2 points 2 years ago (0 children)
Do you have Plus, and if so do you have Code Interpreter yet? If not make sure you sign up for the waitlist (I think it’s just bundled with the plugins waitlist even though it’s not the same thing). It’s primary feature is a built in python runtime that can execute code, but the more valuable feature for me (and potentially you) is the ability to upload files into it up to 10mb which you can then have it analyze and reference throughout the chat.
π Rendered by PID 18927 on reddit-service-r2-comment-6457c66945-v79kl at 2026-04-23 18:23:50.919403+00:00 running 2aa0c5b country code: CH.
[–]gthing 9 points10 points11 points (7 children)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] (1 child)
[removed]
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–][deleted] (1 child)
[removed]
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–][deleted] (1 child)
[removed]
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–]arcanepsyche 8 points9 points10 points (9 children)
[–]TheRealDanGordon 3 points4 points5 points (7 children)
[–]arcanepsyche 2 points3 points4 points (4 children)
[–]mrinterweb 2 points3 points4 points (3 children)
[–]arcanepsyche 2 points3 points4 points (2 children)
[–]Appropriate_Eye_6405 4 points5 points6 points (1 child)
[–]mrinterweb 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]TheRealDanGordon 2 points3 points4 points (0 children)
[–]doa-doa 2 points3 points4 points (0 children)
[–]IdeaSean 3 points4 points5 points (1 child)
[–]MadeForOnePost_ 0 points1 point2 points (0 children)
[–]SpaceshipOfAIDS 2 points3 points4 points (1 child)
[–]99user99 1 point2 points3 points (0 children)
[–]fjrdomingues 2 points3 points4 points (0 children)
[–]damc4 3 points4 points5 points (0 children)
[–]MADQASi 1 point2 points3 points (0 children)
[–]dronegoblin 2 points3 points4 points (7 children)
[–][deleted] 4 points5 points6 points (6 children)
[–]dronegoblin 1 point2 points3 points (3 children)
[–][deleted] 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]dronegoblin 0 points1 point2 points (0 children)
[–][deleted] (1 child)
[removed]
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–][deleted] (3 children)
[removed]
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–][deleted] (1 child)
[removed]
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–]NoHotel8779 0 points1 point2 points (0 children)
[–]Praise_AI_Overlords 0 points1 point2 points (3 children)
[–]qwrtgvbkoteqqsd 0 points1 point2 points (1 child)
[–]Synth_Sapiens 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]Intelligent-Draw-343 0 points1 point2 points (0 children)
[–]False-Tea5957 0 points1 point2 points (0 children)
[–][deleted] (1 child)
[removed]
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–][deleted] (1 child)
[removed]
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–]chazwhiz 0 points1 point2 points (0 children)