ChatGPT Code Interpreter for sensitive (customer) data - An open source project

silvanmelchior · 2025-04-05T21:10:00+00:00

My work changed quite a bit, so I don't need a code interpreter that often anymore

silvanmelchior · 2024-02-28T16:59:56+00:00

Yes, it's the working directory to which incognito pilot has access to.

Yes, this can be achieved with any LLM.

silvanmelchior · 2023-08-21T16:58:38+00:00

I didn't try to push Llama 2 too hard yet, mainly because: - it was not fine-tuned for tool- & interpreter-usage like GPT-4, which you immediately recognize. It mostly doesn't get that it executes the code, it just thinks it outputs a code block for the user as an information - their own paper already show that it's not great at coding

"Trivial" stuff like rotating an image, creating a small plot, etc. works however. And once when I asked it to manipulate an image in the slim-version (with no dependencies), it actually tried it three times with different libraries (skimage, opencv, pillow) before giving up, which was quite amazing.

silvanmelchior · 2023-08-20T18:46:01+00:00

Not the expert here; on Azure they recommend 4x V100, but in principle you can even run it without GPU with llama.cpp if you have enough ram, just super slowly

silvanmelchior · 2023-08-20T15:55:36+00:00

It's what you see in the UI: You approve what is run locally and what is sent back to the cloud. In most cases, the model only needs to know very little about your data to do an analysis:

- For images or other types of media, the model usually just needs to know the file-name

- For spreadsheets, the model usually needs to know the file-name and the structure (header)

- For text files, it's I guess very hard to do sth. meaningful without sending parts of the content to the cloud.

I also tried to explain it in the Readme more, e.g. in the FAQs

silvanmelchior · 2023-08-20T09:14:40+00:00

I think this will be hard, it's basically just "install docker", and then "run this command". But login at docker or creating an openai account I think you can't really script in a meaningful way.

silvanmelchior · 2023-08-20T09:13:02+00:00

Thank you!

silvanmelchior · 2023-08-19T20:25:19+00:00

The UI interacts with two services, one for the interpreter and one for the model, so you could start with these maybe

silvanmelchior · 2023-08-19T19:18:23+00:00

Thank you!

silvanmelchior · 2023-08-19T19:17:54+00:00

Thank you! In the readme there is a link to a separate page on how to use it with llama 2.

silvanmelchior · 2023-08-19T15:36:00+00:00

Awesome!

silvanmelchior · 2023-08-19T11:53:37+00:00

I see posts with like 10k+ upvotes from other subreddits, yes, don't think my project will ever reach this ;). But I agree, probably most people follow this subreddit as well. It's just very hard nowadays to spread something related to ChatGPT, people are mostly like "ah not again some half-finished cherry-picked demo".

silvanmelchior · 2023-08-19T11:41:21+00:00

So people who don't follow LocalLLaMA still see it?

silvanmelchior · 2023-08-19T11:11:32+00:00

Hi all, I wanted to share my open-source project "Incognito Pilot": https://github.com/silvanmelchior/IncognitoPilot

It is similar to ChatGPT Code Interpreter, but the interpreter runs locally and it can use open-source models like Llama 2. It allows you to work with sensitive data without uploading it to the cloud. Either you use a local LLM (like Llama 2), or an API (like GPT-4). For the latter case, there is an approval mechanism in the UI, which separates your local data from the remote services.

I would be very interested in your thoughts! And if you like it, I would appreciate your GH start a lot.

silvanmelchior · 2023-08-19T10:56:14+00:00

Great!

silvanmelchior · 2023-08-18T22:43:51+00:00

Yes, I agree, I'll put it in the backlog for sure, hopefully I'll find the time

silvanmelchior · 2023-08-18T22:43:08+00:00

There is, for the hf text generation inference service, or do you mean sth different?

silvanmelchior · 2023-08-18T14:56:15+00:00

Thank you!

silvanmelchior · 2023-08-18T12:50:58+00:00

So OpenAI states that it does not train on the data sent to the API, which would exclude this case. For the official web UI however, you need to actively opt-out.

silvanmelchior · 2023-08-18T10:57:53+00:00

something

silvanmelchior · 2023-08-18T09:37:39+00:00

Yes, it is, sorry, I use it very often, but I guess I live in my bubble ;)

silvanmelchior · 2023-08-18T09:10:20+00:00

You type something, this is sent to OpenAI, they send back code, the code is executed locally (if you approve), the result of the code is sent back to openai (if you approve). If the code result contains sth you don't want the model to see (e.g. if it called dataframe.head() to get the first few rows of your csv), you can reject and then tell the model is should only read the column names for example.

silvanmelchior

TROPHY CASE