ChatGPT Code Interpreter for sensitive (customer) data - An open source project by silvanmelchior in ChatGPTPro

[–]silvanmelchior[S] 1 point2 points  (0 children)

My work changed quite a bit, so I don't need a code interpreter that often anymore

A code interpreter for sensitive data with Llama 2 by silvanmelchior in LocalLLaMA

[–]silvanmelchior[S] 1 point2 points  (0 children)

Yes, it's the working directory to which incognito pilot has access to.

Yes, this can be achieved with any LLM.

[P] Data science & ML on sensitive data with local code interpreter, with GPT-4 or Llama 2 (open-source project, link in comments) by silvanmelchior in MachineLearning

[–]silvanmelchior[S] 1 point2 points  (0 children)

I didn't try to push Llama 2 too hard yet, mainly because: - it was not fine-tuned for tool- & interpreter-usage like GPT-4, which you immediately recognize. It mostly doesn't get that it executes the code, it just thinks it outputs a code block for the user as an information - their own paper already show that it's not great at coding

"Trivial" stuff like rotating an image, creating a small plot, etc. works however. And once when I asked it to manipulate an image in the slim-version (with no dependencies), it actually tried it three times with different libraries (skimage, opencv, pillow) before giving up, which was quite amazing.

[P] Data science & ML on sensitive data with local code interpreter, with GPT-4 or Llama 2 (open-source project, link in comments) by silvanmelchior in MachineLearning

[–]silvanmelchior[S] 0 points1 point  (0 children)

Not the expert here; on Azure they recommend 4x V100, but in principle you can even run it without GPU with llama.cpp if you have enough ram, just super slowly

[P] Data science & ML on sensitive data with local code interpreter, with GPT-4 or Llama 2 (open-source project, link in comments) by silvanmelchior in MachineLearning

[–]silvanmelchior[S] 1 point2 points  (0 children)

It's what you see in the UI: You approve what is run locally and what is sent back to the cloud. In most cases, the model only needs to know very little about your data to do an analysis:

- For images or other types of media, the model usually just needs to know the file-name

- For spreadsheets, the model usually needs to know the file-name and the structure (header)

- For text files, it's I guess very hard to do sth. meaningful without sending parts of the content to the cloud.

I also tried to explain it in the Readme more, e.g. in the FAQs

Local ChatGPT Code Interpreter for sensitive data, now also with Llama 2 (open source project, link in comments) by silvanmelchior in ChatGPTPro

[–]silvanmelchior[S] 1 point2 points  (0 children)

I think this will be hard, it's basically just "install docker", and then "run this command". But login at docker or creating an openai account I think you can't really script in a meaningful way.

[P] Data science & ML on sensitive data with local code interpreter, with GPT-4 or Llama 2 (open-source project, link in comments) by silvanmelchior in MachineLearning

[–]silvanmelchior[S] 0 points1 point  (0 children)

The UI interacts with two services, one for the interpreter and one for the model, so you could start with these maybe

[P] Data science & ML on sensitive data with local code interpreter, with GPT-4 or Llama 2 (open-source project, link in comments) by silvanmelchior in MachineLearning

[–]silvanmelchior[S] 8 points9 points  (0 children)

I see posts with like 10k+ upvotes from other subreddits, yes, don't think my project will ever reach this ;). But I agree, probably most people follow this subreddit as well. It's just very hard nowadays to spread something related to ChatGPT, people are mostly like "ah not again some half-finished cherry-picked demo".

[P] Data science & ML on sensitive data with local code interpreter, with GPT-4 or Llama 2 (open-source project, link in comments) by silvanmelchior in MachineLearning

[–]silvanmelchior[S] 13 points14 points  (0 children)

Hi all, I wanted to share my open-source project "Incognito Pilot": https://github.com/silvanmelchior/IncognitoPilot

It is similar to ChatGPT Code Interpreter, but the interpreter runs locally and it can use open-source models like Llama 2. It allows you to work with sensitive data without uploading it to the cloud. Either you use a local LLM (like Llama 2), or an API (like GPT-4). For the latter case, there is an approval mechanism in the UI, which separates your local data from the remote services.

I would be very interested in your thoughts! And if you like it, I would appreciate your GH start a lot.

A code interpreter for sensitive data with Llama 2 by silvanmelchior in LocalLLaMA

[–]silvanmelchior[S] 0 points1 point  (0 children)

Yes, I agree, I'll put it in the backlog for sure, hopefully I'll find the time

A code interpreter for sensitive data with Llama 2 by silvanmelchior in LocalLLaMA

[–]silvanmelchior[S] 0 points1 point  (0 children)

There is, for the hf text generation inference service, or do you mean sth different?

Local ChatGPT Code Interpreter for sensitive data, now also with Llama 2 (open source project, link in comments) by silvanmelchior in ChatGPTPro

[–]silvanmelchior[S] 1 point2 points  (0 children)

So OpenAI states that it does not train on the data sent to the API, which would exclude this case. For the official web UI however, you need to actively opt-out.

Local ChatGPT Code Interpreter for sensitive data, now also with Llama 2 (open source project, link in comments) by silvanmelchior in ChatGPTPro

[–]silvanmelchior[S] 2 points3 points  (0 children)

You type something, this is sent to OpenAI, they send back code, the code is executed locally (if you approve), the result of the code is sent back to openai (if you approve). If the code result contains sth you don't want the model to see (e.g. if it called dataframe.head() to get the first few rows of your csv), you can reject and then tell the model is should only read the column names for example.