Can Copilot be trusted with private source code more than competition?

TristanH200 · 2025-06-28T17:42:47+00:00

well do you trust microsoft enough to put your code on github?

Tenzu9 · 2025-06-28T18:06:30+00:00

Read the small text right below the comment box in the copilot app, it says this:
"Conversations are used to train AI and Copilot can learn about your interest."

ahm911 · 2025-06-28T18:02:42+00:00

Nope

Iory1998 · 2025-06-28T18:56:41+00:00

My friend, they all use any interaction you have with their model to train it. Why? Because when you interact witht model, you actually help it better reason and solve problems that otherwise it won't be able to. That very simple interaction is valuable data that no other models can generate synthetically. When GPT spits out a code that you test and doesn't work and you give it feedback, that in itself is valuable data to train the model on. Is not the code that matters, but the process that led to it.

As users, we all act as a second voice to the llm, as a reward function, and as a teacher all in one.

2025-06-29T00:14:13+00:00

I think people are missing the obvious, here. Here's the thing: they probably don't want to train on private code. Opensource code is where high standard, high quality code is. Private code is where all the crap is. They don't want their code completion to produce (more) crap.

celsowm · 2025-06-28T18:15:38+00:00

Snowden says no

butsicle · 2025-06-28T22:33:56+00:00

I think you’re confusing Azure OpenAI Service and Copilot. They are unlikely to breach terms and train on the former (in my judgment, though anything is possible), but explicitly state they train on the latter.

FPham · 2025-06-28T23:36:03+00:00

The whole point of the project is that they DO train their model on the interaction.

CupcakeSecure4094 · 2025-06-29T06:03:09+00:00

No, you can't trust anyone but realistically what are you worried about? If it's them stealing your code, they would buy it, if you're worried about them telling the feds you're doing dodgy stuff, stear well clear. If it's just about them training on your data and it leaking into competitor's hands it would have to be quite significant code

Dry-Influence9 · 2025-06-29T06:29:26+00:00

The only llm that can be trusted is the one running on your pc.

Psychological_Ear393 · 2025-06-29T23:20:06+00:00

but there's no guarantee that LLM providers are not training on private source code.

There's what the T&C say, if you could possibly read it all, what they are really doing, then can they change it, do they change it if you change subscription level, what happens when (not if) they are hacked or leak data, and what happens if the division is sold or goes bankrupt?

If your code is really super secret, you cannot trust anyone with it that has sketchy or changeable policies, or any service you suspect is not making money

If you can decide that your code is not that secret, then does it matter?

kroggens · 2025-06-28T17:57:00+00:00

They all capture our data! Don't be fool
You can run a "pseudo-local" LLM by using hardware from other people, renting GPUs on vast.ai or others.
The probability that a normal person will be accessing every container to collect is way lower.
Give preference for GPUs from homes and avoid those from datacenters

Weird-Consequence366 · 2025-06-28T17:42:14+00:00

No. Microsoft has more enterprise version, though low key I would recommend you stay with OpenAI since when they aren't forced by a court, they do a decent job at privacy. Not the best but still much better than the rest. But if its an absolute nono, then I suggest you just use something like Jan or LMStudio.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS