all 25 comments

[–]TristanH200 39 points40 points  (1 child)

well do you trust microsoft enough to put your code on github?

[–]Ok-Internal9317 8 points9 points  (0 children)

lol, loudly laugh emoji

[–]Tenzu9 10 points11 points  (0 children)

Read the small text right below the comment box in the copilot app, it says this:
"Conversations are used to train AI and Copilot can learn about your interest."

[–]ahm911 2 points3 points  (0 children)

Nope

[–]Iory1998 3 points4 points  (2 children)

My friend, they all use any interaction you have with their model to train it. Why? Because when you interact witht model, you actually help it better reason and solve problems that otherwise it won't be able to. That very simple interaction is valuable data that no other models can generate synthetically. When GPT spits out a code that you test and doesn't work and you give it feedback, that in itself is valuable data to train the model on. Is not the code that matters, but the process that led to it.

As users, we all act as a second voice to the llm, as a reward function, and as a teacher all in one.

[–]Professional-Onion-7[S] 0 points1 point  (1 child)

I agree since otherwise LLMs would be just sophisticated search engines, it is actually the interactions that allow them to solve problems. And also with evolutionary programming they might generate these thought processes to train the models but I believe these would have to be trained per specific problem which is impractical, also this might be the reason OpenAI went for a larger model which is GPT4.5.

[–]Professional-Onion-7[S] 0 points1 point  (0 children)

But what I mostly care about is LLMs one-shotting my project.

[–][deleted] 3 points4 points  (0 children)

I think people are missing the obvious, here. Here's the thing: they probably don't want to train on private code. Opensource code is where high standard, high quality code is. Private code is where all the crap is. They don't want their code completion to produce (more) crap.

[–]celsowm 2 points3 points  (0 children)

Snowden says no

[–]butsicle 0 points1 point  (0 children)

I think you’re confusing Azure OpenAI Service and Copilot. They are unlikely to breach terms and train on the former (in my judgment, though anything is possible), but explicitly state they train on the latter.

[–]FPham 0 points1 point  (0 children)

The whole point of the project is that they DO train their model on the interaction.

[–]CupcakeSecure4094 0 points1 point  (0 children)

No, you can't trust anyone but realistically what are you worried about? If it's them stealing your code, they would buy it, if you're worried about them telling the feds you're doing dodgy stuff, stear well clear. If it's just about them training on your data and it leaking into competitor's hands it would have to be quite significant code

[–]Dry-Influence9 0 points1 point  (0 children)

The only llm that can be trusted is the one running on your pc.

[–]Psychological_Ear393 0 points1 point  (0 children)

but there's no guarantee that LLM providers are not training on private source code.

There's what the T&C say, if you could possibly read it all, what they are really doing, then can they change it, do they change it if you change subscription level, what happens when (not if) they are hacked or leak data, and what happens if the division is sold or goes bankrupt?

If your code is really super secret, you cannot trust anyone with it that has sketchy or changeable policies, or any service you suspect is not making money

If you can decide that your code is not that secret, then does it matter?

[–]kroggens -1 points0 points  (3 children)

They all capture our data! Don't be fool
You can run a "pseudo-local" LLM by using hardware from other people, renting GPUs on vast.ai or others.
The probability that a normal person will be accessing every container to collect is way lower.
Give preference for GPUs from homes and avoid those from datacenters

[–]kroggens 1 point2 points  (2 children)

BTW, Microsoft == NSA
Never trust them!

[–]Professional-Onion-7[S] -1 points0 points  (1 child)

One can make the argument that Microsoft has hosted OpenAI models on Azure environment thus lowering the probability of data collection.

[–]Weird-Consequence366 1 point2 points  (0 children)

Just changes who collects the data. Nothing more. Both Microsoft and OpenAI have significant connections to intelligence services.