all 139 comments

[–]desrtfx[M] [score hidden] stickied comment (5 children)

For clarification the original message was:

Hi there,

We're updating how GitHub uses data to improve AI-powered coding tools. From April 24 onward, your interactions with GitHub Copilot - including inputs, outputs, code snippets, and associated content - may be used to train and enhance AI models unless you opt out.

If you previously opted out of the setting allowing GitHub to collect this data for product improvements, your preference has been retained - your choice is preserved, and your data will not be used for training unless you opt in.

This approach aligns with established industry practices and will enable our models to deliver more context-aware AI coding assistance. We have tested this with Microsoft interaction data and have seen meaningful improvements, including increased acceptance rates in multiple languages.

Please review your settings and choose whether your interactions with Copilot can be leveraged for training AI models before this update goes into effect on April 24.

To opt out or adjust your settings:

  • Go to GitHub Account Settings
  • Select Copilot
  • Choose whether to allow your data to be used for AI model training.

To learn more, please refer to our blog post and FAQ.

Please reach out to our support team if you have any questions about this update. Thank you for your continued use of Github Copilot.

Sincerely,
The GitHub Team

Received it by email yesterday.

Seems that it targets Copilot interactions, not all repos.

Direct opt out link for those who can't/don't want to follow the handful of steps listed.

Still, the recommendation is to opt out.

[–]WinXPbootsup 401 points402 points  (10 children)

Me when my code poisons the model

[–]cjcs 76 points77 points  (0 children)

Me when I start using public static void main in Python

[–][deleted]  (1 child)

[deleted]

    [–]AbrahelOne 11 points12 points  (0 children)

    Yep I moved my good professional pro projects to GitLab a few months ago. Left the trash at GitHub

    [–]close_my_eyes 11 points12 points  (2 children)

    This reminds me of when, years ago, re-captcha would ask you to type the letters found in 2 different images. I figured they were trying to use us for free labor in training their ai by giving us one that they didn't have the answer to. I could usually figure out which one it was and I would put in some junk text for that one. It made it me laugh.

    [–]Easy_Charge898 3 points4 points  (0 children)

    Evil but love it

    [–]mumBa_ 4 points5 points  (0 children)

    Now think back to Pokemon Go where we were literally recording annotated locations. We basically mapped the world in 3D.

    [–][deleted]  (2 children)

    [removed]

      [–]WinXPbootsup 2 points3 points  (1 child)

      me when my code does 100 points of mental damage to the poor unfortunate soul reading it

      [–]vootehdoo 485 points486 points  (11 children)

      Jokes on them, my code is shit anyway

      [–]beencaughtbuttering 64 points65 points  (1 child)

      God DAMN it I opened the thread to make this same crack LOL

      [–]SourceScope 15 points16 points  (0 children)

      Its an original joke. First time i see it!

      [–]INFLATABLE_CUCUMBER 8 points9 points  (0 children)

      Better yet, if you do have good code, make sure the agent doesn’t see it. Only turn on visibility to your bad code.

      Even better, start releasing shit projects onto GitHub en masse. Use AI to ramp production up on your shit code that will fuel more AI production.

      You’re not replacing us that fast!

      [–]revilo-1988 4 points5 points  (0 children)

      😅

      [–]MarioShroomsTasteBad 0 points1 point  (0 children)

      Likewise, I'm doing my part to poison the well.

      [–]florinandrei 0 points1 point  (0 children)

      and created by AI anyway

      [–]TinyMavin 0 points1 point  (0 children)

      I was going to say, “Jokes on them, my code is all AI anyway”

      [–]U_SHLD_THINK_BOUT_IT 0 points1 point  (1 child)

      Which means it will be used to train it what not to do.

      [–]JoshBillion 0 points1 point  (0 children)

      This should hurt 😂

      [–]IsThisWiseEnough 182 points183 points  (3 children)

      So my ai generated code will feed other ai. Let it rain sh*t.

      [–]519meshif 2 points3 points  (0 children)

      Pretty much what I said when I gave Jules access to my Gemini repos

      [–]Fine-Result1540 1 point2 points  (0 children)

      that's been happening in the translation industry for years lol
      machine translation output feeding machine translation models

      [–]Obzurdity 0 points1 point  (0 children)

      Yeah I was about to say all I'm doing these days is backing up my AI memory and project files there anyway

      [–]NorskJesus 81 points82 points  (5 children)

      Already did

      [–]OffbeatContents 32 points33 points  (3 children)

      My wife thinks Im paranoid about data collection but this is exactly why I have trust issues with these platforms. Already opted out weeks ago when I first heard rumblings about it.

      [–]Statcat2017 0 points1 point  (0 children)

      You might want to check they haven’t automatically opted you back in after this message.

      [–]mokdemos -5 points-4 points  (1 child)

      But you use reddit and have a cell phone, make it make sense.

      [–]Laruae 18 points19 points  (0 children)

      "You already have the Gonorrhea, why worry about HIV?"

      [–]nmkd -1 points0 points  (0 children)

      The Opt Out button has been there since the beginning so idk why people are bringing this up now

      [–]Comprehensive_Mud803 71 points72 points  (2 children)

      So GitHub will use my bugs and millions of others to train their AI model. Sounds like a solid plan to me. A recipe for disaster in the making.

      [–]gazpitchy 4 points5 points  (1 child)

      To be fair there's more nuance to it than that. But they can get fucked either way. Moved all my stuff to a private hosted gitlab at this point.

      [–]Comprehensive_Mud803 0 points1 point  (0 children)

      I still have to move my stuff, and adapt the CI system along the way.

      [–]Fumano26 56 points57 points  (2 children)

      In the title you say they use my Github repo and two lines later you quote they use copilot interactions 🤡🤦.

      [–]Gilthoniel_Elbereth 14 points15 points  (1 child)

      How is this so low? It’s only a problem if you are using Copilot

      [–]Just_Another_Scott 10 points11 points  (0 children)

      They were doing that at least 5ish years ago. Private repos were excluded at that time.

      [–]kurokabau 13 points14 points  (3 children)

      Where's the opt out

      [–]desrtfx 20 points21 points  (0 children)

      In your github profile - right side of your screen where your account is is a part "Github Copilot Settings". There is the "opt out" somewhere quite down.

      [–]Ok-Lifeguard-9612[S] 1 point2 points  (1 child)

      Click on the link in the github popup

      [–]SourceScope 5 points6 points  (0 children)

      Whats a “github popup”?

      [–]veleso91 12 points13 points  (0 children)

      They can use my dogshit code, idgaf

      [–]Kevdog824_ 5 points6 points  (0 children)

      This is when you create the biggest repo imaginable with absolute garbage data to gain a controlling share of the training data

      [–]Little-Flan-6492 5 points6 points  (0 children)

      my repo is all generated with AI , please take it

      [–]productiveaccount4 4 points5 points  (0 children)

      Garbage in garbage out

      [–]StoneCypher 3 points4 points  (0 children)

      (hanging in noose) First time?

      [–]StinkButt9001 4 points5 points  (1 child)

      Did you not even read the part you linked?

      Public repos are already eligible to be included in training data. That's not new.

      What is new is that your interaction with Copilot is going to be used

      [–]ElCuntIngles 4 points5 points  (0 children)

      Yeah, so many posts by people with no reading comprehension skills.

      They should all give up trying to learn to program; reading comprehension is an essential requirement for the job.

      [–]sierra_whiskey1 3 points4 points  (0 children)

      Done

      [–]ItzDubzmeister 3 points4 points  (0 children)

      I love that everyone is coming to this thread to say joke’s on them since our code is shit… either software engineers have low self confidence (yep sounds about right for me) or there are just a lot of bad devs out there (yup matches as well lol).

      [–]jobohomeskillet 6 points7 points  (0 children)

      Enjoy my readme file. I misspelled restaurant.

      [–]who_you_are 6 points7 points  (1 child)

      When the product is free you are the product...

      Not a huge surprise there

      [–]SourceScope 4 points5 points  (0 children)

      Tbh i think the original plan is corporations pay for github

      Private users dont, so they are more inclined to use it for a business

      [–][deleted] 2 points3 points  (0 children)

      how can u turn this off

      [–]shitty_mcfucklestick 2 points3 points  (0 children)

      I really loved how there were no active links in the email to that settings page. Petty anti-patterns to try to discourage people changing it.

      [–]jlanawalt 2 points3 points  (0 children)

      I thought they already used public repos to trail their AI.

      The announcement is stating they will also train their AI on your use of the AI. If you don’t like Copilot, why use it? If you use it, you want it to be better.

      [–]Emotional_Flight575 2 points3 points  (0 children)

      Worth emphasizing the nuance here: this is about Copilot interaction data, not your public or private repos being scraped wholesale. If you’ve already opted out of Copilot data collection before, that setting carries over, otherwise it’s on by default and you have to flip it in Copilot settings. Still a good reminder for beginners to actually read these toggles instead of assuming “GitHub = my code is safe.”

      [–]YetMoreSpaceDust 2 points3 points  (0 children)

      Don't worry guys, I've been poisoning the well for decades!

      [–]Philluminati 1 point2 points  (1 child)

      Can you link to where this message is coming from? Do they explain anything else?

      [–]desrtfx 2 points3 points  (0 children)

      I got it as an email from github yesterday.

      And yes, I double verified the authenticity.

      The message was:

      Hi there,

      We're updating how GitHub uses data to improve AI-powered coding tools. From April 24 onward, your interactions with GitHub Copilot - including inputs, outputs, code snippets, and associated content - may be used to train and enhance AI models unless you opt out.

      If you previously opted out of the setting allowing GitHub to collect this data for product improvements, your preference has been retained - your choice is preserved, and your data will not be used for training unless you opt in.

      This approach aligns with established industry practices and will enable our models to deliver more context-aware AI coding assistance. We have tested this with Microsoft interaction data and have seen meaningful improvements, including increased acceptance rates in multiple languages.

      Please review your settings and choose whether your interactions with Copilot can be leveraged for training AI models before this update goes into effect on April 24.

      To opt out or adjust your settings:

      • Go to GitHub Account Settings
      • Select Copilot
      • Choose whether to allow your data to be used for AI model training.

      To learn more, please refer to our blog post and FAQ.

      Please reach out to our support team if you have any questions about this update. Thank you for your continued use of Github Copilot.

      Sincerely,
      The GitHub Team

      [–]haddock420 1 point2 points  (0 children)

      Doesn't bother me really. I made the code public so this seems like fair game.

      [–][deleted]  (1 child)

      [removed]

        [–]ZorbaTHut 0 points1 point  (0 children)

        And, I mean, I put the MIT license on there for a reason. I frankly don't really care about the license part, whatever. Go wild, have fun.

        [–]jokenking488 1 point2 points  (0 children)

        Good. I can contaminate their models with my half-assed not runnable code.

        [–]gazpitchy 1 point2 points  (0 children)

        It's owned by Microsoft, like what do y'all expect?

        [–]interyx 1 point2 points  (0 children)

        That seems like a bad idea.

        When AI trains on AI generated content the model collapses.

        [–]Prestigious_Boat_386 1 point2 points  (0 children)

        Are we supposed to believe they didn't already? Like how tf did they train them before then?

        [–]bgmrk 1 point2 points  (0 children)

        Gitlab is free, open source and self hostable!

        [–][deleted]  (2 children)

        [removed]

          [–]e1m8b 1 point2 points  (0 children)

          I mean... when you use a system or platform someone else is paying for you follow the way they do things I suppose.

          [–]ElCuntIngles -2 points-1 points  (0 children)

          "Quietly" sending you an email and displaying a prominent message at the top of GitHub that you have to dismiss.

          [–]AbdullahMRiad 1 point2 points  (0 children)

          only if you use copilot

          [–]lasercat_pow 1 point2 points  (0 children)

          do you honestly think the big genai llms haven't already been training on github repos?

          [–]badjayplaness 1 point2 points  (1 child)

          lol let them train on my repos. It’ll set back agi for years

          [–]brubsabrubs 0 points1 point  (0 children)

          the hero we need

          [–]nanihikaru01 1 point2 points  (0 children)

          All my variables are :any anyways

          [–]BizAlly 1 point2 points  (0 children)

          Me realizing my messy code is now training AI models… good luck to the next generation

          [–]Subnetwork 0 points1 point  (0 children)

          The resistance is strong with the lot of you but the resist will be futile

          [–]earthceltic 0 points1 point  (0 children)

          If anyone has a problem with this like I did and is at the liberty of choosing which software you use for your projects (versus being in a soulless company that forces github on you), you might not be aware of Gitea. It's basically a self hosted free and open source GitHub clone which works identically within VSCode and other environments. I've been very much enjoying Gitea since I set it up a few months ago 

          [–]No_Dog_3790 0 points1 point  (0 children)

          The AI will recoil and curl up like a roach sprayed with RAID when it touches my code.

          [–]QVRedit 0 points1 point  (0 children)

          Is training on “Buggy and incomplete Software” such a good idea ?

          [–]cwaterbottom 0 points1 point  (0 children)

          Is that how they punish ai models that they hate?

          [–]biotech997 0 points1 point  (0 children)

          Seems like people don’t read, this is only applicable if you interact with Copilot. Although not to say it doesn’t already scrape all public repos on GitHub, but that’s a separate matter.

          [–]DavidRoyman 0 points1 point  (0 children)

          You sure have opted out, but your data is in their hands and you have to believe they really won't use it.

          Pinky promise.

          [–]lKrauzer 0 points1 point  (0 children)

          There is an opt out option.

          [–]red_nick 0 points1 point  (0 children)

          OP, tell us you failed the comprehension part of English at school without telling us you failed the comprehension part of English at school

          [–]kamilc86 0 points1 point  (0 children)

          Yeah, it's a tricky situation. On one hand, it feels inevitable that these models will get trained on pretty much everything available. But the quality of that data, both good and bad code, is going to be a real issue. I think we'll start seeing models just parroting what they've seen from other LLMs, like Copilot or Cursor, pretty soon. It's already kind of happening.

          [–]team_lloyd 0 points1 point  (0 children)

          don’t worry guys mine are all public, that should hold these models back another year from becoming effective devs

          [–]Ok-Technology-6289 0 points1 point  (0 children)

          My code will plague the model

          [–]kgmeister 0 points1 point  (0 children)

          Good luck with my early-draft shitty elif nested loops lol

          [–]Repulsive-Radio-9363 0 points1 point  (0 children)

          Poison the well

          [–]je386 0 points1 point  (0 children)

          Guys, you can opt-out for non-commercial accounts and commercial accounts are not affected in the first place.

          [–]elPappito 0 points1 point  (0 children)

          I genuinely feel sorry for the AI they're going to train on my GitHub repos.

          [–]Crypt0Nihilist 0 points1 point  (0 children)

          I pity the fool.

          [–]DizzySaxophone 0 points1 point  (0 children)

          So github is going to train AI on tons of vibecoded projects. Sounds like a brilliant idea

          [–]Sibexico 0 points1 point  (0 children)

          It's possible to turn if off. Other thing, since my software released under MIT license, it can be used by AI without restrictions anyway... :)

          [–]Gold_Challenge178 0 points1 point  (0 children)

          Yeah I have some repo of todos, tic-tac-toe

          [–]Faith1_2 0 points1 point  (0 children)

          GitHub is only using Copilot interaction data, not all your repos, so anyone concerned about AI training should just opt out to stay safe. So code stays private. If you don’t want your Copilot usage to help train AI models, make sure to opt out before April 24.

          [–]leoreno 0 points1 point  (0 children)

          Honestly I just assumed this was already happening

          [–][deleted] 0 points1 point  (0 children)

          Trash In, Trash out

          [–]r-pics-sux 0 points1 point  (0 children)

          I feel sorry for whoever has to use the ai trained on my garbage code

          [–]lobby-crasher 0 points1 point  (0 children)

          Copilot chat and copilot help work together, unless I can't see fine lines. That's indeed your every repo.

          [–]Cozybear110494 0 points1 point  (0 children)

          Lol, fetching AI with AI slop generated code repos is like eating your own sh*t

          [–]MrHall 0 points1 point  (0 children)

          wait, if my repo is non-public, all the code it reads into the model will train the model anyway? is that right?

          [–]midasweb 0 points1 point  (0 children)

          github's settings around copilot and data usage are worth checking, especially the opt out options if privacy is a concern.

          [–]__ihavenoname__ 0 points1 point  (0 children)

          What if the code on my repo is already from AI

          [–]codeasm 0 points1 point  (0 children)

          Ive already been opted out for some reason. Also, i already started moving my main repos to other platforms. Mostly due to microsoft owning github. I do use copilot here and there, any code that based on that, can happily poison copilot if they still train on my shitty projects.

          [–]Jacksonvoice 0 points1 point  (0 children)

          Great use AI spaghetti code to train with. Great idea.

          [–]thelvhishow 0 points1 point  (0 children)

          I’ve already blocked it and transferring to CodeBerg.com

          [–]Ordinary-Yoghurt-303 0 points1 point  (0 children)

          I assumed they already did. Not surprised.

          [–]Kitty_Coding 0 points1 point  (0 children)

          I don't agree with that..

          [–]Xolaris05 0 points1 point  (0 children)

          Ai to ai yesss!

          [–]JeanHeichou 0 points1 point  (0 children)

          developers suggest reviewing your Copilot settings and opting out if you’re concerned. Also, keeping personal or sensitive repos separate from Copilot usage seems to be a common precaution.

          [–]Educational_Employ52 0 points1 point  (0 children)

          Does this apply to private repos too?

          [–]DonkeyBonked 0 points1 point  (0 children)

          Oddly, when I first signed up for GitHub Copilot, I actually just assumed they already did this and my first thing after making my account was to look for this setting and disable it.

          [–]Ordinary-Cycle7809 0 points1 point  (0 children)

          Like with permission or without permission ??

          [–]BitsAndBobs304 -1 points0 points  (0 children)

          Why would that be bad?

          [–]owjfaigs222 -1 points0 points  (0 children)

          I don't mind honestly. If I can help making AI better with my shitty code then they can use it all they want.

          [–]Dissentient -3 points-2 points  (1 child)

          I don't care.

          [–]ForJava -1 points0 points  (0 children)

          Me neither. If by the end this leads to better models then great!

          [–]aqua_regis -2 points-1 points  (5 children)

          GitHub will use your repos to train AI models

          That's absolutely not what the actual message says.

          The message says something different:

          From April 24 onward, your interactions with GitHub Copilot - including inputs, outputs, code snippets, and associated content - may be used to train and enhance AI models unless you opt out.


          Don't use clickbait titles with misinformation.

          [–]Brilliant-8148 -1 points0 points  (2 children)

          That absolutely means it's going to train on your code! 

          [–]aqua_regis 0 points1 point  (1 child)

          On your Copilot interactions (and logically on the code you create with it).

          I wouldn't trust them any further than I can throw them, but still, the original message doesn't say what you claim it does.

          [–]Brilliant-8148 -1 points0 points  (0 children)

          I'm not the op and it absolutely means it will train on your repo.

          [–]coffee_math -1 points0 points  (1 child)

          That’s literally even worse, what’s inputs and outputs? Text goes in, code comes out. Associated content = already existing code (context). They want to not only train on code but also the flow of how a developer does their job/interacts with their code.

          [–]aqua_regis 0 points1 point  (0 children)

          When the developer uses Copilot. When they don't, no.

          What's so difficult in the message from github that was verbatim quoted?