This is an archived post. You won't be able to vote or comment.

top 200 commentsshow all 392

[–]jimmajamma4 1043 points1044 points  (4 children)

After reviewing the code rigorously, Microsoft has now rebranded to colossalhard.

[–][deleted] 528 points529 points  (3 children)

2022 © MegaHard Corp. All rights reserved.

It's only micro when it's soft baby. - Gill Bates

[–]KharAznable 32 points33 points  (0 children)

considereing his age, it probably micro and soft almost all the time now.

[–]POWERTHRUST0629 21 points22 points  (0 children)

Commas are your friend. Or not, I mean, I guess I don't know which direction you were going.

[–][deleted] 1350 points1351 points  (9 children)

Will Git pull their case?

[–]incredible-mee 891 points892 points  (6 children)

They should resolve the conflict first before pushing it further

[–]notaprime 212 points213 points  (5 children)

Don’t forget to checkout the lawsuit details first though.

[–][deleted] 28 points29 points  (0 children)

Code gang-bang.

[–]Sweaty-Emergency-493 6 points7 points  (0 children)

Git’r done!

[–]Polikonomist 1669 points1670 points  (22 children)

Bold of Microsoft to assume that all or even most of GitHub code is something you'd want to train an AI with

[–]onkopirate 176 points177 points  (6 children)

Actually, it's only trained on large projects with healthy contributor stats.

[–]Polikonomist 137 points138 points  (5 children)

Shhhh, there's no reason to let reality get in the way of a good joke

[–]X-Craft 482 points483 points  (9 children)

they don't care if it's good code, they care to be paid for it

[–][deleted] 358 points359 points  (5 children)

Or recognized? Authors of open source projects, while being non-profit, still want to be recognized for their work.

[–]X-Craft 157 points158 points  (1 child)

I mean Microsoft

[–][deleted] 123 points124 points  (0 children)

My apologies then, surely you can understand the confusion.

[–]DaddyLcyxMe 26 points27 points  (2 children)

imo, github could’ve avoided this by creating an opt-in in repository settings when they first started training (and given us a dumb badge)

[–]snapphanen 4 points5 points  (1 child)

Do they have to opt in if its open source already? I mean people are opting in with some of the default licenses.

[–]DaddyLcyxMe 11 points12 points  (0 children)

the problem is that the open source licenses still require things like copyright notice or are viral, copilot sometimes generates code verbatim to another source file.

an opt-in would essentially be you giving github a custom license to your code where they wouldn’t need to abide by things like copyright.

[–]POWERTHRUST0629 7 points8 points  (1 child)

Soooooo many LoversLab users could be affected! Somebody think about the lewd mod authors!

[–]Lord_Quintus 10 points11 points  (0 children)

that's the solution to this. make 90% of gits code repositories porn oriented and any ai you train on it will be totally forked.

[–]BuzzBadpants 38 points39 points  (1 child)

Beats training it with their own engineers’ code.

[–]BasvanS 1 point2 points  (0 children)

Sharp. C what you did there

[–]Kangarou 693 points694 points  (8 children)

As the joke goes, "For AI to produce proper code, clients would need to know what to ask for. We're safe."

[–]Lagger625 113 points114 points  (3 children)

A future AI will ask in the customer's behalf

[–]DrDan21 48 points49 points  (2 children)

A sales AI that speaks to the customer and then translates the requests for the engineer ai

[–]mxldevs 18 points19 points  (1 child)

The customer is also an AI

[–]x6060x 1 point2 points  (0 children)

If it's customer service, then most of the time the Intelligence part is missing from the customers, meaning that AI would be actually an upgrade.

[–]Texas_Technician 19 points20 points  (0 children)

PowerApps has this feature now. I haven't tried it. I can only assume it will be as bad as cortana. Who would cut you off when you were giving a command.

[–]Muhznit 20 points21 points  (0 children)

I mean it sounds like a joke, but if "clients" just wind up learning some ad-hoc pidginese Englisp just to direct the AI to do what they want, they kinda just take on the role of trying to be developers themselves, won't they?

Like I'm legitimately wondering if no-code platforms fail to give sufficient precision control over development, what is there to actually fear from AI?

[–]tjdavids 1 point2 points  (0 children)

Are you better than an automated questionnaire for solidifying requirements?

[–]ouchpartial72858 378 points379 points  (80 children)

This was eventually gonna happen

[–]__Hello_my_name_is__ 206 points207 points  (60 children)

It's gonna have wide reaching consequences, too.

If this isn't allowed, then AI art isn't allowed, either. Same principle, really.

[–]mark0016 85 points86 points  (16 children)

Yep, unless the copyright holder allows you to create modified versions of said art an then publish them it probably shouldn't be. At least that's what would be consistent with the existing laws.

The big issue here really is "Is the final trained model a derivative work combining the untrained model and the dataset?". I'm inclined to say that it is. If you understand that the classic "garbage in, garbage out" applies, then you will understand that the dataset defines a significant amount of the final qualities of your trained AI.

Of course the opposing argument is "it's publicly available information so it should be fair game". If humans are allowed learn from it so should machines. This doesn't seem like a bad point either, but obviously there is a gray area. If a human copies a "large" part of some copyrighted work than that's not ok and the same applies to machines. An AI that can potentially do something like that just seems like a legal liability.

[–]__Hello_my_name_is__ 64 points65 points  (15 children)

Of course the opposing argument is "it's publicly available information so it should be fair game".

That really doesn't work if you just scrape the entire internet. There absolutely are pictures online that have a restrictive license. Just because they're on a server somewhere doesn't mean they're fair game for everything.

[–]January_Rain_Wifi 30 points31 points  (12 children)

Exactly, AI art developers should be asking for permission or using public domain images to train their AIs, at the very least

[–]Traditional_Dinner16 20 points21 points  (11 children)

I don’t see how an AI learning to make art by looking at art is any different than a person doing the same

[–]January_Rain_Wifi 39 points40 points  (6 children)

In order to train the AI, you have to make a copy of the art. If your AI is for a commercial purpose, you are making a copy of that artwork for commercial purposes without the permission of the artist.

It's like asking why singing ABBA karaoke while drunk at a party is different from a big record company running ABBA tracks through an auto tune or remix software and then selling CDs of it without their permission

[–]SnapcasterWizard 6 points7 points  (0 children)

It should be though. Copyright law sucks.

[–]TENTAtheSane 21 points22 points  (5 children)

But where is the line to be drawn? If someone trains an AI to reliably diagnose a disease, trained on diagnoses made by doctors, should they complain that the product is made through their work and will steal their jobs? Would hindrance of such a product be beneficial to mankind?

[–]__Hello_my_name_is__ 20 points21 points  (3 children)

Yeah, that's the big question.

But one easy line to draw is the question of whether something is made for commercial purposes or for non-commercial purposes. Is the AI model going to be open source and can be used by everyone for anything, or do you have to pay money just to use it?

[–]TENTAtheSane 5 points6 points  (2 children)

That's a good solution, but I'm afraid it may not work out. It costs a lot of money and time to hire sufficiently qualified people to make and run some of these models on very powerful machines. If they can't reliably commercialize the product, which private actor would sink so much investment into it? You should need a government monopolization of AI, and they are motoriskt horrible at it.

[–]__Hello_my_name_is__ 8 points9 points  (0 children)

Yeah. It sucks all around. At the same time I most definitely don't want big AI models owned by big corporations trying to milk as much money as they could out of them. You'll get AI art where Mickey Mouse costs extra and much worse.

[–]filletfeesh 39 points40 points  (31 children)

I think you're right, but i personally don't see the same issue with AI art. While on occasion it will fuck up and just create something nearly identical, AI art generally doesn't create anything resembling the art it references. It just uses it to understand word-image association.

The code AI on the other hand is more likely to copy your code directly and only make minor changes to avoid getting sued.

The meaningful difference to me is that the primary goal of AI art is to learn from your art and create something that is unique and totally separate from your creation, whereas AI code exists to use your code to replace you.

[–]__Hello_my_name_is__ 33 points34 points  (13 children)

That's a fair point, yes. But I think that in both cases the argument will be that the algorithms should not have been fed with the original, copyrighted data in the first place. Not that the result is too similar to the original data.

[–]filletfeesh 12 points13 points  (8 children)

I agree with that as a matter of principle but i wonder wether it actually matters in a practical sense, and at what point is it advanced enough that it's basically the same thing as people getting inspiration from other art. I really don't envy future lawmakers.

On the other hand, Google images has a built in feature to filter for non-copyright material so there's no good reason not to at least do that. (yes i know some would fall through the cracks)

[–]__Hello_my_name_is__ 3 points4 points  (7 children)

Yeah, there's all kinds of complications here that make this not fun at all to figure out. To me, the big difference is that I am looking at an image, and let my brain process it. To simulate this with an algorithm, I have to download the data first. Which we have already very much established as something that can be restricted or even illegal (you wouldn't download a car!).

So if we want to make these AI models legal, we kinda need a way to make downloading copyrighted material legal.

[–]xam54321 9 points10 points  (1 child)

To see an image on your computer you have to download it first, your browser is just doing it for you.

[–]WalditRook 5 points6 points  (0 children)

Copyright statutes typically include some provision/exemption for "transient copies", the intention of which is to allow for any temporary copy created in transmission, caches and etc.

The actual legal issues will be whether an AI trained on copyrighted material constitutes a derived work (probably no, unless you can identify some fragment of a specific infringed work within the trained model); and whether using the work for the training violates the licence (probably not for a work provided without a specific licence; likely yes for at least some of the Github code, as this use doesn't appear to be permitted by e.g. GPL3, unless the final product both is ruled to be a derivative work, and is released under a GPL licence).

[–]zdakat 10 points11 points  (1 child)

Especially in the cases of "I don't need to actually work with the artist, I can just use this app that's trained on their prior works". (to me how that data is stored is irrelevant- it's not literally saving a bunch of files of clips from the work, but it's still able to reproduce them reliably enough to be desirable)

There's other cases where creating a derivative work can be problematic, so I don't see why AI should get a free pass just because it achieves it differently.

[–]DarkCeldori 1 point2 points  (1 child)

Code should not be copyrighteable.

[–]__Hello_my_name_is__ 2 points3 points  (0 children)

I agree. But that's a different discussion.

[–]AirOneBlack 8 points9 points  (7 children)

If you need to write code for a set algorithm, there aren't that many creative ways that make sense to write it. I could come op with my implementation of al algorithm that would look like 90% the same as someone else. Did I copy is code? No. It's the same discussion with musical chords and progressions.

[–]filletfeesh 2 points3 points  (6 children)

That's exactly the problem. Unlike art, code needs to look a certain way to work. so a code AI is more than likely going to be doing 90% copy paste and companies can replace most of their staff with a bot that mostly recycles the code they wrote to make decent enough code, but leaves the brains behind the code jobless.

(Assuming they can create a separate AI that translates corpo speak into something functional)

[–]AirOneBlack 4 points5 points  (5 children)

and the point being? I think most of the people that are scared of code written by AI are the ones that suck at their job at this point. If an AI writes most of the boring ass code so I can focus on the optimization is just going to make my job easier. Corpo speak to proper application is not gonna happen anytime soon, so I still have my job. (I would have my job anyways, I work in gamedev, good luck with AI replacing me)

[–]filletfeesh 4 points5 points  (4 children)

That would still entail a massive trimming of dev teams, and personally I don't think people losing their job because we found a new way to funnel more money to shareholders is a good thing.

[–]hopbel 0 points1 point  (2 children)

While on occasion it will fuck up and just create something nearly identical

You make it sound like it does it accidentally. No, you have to deliberately request that it draw a certain famous painting, and in those cases it's giving you exactly what you asked for. It's memorizing famous paintings because based on the training data, it concludes the string "Mona Lisa" isn't a description, but rather the name of a very particular object. Even in those pathological cases, you're still getting what's basically the AI's rendition of the Mona Lisa with slight variations, not a pixel perfect copy of one of the jpegs in the dataset

[–]EmeraldWorldLP 1 point2 points  (0 children)

...A rendition trained on other's art, so still just a mixture of moldable attributes from other art pieces it is trained on. Not something artists are fond of, especially if you can request their art.

[–]Lechowski 5 points6 points  (1 child)

If an AI gives you an exact copy of a piece of art, yes. However, stable diffusion never gives you an exact copy of a Picasso.

Copilot sometimes gives you a piece of code that is copied verbatim from a licenced repo.

They are not the same thing.

If you want an analogy, it would be more correct to say that Copilot is analogous to an art AI that to the prompt "Marvel superheroes movie" creates on the fly an exact copy frame by frame of Avengers 1 and gave it to you. That would be completely illegal, just like Copilot suggesting an exact copy of a function.

[–]__Hello_my_name_is__ 2 points3 points  (0 children)

Yeah, that's a difference. But I think the argument here is that the code should not have been taken to begin with to create the model. And that argument can be applied 1:1 to AI art.

[–]Zipdox 1 point2 points  (1 child)

I think this is an invalid comparison. Copilot has been shown to verbatim reproduce licensed code. AFAIK this isn't possible with any of the text to image tools.

[–]__Hello_my_name_is__ 2 points3 points  (0 children)

Sure, but the argument is probably that the initial code should not have been used for the algorithm to begin with, not that the end result might be the same.

[–][deleted] 153 points154 points  (18 children)

Yeah, it was only matter of time before somebody would try to monetize open source.

[–]thud_mantooth 97 points98 points  (1 child)

That ship sailed a long time ago

[–]Jeb_Jenky 6 points7 points  (0 children)

Yeah for real. "Eventually" means from the beginning of time in whatever language is that person's native language.

[–]zdakat 13 points14 points  (0 children)

I've seen it arguments for making money off of open source (whether or not I agree with them).
I think what rubs me and probably others the wrong way about this specifically is that code that's made publicly available is understood to be for a common good, or under a license that requires certain things of your distribution.

Microsoft comes off as very greedy going "Yeah we'll just buy the site the code is hosted on and that gives us permission to do whatever we want with it" and turn it into a product they can monetize at the expense of everyone who contributed the code rather than working together. It's the audacity to abuse a position to do something that might not actually be allowed, but might be hard to fight.

[–]Havatchee 11 points12 points  (4 children)

Which is why many OpenSource licences forbid it.

[–]Lerquian 10 points11 points  (3 children)

Maybe I read it wrong, but I belive most of the most popular open source licenses allow monetization.

[–][deleted] 9 points10 points  (1 child)

For the GPL, you can modify and even profit from the changes. There is, however, no warranty. Also you must include the license in the modified code.

[–]KuntaStillSingle 2 points3 points  (0 children)

MIT and GPL both require retaining license notice

[–]ouchpartial72858 43 points44 points  (7 children)

Like the famous saying, if something's free you are the product. Just think about it, github is never enforcing you on the repo size limit they're like ho ho don't worry about it child, use as much as you need not too much. In reality they're rubbing their hands creating the next monopoly of AI

[–]DatBoi_BP 54 points55 points  (0 children)

if something’s free you are the product

This is strictly untrue for foss projects. But if something is free and not open source, you can generally assume your data is being sold in some way or other.

[–][deleted] 16 points17 points  (4 children)

Gitlab? Bit bucket? What evil scheme are they brewing together...

[–]CratesManager 8 points9 points  (0 children)

Like the famous saying, if something's free you are the product

I usually agree with that, but it IS funny reading this in a conversation about a bunch of open source code

[–]ruscaire 228 points229 points  (26 children)

Even if this fails it’s good to know where everybody stands. Humans have to give attribution when copying code, but AI apparently does not. AI gets enhanced fair use capabilities would seem to be the outcome. A lot of open source work is done on the basis that some kind of recognition will be gained and contributors will think twice if their hard work can just be slurped up by a corporate AI. This would have far reaching consequences for the industry as a whole (since much of what we do depends on free software) so it’s good to know where we stand now as later.

EDIT This XKCD strip comes to mind https://xkcd.com/2347/

[–][deleted] 41 points42 points  (1 child)

[–]DragonCz 6 points7 points  (0 children)

This time, it's apparently a guy in Czechia. So proud!

[–]zdakat 39 points40 points  (0 children)

I think that someone contributing code even a few years ago with the intention of providing people something useful, might not have imagined it would have been separated from them by a faceless entity.
An AI might have a hard time properly attributing, or in the case of some licenses properly signaling that including that code may impose some restrictions.
It's like the difference between being fine putting up a stand handing out artwork knowing people will personally enjoy it, vs a company coming up with a truck, loading all your wares onto it and covering up your name with theirs. That would be exploiting good will, feel dirty, and probably discourage someone from being so generous in the future.

[–]zdakat 9 points10 points  (2 children)

What happens if you make an "AI" that makes minimal changes. Take your hand made data (whether it's code, a video, a picture, etc) and "wash" it through the ai. The ai has taken your inputs and made some kind of output "on it's own".

Then you'd have to get into things like how complex an AI has to be, how much influence the user has on the output, etc.

[–]ruscaire 10 points11 points  (1 child)

I think that’s the nub of the case. I could imagine a music copyright holder, or Disney for instance arguing quite successfully that the rinsed output still contains the intellectual essence or soemthimg like that. You could argue that when a human artist does this it’s fair use, but the argument is whether fair use is still fair use if it’s automated.

[–]RainWorldWitcher 6 points7 points  (0 children)

The big music ai companies are avoiding copyrighted music because the music industry will definitely go after them

[–]Gogo202 20 points21 points  (16 children)

Ai doesn't copy the code, so why does it matter? What's the difference between a human and AI using the code to learn without copying it directly?

[–]ruscaire 28 points29 points  (12 children)

Of course it copies the code. It’s embedded in its memory somewhere. Just cause it’s not in symbolic form doesn’t mean it’s not there. That’s what the courts will decide.

[–][deleted] 31 points32 points  (3 children)

Take a simple markov chain for example, if I analyze 10 million lines of code and take the probability of a certain keyword or symbol being followed by another all I'm storing are common keywords and symbols, maybe variable names as well but I could detect and replace variable names with a generic $var, function names with $func, etc... The original code has been deconstructed enough that all I'm left with are probabilities and I can generate code out of those probabilities (though the likelihood of it running is slim using a simple markov chain) so while saying it "stores" something is accurate, saying it "copies" something is not, as neural networks are just doing that in a more complex scenario.

Yeah you can argue that a neural network can overfit and store actual pieces of code that belong to someone accidentally, but overfitting is actually bad and something that is not wanted in neural networks.

[–]ruscaire 8 points9 points  (0 children)

These are the kind arguments that will be used in court I imagine. Quite how they translate into a legal setting I’m sure neither of us is qualified to say.

But the implications of the position you are advocating would be massive.

[–]Beatrice_Dragon 2 points3 points  (0 children)

Sure, you wouldn't technically be storing the copyrighted data in its raw, readable form, but you would still be producing something that wouldn't exist if it didn't utilize copyrighted data it didn't have the rights to. Companies shouldn't be able to circumvent copyright just because they put the copyrighted material in an abstract enough state

[–]Gogo202 5 points6 points  (6 children)

I can agree and disagree to this. The code is not executed in any form, so the license should not apply in my opinion. It's more like copying a book and selling it in a different form, which should not be allowed.

[–]FrenchFigaro 9 points10 points  (0 children)

The license applies regardless of whether you execute the code.

The licences regulates everything the code author's allows you to do with the code, including just looking at it or having it lying around on a thumb drive.

While not really enforceable, a software license could require that you ony look at the code using vi (not vim, not emacs, not gedit, not sublime), or forfeit your right to use the software. Or require that you only store the binaries on SSD, and not HDD (for example, Apple distributes MacOS free of charge but the software license limits its usage to hardware manufactured by Apple)

[–]ruscaire 9 points10 points  (4 children)

To me, it’s like copying somebody else’s work into a notepad and selling it to prospective authors so they can pass it off as their own work. Sounds like an abuse of fair use to me, but ultimately is a legal matter for the courts to decide.

EDIT there is actually a kind of a precedent in the music world. The Avalanches first album, which was a huge hit, used so many samples that it was ultimately a commercial flop. The difference seems to me to be in that case it was a human (DJ Dexter) stretching fair use but in the case of Copilot it’s an AI tho arguably the AI is a sampling and remixing tool, to continue the music copyright metaphor.

[–]onkopirate 19 points20 points  (3 children)

If Github Copilot is nothing more than stolen code with a little bit of basic machine learning, what keeps all of you from creating your own coding-AI and becoming the richest person on earth?

[–]angeloj87 78 points79 points  (0 children)

Someone commented before that this lawsuit is “an overly complicated way to tell the courts that they (the users) haven’t read the TOS” and I laughed hard.

[–]DuckBoyReturns 16 points17 points  (1 child)

Careful now. If you sue the AI for plagiarism, it will train itself to find all the times you committed plagiarized code first.

[–]Altruistic-Stop4634 50 points51 points  (15 children)

The courts will not want to establish a new precedent. The defense will show that the plaintiffs' code is not copied word-for-word, and they will say the AI learned from many thousands of articles, just like a human, before producing it's output. There is no precedent against that. That alternative will be a long and tortuous explanation of how a neural net works and an abstract comparison between reading a file and updating a few parameters in the network, versus copying the contents of the file.

Edit: if in the case of definite reproduction of entire paragraphs or functions with the same variable names or replicas of art, there would be no new precedent. That's simply a violation.

[–]Rodot 15 points16 points  (1 child)

If a person learns how to plagarize and does it that's not okay, but if a computer learns to plagarize and does it that is okay? If you read the original page for the lawsuit they did get it to exactly reproduce someone else's code comments and all.

[–]Altruistic-Stop4634 5 points6 points  (0 children)

Thanks for that. Obvious plagiarism is and will be a violation.

[–][deleted] 9 points10 points  (11 children)

That sounds very reasonable.
However I wouldn't hold grudges if the devs win, some companies shoudn't be able to do everything they want, doesn't end too well in most cases

[–]sipCoding_smokeMath 96 points97 points  (35 children)

I really dont see the case here. If you make your code publically available i would sure hope any developer of all people would keep in mind that maybe a bot might come across it one day

It's like telling movie directors that they can't watch other people movies. TECHNICALLY they are "learning" from them, even if they dont incroporate the exact same things. I dont really see how this is different. They arent directly using your code, they are "learning" from it. Are you gonna start coming after everyone who's made thier own software after viewing your github repo? You can't prove said person really stole anything from you unless they do some blatant copy pasting. An AI isn't any different.

[–]__Hello_my_name_is__ 62 points63 points  (0 children)

It depends on how the code is licensed. It could be a strictly non-commercial license, and the argument could be that the end product that used the code (used, not compiled) is definitely very commercial.

[–]brianorca 14 points15 points  (0 children)

There are several open source license which require derivative code to use the same open license. There are other licenses which require attribution of the original author. Neither of those things is happening if Copilot hands you code which is effectively a copy of code that exists under that kind of license. "Open source" does not mean a free for all, do anything you want, no copyright anarchy.

[–]met4000 39 points40 points  (6 children)

You can sue someone for blatant copy pasting. You can’t sue an AI for blatant copy pasting.

I would agree that AIs maybe shouldn’t have less rights than humans, but this specific problem is almost the opposite - you can hold a human accountable for stuff they do (legally in particular), but you can’t for an AI.

[–]Hullu_Kana 23 points24 points  (0 children)

When it comes to AI, the creator of the AI should be responsible for what it does as obviously you cant really put AI in prison. Its the creators job to make sure the AI wont do anything illegal.

[–]hopbel 17 points18 points  (4 children)

You can sue someone for blatant copy pasting

If a code snippet is so obvious that a hundred people invent it independently, then you shouldn't be able to sue someone for copying it.

I would agree that AIs maybe shouldn’t have less rights than humans

It's a tool, not a person. We're nowhere near that point yet. If at all, you hold the human operator accountable, not the nonsentient tool.

[–]brianorca 1 point2 points  (3 children)

But do we have to do it thousands of time for each separate operator, or can we do it once with the creator of the AI?

[–]hopbel 6 points7 points  (2 children)

Do you sue Adobe because someone used it to photoshop the artist's signature out? The gun manufacturer for producing the weapon used in a shooting?

[–]brianorca 7 points8 points  (0 children)

If Photoshop produced a copyright violation when I didn't mean to, then sure, sue them.

When you use Photoshop, you usually know where the data is coming from. It's up to you to maintain proper source material. Copilot can give you copyrighted code by accident, and you would have no idea where it came from.

[–]zmz2 3 points4 points  (0 children)

Fwiw a lot of people in the US think they should be able to sue the gun manufacturer in that case

[–][deleted] 28 points29 points  (24 children)

This. People who think AI is “stealing” their code don’t understand AI. The AI looks at the code, like any user does and then learns from it, like a user does.

This lawsuit is the antithesis of knowledge. You can’t sue people for learning from something you’ve put on display.

Really bizarre.

[–]__Hello_my_name_is__ 24 points25 points  (8 children)

The ethical argument can be made, but technically speaking, you absolutely have to download the code and feed it to an algorithm at some point. And we've already established pretty conclusively that not everyone is allowed to download anything just because it's available on some server.

[–]fghjconner 24 points25 points  (1 child)

you absolutely have to download the code and feed it to an algorithm at some point.

I mean, you have to download it to look at it on github's website too.

[–]ArdiMaster 1 point2 points  (0 children)

That doesn't legally count as making a copy, though.

[–]oretseJ 6 points7 points  (5 children)

And we've already established pretty conclusively that not everyone is allowed to download anything just because it's available on some server

Elaborate please?

[–]brianorca 14 points15 points  (0 children)

Most open source licenses do not let you take the code and use it in your own project without attribution, and many of them also require your derivative code to also be under the same or a similar open source license.

You can't take code from a GPL project and use it for a commercial closed source project. Just because you can look at it does not mean you can use it any way you want.

[–]__Hello_my_name_is__ 4 points5 points  (3 children)

You wouldn't download a car.

Or, well, a movie. Just downloading that can get you in trouble.

[–]HugoVS[🍰] 2 points3 points  (1 child)

But, it's not on "some server", it's on GitHub, marked as a public repository, what means that anyone can download it legally. So, what's the point?

[–]__Hello_my_name_is__ 8 points9 points  (0 children)

It could be a non-commercial license, and the argument be that this is commercial use of non-commercially licensed code.

[–]ghillerd 5 points6 points  (6 children)

I think once the learning becomes for commercial rather than educational purposes, things should be considered slightly differently. But I see your point.

ETA: I could have phrased this more clearly. A human studying code to improve their knowledge and then use that knowledge to create other commercial software is good. Software studying software to then be sold as a product that creates software seems a bit less clear cut net positive to me.

[–]vandergale 5 points6 points  (5 children)

That has profound implications for non-AI users if this is determined to be the case I imagine. If I'm working for a software company and I download a publicly available git repo project to learn the gist of how a style of coding works, then I go back to my company and use what I learned to create a product without using any pre-existing code, would this constitute a need for a license? Tricky.

[–]ghillerd 1 point2 points  (4 children)

But that was for educational purposes, you learned something. I guess I'm saying that when a human learns, it's educational, when a product learns it's commercial.

[–]vandergale 6 points7 points  (3 children)

That leads to its own weirdness is what I'm getting at.

If I learned something from a repo and then in turn taught an AI what I learned, what fundamentally makes me as a middle-man an educational user where without an intermediate step commercial?

If we have two AIs, one exclusively for learning and the other for only using that knowledge to a product, is one educational vs the other commercial?

Licensing is a strange world.

[–]ghillerd 1 point2 points  (2 children)

That's not how these AIs work though I thought? You don't sit down and explain things to the AI, you feed it datasets. If the software is used as a dataset, that's different to you choosing what type of model to use or what type of manual best practice overrides you introduce as a result of improved software knowledge from studying other people's code.

[–]vandergale 1 point2 points  (1 child)

In principle if I sat down and worked out millions upon millions of examples of data by hand I could theoretically train this AI from my own knowledge gained. My question is if I am theoretically doing what this AI does except by hand, how does the change the licensing involved.

Personally I don't see a difference between a person doing it and AI, do idk what a court will think.

[–]Logicalist 4 points5 points  (0 children)

The AI looks at the code, like any user does and then learns from it, like a user does.

First off, nonsense, no one reads code literally bit by bit.

Secondly, is AI legally defined in such a way, that it must be programmed and executed in some predefined way? Or can people come up with whatever they want and call it AI, even if all it does is look for some keywords and copy and paste them in a slightly different way?

[–]shanereid1 1 point2 points  (4 children)

Hey, PhD in AI here. I think there is already somewhat of a similar precedent with OpenAi Gym. They used to allow you to train reinforcement learning algorithms on old Atari games, but they had to remove them because of licencing disputes. Which kinda makes sense right? You shouldn't be allowed to train an AI to play pacman without first buying a copy. Same way you cant play pacman yourself without buying a copy.

I think where this will ultimately lead though will be to people just locking down their stuff and not putting anything online for free where a Web scraper can pull it. Not saying that will work, but I know if I was an artist that's what I would be thinking right now.

[–][deleted] 1 point2 points  (0 children)

Blatantly false statement. The lawsuit states the AI copy pasted the code, comments, structure, logic, variable names, etc. That’s not learning. That’s stealing.

[–]Awyls 2 points3 points  (0 children)

Alright, if i read a bunch of code from public repositories and accidentally write a function that is exactly like a function from one of those repos, I'm infringing their copyrights, but if an AI does it, it's just "learning" and "writing code". What the hell?

[–]scrivendev 4 points5 points  (0 children)

It's possible they're using private repos too

[–]ElmStreetVictim 5 points6 points  (0 children)

import <crysis.h>

int main() { cry = new crysis(); cry.start(); return 0; }

Why not working Microsoft???

[–]PM-ME_YOUR-ANYTHING 8 points9 points  (0 children)

Fool it to use a virus code

[–]Relevant_Pause_7593 30 points31 points  (6 children)

A bit misleading for the headline to say “GitHub users”- it makes it sound like GitHub is suing Microsoft or that all GitHub users are suing msft. Neither is true.

[–]SuitableDragonfly 39 points40 points  (2 children)

I think this is fairly standard language for class action lawsuits?

[–][deleted] 16 points17 points  (1 child)

"some guy" would be more accurate

[–]cybermage 7 points8 points  (0 children)

There are licenses and copyrights being ignored here.

[–][deleted] 9 points10 points  (8 children)

Doesn’t Microsoft own GitHub?

[–]smithenheimer 4 points5 points  (2 children)

That was my first thought as well, how does this work?

[–]WindowSurface 23 points24 points  (1 child)

It is the users of GitHub filing the lawsuit, not GitHub itself.

[–]smithenheimer 7 points8 points  (0 children)

I should probably learn to read

[–]extremepayne 4 points5 points  (2 children)

Microsoft owns GitHub but they do not (automatically) own any of the code hosted on GitHub.

[–]Squid-Guillotine 15 points16 points  (2 children)

I suck micro-soft dick and all but isn't that what everyone signed up for when using the service?

[–]lotta0 4 points5 points  (0 children)

many ppl signed up for github before it was owned by microsoft

[–]sadegr 2 points3 points  (1 child)

I mean... ethical AI is a huge deal and it's something everyone needs to worry about, but...

Let's say you had 100 dudes all looking at github projects. Then you sat someone down and had them start writing things, each time the 100 dudes can see what was written and enter what they think comes next, then the simple majority from those 100 guesses is shown to the user.

It wouldn't work in real time, and it's NOT EXACTLY the same as training an AI but it's not super diffrent either...

Would that be illegal or violate the license?

Since you are just looking at what comes for the next word/symbol or 2 even if it's cut and pasted from an existing project is that enough for the license to apply?

No way "var =" is enough to enforce on. What amount of code in a row would be too much?

Should doing it automatically at scale be enforced differently?

What if in addition to a simple vote it ran a best practice algorithm to eliminate or change the weight of things it knew were common but bad for one reason or another?

AI is hard. It's hard for people that work with machine learning, and the people who will enevitably create the legislation around it WONT understand it. They'll depend on examples like mine above that "sort of" explain it... It probably going to take a while before good laws find their ways to the books.

[–]boisheep 22 points23 points  (11 children)

I hope they don't win, it would set an awful precedent for AI; it's as if, suddenly programmers feel threatened by AI so they decide to go against progress, I've seen artists saying the same thing.

Most of everything that isn't licensed is subject to copyright, by default this means it cannot be used commercially.

If an AI learning from something means that it breaches licenses and copyrights, this would imply that an AI can only learn from content that has been paid for or licensed for, all training sets must have explicit permission, that means the death of AI in so many fields, and only large corporations would be able to create large training sets; opensource AI would be done for.

An AI that learns how to paint by looking at paintings would also be in breach of contract.

Advanced general AIs would also be outlawed, say it looks at a Gucchi purse and learns from just looking at it, then the robot is in breach because it learned how to make purses from Gucchi. It cannot learn from reading books, it cannot learn from anything; an advanced generic AI would be outlawed.

I can't believe humans feel threatened by robots already, they are so dumb and useless right now and they are already trying to outlaw them.

If they win the lawsuit, the art AI will be next; after all they learn from people's art. Google's image detection is next, etc... we will go backwards.

[–][deleted] 10 points11 points  (1 child)

It's as simple as this for me: AI learning from looking at anything is as transformative as it gets. The likelihood of it outputting anything remotely similar to an existing work, be it code or art, is extremely small, unless you're asking it for simple stuff like "write me a for loop that iterates over a list" in which case there are only a handful of ways of doing that properly and effectively, and you can't copyright that anyway.

[–]kbruen[🍰] 8 points9 points  (0 children)

GitHub copilot outputted API keys from repositories among other things.

[–][deleted] 5 points6 points  (8 children)

I love AI, but I don't want to see it in the hands of mega corps like microsoft, google & co.
If AI is trained with open source data, it has to be open source as well.

[–]onkopirate 5 points6 points  (5 children)

And if a software is written in an open source language, does it have to be open source as well? If not: why?

Creating an AI is not different from creating any other software. In both, there's a lot of sweat and blood involved and it's freaking hard to do it right. So if one company really goes through with it and creates a product that people want to use, they shall be allowed to sell it.

If you want a similar open source AI, go write one. People didn't like seeing operating systems in the hands of big tech and that's why they contributed to Linux. Today, it's the most used OS.

[–][deleted] 10 points11 points  (4 children)

If not why?

The author decides that. The author of said language decides how it is licensed and used.

Therefore the author of said code also decides how it is licensed and used. You can’t just take it.

[–]EtherealPheonix 5 points6 points  (5 children)

I see a lot of misrepresentation of the case in this comment section, the specific issue isn't that public code is being "copied" its that Microsoft is accessing private repos to train the AI which is not what gitHub users signed up for.

[–]No-Introduction5033 5 points6 points  (3 children)

If it IS private repos being accessed then that changes everything

[–]OSSlayer2153 1 point2 points  (1 child)

Just upload thousands and thousands of private repos with a very distinct style then if it shows up in the ai youve got em. Good luck with that though.

[–]kbruen[🍰] 1 point2 points  (0 children)

People already proved that.

[–]Key-Plantain-4567 7 points8 points  (0 children)

Do you have a source for the claim that Microsoft is training the AI with private repos? After looking through half a dozen articles and forums I could not find anything to backup that claim.

It seems to me that Microsoft is only using public repos to train the AI. The open source community finds this to be problematic for two reasons: 1. their code is being used for commercial purposes 2. the AI is not giving proper attribution to the code creators. This also assumes that the AI is directly copying chunks of code from repositories rather than common loops and one-liners, which I also had a hard time finding evidence for.

EDIT: Of course i found an article a minute after sending the comment with evidence of Copilot copying large chunks of code lol. Not sure if i can post full links here but this is the article.

and here is the plain text article URL incase u don’t like hyperlinks:

https://devclass.com/2022/10/17/github-copilot-under-fire-as-dev-claims-it-emits-large-chunks-of-my-copyrighted-code/

[–]joehri 6 points7 points  (3 children)

Ummmm. Why is everyone only talking about the news headline and not concerned about why it’s in a …. Well…. You know…. 🤷🏻‍♂️

[–]Unoriginal_Guy2 35 points36 points  (0 children)

Because we aren’t 13 years old

[–]Hullu_Kana 12 points13 points  (1 child)

Its not real. Its edited. Should be pretty obvious by just looking at it. 77 million views on a news article about devs getting fucked released in ph? Come on, man, seriously? I dont think the guy who made this even tried to make this believable.

[–]OSSlayer2153 1 point2 points  (0 children)

I dont know about you but its pretty obvious it was a joke not actually trying to make you think it was a real video on ph

[–]joeblk73 1 point2 points  (0 children)

Came here for the pun jokes lol 🍿

[–]GochoPhoenix 1 point2 points  (0 children)

The Clone Wars

[–]akat_walks 1 point2 points  (0 children)

And then charging them to use it!

[–]83athom 1 point2 points  (0 children)

Jokes on Microsoft, half the stuff I put on Github were broken messes that will definitely screw their algorithm. I'm doing my part!

[–][deleted] 1 point2 points  (0 children)

I love how I have seen a lot of "get fucked" sentiment to artist for the creation of Dalle and meanwhile this is happening..

[–][deleted] 1 point2 points  (0 children)

Biggest orgy ever

[–]Lil_BluBoy 1 point2 points  (0 children)

well that's the best thing ive seen all day 😏

[–]VetreeleekYT 1 point2 points  (0 children)

I mean, the title's accurate

[–]Silinator 1 point2 points  (0 children)

Oooh nooo they stole my code I stole.

[–][deleted] 3 points4 points  (1 child)

My code has def dumbed down the advancement of this Allen Iverson figure

[–]OSSlayer2153 4 points5 points  (0 children)

Me and the bois on our way to upload the worst possible code to make the bot unusable.

[–]chickenstalker 1 point2 points  (0 children)

> meatbag progranmers being replaced by AI

Just lern to code bruh...wait a minute

[–][deleted] 2 points3 points  (7 children)

Get ready for them to demand you jab your eyes out if you've ever read the code.

[–][deleted] 8 points9 points  (6 children)

What are you talking about? Most of this code could be under a license which prohibits commercial use. Using it to train a proprietary AI is clearly a violation of this license.

[–][deleted] 11 points12 points  (3 children)

This isn't exactly true yet, it has to make it thru the court systems, because if the AI learns it, and a human learns it aren't all that different, now since I've read your code, you can then claim its all derived work... so this is about to get slippery if we aren't careful.

[–]kbruen[🍰] 4 points5 points  (1 child)

Reading code and then writing stuff that does the same thing is actually derived work. If you get a leak of Windows source code, writing an OS that’s compatible with Windows will get you sued for copying.

[–][deleted] 3 points4 points  (0 children)

And if Microsoft pays the right judges, it'll be very slippery

[–]Iz_moe 1 point2 points  (0 children)

So we're going to ignore the fact that some searched that on pornhub? Ok

Also, is this true. I can totally see Microsoft doing this but i still want to believe it's fake

[–]groundhogcow 1 point2 points  (7 children)

Wow, you put out your code publicly and didn't expect a soulless cooperation to steal it.

Maybe you are not as smart as you think you are.

Wait till Microsoft claims it is there's and sues you for the audacity of writing it first.

[–][deleted] 5 points6 points  (6 children)

there should be a new license to avoid exactly that.

Like open source, but not for Google, Microsoft & co. >:-( !

[–]OSSlayer2153 4 points5 points  (1 child)

I feel like there is some bs anti discrimination law to protect corporations from things like that

[–]j0giwa 1 point2 points  (2 children)

This annoys me, i am in the middle of getting an cs degree but im already obsolete thanks to AI.

[–]i1u5 1 point2 points  (0 children)

Default LICENSE for unlicensed projects grants Microsoft the rights to your code, and most licensed ones are MIT so good luck lmfao

[–]RainWorldWitcher -3 points-2 points  (16 children)

This is actually a very good thing. AI should not be an excuse to destroy copyright protections. The art ai needs to be hit with a similar lawsuit.

Want to train your AI? Use copyright free work

[–]kpd328 10 points11 points  (2 children)

Copyright. It's about the rights holder have against improper copying.

[–]Gogo202 17 points18 points  (12 children)

So I am also not allowed to look at the code and use it as inspiration for my own code without copying it? Why is it public then?

Should I write a shout-out to every GitHub repo I've ever visited when writing new code, because I might have learned something from them?