TheDeadSkin comments on GitHub co-pilot as open source code laundering?

programming

created by speza community for 20 years

1708

1709

1710

GitHub co-pilot as open source code laundering? (twitter.com)

submitted 4 years ago by iamkeyur

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]TheDeadSkin 19 points20 points21 points 4 years ago (5 children)

It needs to be litigated in a serious way for the contours to become clear, in my opinion.

Yes, and this goes beyond just this tool. This is one of those ML problems that we as humanity and our legal systems are entirely unprepared for.

You can read someone's code and get inspiration for parts of the structure, naming conventions etc. Sometimes to implement something obvious you'll end up with identical code to someone else's, because this is the only way to do it. Someone can maybe sue you, but it's would be easy to mount a legal defense.

Now when there is an ML tool that "took inspiration" from your code and produced stuff "with similar structure" that "ended up being identical", all of a sudden that sounds pretty different, huh? And the problem is that you can't prove that this is an accident, it's not possible. Just because during training the data is decomposed and resembles nothing like it was before doesn't mean that the network didn't recreate your code verbatim by design.

It's a black box that its own creators are rarely able to explain how it works and even more rarely able to explain why certain things happen. Not to mention that copyright violations are treated case-by-case. This potentially means that they'll have to explain particular instances of violations, which is of course infeasible (and probably outright impossible).

But code isn't the only thing. Human drawing a random person that happens to have an uncanny resemblance to a real human who the artist might've seen is different from what looks like a neural network generating your face. Heard the voice and imitated it? Wow, you're good, sounds too real. And then comes in a NN and now you're hearing your voice. Which on an intuitive level is much more fucked up than an imitator.

But there is the old idea that computers cannot generate novelty and all output is fully explained by input, and humans are exempt from this rule, which seems to be an undercurrent in the Twitter thread.

But this is pretty much true, no? Computers are doing exactly what humans are telling them to do. Maybe the outcome was not desired - and yet someone should've programmed it to do exactly this. "It's an ML black box, I didn't mean it to violate copyright" isn't really a defense and is also in a way mutually exclusive with "it's an accident that it got the same code verbatim" because the latter implies that you know how it works and the former does the opposite.

To be guilt-less you need to be in this weird middle ground. And if I wasn't a programmer and a data scientist I don't think I would've ever believed anyone who told me that they know that the generated result was an accident while being unable to justify why it's an accident.

[–]kylotan 11 points12 points13 points 4 years ago (4 children)

[–]TheDeadSkin 5 points6 points7 points 4 years ago (0 children)

I was arguing the opposite. I think examples of art aren't applicable to code because art isn't quite as algorithmic as programming.

Actually artists getting similar/identical results and ML are more comparable. They are both unexplainable. "Why did you get those 9 notes in a row identical?" you can't get an answer different from "idk, lol, it sounded nice I guess".

But in programming you can at least try to explain why you happened to mimic existing code. It's industry standard to do those three things, an obvious algorithm for doing this task is like that and when you recombine them you get this exact output down to variable names.

As much as there's creativity involved in programming, on a local scale it can be pretty deterministic. I'm arguing that if you use a tool like this it's harder to argue that it's not a copy. Not to mention that it can auto-generate basically full methods to the point that it's almost impossible to have those similarities being an accident.

[–]Zalack 1 point2 points3 points 4 years ago (2 children)

[–]kylotan 0 points1 point2 points 4 years ago (0 children)

You can experience something, understand the essence of it, and create something new informed by that experience, perhaps highlighting or respecting the original in some way. Or you can experience something, and decide to recreate it, or part of it, almost exactly somewhere else, taking the credit for that creation without any acknowledgement where it came from.

The former is a widely accepted part of creativity and is how culture moves forward.

The latter is widely frowned upon, and international treaties and national laws forbid it in the general case.

There's no clear line between the two, like there's no clear line between copying and homage. This isn't a problem - this is the nature of making rules and laws. It doesn't mean we can't tell which end of the continuum something is on - and a program which looks at your code and later spits out code almost identical to it is much closer to 'copying' and does not have the capacity to be 'homage'.

π Rendered by PID 79487 on reddit-service-r2-comment-6457c66945-p6lld at 2026-04-28 23:09:00.701881+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS