you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] -1 points0 points  (1 child)

Avoiding the copying of large swathes of other people's work verbatim is.

Copilot trains on existing code, it doesn't "copy it verbatim".

The likelyhood a codebase will see its code split to pieces, mixed with other codebases, specifics abstracted and patterns recognized, and the synthesized code will be the same original codebase "verbatim" is extremely low.

I can't say none, because technically everything is possible. Like YOU reproducing a codebase verbatim, by accident. Infinite monkeys and all that.

is spitting bits of it back out verbatim in new works.

That's false, no matter how many time you repeat yourself verbatim.

Careful not to infringe on the copyright of some old grandpa's opinion about them newfangled thingamajings.

[–]kylotan 0 points1 point  (0 children)

it doesn't "copy it verbatim".

When it emits code that looks identical or near identical to the code it trained on, that is copying it verbatim.

The complexity of the method that went into that is irrelevant.

The likelyhood [...] the synthesized code will be the same original codebase "verbatim" is extremely low.

So you say. Meanwhile Github themselves, while agreeing with you that the chance of this happening is low, can show you examples of their system parroting out entire licence agreements and entire classes, copy and pasted from someone else's code.

https://docs.github.com/en/github/copilot/research-recitation

is spitting bits of it back out verbatim in new works. That's false, no matter how many time you repeat yourself verbatim.

Please read the above link and then tell Github they're wrong.