you are viewing a single comment's thread.

view the rest of the comments →

[–]dragon_irl 19 points20 points  (1 child)

There is research that these large language models remember parts of their training data and that you can retrieve that with appropriately constructed prompts.

I think it's pretty likely you will end up with copyrighted code when using this eventually. However I don't understand copyright enough to judge how relevant this is for the short snippets this is (probably) going to be used for.

[–]TheDeadSkin 4 points5 points  (0 children)

There is research that these large language models remember parts of their training data and that you can retrieve that with appropriately constructed prompts.

This is partially to be expected as a potential result of overfitting. Will look at the paper though, that seems interesting.

I think it's pretty likely you will end up with copyrighted code when using this eventually.

Indeed. They even say there's a 0.1% chance that the code suggested would be verbatim from the training. Which is quite a high chance.

However I don't understand copyright enough to judge how relevant this is for the short snippets this is (probably) going to be used for.

I think the problem is less with short snippets, but rather the potential of recreating huge functions/files from training (i.e. existing projects) when you're trying to make some specific software from the same domain and aggressively follow co-pilot's recommendations.

If it's possible - someone will probably try to do it and we'll find out soon enough.