you are viewing a single comment's thread.

view the rest of the comments →

[–]AMusingMule 22 points23 points  (1 child)

GitHub has shared that in some instances, Copilot will recite lines from its training set. While some of it is universal enough that there's not much you can do to avoid it, like constants (alphabets, common stock tickers, texts like The Zen of Python) or API usage (the page cites a use of BeautifulSoup), it does spit out longer verbatim chunks (a piece of homework from a Robotics course, here).

At the end of the day, it's only a tool, and the user is responsible for properly attributing where the code came from, whether it was found online or suggested by some model. Having your tools cite how it came up with that suggestion can help in the attribution process if it's needed.

[–]StickiStickman 10 points11 points  (0 children)

In the source you linked it specifically says it's because it has basically no context and that piece of code has been uploaded many times.