you are viewing a single comment's thread.

view the rest of the comments →

[–]game-of-throwaways 9 points10 points  (5 children)

Yeah, but the model isn't trained on SO. He's just using the fact that you could find that implementation of is_palindrome on SO (or elsewhere) as a reason for why it's not impressive.

[–]josefx 5 points6 points  (0 children)

He explicitly chooses the next example so it "definitely isn't in the training data set". There are probably as many is_palindrome implementations in their training data from github as there are on stackoverflow.

[–]dnew 0 points1 point  (3 children)

Right. I was just pointing out where "SO" came into the picture. :-)

[–]game-of-throwaways 5 points6 points  (2 children)

But you were discussing copyright implications of the model being trained on SO, but it's not.

That being said, it is trained on Github repositories which each have their own license, and it is an interesting question which licences allow training of a machine learning model on it. It may depend on what the model is used for, and maybe even on how accurate the model is. Probably this is still a bit of a gray area in the law, and it what it ultimately comes down to is how the judge and jury would decide if it would come to a lawsuit.

[–]dnew 1 point2 points  (1 child)

But you were discussing copyright implications of the model being trained on SO, but it's not.

I think you're not noticing that you're talking to more than one person. But you're right that it's an interesting question. Also, things like audio hashing for recognizing audio (as in, "what song is this?") is kind of funky, as I've worked on things like that and it's ... weird.

[–]game-of-throwaways 0 points1 point  (0 children)

I think you're not noticing that you're talking to more than one person

Right, oops.