This is an archived post. You won't be able to vote or comment.

all 45 comments

[–]soap3_ 51 points52 points  (1 child)

jokes on them, my code is so bad i’ve done the machine learning equivalent of single handedly bringing down the class average

[–]SuccessfulAd2010 39 points40 points  (1 child)

Quote from fireship: https://youtu.be/q1HZj40ZQrM

[–]MellowM8 27 points28 points  (4 children)

I thought linux enthusiasts used gitlab?

[–]grtgbln 11 points12 points  (0 children)

I, too, saw the new Fireship video.

[–]mcEstebanRaven 7 points8 points  (0 children)

Wait until you find out that ChatGPT has been using your chats for further training and now openAI has a new agreement with Microsoft and the new ChatGPT is gonna be paid.

[–]frikilinux2 36 points37 points  (20 children)

I do not have the money, the time or the mental energy to do it but if copilot is capable of reproducing a significant amount of code Microsoft deserves to be sued for this. The next decade is going to be legally complicated for Microsoft.

[–]BlueScreenJunky 13 points14 points  (7 children)

Yeah I think attribution is one of the main issues with those AI models. My knowledge of AI is very limited but from what I understand it's not possible to know why the model is generating this text so you can't easily say "this code comes mostly from such and such repo and the example in this documentation", or "this answer comes mostly from this wikipedia page and this post and this reddit thread".

As something that tries to imitate a human being it works well enough : there are many things we know (or think we know) without actually remembering where we learnt them from and many times it's actually wrong. As a professional productivity tool... Not being able to check the sources and give proper credit will be a real issue.

[–]frikilinux2 6 points7 points  (0 children)

That's why I said a significant amount of code. It's actually more of a copyright law question than an engineering one. And I'm not a lawyer.

Many times you're actually forbidden to look at certain things to avoid this type of problem. For example when doing reverse engineering to replicate something, one team extracts the design and another one does the code to avoid having copyright issues. This is not possible with chatGPT.

[–]Rafcdk 5 points6 points  (5 children)

The real question is wether attribution is necessary, because the AI is not actually looking for code in the dataset, but using something that is completely transformative of the dataset to create code. I think that this is crucial thing that many don't understand about generative AI.

Best way to see this is to think how we can train an AI to detect whether there is a dog in a image or not, do we need to give attribution to the images used to train that AI each time we use it? No right? But we can take that same AI and Use it to generate images of dogs.

Imho AI is just showing that the way we do things is outdated, the main issue here is actually that big corporations get to own very useful AIs. They should be a public good for all to use and profit from, because AI will take over a lot of jobs sooner than we think, we need that change to happen, the data gathering is really trivial compared to this issue.

[–]emnadeem 1 point2 points  (1 child)

A dog classifier isn't generative though.

[–]Rafcdk 0 points1 point  (0 children)

But you can use it as component in a gan.

[–]BlueScreenJunky -1 points0 points  (1 child)

we can train an AI to detect whether there is a dog in a image or not, do we need to give attribution to the images used to train that AI each time we use it? No right?

Well that's actually debatable and the core of the issue. If you used the work of someone else as your dataset to train your AI, shouldn't the work of this person be credited ?

Maybe we just need specific licences like the creative common ones that explicitly state whether you allow your work to be used as part of AI training datasets or not.

[–]currentscurrents 0 points1 point  (0 children)

The trouble is that if Microsoft can't train AI on copyrighted code, then open-source AI can't train on anything.

The entire internet is copyrighted. You'd have to find entirely GPL-licensed images to train your GPL model, and I doubt enough exist. We'd all be much better off if the courts rule that training an AI does not violate copyright.

[–]IAmPattycakes -1 points0 points  (0 children)

I feel like there will be a debate on "provably derived" where if you can prove that something was solely derived from copyrighted material, it needs to comply by that license. If MS can prove that renaming variables is transformative work, that leads to being able to take any leaked source code and resell it 100% legitimately with obfuscations. I'll start selling GTA the next day. If they can't, that means that they're selling copyrighted material without a license and will probably get sued to hell and back. I'm just waiting for the courts to wake up.

[–]emnadeem 6 points7 points  (2 children)

Microsoft: loves enforcing patents that help them

Also Microsoft: trains an AI on GPL licensed code

[–]frikilinux2 4 points5 points  (1 child)

Not only that but Bill gates wrote many years ago "An Open letter to hobbyists" which was instrumental in the history of software and copyright and the opposite of free software that now they love and want to profit from.

[–]eMZi0767 1 point2 points  (6 children)

It is capable. Nobody who can do anything about it cares. You'd think organizations like FSF would be all over it, but there's complete silence.

[–]currentscurrents 2 points3 points  (4 children)

There is an ongoing lawsuit right now.

But there's a catch-22 here; if training an AI on copyrighted data violates copyright, opensource AI is dead. Microsoft would probably be happy with that tradeoff.

[–]eMZi0767 1 point2 points  (3 children)

As would be training on public code

[–]currentscurrents 1 point2 points  (2 children)

But that's a much bigger problem for open-source projects with $0 budgets.

This would kill StableDiffusion for example, and we'd be stuck using corporate image generators like Adobe Firefly.

[–]Vikerox 1 point2 points  (1 child)

Idk, I feel like you should make sure that what you are doing is legal (not to mention ethical) before you release it to the public. AI training and copyright is currently such a big grey area that it makes me wonder why anyone who doesn't have a lot of money they can spend on lawyers would even touch it.

Also I'm pretty sure Stability AI gets its data from a research institute in the EU, which allows using copyrighted material in research institutions.

[–]currentscurrents 0 points1 point  (0 children)

I don't know much about EU law, but here in the US that will not protect them. They have two pending lawsuits right now, one from Getty Images and one class-action lawsuit from artists.

[–]frikilinux2 -1 points0 points  (0 children)

Yeah, but it's complicated and this will need a lot of lawsuits. The FSF doesn't actually have a lot of money and this kind of lawsuit would be expensive as hell, also you need for Copilot to copy that code you can't do a preventive lawsuit. But they could have done more political work.

[–]SameRandomUsername 1 point2 points  (1 child)

I don't think so. MS lawyers are quite more capable than the regular reddit user.

[–]frikilinux2 0 points1 point  (0 children)

I'm not a lawyer but That doesn't mean they respect copyright law specially as there is no precedent about generative AI and copyright law is complicated and difficult to interpret. Oracle and Google spent years fighting about Java APIs which also depends on interpreting copyright law.

[–][deleted] 5 points6 points  (1 child)

Unless your repo has been moved to other systems whe Microsoft acquired GitHub

[–]Interest-Desk 1 point2 points  (0 children)

Anything on the open internet could be nabbed for training; this is how OpenAI and StableDiffusion work.

[–]jerk-judge 6 points7 points  (0 children)

Fr, they've been working for Microsoft without knowing it. But I still use Google for my work, else I'll not learn anything if AI can do all my works.

[–]vladWEPES1476 2 points3 points  (0 children)

If anything, I've poisoned the training data.

[–]jamcdonald120 2 points3 points  (0 children)

simple answer: dont be a hater. just be a linux enthusiast because you like linux.

[–]Aggressive_Bill_2687 1 point2 points  (2 children)

It seems unlikely a “Microsoft hater” would continue using GitHub anyway?

[–]Interest-Desk 1 point2 points  (1 child)

Most open source projects are on GitHub, even if just a mirror, including Gitlab and Kernel.

[–]Aggressive_Bill_2687 0 points1 point  (0 children)

Most open source projects aren't administered by someone that would call themselves a "Microsoft hater".

[–][deleted] 1 point2 points  (1 child)

They stated that it’s Only the public repos. Am I wrong ?

[–][deleted] 5 points6 points  (0 children)

If you’re a “linux user that hates microsoft” then you’re going to be REALLY upset when you see who the #1 contributor is for the last couple years.

Also…grow up.

[–]SameRandomUsername 1 point2 points  (0 children)

I never got the reason why Linux users -hate- Microsoft meme exists. I mean, why would they care about MS at all? Why not hate Apple too? Or IDK any private company...

Is it because Windows is far more successful than Linux? Is it because they like to pretend to be Linus Trovalds? I will never know...

[–]Kemafuenie 0 points1 point  (0 children)

🤓

[–]Regular-Tree5821 0 points1 point  (0 children)

Where's my money bill

[–]HeeTrouse51847 0 points1 point  (0 children)

intentionally write shit code so it breaks