Open Source

33091 readers

350 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Posts must be relevant to the open source ideology
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago

MODERATORS

[email protected]

Should GitHub Be Sued For Training Copilot On GPL Code? (fosspost.org)

submitted 3 years ago by [email protected] to c/[email protected]

13 comments fedilink hide all child comments

top 13 comments

sorted by: hot top controversial new old

[–] [email protected] 27 points 3 years ago (1 children)

GitHub’s current CEO said that from their point of view, they see this as a part of “fair use”

Of course they will argue that, otherwise they would have to scrap the whole project. But it doesnt matter what they say, I think only the authors of the code can answer it. So far it looks mainly like a way to get around open source licenses, and use the code in proprietary projects. At the very least, they should make it opt in for devs to have their code analyzed.

However, there is nothing that prevents anyone from doing the same for free.

Absolutely not true, it takes some very specific knowledge and a team of developers to implement this. Something which can only be done by large companies.

Is anyone actually planning to sue them? I would definitely support that, because my own code is likely affected.

[–] [email protected] 11 points 3 years ago (1 children)

I completely agree. The opt-in thing will be the way if you want to do that, but they know that the participation will be insanely lower than just using the wohle code.

[–] [email protected] 3 points 3 years ago (1 children)

Well, they could've included only permissively-licensed source code. They generally have that information.

[–] [email protected] 2 points 3 years ago (1 children)

You really mean seeing the project as a derivative of the code used to train the bot, right? In that case, even permissive licenses usually require citing the author. In fact, in Europe, even if not stated in the license, the author can never loses their right to attribution.

I guess to refine your solution, every work built with the help of copilot should credit "copilot contributors" à la OSM

[–] [email protected] 1 points 3 years ago

Yeah, good point. To really get it right, they would have to paste each snippet with a full copyright+license header attached. If the dev then removes that information, they're not at fault.

But yeah, it really feels more and more stupid, the more I think about it, to build a commercial tool that algorithmically reproduces copyrighted works with the copyright information removed.

[–] [email protected] 18 points 3 years ago* (last edited 3 years ago)

Lots of example of Copilot regurgitating code verbatim: Quake's fast inverse sqrt (GPL), copyright headers or the entire GPL license, someone's "about me" page... This should be enough to convince anyone that, even when they get it to stop proposing "obviously stolen" code (e.g. rename variables a bit, propose code without names in it), it is still all stolen code.

[–] [email protected] 18 points 3 years ago

It's pretty unfortunate that the author of the article took a "who cares", if its open source just let anyone use it as they wish, attitude. They seem ignorant of the history of why these strong copyleft licenses became necessary in the first place; to protect open source from corporate subversion and cooptation.

[–] [email protected] 14 points 3 years ago

GitHub should be sued for a lot of things IMO

[–] [email protected] 10 points 3 years ago

Yes.

[–] [email protected] 9 points 3 years ago* (last edited 3 years ago)

I definitely feel like their fair use argument won't hold in court.

When you tell someone about a song and show them a 5-second-snippet, that's fair use.

But if you instead send someone that 5-second-snippet with the copyright information removed, and then even with the suggestion to include that snippet in their own song, that's you infringing the copyright, not the person that innocently included your sample.

[–] [email protected] 6 points 3 years ago

Yes.

[–] [email protected] 5 points 3 years ago

This thing is not much more than a database of all the data it has seen. Far from "just like a human who reads various books". If they want to make it useful for "humanity" they should release it under GPL - probably that would be complicated given all the mix of the licenses used though.

This is in essence similar to Clearview AI. Data/code is published with an specific intention, which is reflected by the license in case of code. It should not be used outside of that intention which for GPL is clearly stating that the author does not want to have their code used as part of a commercial software.