this post was submitted on 02 Oct 2023
170 points (89.7% liked)

Technology

37792 readers
487 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago
MODERATORS
(page 2) 50 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 2 years ago* (last edited 2 years ago)

Let me replicate the point of contention on this topic:

IANAL so correct me on this - copyright currently protects the expression of works, with the exception of fair use. So let's ignore fair use for now and focus instead on "expression is copyrighted, idea is fair game".

Let's look at how AIs work. AIs that generate text are usually called "LLM" (large language model). In their training, they are shown some text and then they either predict what the response or what the next word looks like. They then get to look at the right solution, and they learn how to improve in this specific scenario. The way they learn, in it's simplest form, is looking at the previous text, doing some math on it with specific weights, and then they adjust those weights. We're talking arbitrary math and arbitrary decimals for the most part. So imagine on your hard drive the AI model looks like a metadata, a blueprint where the weights are and how they interact, and then the numbers attached to the weights (this is the trained bit).

Under current copyright law you would need to prove that these numbers are either specifically representative of the expression of a book itself, or in tandem with the rest of the AI they give the AI the ability to replicate the book in its expression as to be a substitute for the book.

The former is probably impossible to argue, as these numbers in its very nature and on its own don't represent the book. For one, the numbers represent what to do with a given number of inputs, but they then also include a wide range of books and text that are important for a particular section of weights in the AI model.

Now the latter argument is interesting. I am not a lawyer, so no clue how one would argue this in court, but there is a point to be made that some of the expression of a book is resembled in the output of the AI. Now this doesn't look to me like something to be measured in traditional copyright, but there's certainly an argument here that this deserves protection.

This is the point of contention. And every time ppl say "this should be easy", no, it shouldn't. Law is hard, and the technical details of AI are even harder. I dumbed down a lot of topics here to make it easier to understand for a layperson, but ask experts and they can report about the wrinkles of it for days.

Hopefully this helps some ppl understand the vast majority of the issues. Please correct me in the comments, or give me your best arguments, would love to see all the facettes on this.

load more comments
view more: ‹ prev next ›