this post was submitted on 30 Jan 2025
154 points (94.8% liked)

Technology

61227 readers
5047 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

So taking data without permission is bad, now?

I'm not here to say whether the R1 model is the product of distillation. What I can say is that it's a little rich for OpenAI to suddenly be so very publicly concerned about the sanctity of proprietary data.

The company is currently involved in several high-profile copyright infringement lawsuits, including one filed by The New York Times alleging that OpenAI and its partner Microsoft infringed its copyrights and that the companies provide the Times' content to ChatGPT users "without The Times’s permission or authorization." Other authors and artists have suits working their way through the legal system as well.

Collectively, the contributions from copyrighted sources are significant enough that OpenAI has said it would be "impossible" to build its large-language models without them. The implication being that copyrighted material had already been used to build these models long before these publisher deals were ever struck.

The filing argues, among other things, that AI model training isn't copyright infringement because it "is in service of a non-exploitive purpose: to extract information from the works and put that information to use, thereby 'expand[ing] [the works’] utility.'"

This kind of hypocrisy makes it difficult for me to muster much sympathy for an AI industry that has treated the swiping of other humans' work as a completely legal and necessary sacrifice, a victimless crime that provides benefits that are so significant and self-evident that it's wasn't even worth having a conversation about it beforehand.

A last bit of irony in the Andreessen Horowitz comment: There's some handwringing about the impact of a copyright infringement ruling on competition. Having to license copyrighted works at scale "would inure to the benefit of the largest tech companies—those with the deepest pockets and the greatest incentive to keep AI models closed off to competition."

"A multi-billion-dollar company might be able to afford to license copyrighted training data, but smaller, more agile startups will be shut out of the development race entirely," the comment continues. "The result will be far less competition, far less innovation, and very likely the loss of the United States’ position as the leader in global AI development."

Some of the industry's agita about DeepSeek is probably wrapped up in the last bit of that statement—that a Chinese company has apparently beaten an American company to the punch on something. Andreessen himself referred to DeepSeek's model as a "Sputnik moment" for the AI business, implying that US companies need to catch up or risk being left behind. But regardless of geography, it feels an awful lot like OpenAI wants to benefit from unlimited access to others' work while also restricting similar access to its own work.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 3 points 4 hours ago

I'm a software engineer and I have been playing guitar nearly every day since I was 8 years old. I release everything GPL/AGPL or CC-BY-SA that I own and can. Heck, I am racking every day trying to figure out ideas that can hopefully make me a living while also giving everything I have away. I don't want to own my shit man, I just want to share what I have and hope it's useful, and I don't want people being assholes so I opt for the copyleft instead of liberal licenses.