this post was submitted on 28 May 2025

125 points (95.6% liked)

Technology

70550 readers

4140 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

125

AI model collapse is not what we paid for (www.theregister.com)

submitted 6 days ago by [email protected] to c/[email protected]

19 comments fedilink hide all child comments

top 19 comments

sorted by: hot top controversial new old

[–] [email protected] 59 points 6 days ago* (last edited 6 days ago) (1 children)

In an AI model collapse, AI systems, which are trained on their own outputs, gradually lose accuracy, diversity, and reliability. This occurs because errors compound across successive model generations, leading to distorted data distributions and "irreversible defects" in performance. The final result? A Nature 2024 paper stated, "The model becomes poisoned with its own projection of reality."

A remarkably similar thing happened to my aunt who can't get off Facebook. We try feeding her accurate data, but she's become poisoned with her own projection of reality.

[–] [email protected] 12 points 6 days ago (1 children)

It's such an easy thing to predict happening, too. If you did it perfectly, it would, at best, maintain an unstable equilibrium and just keep the same output quality.

[–] [email protected] 5 points 6 days ago* (last edited 6 days ago)

Unstable, yes. Equilibrium... no.

She sometimes maintains coherence for several responses, but at a certain point, the output devolves into rants about how environmentalists caused the California wildfires.

These conversations consume a lot of energy and provide very limited benefit. We're beginning to wonder if the trade-offs are worth it.

[–] [email protected] 73 points 6 days ago (1 children)

What all this does is accelerate the day when AI becomes worthless.

It was always worthless. Or, at least, it was always worthless thinking that LLMs were a substitute for reasoning AI, which is what it appears many people have been suckered into.

[–] thatonecoder 5 points 6 days ago

Yeah… I have tried LLMs, and they have horrible hallucinations. For instance, when I tried to “teach” one about Hit Selecting in Minecraft, I used an example of a player that uses it (EREEN), it kept corrupting it to EREEEN. Even when I clarified, it kept doing it, forever.

[–] [email protected] 68 points 6 days ago (1 children)

Google Search has been going downhill for way longer than a few months. It's been close to a decade now.

[–] [email protected] 29 points 6 days ago (3 children)

TBF, SEO and other methodologies that game the rankings muddy the waters and make it harder to get to what you are looking for.

[–] [email protected] 12 points 6 days ago (2 children)

That is not the problem though, Google used to just give you the results containing what you searched for, the problem started when they tried to be "smarter" than that.

[–] [email protected] 7 points 6 days ago

That has never been true for Google. That's what other search engines did in the late 90s, and Google's success comes precisely from implementing smart ranking rather than just being a directory.

They were also early adopters of semantic search using NLP and embeddings, way before LLMs became popular.

[–] [email protected] 4 points 6 days ago (1 children)

That's never really been true. It's a cat and mouse game.

If Google actually used its 2015 or 2005 algorithms as written, but on a 2025 index of webpages, that ranking system would be dogshit because the spammers have already figured out how to crowd out the actual quality pages with their own manipulated results.

Tricking the 2015 engine using 2025 SEO techniques is easy. The problem is that Google hasn't actually been on the winning side of properly ranking quality for maybe 5-10 years, and quietly outsourced the search ranking systems to the ranking systems of the big user sites: Pinterest, Quora, Stack Overflow, Reddit, even Twitter to some degree. If there's a responsive result and it ranks highly on those user voted sites, then it's probably a good result. And they got away with switching to that methodology just long enough for each of those services to drown in their own SEO spam techniques, so that those services are all much worse than they were in 2015. And now indexing search based on those sites is no longer a good search result.

There's no turning backwards. We need to adopt new rankings for the new reality, not try to turn back to when we were able to get good results.

[–] [email protected] 2 points 6 days ago (1 children)

I am not talking about SEO and ranking crap. I am talking about the fact that Google e.g. decides that it is smarter than me when I say I want this exact string while searching for an error message and tries to replace all kinds of parts of it. I am talking about Google replacing terms in my search with other terms because it thinks those are more popular even though the term I used was e.g. the name of a software that is deliberately not the common word Google replaced it with. I am talking about Google replacing actual search results with AI summaries.

[–] [email protected] 3 points 6 days ago

I share your frustration. I went nuts about this the other day. It was in the context of searching on a discord server, rather than Google, but it was so aggravating because of the how the "I know better than you" is everywhere nowadays in tech. The discord server was a reading group, and I was searching for discussion regarding a recent book they'd studied, by someone named "Copi". At first, I didn't use quotation marks, and I found my results were swamped with messages that included the word "copy". At this point I was fairly chill and just added quotation marks to my query to emphasise that it definitely was "Copi" I wanted. I still was swamped with messages with "copy", and it drove me mad because there is literally no way to say "fucking use the terms I give you and not the ones you think I want". The software example you give is a great example of when it would be real great to be able to have this ability.

TL;DR: Solidarity in rage

[–] [email protected] 5 points 6 days ago (1 children)

Look at how they give results for youtube, maybe three relevant and then its back to suggestions.

[–] [email protected] 2 points 6 days ago (1 children)

YT search has always been comically bad if your looking for "long tail results".

[–] [email protected] 3 points 6 days ago

It absolutely got worse around 5 or so years ago, literally filled with suggestions instead of the fucking search term.

[–] [email protected] 3 points 6 days ago

Because Google allows them to. They could easily ignore these kind of tricks but choose not to

[–] [email protected] 24 points 6 days ago

But could I pay for model collapse? I'd be down for that.

[–] [email protected] 2 points 6 days ago

LLMentalist to the rescue

[–] [email protected] 2 points 6 days ago

Could also be the AI crawler flood & the responds from website administrators 🤔