Technology

63375 readers

6704 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews (arstechnica.com)

submitted 10 hours ago by [email protected] to c/[email protected]

11 comments fedilink hide all child comments

top 11 comments

sorted by: hot top controversial new old

[–] [email protected] 9 points 6 hours ago (1 children)

I think most ML experts (that weren't being paid out the wazoo for saying otherwise) have been saying we're on the tail end of the LLM technology sigma curve. (Basically treating an LLM as a stochastic index, the actual measure of training algorithm quality is query accuracy per training datum)

Even with deepseek's methodology, you see smaller and smaller returns on training input.

[–] MDCCCLV 1 points 39 minutes ago

At this point, it is useful for doing some specific things so the way to make it great is making it cheap and accessible. Being able to run it locally would be way more useful.

[–] [email protected] 17 points 8 hours ago* (last edited 8 hours ago)

Is it because they used data from after chat GPT was released?

Edit:

marginally better performance than GPT-4oat 30x the cost for input and 15x the cost for output.

Ahh, good old fashion law of diminishing returns.

[–] [email protected] 43 points 9 hours ago (2 children)

With this, OpenAI is officially starting to crack. They've been promising a lot and not delivering, the only reason they would push out GPT4.5 even though it's worse and more expensive than the competition is because the investors are starting to get mad.

[–] [email protected] 2 points 1 hour ago* (last edited 1 hour ago)

Who wouldn’t be mad considering the amount of money OpenAI is burning. They’re already taking a huge risk and I believe mostly out of ideology, believing this time it’ll be the singularity simply because ChatGPT has this ability to fool humans into thinking there’s some humanity there.

[–] [email protected] 3 points 6 hours ago

Thry also had poor video generatiin

[–] [email protected] 26 points 9 hours ago

I’m sure turning on a few more nuclear plants to power shoveling an ever larger body of AI slop-contaminated text into the world’s most expensive plagiarism machine will fix it!

[–] [email protected] 10 points 9 hours ago

That’s bad. Mmmmmkay.

[–] humanspiral 1 points 6 hours ago

Not an expert in the field, but OP seems to be using relevant metrics to criticize model cost/performance.

One reason to dislike OpenAI is its "national security ties". It can probably get the "wrong customers" paying whatever expense it is.

[–] [email protected] 4 points 8 hours ago (1 children)

That was kind of expected, but Claude isn't that good either.

[–] thatsnothowyoudoit 6 points 6 hours ago* (last edited 5 hours ago)

I think that depends on what you’re doing. I find Claude miles ahead of the pack in practical, but fairly nuanced coding issues - particularly in use as a paired programmer with Strongly Typed FP patterns.

It’s almost as if it’s better in real-world situations than artificial benchmarks.

And their new CLI client is pretty decent - it seems to really take advantage of the hybrid CoT/standard auto-switching model advantage Claude now has with this week’s update.

I don’t use it often anymore but when I reach for a model first for coding - it’s Claude. It’s the most likely to be able to grasp the core architectural patterns in a codebase (like a consistent monadic structure for error handling or consistently well-defined architectural layers).

I just recently cancelled my one month trial of Gemini - it was pretty useless; easy to get stuck in a dumb loop even with project files as context.

And GPT-4/o1/o3 seems to really suck at being prescriptive - often providing walls of multiple solutions that all somehow narrowly miss the plot - even with tons of context.

That said Claude sucks - SUCKS - at statistics - being completely unreliable where GPT-4 is often pretty good and provides code (Python) for verification.