this post was submitted on 21 Mar 2025
266 points (99.3% liked)

Selfhosted

44814 readers
619 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

I just started using this myself, seems pretty great so far!

Clearly doesn't stop all AI crawlers, but a significantly large chunk of them.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 1 day ago (1 children)

Proof of work is just that, proof that it did work. What work it's doing isn't defined by that definition. Git doesn't ask for proof, but it does do work. Presumably the proof part isn't the thing you have an issue with. I agree it sucks that this isn't being used to do something constructive, but as long as it's kept to a minimum in user time scales, it shouldn't be a big deal.

Crypto currencies are an issue because they do the work continuously, 24/7. This is a one-time operation per view (I assume per view and not once ever), which with human input times isn't going to be much. AI garbage does consume massive amounts of power though, so damaging those is beneficial.

[–] [email protected] 0 points 15 hours ago (1 children)

I'm not sure where you're going with the git simile. Git isn't performing any proof of work, at all. By definition, Proof of Work is that "one party (the prover) proves to others (the verifiers) that a certain amount of a specific computational effort has been expended." The amount of computational power used to generate hashes for git is utterly irrelevant to its function. It doesn't care how many cycles are used to generate a hash; therefore it's in no way proof of work.

This solution is designed to cost scrapers money; it does this by causing them to burn extra electricity. Unless it's at scale, unless it costs them, unless it has an impact, it's not going to deter them. And if it does impact them, then it's also impacting the environment. It's like having a door-to-door salesman come to your door and intentionally making them wait while their car is running, and there cackling because you made them burn some extra gas, which cost than some pennies and also dumped extra carbon monoxide into the atmosphere.

Compare this to endlessh. It also wastes hacker's time, but only because it just responds very slowly with and endless stream of header characters. It's making them wait, only they're not running their car while they're waiting. It doesn't require the caller to perform an expensive computation which, in the end, is harmful to more than just the scraper.

Let me make sure I understand you: AI is bad because it uses energy, so the solution is to make them use even more energy? And this benefits the environment how?

[–] [email protected] 2 points 14 hours ago (1 children)

I'm not the person who brought git up. I was just stating that work is work. Sure, git is doing something useful with it. This is arguably useful without the work itself being important. Work is the thing you're complaining about, not the proof.

This solution is designed to cost scrapers money; it does this by causing them to burn extra electricity. Unless it's at scale, unless it costs them, unless it has an impact, it's not going to deter them.

Yeah, but the effect it has on legitimate usage is trivial. It's a cost to illegitimate scrapers. Them not paying this cost also has an impact on the environment. In fact, this theoretically doesn't. They'll spend the same time scraping either way. This way they get delayed and don't gather anything useful for more time.

To use your salesman analogy, it's similar to that, except their car is going to be running regardless. It just prevents them from reaching as many houses. They're going to go to as many as possible. If you can stall them then they use the same amount of gas, they just reach fewer houses.

Compare this to endlessh. It also wastes hacker's time, but only because it just responds very slowly with and endless stream of header characters. It's making them wait, only they're not running their car while they're waiting.

This is probably wrong, because you're using the salesman idea. Computers have threads. If they're waiting for something then they can switch tasks to something else. It protects a site, but it doesn't slow them down. It doesn't actually really waste their time because they're performing other tasks while they wait.

Let me make sure I understand you: AI is bad because it uses energy, so the solution is to make them use even more energy? And this benefits the environment how?

If they're going to use the energy anyway, we might as well make them get less value. Eventually the cost may be more than the benefit. If it isn't, they spend all the energy they have access to anyway. That part isn't going to change.

[–] [email protected] 1 points 4 hours ago

I'm not the person who brought git up.

Then I apologize. All I can offer is that it's a weakness of my client that it's difficult and outside the inbox workflow to see any history other than the comment to which you're replying. Not an excuse; just an explanation.

Work is the thing you're complaining about, not the proof.

If given the option, I'd prefer all computing to have zero cost; sure. But no, I'm not complaining abou t the work. I'll complain about inefficient work, but the real issue is work for work's sake; in particular, systems designed specifically where the only important fact us proving that someone burned X pounds of coal to get a result. Because, while exaggerated and hyperbolically started, that's exactly what Proof-of-Work systems are. All PoW systems care about is that the client provably consumed a certain amount of CPU power. The result is the work is irrelevant for anything but proving that someone did work.

With exceptions like BOINC, the work itself from PoW systems provides no other value.

Compare this to endlessh.

This is probably wrong, because you're using the salesman idea.

It's not. Computer networks can open only so many sockets at a time; threading on a single computer is finite, and programmers normally limit the amount of concurrency because high concurrency itself can cause performance issues.

If they're going to use the energy anyway, we might as well make them get less value.

They're going to get their value anyway, right? This doesn't stop them; it just makes each call to this more expensive. In the end, they do the work and get the data; it just cost them - and the environment - more.

Do you think this will stop scrapers? Or is it more of a "fuck you", but with a cost to the planet?

Honey pots are a better solution; they're far more energy efficient, and have the opportunity to poison the data. Poisoned data is more like what you suggest: they're burning the energy anyway, but are instead getting results that harm their models. Projects like Nepenthes go in the right direction. PoW systems are harmful - straight up harmful. They're harmful by preventing access to people who don't use JavaScript, and they're harmful in exactly the same way crypto mining is.