this post was submitted on 07 Jul 2025
140 points (98.6% liked)

Technology

39495 readers
426 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 7 points 1 day ago* (last edited 1 day ago) (1 children)

Unfortunately, archive.is seems to have moved behind a big corporate CAPTCHA service, subjecting readers to having their reading habits (both the articles and the referring communities) tracked at a large scale.

I suggest this archive link instead:

https://web.archive.org/web/20250707135819/https://www.404media.co/the-open-source-software-saving-the-internet-from-ai-bot-scrapers/

[–] [email protected] 1 points 1 day ago (1 children)

Unfortunately, archive.is has moved behind Cloudflare, subjecting readers to having their reading habits (both the articles and the referring communities) tracked at a large scale.

How do you know this?

What about https://ghostarchive.org/?

[–] [email protected] 6 points 1 day ago* (last edited 1 day ago) (1 children)

Sorry; I shouldn't have written Cloudflare specifically. Their CAPTCHA page now contains scripts from Google, not Cloudflare. I have corrected my comment.

How do you know this?

Because a couple months ago, archive.is/archive.today started showing me CAPTCHA pages instead of the archived articles when I use Firefox with scripts disabled. The current page contains scripts hosted by Google, which I won't enable, so I can't read the archived articles.

What about https://ghostarchive.org/?

I haven't used that site enough to have a consistent picture of what it's doing. When I tried it a few minutes ago, it directed me to a CAPTCHA wall when trying to submit an article, but not when searching for an archived article. I'll try to remember to look at it again periodically, to be able to answer this question in the future.

[–] [email protected] 3 points 1 day ago

Thanks. I appreciate the info and effort.