@[email protected] Block them. Make them pay!
Cybersecurity
An umbrella community for all things cybersecurity / infosec. News, research, questions, are all welcome!
Rules
Community Rules
- Be kind
- Limit promotional activities
- Non-cybersecurity posts should be redirected to other communities within infosec.pub.
@[email protected] not cool running so many connections but 65,000 pages isn't really that much for a contemporary website. If you have a CDN then even more so.
@[email protected] I had my own run in with GPTbot spamming requests, falling into a recursive hole with desktop/mobile view links and sending malformed URLs: https://mastodon.org.uk/@DrinkyBird/113743065815541997
This has happened to us on several of the over 300 domains we host.
The COSTS to support OpenAI harvesting, bandwidth, and the rest of the AI bot farms stealing copyrighted content is crushing us.
@[email protected] GPTBot is the most aggressive content scraper I've come across in decades of server management. Totally ignores any crawl limits that you set in your robots.txt, and they operate on enough IPs to make even nginx configured rate limiting a bit futile.
You can, though, block them (and others) by their useragent string. Add this to your .htaccess to block both GPTBot and Claude, for example:
SetEnvIfNoCase ^User-Agent$ .*(ClaudeBot|GPTBot) BADBOTHAMMER
Deny from env=BADBOTHAMMER
@[email protected] im neither a lawyer nor cybersecurity expert, just a fresh computer engineer. Im curious what would happen if they pursued legal action against openai for the downtime? Openai attacked their service and took them offline causing financial loss. Seriously why not treat it like a hack? What would a judge say when comparing openai's actions to those of some kids running a ddos campaign?
@[email protected] This isn't "like" a DDoS attack it IS a DDoS attack.
Virtually every early example of a modern computer attack was originally someone just messing around or making a mistake (the first virus, worm, and DoS all come to mind) and to my knowledge all of those were tried on (and many found guilty to) serious hacking charges, so why shouldn't OpenAI? They shouldn't get to claim "well, your service should have been able to handle a DDoS" or "we're doing it for gain, though."
@[email protected] I’ve to deal with the AI scraping problem too at work.
They are the worst scraping bot ever made, not only OpenAI but a dozen of AI startup.