digdilem

joined 2 years ago
[–] [email protected] 3 points 9 hours ago

Sometimes questions like this are tests to see how you'll react when asked to deliver the impossible.

(I mean, it's not in this case, but if that's totally how I'd answer if I'd posted it and was challenged)

[–] [email protected] 7 points 2 days ago

Love what these guys are doing.

There's also a mainboard case, so you don't need the whole laptoppy thing at all if you don't want to

https://frame.work/gb/en/products/cooler-master-mainboard-case

[–] [email protected] 8 points 3 days ago

Medusa rules apply - just grab a shiny shield.

[–] [email protected] 16 points 4 days ago

Spez was a total musk fanboy during the early Twitter changes and said that he used him as an inspiration when changing Reddit's api, breaking all the decent phone clients. I don't doubt he'll do anything to join the big boys club.

[–] [email protected] 16 points 5 days ago

Surely it's up to the advertisers to choose where who they pay money to use?

[–] [email protected] 2 points 6 days ago (1 children)
Centralised social media did, and is, doing extremely well by most metrics.

Such as censorship (everywhere). shadowbanning (X), ownership by an egomaniac slyster (X), blue badges (X), and being ganged up on (reddit). i’m ignoring platforms for addicts such as tiktok, youtube, and instagram.

You seem to be under the misapprehension that these large social media companies are operating for your benefit.

A strange notion to hold.

[–] [email protected] 14 points 1 week ago (1 children)

Guessing this wasn't in Europe. Consumer protection laws actually do stuff over here. Goods must be fit for purpose, and there's no fixed limit for how long that is. A refund after two years wouldn't be unreasonable for a hard drive and likely to succeed.

[–] [email protected] 1 points 1 week ago (3 children)

We tried centralization and doesn’t go well.

Well, no. Centralised social media did, and is, doing extremely well by most metrics.

We can build our ivory towers and feel happy and safe within them, but it doesn't change the fact that we're not missed and the above probably like that we're not there pointing out its flaws.

[–] [email protected] 6 points 1 week ago (1 children)

s/reminder/warning/

[–] [email protected] 61 points 1 week ago

It's not that we "hate them" - it's that they can entirely overwhelm a low volume site and cause a DDOS.

I ran a few very low visit websites for local interests on a rural. residential line. It wasn't fast but was cheap and as these sites made no money it was good enough Before AI they'd get the odd badly behaved scraper that ignored robots.txt and specifically the rate limits.

But since? I've had to spend a lot of time trying to filter them out upstream. Like, hours and hours. Claudebot was the first - coming from hundreds of AWS IPs and dozens of countries, thousands of times an hour, repeatedly trying to download the same urls - some that didn't exist. Since then it's happened a lot. Some of these tools are just so ridiculously stupid, far more so than a dumb script that cycles through a list. But because it's AI and they're desperate to satisfy the "need for it", they're quite happy to spend millions on AWS costs for negligable gain and screw up other people.

Eventually I gave up and redesigned the sites to be static and they're now on cloudflare pages. Arguably better, but a chunk of my life I'd rather not have lost.

[–] [email protected] 3 points 1 week ago* (last edited 1 week ago) (1 children)

Atlassian is shit for forcing us into the expensive cloud for a shit product.

I feel your pain. Or rather, I felt it once and am now freed!

We were big into Atalassian when they announced they were going cloud only. We had on-prem versions of Jira, Confluence and Bitbucket

We pretty quickly said "Fuck that", mostly because we have an on-prem policy for IP protection.

I was pretty happy to spend some time searching for replacements, mostly because it was my job to apply upgrades to these steaming, tottering piles of badly written java horseshit. They looked pretty, but the upgrade process was convoluted and quite often failed terminally. I still think that the difficulty of upgrading the hosted versions was a driver towards cloud only, mostly because it exposed how shite the things were and how many complaints they must have got for offering an on-prem product that was so hard to maintain, despite looking pretty.

I take some pleasure that the Atlassian share price is now half what it was before they did this.

(If anyone was interested; Confluence and Jira were replaced by Youtrack. Bitbucket by Teamcity. Both by Jetbrains, both much easier to upgrade (Teamcity is web-based one-click), and our licencing costs are about half what we paid to Atlassian)

[–] [email protected] 5 points 1 week ago (1 children)

And that despite charging for it, they fill many versions of it with adverts, install without asking bloatware and crap paid for by other companies to shove down your throat, and also sells your personal information to (checks) at least 801 third parties.

 

Under this methodology of all 193 UN Member States – an expansive model of 17 categories, or “goals,” many of them focused on the environment and equity – the U.S. ranks below Thailand, Cuba, Romania and more that are widely regarded as developing countries.

In 2022, America was 41st. Interesting to see where it will be after this term of office, which looks set to be working against many of these aims.

 

On display at the Stromness museum. Carved from whalebone and believed to be a child's doll.

Was discovered at the famous Skara Brae site, and then spent years forgotten in a box at the museum before being rediscovered.

https://www.bbc.co.uk/news/uk-scotland-north-east-orkney-shetland-36526874

185
submitted 9 months ago* (last edited 9 months ago) by [email protected] to c/[email protected]
 

I host a few small low-traffic websites for local interests. I do this for free - and some of them are for a friend who died last year but didn't want all his work to vanish. They don't get so many views, so I was surprised when I happened to glance at munin and saw my bandwidth usage had gone up a lot.

I spent a couple of hours working to solve this and did everything wrong. But it was a useful learning experience and I thought it might be worth sharing in case anyone else encounters similar.

My setup is:

Cloudflare DNS -> Cloudflare Tunnel (Because my residential isp uses CGNAT) -> Haproxy (I like Haproxy and amongst other things, alerts me when a site is down) -> Separate Docker containers for each website. On a Debian server living in my garage.

From Haproxy's stats page, I was able to see which website was gathering attention. It's one running PhpBB for a little forum. Tailing apache's logs in that container quickly identified the pattern and made it easy to see what was happening.

It was seeing a lot of 404 errors for URLs all coming from the same user-agent "claudebot". I know what you're thinking - it's an exploit scanning bot, but a closer look showed it was trying to fetch normal forum posts, some which had been deleted months previously, and also robots.txt. That site doesn't have a robots.txt so that was failing. What was weird is that the it was requesting at a rate of up to 20 urls a second, from multiple AWS IPs - and every other request was for robots.txt. You'd think it would take the hint after a million times of asking.

Googling that UA turns up that other PhpBB users have encountered this quite recently - it seems to be fascinated by web forums and absolutely hammers them with the same behaviour I found.

So - clearly a broken and stupid bot, right? Rather than being specifically malicious. I think so, but I host these sites on a rural consumer line and it was affecting both system load and bandwidth.

What I did wrong:

  1. In docker, I tried quite a few things to block the user agent, the country (US based AWS, and this is a UK regional site), various IPs. It took me far too long to realise why my changes to .htaccess were failing - the phpbb docker image I use mounts the root directory to the website internally, ignoring my mounted vol. (My own fault, it was too long since I set it up to remember only certain sub-dirs were mounted in)

  2. Figuring that out, I shelled into the container and edited that .htaccess, but wouldn't have survived restarting/rebuilding the container so wasn't a real solution.

Whilst I was in there, I created a robots.txt file. Not surprisingly, claudebot doesn't actually honour whats in there, and still continues to request it ten times a second.

  1. Thinking there must be another way, I switched to Haproxy. This was much easier - the documentation is very good. And it actually worked - blocking by Useragent (and yep, I'm lucky this wasn't changing) worked perfectly.

I then had to leave for a while and the graphs show it's working. (Yellow above the line is requests coming into haproxy, below the line are responses).

Great - except I'm still seeing half of the traffic, and that's affecting my latency. (Some of you might doubt this, and I can tell you that you're spoiled by an excess of bandwidth...)

  1. That's when the penny dropped and the obvious occured. I use cloudflare, so use their firewall, right? No excuses - I should have gone there first. In fact, I did, but I got distracted by the many options and focused on their bot fighting tools, which didn't work for me. (This bot is somehow getting through the captcha challenge even when bot fight mode is enabled)

But, their firewall has an option for user agent. The actual fix was simply to add this in WAF for that domain.

And voila - no more traffic through the tunnel for this very rude and stupid bot.

After 24 hours, Cloudflare has blocked almost a quarter of a million requests by claudebot to my little phpbb forum which barely gets a single post every three months.

Moral for myself: Stand back and think for a minute before rushing in and trying to fix something in the wrong way. I've also taken this as an opportunity to improve haproxy's rate limiting internally. Like most website hosts, most of my traffic is outbound, and slowing things down when it gets busy really does help.

This obviously isn't a perfect solution - all claudebot has to do is change its UA, and by coming from AWS it's pretty hard to block otherwise. One hopes it isn't truly malicious. It would be quite a lot more work to integrate Fail2ban for more bots, but it might yet come to that.

Also, if you write any kind of web bot, please consider that not everyone who hosts a website has a lot of bandwidth, and at least have enough pride to write software good enough to not keep doing the same thing every second. And, y'know, keep an eye on what your stuff is doing out on the internet - not least for your own benefit. Hopefully AWS really shaft claudebot's owners with some big bandwidth charges...

EDIT: It came back the next day with a new UA, and an email address linking it to anthropic.com - the Claude3 AI bot, so it looks like a particularly badly written scraper for AI learning.

view more: next ›