Shadow

joined 2 years ago
MODERATOR OF
[–] Shadow 1 points 4 hours ago

Is this gone now we're on the new server?

[–] Shadow 2 points 4 hours ago

Yep I'll add this, probably tomorrow.

[–] Shadow 1 points 11 hours ago* (last edited 5 hours ago)

One potential issue with us doing friendica, is that I don't think any of us admins are facebook users. I'm happy to run the software for it... but if none of use it, then moderation / admin may not happen and become a problem. Pixelfed is more interesting as a next step.

[–] Shadow 7 points 11 hours ago

When you're flying in close proximity like that at a low altitude, I'd assume they're all VFR and not relying on radar.

TCAS is good because it's an instantaneous "YOU WILL HIT THIS. GO UP NOW". There's no having to look at it and think about what to do, that's why its so successful. Can't really compare radar to that.

[–] Shadow 5 points 11 hours ago (1 children)

I like the silicone ones. They're soft, super flexible and seem to hold up well.

[–] Shadow 14 points 11 hours ago* (last edited 11 hours ago) (5 children)

Pilots were flying under NVG (limited field of view) and appear to have been watching the wrong plane when they said they had the traffic in sight. Were probably flying too low for TCAS, if a black hawk even has it.

[–] Shadow 8 points 17 hours ago (1 children)

Second move complete, all done for now!

[–] Shadow 4 points 20 hours ago

Not necessarily. Atc isn't fully automated, those people aren't just maintaining systems. They're active parts of the system.

If you've suddenly lost a bunch of your staff, everyone is now stressed out and probably having to work longer / busier shifts. That could quickly lead to an accident.

I agree though it's premature to blame Trumps actions for causing this. Thankfully the aviation industry is really good about detailed post mortems.

[–] Shadow 2 points 1 day ago

No, you can make rules to do anything you want. I've never had to do anything too complex, I just set one transaction the way I want and then hit yes when it offers to make it a rule.

[–] Shadow 2 points 1 day ago (2 children)

It didn't force me into any weird bucket systems and felt like it clicked naturally with how I budget. The app is nice and works well, creating rules is easy. I don't really have anything to compare against, but it just all works for me.

[–] Shadow 17 points 1 day ago (7 children)

So at what point does everyone start protesting?

125
submitted 3 days ago* (last edited 3 days ago) by Shadow to c/main
 

Hello everyone!

I'll be taking the site down for two maintenance windows this week to complete our server migration.

  • Weds Jan 29th - 09:00 - 11:00 PT (12:00 - 14:00 ET)
  • Thurs Jan 30th - 09:00 - 11:00 PT (12:00 - 14:00 ET)

During the first window I'll be migrating us from OVH to our new dedicated hardware. After this migration there will likely be some temporarily broken images, as it takes approximately 8 hours to resync our object storage from OVH.

This is a major change and despite my testing, may have some unintended side effects. If you run into any problems that aren't just a broken image, please let us know.

The second maintenance window is to migrate our pict-rs database from it's local sled-db into our primary postgres DB. This is a much smaller change but since pict-rs checks every image as it goes through them, it takes about 1.5 hours.

As usual, you can check https://status.lemmy.ca/ for updates.

 

Hello everyone, we're long overdue for an update on how things have been going!

Finances

Since we started accepting donations back in July we've received a total of $1350, as well as $1707 in older donations from smorks. We haven't had any expenses other than OVH (approx $155/mo) since then, leaving us $2152 in the bank.

We still owe TruckBC $1980 for the period he was covering hosting, and I've contributed $525 as well (mostly non-profit registration related stuff, plus domain renewals). We haven't yet discussed reimbursing either of us, we're both happy to build up a contingency fund for a while.

New Server

A few weeks ago, we experienced a ~26-hour outage due to a failed power supply and extremely slow response times from OVH support. This was followed by an unexplained outage the next morning at the same time. To ensure Lemmy’s growth remains sustainable for the long term and to support other federated applications, I’ve donated a new physical server. This will give us a significant boost in resources while keeping the monthly cost increase minimal.

Our system specs today:

  • Undoubtedly the cheapest hardware OVH could buy
  • Intel Xeon E-2386G (6 cores @ 3.5ghz)
  • 32gb of ram
  • 2x 512gb Samsung nvme in raid 1
  • 1gb network
  • $155/month

The new system:

  • Dell R7525
  • AMD EPYC 7763 (64 cores @ 2.45ghz)
  • 1tb of ram
  • 3x 120gb sata ssd (hw raid 1 with a hot spare, for proxmox)
  • 4x 6.4tb nvme (zfs mirrored + striped, for data)
  • 1gb network with a 50mbit commit (See 95th percentile billing)
  • Redundant power supplies
  • Next day hardware support until Aug 2027
  • $166/month + tax

This means instead of renting an entire server and having them be responsible for the hardware, we'll be renting co-location space at a Vancouver datacenter PDF via a 3rd party service provider I know.

These servers are extremely reliable but if there is a failure, either Otter or myself will be able to get access reasonably quickly. We also have full OOB access via idrac, so it's pretty unlikely we'll ever need to go on site.

Server Migration

Phase 1 is currently planned for Jan 29th or 30th and will completely move us out of OVH and onto our own hardware. I'm expecting probably a 2-3 hour outage, followed by an 6-8 hour window where some images may be missing as the object store resyncs. I'll make another follow up post in a week with specifics.

Phases 2+ I'm not 100% decided on yet and have not planned a timeline around. It would get us into a fully redundant (excluding hardware) setup that's easier to scale and manage down the road, but it does add a little bit of complexity.

Let me know if you have any questions or comments, or feedback on the architecture!

25
submitted 3 weeks ago* (last edited 3 weeks ago) by Shadow to c/main
 

Morning all!

I'm going to be taking the site down for about 5 minutes, so that I can get a consistent copy of our databases (postgres + pict-rs sled).

Will do it at about 10am PST.

280
submitted 3 weeks ago* (last edited 3 weeks ago) by Shadow to c/main
 

Hey everyone, and happy new year!

Sorry about that super long downtime there. Yesterday (Sunday) morning at 10:03AM PST our server suffered a physical hardware failure, apparently a power supply failure. Unfortunately despite opening a ticket with our hosting vendor (OVH) a few minutes later and them claiming to have 24/7 support, nobody looked at our ticket until this morning when their phone support lines opened and I called them.

They've now replaced a defective power supply and we're back online, after ~26 hours of being offline. Some pretty disappointing response times, to put it nicely.

We're planning to move away from OVH at the end of this month, onto proper enterprise grade hardware that we own and control. This will give us a HUGE boost in server resources and allow us to scale for the foreseeable future, while also giving us the control to resolve problems like this much quicker. Expect another follow up post about this in the next couple weeks once I've put together the migration plan.

Timeline:

  • Jan 5th 10:03am PST - We get alerts to the server being non-responsive.
  • Jan 5th 10:05am PST - I pull up the console via IPMI and it's completely non-responsive. Attempting to power off / on the server or do anything, does not work.
  • Jan 5th 10:15am PST - Initial support ticket created with OVH. I followed up a couple times over the next few hours, and got no response.
  • Jan 6th 6:32am PST - Called OVH, gave them the case number and asked them to investigate
  • Jan 6th 7:34am PST - I get notified they'll start their "intervention" in 15 minutes.
  • Jan 6th 11:04am PST - Call them again, the tech is still working on it and they'll get back to me with an update
  • Jan 6th 11:34am PST - "I was informed by our data centre technician that there is an issue with the power supply unit for the rack on which your server resides. Your server will come back online once they have replaced the power supply."
  • Jan 6th 12:17pm PST - We're back up finally!

Edit on Jan 7th @ 8:40am PST: We just had another outage of about an hour. Investigating with OVH.

17
Castle Infinity (en.wikipedia.org)
submitted 1 month ago by Shadow to c/[email protected]
1
submitted 2 months ago* (last edited 2 months ago) by Shadow to c/[email protected]
75
submitted 2 months ago* (last edited 2 months ago) by Shadow to c/main
 

One of the drives in our server has failed. =( Even though it should be a 10 minute job OVH needs a 2 hour window to replace it.

I've requested they schedule it for Tuesday from 8am - 10AM PST. Hopefully it'll be reasonably quick, but expect cloudflare tunnel errors while they perform the work.

104
submitted 2 months ago* (last edited 2 months ago) by Shadow to c/main
 

Hey All!

I'm going to upgrade us to 0.19.7 tomorrow (Sunday Nov 24th) around 10am PST. I don't expect significant downtime, but expect a few minutes at least.

Update: All done!

128
submitted 2 months ago* (last edited 2 months ago) by Shadow to c/[email protected]
 
view more: next ›