Migrating a Live Mastodon Instance With No Downtime

This is going to be a pretty technical blog post, that can hopefully help others who find themselves in a similar situation. The situation of having to migrate a live mastodon instance with practically no downtime.

Together with some friends I run a relatively small Mastodon instance on https://famichiki.jp. We have about 500~ accounts of which about 280 are active, with about 100 members posting multiple times a day. We’re averaging around 4 million requests a week, depending on … well, I don’t know what it depends on. All I know is we average around 4 million requests a week.

We’re a community ran instance, ran purely on donations and the hard work of our moderation and administration team. Yes, even though we’re small there’s always something going on. We’re using OpenCollective to manage our donations, and OpenCollective allows you to submit expenses. This is perfect, because it’s very transparent. Downside: I was running the Famichiki instance together with some other side projects under a person Vultr and Cloudflare account. This sucks for two reasons:

  1. As much as I love the other sysadmin, I’m not going to share neither my Vultr or Cloudflare account details.
  2. I absolutely hate having to blank out invoices when submitting them for expenses. Plus, with bulk discounts it becomes quite hard to calculate just exactly how much the project owes me.

So, solution to this? Migrate everything over! Live! Without downtime.

Getting Started

First, let’s take inventory of our existing infrastructure. We’re running on a single machine that runs Redis, Postgres, and Mastodon. We’re using Cloudflare for DDoS protection and caching which results in the following diagram:

Simple enough, isn’t it!

Replicating the Database

Now that we know what we’re dealing with, we can begin the migration process. First we provision a new machine on Vultr under a new account, dedicated purely to Famichiki. Our machine runs Debian, so that’s what we go with and we add the right repositories for Postgresql. One apt-install later and the machine is now running Postgres. Great. Now we need to get the data over.

First we setup a replica user on the old machine and make sure it has replication access via pg_hba.conf. Then, we simply kickoff the base backup process which configures our new server as a streaming replica server.

pg_basebackup -h localhost \\
	-p 5433 \\
	-U replica_user \\
	-D /var/lib/postgresql/15/main/ \\
	-Fp -Xs -R --verbose

If you’re eagle eyed you notice we’re replicating from localhost:5433, this is because I didn’t want to expose our database to the internet, so instead we setup a tunnel from the new server to the old server, allowing us to connect to the old server as if it was running locally.

A quick look on the primary server select * from pg_stat_replication tells us whether the replication is running or not.

Installing Nginx

This doesn’t need an explanation really. Copy over your Nginx configuration, and if you can also the letsencrypt certificates, assuming you’re using letsencrypt.

Installing Mastodon

With Postgres and Nginx up and running, we can now begin to setup Mastodon. I followed the excellent installation guide. Specifically the part that covers setting up Mastodon. You can follow it all the way up to “generating the wizard”. What we’ll instead do, is run only this command:

RAILS_ENV=production bundle exec rails assets:precompile

We copy over the .env.production configuration from the previous server. Make sure to point your postgresql to the old server. The new server is in replication mode and can’t be written to, but when Mastodon posts it’ll attempt to run some updates.

Verifying your Mastodon Installation

OK, this will require some editing on your local machine. What I did was update my hosts file to point to the new Mastodon instance directly. When I first opened Mastodon, everything seemed OK except for the fact that my feed was empty. This is because Mastodon stores the home feed in Redis. Redis is empty. You can re-populate the feed easily

RAILS_ENV=production bin/tootctl feeds build

Or if you want to rebuild feeds for a specific user, e.g a user named admins.

RAILS_ENV=production bin/tootctl feeds build admins

After running this command my home screen was looking like it used to look and I confirmed that I could properly toot from the new instance.

Moving Traffic over

At this point, our setup looks like this. We basically have two identical servers, with the new server reading from the old servers’ primary database server.

Using Nginx, we can now simply update our old server to proxy traffic to the new server. This means we’ll have a nice fallback on our old server in case something goes wrong on the new server. The configuration looks like this.

server {
  listen 443 ssl;
  listen [::]:443 ssl;
  server_name famichiki.jp;

  ssl_protocols TLSv1.2 TLSv1.3;
  ssl_ciphers HIGH:!MEDIUM:!LOW:!aNULL:!NULL:!SHA;
  ssl_prefer_server_ciphers on;
  ssl_session_cache shared:SSL:10m;
  ssl_session_tickets off;

  ssl_certificate     /etc/letsencrypt/live/famichiki.jp/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/famichiki.jp/privkey.pem;

  location / {
    proxy_pass https://#.#.#.#
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
  }
}

⚠️ Before enabling this configuration, make sure you run the tootctl feeds build command or your users will be in for one heck of a surprise.

Once you enable this configuration, your setup will resemble this:

Once you confirm everything is working, you can begin to shut down the old Mastodon processes. Make sure to keep Postgres and Nginx running. Also make sure your Sidekiq queues are empty!

Failing Over Postgres: The Grande Finale

Now there was just one last thing to do, failing over Postgres. I did this during a time when the server was most active, which isn’t the smartest thing ever but I didn’t feel like doing this at the middle of the night. When you do this, there’s two things that need to happen:

  1. Your secondary Postgres server has to be promoted using pg_promote()
  2. Your mastodon instance has to know it now has to read from the new server

This is the only part where your users are at risk of losing data. Or rather, sending data to the wrong server. First make sure you promote Postgres by running SELECT pg_promote();. This command waits until the fail-over is complete. Once completed, you can restart Mastodon. It’s imperative you run these commands as quickly as possible or your users can lose data.

systemctl reload mastodon-web; 
systemctl restart mastodon-sidekiq; 
systemctl restart mastodon-streaming;

When running this command, the setup looks like this:

Updating CloudFlare

With the old server simply acting as a proxy, we can update Cloudflare to point to our new server. If you’re not using Cloudflare as your DNS servers you might want to run the old servers for at least 24 hours so your users’ DNS records have time to expire and point to the new server.

Once that’s done, your done and your environment now looks like this!

Draw The Rest of the Owl

I know this feels a bit “Draw-the-rest-of-the-fucking-owl”-y, and if it does then this guide probably isn’t for you. Regardless, I hope you enjoyed it.

If you have any questions, you can always ping me on Famichiki!

https://famichiki.jp/@brian