Skip to content
Snippets Groups Projects
backup-recovery.rst 6.11 KiB
Newer Older
  • Learn to ignore specific revisions
  • Backup/Recovery
    ===============
    
    This document covers the details of backups of the data required for PrivateStorageio to operate.
    It describes the situations in which these backups are intended to be useful.
    It also explains how to use these backups to recover in these situations.
    
    Tahoe-LAFS Storage Nodes
    ------------------------
    
    The state associated with a Tahoe-LAFS storage node consists of at least:
    
    1. the "node directory" containing
       configuration,
       logs,
       public and private keys,
       and service fURLs.
    2. the "storage" directory containing
       user ciphertext,
       garbage collector state,
       and corruption advisories.
    
    Node Directories
    ~~~~~~~~~~~~~~~~
    
    The "node directory" changes gradually over time.
    New logs are written (including incident reports).
    The announcement sequence number is incremented.
    The introducer cache is updated.
    
    The critical state necessary to reproduce an identical storage node does not change.
    This state consists of
    
    * the node id (my_nodeid)
    * the node private key (private/node.privkey)
    * the node x509v3 certificate (private/node.pem)
    
    A backup of the node directory can be used to create a Tahoe-LAFS storage node with the same identity as the original storage node.
    It *cannot* be used to recover the user ciphertext held by the original storage node.
    Nor will it recover the state which gradually changes over time.
    
    Backup
    ``````
    
    A one-time backup has been made of these directories in the PrivateStorageio 1Password account.
    The "Tahoe-LAFS Storage Node Backups" vault contains backups of staging and production node directories.
    The process for creating these backups is as follows:
    
    ::
    
       DOMAIN=private.storage
       FILES="node.pubkey private/ tahoe.cfg my_nodeid tahoe-client.tac node.url permutation-seed"
       DIR=/var/db/tahoe-lafs/storage
       for n in $(seq 1 5); do
           NODE=storage00${n}.${DOMAIN}
           ssh $NODE tar vvjcf - -C $DIR $FILES > ${NODE}.tar.bz2
       done
    
       tar vvjcf ${DOMAIN}.tar.bz2 *.tar.bz2
    
    Recovery
    ````````
    
    #. Prepare a system onto which to recover the node directory.
       The rest of these steps assume that PrivateStorageio is deployed on the node.
    
    #. Download the backup tarball from 1Password
    
    #. Extract the particular node directory backup to recover from ::
    
         [LOCAL]$ tar xvf ${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}.tar.bz2
    
    #. Upload the node directory backup to the system onto which recovery is taking place ::
    
         [LOCAL]$ scp ${NODE}.${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}:recovery.tar.bz2
    
    #. Clean up the local copies of the backup files ::
    
         [LOCAL]$ rm -iv ${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}.tar.bz2
    
    #. The rest of the steps are executed on the system on which recovery is taking place.
       Log in ::
    
         [LOCAL]$ ssh ${NODE}.${DOMAIN}
    
    #. On the node make sure there is no storage service running ::
    
         [REMOTE]$ systemctl status tahoe.storage.service
    
       If there is then figure out why and stop it if it is safe to do so ::
    
         [REMOTE]$ systemctl stop tahoe.storage.service
    
    #. On the node make sure there is no existing node directory ::
    
         [REMOTE]$ stat /var/db/tahoe-lafs/storage
    
       If there is then figure out why and remove it if it is safe to do so.
    
    #. Unpack the node directory backup into the correct location ::
    
         [REMOTE]$ mkdir -p /var/db/tahoe-lafs/storage
         [REMOTE]$ tar xvf recovery.tar.bz2 -C /var/db/tahoe-lafs/storage
    
    #. Mark the node directory as created and consistent ::
    
         [REMOTE]$ touch /var/db/tahoe-lafs/storage.created
    
    #. Start the storage service ::
    
         [REMOTE]$ systemctl start tahoe.storage.service
    
    #. Clean up the remote copies of the backup file ::
    
         [REMOTE]$ rm -iv recovery.tar.bz2
    
    
    Storage Directories
    ~~~~~~~~~~~~~~~~~~~
    
    
    The user ciphertext is backed up using `Borg backup <https://borgbackup.readthedocs.io/>`_ to a separate location - currently a SaaS backup storage service (`borgbase.com <https://borgbase.com>`_).
    
    
    Jean-Paul Calderone's avatar
    Jean-Paul Calderone committed
    Borg backup uses a *RepoKey* secured by a *passphrase* to encrypt the backup data and an *SSH key* to authenticate against the backup storage service.
    
    Each Borg backup job requires one *backup repository*.
    
    Jean-Paul Calderone's avatar
    Jean-Paul Calderone committed
    The backups are automatically checked periodically.
    
    
    SSH keys
    ````````
    
    Borgbase `recommends creating ed25519 ssh keys with one hundred KDF rounds <https://www.borgbase.com/ssh>`_.
    We create one key pair per grid (not per host)::
    
        $ ssh-keygen -f borgbackup-appendonly-staging -t ed25519 -a 100
        $ ssh-keygen -f borgbackup-appendonly-production -t ed25519 -a 100
    
    
    Jean-Paul Calderone's avatar
    Jean-Paul Calderone committed
    Save the key without a passphrase and upload the public part to `Borgbase SSH keys <https://www.borgbase.com/ssh>`_.
    
    
    Passphrase
    ``````````
    
    
    Make up a passphrase to encrypt our repository key with. Use computer help if you like::
    
    
        nix-shell --packages pwgen --command 'pwgen --secure 83 1'  # 83 is the year I was born. Very random.
    
    Create & initialize the backup repository
    `````````````````````````````````````````
    
    
    Jean-Paul Calderone's avatar
    Jean-Paul Calderone committed
    Borgbase.com offers a `borgbase.com GraphQL API <https://docs.borgbase.com/api/>`_.
    Since our current number of repositories is small we save time by creating the repositories by clicking a few buttons in the `borgbase.com Web Interface <https://www.borgbase.com/repositories>`_:
    
    * Set up one repository per backup job.
    * Set the *Repository Name* to the FQDN of the host to be backed up.
    * Add the SSH key created earlier as *Append-Only Access* key.
    * Leave the other settings at their defaults.
    
    Jean-Paul Calderone's avatar
    Jean-Paul Calderone committed
    Then initialize those repositories with our chosen parameters::
    
    
        export BORG_PASSCOMMAND="cat borgbackup-passphrase-staging"
        export BORG_RSH="ssh -i borgbackup-appendonly-staging"
        borg init -e repokey-blake2 xyxyx123@xyxyx123.repo.borgbase.com:repo
    
    
    Reliability checks
    ``````````````````
    
    Borg handles large amounts of data.
    
    Jean-Paul Calderone's avatar
    Jean-Paul Calderone committed
    Given enough bits rare, spurious bit flips become a problem.
    That is why regular runs of ``borg check`` are recommended
    (see the `borgbase FAQ <https://docs.borgbase.com/faq/#how-often-should-i-run-borg-check>`_).
    
    
    
    Recovery
    ````````
    
    Borg offers various methods to restore backups.
    
    Jean-Paul Calderone's avatar
    Jean-Paul Calderone committed
    A very convenient method is to mount a backup set using FUSE.
    
    Florian Sesser's avatar
    Florian Sesser committed
    Please consult the restore documentation at `Borgbase <https://docs.borgbase.com/restore/>`_ and `Borg <https://borgbackup.readthedocs.io/en/stable/usage/mount.html>`_.