diff --git a/docs/ops/README.rst b/docs/ops/README.rst index 9ef2837548e272fffadc55130ec1f541d46acafa..8026e1673b0dbed3708b1c4b0e7b600c049fabde 100644 --- a/docs/ops/README.rst +++ b/docs/ops/README.rst @@ -3,11 +3,10 @@ Administrator documentation This contains documentation regarding running PrivateStorageio. -.. include:: - ../../morph/README.rst +.. toctree:: + :maxdepth: 2 -.. include:: - monitoring.rst - -.. include:: - generating-keys.rst + morph + monitoring + generating-keys + backup-recovery diff --git a/docs/ops/backup-recovery.rst b/docs/ops/backup-recovery.rst new file mode 100644 index 0000000000000000000000000000000000000000..a39c96dfa859203d6b54c1812e70414715b920e9 --- /dev/null +++ b/docs/ops/backup-recovery.rst @@ -0,0 +1,115 @@ +Backup/Recovery +=============== + +This document covers the details of backups of the data required for PrivateStorageio to operate. +It describes the situations in which these backups are intended to be useful. +It also explains how to use these backups to recover in these situations. + +Tahoe-LAFS Storage Nodes +------------------------ + +The state associated with a Tahoe-LAFS storage node consists of at least: + +1. the "node directory" containing + configuration, + logs, + public and private keys, + and service fURLs. +2. the "storage" directory containing + user ciphertext, + garbage collector state, + and corruption advisories. + +Node Directories +~~~~~~~~~~~~~~~~ + +The "node directory" changes gradually over time. +New logs are written (including incident reports). +The announcement sequence number is incremented. +The introducer cache is updated. + +The critical state necessary to reproduce an identical storage node does not change. +This state consists of + +* the node id (my_nodeid) +* the node private key (private/node.privkey) +* the node x509v3 certificate (private/node.pem) + +A backup of the node directory can be used to create a Tahoe-LAFS storage node with the same identity as the original storage node. +It *cannot* be used to recover the user ciphertext held by the original storage node. +Nor will it recover the state which gradually changes over time. + +Backup +`````` + +A one-time backup has been made of these directories in the PrivateStorageio 1Password account. +The "Tahoe-LAFS Storage Node Backups" vault contains backups of staging and production node directories. +The process for creating these backups is as follows: + +:: + + DOMAIN=private.storage + FILES="node.pubkey private/ tahoe.cfg my_nodeid tahoe-client.tac node.url permutation-seed" + DIR=/var/db/tahoe-lafs/storage + for n in $(seq 1 5); do + NODE=storage00${n}.${DOMAIN} + ssh $NODE tar vvjcf - -C $DIR $FILES > ${NODE}.tar.bz2 + done + + tar vvjcf ${DOMAIN}.tar.bz2 *.tar.bz2 + +Recovery +```````` + +#. Prepare a system onto which to recover the node directory. + The rest of these steps assume that PrivateStorageio is deployed on the node. + +#. Download the backup tarball from 1Password + +#. Extract the particular node directory backup to recover from :: + + [LOCAL]$ tar xvf ${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}.tar.bz2 + +#. Upload the node directory backup to the system onto which recovery is taking place :: + + [LOCAL]$ scp ${NODE}.${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}:recovery.tar.bz2 + +#. Clean up the local copies of the backup files :: + + [LOCAL]$ rm -iv ${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}.tar.bz2 + +#. The rest of the steps are executed on the system on which recovery is taking place. + Log in :: + + [LOCAL]$ ssh ${NODE}.${DOMAIN} + +#. On the node make sure there is no storage service running :: + + [REMOTE]$ systemctl status tahoe.storage.service + + If there is then figure out why and stop it if it is safe to do so :: + + [REMOTE]$ systemctl stop tahoe.storage.service + +#. On the node make sure there is no existing node directory :: + + [REMOTE]$ stat /var/db/tahoe-lafs/storage + + If there is then figure out why and remove it if it is safe to do so. + +#. Unpack the node directory backup into the correct location :: + + [REMOTE]$ mkdir -p /var/db/tahoe-lafs/storage + [REMOTE]$ tar xvf recovery.tar.bz2 -C /var/db/tahoe-lafs/storage + +#. Mark the node directory as created and consistent :: + + [REMOTE]$ touch /var/db/tahoe-lafs/storage.created + +#. Start the storage service :: + + [REMOTE]$ systemctl start tahoe.storage.service + +#. Clean up the remote copies of the backup file :: + + [REMOTE]$ rm -iv recovery.tar.bz2 diff --git a/docs/ops/morph.rst b/docs/ops/morph.rst new file mode 100644 index 0000000000000000000000000000000000000000..5bcffb5fb5a6928a69300dcfb3ac4cb7126ba09a --- /dev/null +++ b/docs/ops/morph.rst @@ -0,0 +1,2 @@ +.. include:: + ../../morph/README.rst