Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • 118-borg-backup-not-running-as-it-should
  • 125.dont-set-static-datasource-uids
  • 125.silence-broken-backup-alerts
  • 133.give-access-to-prod-infra
  • 149.fix-bootloader
  • 157.authorize-new-hro-key
  • 162.flexible-grafana-module
  • 163.jp-to-ben-for-prod
  • 164.grafana-alert-rules
  • 190-our-regular-updates-fill-up-the-servers-boot-partitions
  • 207.payment-server-exception-reporting
  • 287.publish-tahoe-error-rate
  • 300.monitor-payment-server
  • 352.cachix
  • 42.update-nixpkgs
  • 445.update-zkapauthorizer
  • 62.openssl-111k
  • 67.rationalize-morph-names.2
  • 87.qemu-local-grid
  • 87.test-local-grid
  • 88.no-gui-for-qemu
  • also-alert-on-incoming-network-errors
  • develop
  • doc-fix
  • dont-use-etc-hosts
  • failsafe-payment-process
  • fix-repo-update-docs
  • flake
  • hro-cloud
  • localdev-qemu
  • make-sure-we-run-a-openzfs-compatible-kernel
  • meejah-develop-patch-44361
  • monitored-node
  • nixpkgs-upgrade-2022-07-13
  • nixpkgs-upgrade-2022-07-14
  • nixpkgs-upgrade-2022-07-22
  • nixpkgs-upgrade-2023-11-06
  • nixpkgs-upgrade-2024-02-12
  • nixpkgs-upgrade-2024-02-19
  • nixpkgs-upgrade-2024-02-26
  • nixpkgs-upgrade-2024-03-04
  • nixpkgs-upgrade-2024-03-11
  • nixpkgs-upgrade-2024-03-18
  • nixpkgs-upgrade-2024-03-25
  • nixpkgs-upgrade-2024-04-22
  • nixpkgs-upgrade-2024-05-13
  • nixpkgs-upgrade-2024-10-14
  • nixpkgs-upgrade-2024-12-23
  • nixpkgs-upgrade-2025-06-16
  • parallel-privatestorage-system-tests
  • payment-proxy-timeouts
  • per-node-monitor-config
  • production
  • reproduce-permission-errors
  • smaller-system-images
  • spending-node
  • spending-node-rebase
  • staging
  • upgrade-nixos-to-22.11_with-libvirt-localgrid
59 results

Target

Select target project
  • tomprince/PrivateStorageio
  • privatestorage/PrivateStorageio
2 results
Select Git revision
  • arion
  • develop
  • dont-use-etc-hosts
  • local-test-grid
  • no-morph-on-nodes
  • sec
  • simple-docs-build
  • simplify-grafana
  • stuff
9 results
Show changes
Showing
with 994 additions and 59 deletions
...@@ -20,7 +20,7 @@ ...@@ -20,7 +20,7 @@
# -- Project information ----------------------------------------------------- # -- Project information -----------------------------------------------------
project = 'PrivateStorageio' project = 'PrivateStorageio'
copyright = '2019, PrivateStorage.io, LLC' copyright = '2021, PrivateStorage.io, LLC'
author = 'PrivateStorage.io, LLC' author = 'PrivateStorage.io, LLC'
# The short X.Y version # The short X.Y version
...@@ -38,8 +38,10 @@ release = '0.0' ...@@ -38,8 +38,10 @@ release = '0.0'
# Add any Sphinx extension module names here, as strings. They can be # Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones. # ones.
extensions = [ extensions = [
"sphinx.ext.graphviz", "sphinx.ext.graphviz",
"sphinxcontrib.plantuml",
] ]
# Add any paths that contain templates here, relative to this directory. # Add any paths that contain templates here, relative to this directory.
...@@ -59,7 +61,7 @@ master_doc = 'index' ...@@ -59,7 +61,7 @@ master_doc = 'index'
# #
# This is also used if you do content translation via gettext catalogs. # This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases. # Usually you set "language" from the command line for these cases.
language = None language = 'en'
# List of patterns, relative to source directory, that match files and # List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files. # directories to ignore when looking for source files.
...@@ -82,6 +84,11 @@ html_theme = 'alabaster' ...@@ -82,6 +84,11 @@ html_theme = 'alabaster'
# documentation. # documentation.
# #
# html_theme_options = {} # html_theme_options = {}
html_theme_options = {
'logo': 'logo-ps.svg',
'description': " ", # ugly hack to get some white space below the logo
'fixed_sidebar': True,
}
# Add any paths that contain custom static files (such as style sheets) here, # Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files, # relative to this directory. They are copied after the builtin static files,
......
Developer documentation
=======================
Building
--------
The build system uses `Nix`_ which must be installed before anything can be built.
Start by setting up the development/operations environment::
$ nix-shell
Testing
-------
The test system uses `Nix`_ which must be installed before any tests can be run.
Unit tests are run using this command::
$ nix-build --attr unit-tests
Unit tests are also run on CI.
The system tests are run using this command::
$ nix-build --attr system-tests
The build requires > 10 GB of disk space,
and the VMs might be timing out on slow or busy machines.
If you run into timeouts,
try `raising the number of retries <https://whetstone.private.storage/privatestorage/PrivateStorageio/-/blob/e8233d2/nixos/modules/tests/run-introducer.py#L55-62>`_.
It is also possible go through the testing script interactively - useful for debugging::
$ nix-build --attr system-tests.private-storage.driver
This will give you a result symlink in the current directory.
Inside that is bin/nixos-test-driver which gives you a kind of REPL for interacting with the VMs.
The kind of `Python in this testScript <https://whetstone.private.storage/privatestorage/PrivateStorageio/-/blob/78881a3/nixos/modules/tests/private-storage.nix#L180>`_ is what you can enter into this REPL.
Consult the `official documentation on NixOS Tests <https://nixos.org/manual/nixos/stable/index.html#sec-nixos-tests>`_ for more information.
Updatings Pins
--------------
Nixpkgs
```````
To update the version of NixOS we deploy with, run::
nix-shell --run 'update-nixpkgs'
That will update ``nixpkgs.json`` to the latest release on the nixos release channel.
To update the channel, the script will need to be updated,
along with the filenames that have the channel in them.
To create a text summary of what an update changes - to put in Merge Requests, for example - run::
nix-build -A morph -o result-before
update-nixpkgs
nix-build -A morph -o result-after
nix-shell -p nixUnstable
nix --extra-experimental-features nix-command store diff-closures ./result-before/ ./result-after/
Gitlab Repositories
```````````````````
To update the version of packages we import from gitlab, run::
nix-shell --command 'update-gitlab-repo nixos/pkgs/<package>/repo.json'
That will update the package to point at the latest version of the project.\
The command uses branch and repository owner specified in the ``repo.json`` file,
but you can override them by passing the ``--branch`` or ``-owner`` arguments to the command.
A specific revision can also be pinned, by passing ``-rev``.
Interactions
------------
Storage-Time Purchase (ie Payment)
``````````````````````````````````
.. uml::
actor User as User
participant GridSync
participant ZKAPAuthorizer
database ZKAPAuthzDB as "ZKAPAuthorizer"
participant Browser
participant PaymentServer as "Payment Server"
database PaymentServerDB as "Payment Server"
participant WebServer as "Web Server"
participant Stripe
User -> GridSync : buy storage-time
activate User
GridSync -> GridSync : generate voucher
GridSync -> ZKAPAuthorizer : redeem voucher
activate ZKAPAuthorizer
ZKAPAuthorizer -> ZKAPAuthzDB : store voucher
ZKAPAuthorizer -> GridSync : acknowledge
GridSync -> Browser : open payment page
loop until redeemed
GridSync -> ZKAPAuthorizer : query voucher state
ZKAPAuthorizer -> GridSync : not paid
end
Browser -> WebServer : request payment form
WebServer -> Browser : payment form
Browser -> User : Payment form displayed
activate User
User -> Browser : Submit payment details
Browser -> Stripe : Submit payment details
alt payment details accepted
Stripe -> Browser : details okay, return card token
Browser -> PaymentServer : create charge using card token
PaymentServer -> Stripe : charge card using token
note left: the user has now paid for the service
Stripe -> PaymentServer : acknowledge
PaymentServer -> PaymentServerDB : store voucher paid state
else payment details rejected
Stripe -> Browser : payment failure
end
Browser -> User : payment processing results displayed
deactivate User
group repeat for each redemption group
ZKAPAuthorizer -> ZKAPAuthzDB : generate and store random tokens
ZKAPAuthorizer -> PaymentServer : redeem voucher with blinded tokens
PaymentServer -> ZKAPAuthorizer : return signatures for blinded tokens
ZKAPAuthorizer -> ZKAPAuthzDB : store unblinded signatures for tokens
note right: the user has now been authorized to use the service
end
deactivate ZKAPAuthorizer
loop until redeemed
GridSync -> ZKAPAuthorizer : query voucher state
ZKAPAuthorizer -> GridSync : fully redeemed
end
GridSync -> User : storage-time available displayed
deactivate User
Storage-Time Spending (ie Use)
``````````````````````````````
.. uml::
participant MagicFolder
participant TahoeLAFS as "Tahoe-LAFS"
participant ZKAPAuthorizer
database ZKAPAuthzDB as "ZKAPAuthorizer"
participant StorageNode as "Storage Node"
participant SpendingService as "Spending Service"
[-> MagicFolder: upload triggered
activate MagicFolder
MagicFolder -> TahoeLAFS : store some data
activate TahoeLAFS
TahoeLAFS -> ZKAPAuthorizer : store some data
activate ZKAPAuthorizer
loop until tokens accepted
ZKAPAuthorizer <- ZKAPAuthzDB : load some tokens
ZKAPAuthorizer -> StorageNode : store some data using these tokens
StorageNode -> SpendingService : spend these tokens
alt spent tokens
SpendingService -> StorageNode: already spent, rejected
StorageNode -> ZKAPAuthorizer: already spent, rejected
else fresh tokens
SpendingService -> StorageNode: accepted
end
end
StorageNode -> ZKAPAuthorizer: data stored
deactivate ZKAPAuthorizer
ZKAPAuthorizer -> ZKAPAuthzDB: discard spent tokens
ZKAPAuthorizer -> TahoeLAFS: data stored
deactivate TahoeLAFS
TahoeLAFS -> MagicFolder: data stored
deactivate MagicFolder
.. include::
../../morph/grid/local/README.rst
.. _Nix: https://nixos.org/nix
System Designs
--------------
.. toctree::
:maxdepth: 2
System Design Template <template>
$HEADLINE
=========
*The goal is to do the least design we can get away with while still making a quality product.*
*Think of this as a tool to help define the problem, analyze solutions, and share results.*
*Feel free to skip sections that you don't think are relevant*
*(but say that you are doing so).*
*Delete the bits in italics*
**Contacts:** *The primary contacts for this design.*
**Date:** *The last time this design was modified. YYYY-MM-DD*
*Short description of the feature.*
*Consider clarifying by also describing what it is not.*
Rationale
---------
*Why are we doing this now?*
*What value does this give to our users?*
*Which users?*
User Stories
------------
**$STORY NAME**
**Category:** *must / nice to have / must not*
As a **$PERSON** I want **$FEATURE** so that **$BENEFIT**.
**Acceptance Criteria:**
* *What concrete conditions must be met for the implementation to be acceptable?*
* *Surface assumptions about the user story that may not be shared across the team.*
* *Describe failure modes and negative scenarios when preconditions for using the feature are not met.*
* *Place the story in a performance/scaling context with real numbers.*
*Have as many as you like.*
*Group user stories together into meaningfully deliverable units.*
*Gather Feedback*
-----------------
*It might be a good idea to stop at this point & get feedback to make sure you're solving the right problem.*
Alternatives Considered
-----------------------
*What we've considered.*
*What trade-offs are involved with each choice.*
*Why we've chosen the one we did.*
Detailed Implementation Design
------------------------------
*Focus on:*
* external and internal interfaces
* how externally-triggered system events (e.g. sudden reboot; network congestion) will affect the system
* scalability and performance
Data Integrity
~~~~~~~~~~~~~~
*If we get this wrong once, we lose forever.*
*What data does the system need to operate on?*
*How will old data be upgraded to meet the requirements of the design?*
*How will data be upgraded to future versions of the implementation?*
Security
~~~~~~~~
*What threat model does this design take into account?*
*What new attack surfaces are added by this design?*
*What defenses are deployed with the implementation to keep those surfaces safe?*
Backwards Compatibility
~~~~~~~~~~~~~~~~~~~~~~~
*What existing systems are impacted by these changes?*
*How does the design ensure they will continue to work?*
Performance and Scalability
~~~~~~~~~~~~~~~~~~~~~~~~~~~
*How will performance of the implementation be measured?*
*After measuring it, record the results here.*
Further Reading
---------------
*Links to related things.*
*Other designs, tickets, epics, mailing list threads, etc.*
...@@ -6,18 +6,28 @@ ...@@ -6,18 +6,28 @@
Welcome to PrivateStorageio's documentation! Welcome to PrivateStorageio's documentation!
============================================ ============================================
Howdy!
We separated the documentation into parts addressing different audiences.
Please enjoy our docs for:
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
:caption: Contents:
README Administrators <ops/README>
architecture-overview Developers <dev/README>
morph System Designs <dev/designs/index>
Naming
------
The name of this ops software project is "PrivateStorageio".
There is another ops project that deals with the surrounding/supporting infrastructure called "PrivateStorageOps".
The domain name used for the website and other network services associated with this deployment is "privatestorage.io" (but may change to "private.storage" at some point).
Indices and tables Indices and tables
================== ------------------
* :ref:`genindex` * :ref:`genindex`
* :ref:`modindex` * :ref:`modindex`
......
Administrator documentation
###########################
This contains documentation regarding running PrivateStorageio.
.. toctree::
:maxdepth: 2
morph
monitoring
generating-keys
backup-recovery
stripe
Backup/Recovery
===============
This document covers the details of backups of the data required for PrivateStorageio to operate.
It describes the situations in which these backups are intended to be useful.
It also explains how to use these backups to recover in these situations.
Tahoe-LAFS Storage Nodes
------------------------
The state associated with a Tahoe-LAFS storage node consists of at least:
1. the "node directory" containing
configuration,
logs,
public and private keys,
and service fURLs.
2. the "storage" directory containing
user ciphertext,
garbage collector state,
and corruption advisories.
Node Directories
~~~~~~~~~~~~~~~~
The "node directory" changes gradually over time.
New logs are written (including incident reports).
The announcement sequence number is incremented.
The introducer cache is updated.
The critical state necessary to reproduce an identical storage node does not change.
This state consists of
* the node id (my_nodeid)
* the node private key (private/node.privkey)
* the node x509v3 certificate (private/node.pem)
A backup of the node directory can be used to create a Tahoe-LAFS storage node with the same identity as the original storage node.
It *cannot* be used to recover the user ciphertext held by the original storage node.
Nor will it recover the state which gradually changes over time.
Backup
``````
A one-time backup has been made of these directories in the PrivateStorageio 1Password account.
The "Tahoe-LAFS Storage Node Backups" vault contains backups of staging and production node directories.
The process for creating these backups is as follows:
::
DOMAIN=private.storage
FILES="node.pubkey private/ tahoe.cfg my_nodeid tahoe-client.tac node.url permutation-seed"
DIR=/var/db/tahoe-lafs/storage
for n in $(seq 1 5); do
NODE=storage00${n}.${DOMAIN}
ssh $NODE tar vvjcf - -C $DIR $FILES > ${NODE}.tar.bz2
done
tar vvjcf ${DOMAIN}.tar.bz2 *.tar.bz2
Recovery
````````
#. Prepare a system onto which to recover the node directory.
The rest of these steps assume that PrivateStorageio is deployed on the node.
#. Download the backup tarball from 1Password
#. Extract the particular node directory backup to recover from ::
[LOCAL]$ tar xvf ${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}.tar.bz2
#. Upload the node directory backup to the system onto which recovery is taking place ::
[LOCAL]$ scp ${NODE}.${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}:recovery.tar.bz2
#. Clean up the local copies of the backup files ::
[LOCAL]$ rm -iv ${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}.tar.bz2
#. The rest of the steps are executed on the system on which recovery is taking place.
Log in ::
[LOCAL]$ ssh ${NODE}.${DOMAIN}
#. On the node make sure there is no storage service running ::
[REMOTE]$ systemctl status tahoe.storage.service
If there is then figure out why and stop it if it is safe to do so ::
[REMOTE]$ systemctl stop tahoe.storage.service
#. On the node make sure there is no existing node directory ::
[REMOTE]$ stat /var/db/tahoe-lafs/storage
If there is then figure out why and remove it if it is safe to do so.
#. Unpack the node directory backup into the correct location ::
[REMOTE]$ mkdir -p /var/db/tahoe-lafs/storage
[REMOTE]$ tar xvf recovery.tar.bz2 -C /var/db/tahoe-lafs/storage
#. Mark the node directory as created and consistent ::
[REMOTE]$ touch /var/db/tahoe-lafs/storage.created
#. Start the storage service ::
[REMOTE]$ systemctl start tahoe.storage.service
#. Clean up the remote copies of the backup file ::
[REMOTE]$ rm -iv recovery.tar.bz2
Storage Directories
~~~~~~~~~~~~~~~~~~~
The user ciphertext is backed up using `Borg backup <https://borgbackup.readthedocs.io/>`_ to a separate location - currently a SaaS backup storage service (`borgbase.com <https://borgbase.com>`_).
Borg backup uses a *RepoKey* secured by a *passphrase* to encrypt the backup data and an *SSH key* to authenticate against the backup storage service.
Each Borg backup job requires one *backup repository*.
The backups are automatically checked periodically.
SSH keys
````````
Borgbase `recommends creating ed25519 ssh keys with one hundred KDF rounds <https://www.borgbase.com/ssh>`_.
We create one key pair per grid (not per host)::
$ ssh-keygen -f borgbackup-appendonly-staging -t ed25519 -a 100
$ ssh-keygen -f borgbackup-appendonly-production -t ed25519 -a 100
Save the key without a passphrase and upload the public part to `Borgbase SSH keys <https://www.borgbase.com/ssh>`_.
Passphrase
``````````
Make up a passphrase to encrypt our repository key with. Use computer help if you like::
nix-shell --packages pwgen --command 'pwgen --secure 83 1' # 83 is the year I was born. Very random.
Create & initialize the backup repository
`````````````````````````````````````````
Borgbase.com offers a `borgbase.com GraphQL API <https://docs.borgbase.com/api/>`_.
Since our current number of repositories is small we save time by creating the repositories by clicking a few buttons in the `borgbase.com Web Interface <https://www.borgbase.com/repositories>`_:
* Set up one repository per backup job.
* Set the *Repository Name* to the FQDN of the host to be backed up.
* Add the SSH key created earlier as *Append-Only Access* key.
* Leave the other settings at their defaults.
Then initialize those repositories with our chosen parameters::
export BORG_PASSCOMMAND="cat borgbackup-passphrase-staging"
export BORG_RSH="ssh -i borgbackup-appendonly-staging"
borg init -e repokey-blake2 xyxyx123@xyxyx123.repo.borgbase.com:repo
Reliability checks
``````````````````
Borg handles large amounts of data.
Given enough bits rare, spurious bit flips become a problem.
That is why regular runs of ``borg check`` are recommended
(see the `borgbase FAQ <https://docs.borgbase.com/faq/#how-often-should-i-run-borg-check>`_).
Recovery
````````
Borg offers various methods to restore backups.
A very convenient method is to mount a backup set using FUSE.
Please consult the restore documentation at `Borgbase <https://docs.borgbase.com/restore/>`_ and `Borg <https://borgbackup.readthedocs.io/en/stable/usage/mount.html>`_.
Generating keys
===============
There are example ``public-keys`` and ``private-keys`` repos in ``morph/grid/local/``.
``<grid>/config.json`` has the paths for the key files for the respective grid.
Create a symlink ``private-keys`` to your secret key repositories for the deployment you are working on.
Create a directory named ``public-keys`` containing the corresponding public keys for the deployment.
Stripe
``````
For the Stripe key any random bytes with a little light formatting "work" - at least to make our software happy - but if you want to be able to interact with Stripe and have payments (even pretend payments) move all the way through the system you should get a Stripe account and generate a key w/ them.
Lauri can get you added to our "dev" Stripe account, too, though I forget how important that is for ad hoc dev/testing.
I think this will work for generating random Stripe secret keys (that our software will load, I think, but Stripe will reject)::
>>> import base64, os
>>> print((b"sk_test_" + base64.b64encode(os.urandom(25)).strip(b"=")).decode("ascii"))
sk_test_Dr+XLVjkC0oO3Zw8Ws0yWtDLqR1sM+/fmw
Public keys are the same but "pk_test" instead of "sk_test" ("test" is for "test mode" key that can only process pretend txns; for real txns there are keys with "live" embedded).
ZKAP-Issuer Ristretto
`````````````````````
Here is a Ristretto key you can use, randomly generated just now::
SILOWzbnkBjxC1hGde9d5Q3Ir/4yLosCLEnEQGAxEQE=
Generate your own like this::
[flo@la:~/PrivateStorageio]$ nix-shell
[nix-shell:~/PrivateStorageio]$ nix-shell -p zkapissuer.components.exes.PaymentServer-generate-key
[nix-shell:~/PrivateStorageio]$ PaymentServer-generate-key
SILOWzbnkBjxC1hGde9d5Q3Ir/4yLosCLEnEQGAxEQE=
Make sure you write it into the key file `without any leading or trailing white space, also without newlines <https://github.com/LeastAuthority/python-challenge-bypass-ristretto/issues/37>`_.
For example::
echo -n "SILOWzbnkBjxC1hGde9d5Q3Ir/4yLosCLEnEQGAxEQE=" > ristretto.signing-key
Monitoring VPN
``````````````
Create all of the Wireguard VPN keys for a grid::
./tools/create-vpn-keys.sh morph/grid/testing/grid.nix
<mxfile host="app.diagrams.net" modified="2023-04-20T20:17:44.466Z" agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36" etag="8PyLTVr0G94q4Dna4Dsz" version="21.2.1" type="device">
<diagram name="Page-1" id="aaaa8250-4180-3840-79b5-4cada1eebb92">
<mxGraphModel dx="794" dy="476" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="1920" pageHeight="1200" background="#ffffff" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<mxCell id="vhdg0YFc32S7_3H95Ew1-1" value="&lt;span&gt;Management VPN&lt;br&gt;(Wireshark, TINC...)&lt;/span&gt;" style="ellipse;shape=cloud;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="780" y="720" width="450" height="110" as="geometry" />
</mxCell>
<mxCell id="2mYkRctJDop23S32jJdh-2" value="" style="rounded=0;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="840" y="405.46" width="370" height="304.54" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-3" value="Loki" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.application2;fillColor=#86E83A;strokeColor=#B0F373;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="1116" y="592.9000000000001" width="62" height="53" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-4" value="Prometheus" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.application;fillColor=#4286c5;strokeColor=#57A2D8;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="866" y="585" width="62" height="68.8" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-5" value="Grafana" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.ami2;aspect=fixed;fillColor=#FF9900;strokeColor=#ffffff;" parent="1" vertex="1">
<mxGeometry x="996" y="425" width="74" height="50" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-6" value="Node 1" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="779" y="845" width="74" height="50" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-7" value="Operator" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.end_user;strokeColor=#9673a6;fillColor=#e1d5e7;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="1276" y="305" width="49" height="100.46" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-16" value="Node ..." style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="902" y="845" width="74" height="50" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-17" value="Node ..." style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="1025" y="845" width="74" height="50" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-18" value="Node N" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="1147.5" y="845" width="74" height="50" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-45" value="" style="endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;" parent="1" edge="1">
<mxGeometry x="806" y="695" width="50" height="50" as="geometry">
<mxPoint x="894.6666666666666" y="695" as="sourcePoint" />
<mxPoint x="936" y="835" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-46" value="" style="endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;" parent="1" edge="1">
<mxGeometry x="806" y="695" width="50" height="50" as="geometry">
<mxPoint x="894.6666666666666" y="695" as="sourcePoint" />
<mxPoint x="1046" y="835" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-47" value="" style="endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;" parent="1" edge="1">
<mxGeometry x="806" y="695" width="50" height="50" as="geometry">
<mxPoint x="894.6666666666666" y="695" as="sourcePoint" />
<mxPoint x="1166" y="835" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-50" value="" style="endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;" parent="1" edge="1">
<mxGeometry x="818.6666666666667" y="695" width="63.33333333333333" height="75" as="geometry">
<mxPoint x="846" y="835" as="sourcePoint" />
<mxPoint x="1148" y="695" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-51" value="" style="endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;" parent="1" edge="1">
<mxGeometry x="818.6666666666667" y="695" width="63.33333333333333" height="75" as="geometry">
<mxPoint x="946" y="835" as="sourcePoint" />
<mxPoint x="1148" y="695" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-52" value="" style="endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;" parent="1" edge="1">
<mxGeometry x="818.6666666666667" y="695" width="63.33333333333333" height="75" as="geometry">
<mxPoint x="1056" y="835" as="sourcePoint" />
<mxPoint x="1148" y="695" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-53" value="" style="endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;" parent="1" edge="1">
<mxGeometry x="818.6666666666667" y="695" width="63.33333333333333" height="75" as="geometry">
<mxPoint x="1186" y="835" as="sourcePoint" />
<mxPoint x="1148" y="695" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-57" value="" style="endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="988" y="495" as="sourcePoint" />
<mxPoint x="928" y="565" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-58" value="" style="endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="1078" y="495" as="sourcePoint" />
<mxPoint x="1136" y="565" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-61" value="View dashboards&lt;br&gt;in browser" style="shape=flexArrow;endArrow=classic;html=1;strokeColor=#9673a6;strokeWidth=1;fillColor=#e1d5e7;spacing=8;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="1259.5" y="415" as="sourcePoint" />
<mxPoint x="1109.5" y="445" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-63" value="&lt;h1&gt;Monitoring architecture&amp;nbsp;&lt;/h1&gt;&lt;p&gt;Keep it simple, sunshine!&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;i&gt;Grafana&lt;/i&gt; retrieves metrics from &lt;i&gt;Prometheus&lt;/i&gt; and logs from&amp;nbsp;&lt;i&gt;Loki&lt;/i&gt;,&amp;nbsp;&lt;span&gt;shows dashboards (web) and does alerting (via eMail? Slack?)&lt;/span&gt;&lt;br&gt;&lt;p&gt;&lt;i&gt;Prometheus&lt;/i&gt; stores metrics it pulls from various &lt;i&gt;Exporters&lt;/i&gt; on nodes&lt;/p&gt;&lt;p&gt;&lt;i&gt;Promtail&lt;/i&gt; on nodes pushes logs to &lt;i&gt;Loki&lt;br&gt;&lt;br&gt;&lt;/i&gt;&lt;/p&gt;&lt;p&gt;We try to keep the system as simple as possible: All monitoring and alerting runs on a single machine.&lt;/p&gt;&lt;h2&gt;Changes&lt;/h2&gt;&lt;div&gt;v2: Add Github authentication to Grafana. Add management VPN.&lt;br&gt;&lt;br&gt;v1: Initial version&lt;/div&gt;" style="text;html=1;strokeColor=none;fillColor=none;spacing=5;spacingTop=-20;whiteSpace=wrap;overflow=hidden;rounded=0;shadow=0;comic=0;sketch=0;" parent="1" vertex="1">
<mxGeometry x="596" y="315" width="164" height="545" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-65" value="Send alerts" style="shape=flexArrow;endArrow=classic;html=1;strokeColor=#9673a6;strokeWidth=1;fillColor=#e1d5e7;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="1086" y="405.46000000000004" as="sourcePoint" />
<mxPoint x="1236" y="375.46000000000004" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="2mYkRctJDop23S32jJdh-3" value="Monitoring server" style="text;html=1;strokeColor=none;fillColor=none;align=left;verticalAlign=middle;whiteSpace=wrap;rounded=0;" parent="1" vertex="1">
<mxGeometry x="845" y="411" width="120" height="20" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-44" value="" style="endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;" parent="1" edge="1">
<mxGeometry x="806" y="695" width="50" height="50" as="geometry">
<mxPoint x="894.6666666666666" y="695" as="sourcePoint" />
<mxPoint x="826" y="835" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="vhdg0YFc32S7_3H95Ew1-7" value="GitHub&lt;br&gt;OAuth2" style="ellipse;shape=cloud;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="984" y="305" width="98" height="60" as="geometry" />
</mxCell>
<mxCell id="vhdg0YFc32S7_3H95Ew1-8" value="" style="endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="1032.76" y="417" as="sourcePoint" />
<mxPoint x="1032.76" y="372" as="targetPoint" />
</mxGeometry>
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>
<!--[if IE]><meta http-equiv="X-UA-Compatible" content="IE=5,IE=9" ><![endif]-->
<!DOCTYPE html>
<html>
<head>
<title>monitoring-architecture.html</title>
<meta charset="utf-8"/>
</head>
<body>
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;" data-mxgraph="{&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;resize&quot;:true,&quot;xml&quot;:&quot;&lt;mxfile host=\&quot;app.diagrams.net\&quot; modified=\&quot;2023-04-20T20:19:05.428Z\&quot; agent=\&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36\&quot; etag=\&quot;4ETb3iIJGh8a_GDWm07E\&quot; version=\&quot;21.2.1\&quot; type=\&quot;device\&quot;&gt;&lt;diagram name=\&quot;Page-1\&quot; id=\&quot;aaaa8250-4180-3840-79b5-4cada1eebb92\&quot;&gt;&lt;mxGraphModel dx=\&quot;794\&quot; dy=\&quot;476\&quot; grid=\&quot;1\&quot; gridSize=\&quot;10\&quot; guides=\&quot;1\&quot; tooltips=\&quot;1\&quot; connect=\&quot;1\&quot; arrows=\&quot;1\&quot; fold=\&quot;1\&quot; page=\&quot;1\&quot; pageScale=\&quot;1\&quot; pageWidth=\&quot;1920\&quot; pageHeight=\&quot;1200\&quot; background=\&quot;#ffffff\&quot; math=\&quot;0\&quot; shadow=\&quot;0\&quot;&gt;&lt;root&gt;&lt;mxCell id=\&quot;0\&quot;/&gt;&lt;mxCell id=\&quot;1\&quot; parent=\&quot;0\&quot;/&gt;&lt;mxCell id=\&quot;vhdg0YFc32S7_3H95Ew1-1\&quot; value=\&quot;&amp;lt;span&amp;gt;Management VPN&amp;lt;br&amp;gt;(WireGuard)&amp;lt;/span&amp;gt;\&quot; style=\&quot;ellipse;shape=cloud;whiteSpace=wrap;html=1;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;780\&quot; y=\&quot;720\&quot; width=\&quot;450\&quot; height=\&quot;110\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;2mYkRctJDop23S32jJdh-2\&quot; value=\&quot;\&quot; style=\&quot;rounded=0;whiteSpace=wrap;html=1;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;840\&quot; y=\&quot;405.46\&quot; width=\&quot;370\&quot; height=\&quot;304.54\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-3\&quot; value=\&quot;Loki\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.application2;fillColor=#86E83A;strokeColor=#B0F373;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;1116\&quot; y=\&quot;592.9000000000001\&quot; width=\&quot;62\&quot; height=\&quot;53\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-4\&quot; value=\&quot;Prometheus\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.application;fillColor=#4286c5;strokeColor=#57A2D8;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;866\&quot; y=\&quot;585\&quot; width=\&quot;62\&quot; height=\&quot;68.8\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-5\&quot; value=\&quot;Grafana\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.ami2;aspect=fixed;fillColor=#FF9900;strokeColor=#ffffff;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;996\&quot; y=\&quot;425\&quot; width=\&quot;74\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-6\&quot; value=\&quot;Node 1\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;779\&quot; y=\&quot;845\&quot; width=\&quot;74\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-7\&quot; value=\&quot;Operator\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.end_user;strokeColor=#9673a6;fillColor=#e1d5e7;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;1276\&quot; y=\&quot;305\&quot; width=\&quot;49\&quot; height=\&quot;100.46\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-16\&quot; value=\&quot;Node ...\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;902\&quot; y=\&quot;845\&quot; width=\&quot;74\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-17\&quot; value=\&quot;Node ...\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;1025\&quot; y=\&quot;845\&quot; width=\&quot;74\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-18\&quot; value=\&quot;Node N\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;1147.5\&quot; y=\&quot;845\&quot; width=\&quot;74\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-45\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;806\&quot; y=\&quot;695\&quot; width=\&quot;50\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;894.6666666666666\&quot; y=\&quot;695\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;936\&quot; y=\&quot;835\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-46\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;806\&quot; y=\&quot;695\&quot; width=\&quot;50\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;894.6666666666666\&quot; y=\&quot;695\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1046\&quot; y=\&quot;835\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-47\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;806\&quot; y=\&quot;695\&quot; width=\&quot;50\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;894.6666666666666\&quot; y=\&quot;695\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1166\&quot; y=\&quot;835\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-50\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;818.6666666666667\&quot; y=\&quot;695\&quot; width=\&quot;63.33333333333333\&quot; height=\&quot;75\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;846\&quot; y=\&quot;835\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1148\&quot; y=\&quot;695\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-51\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;818.6666666666667\&quot; y=\&quot;695\&quot; width=\&quot;63.33333333333333\&quot; height=\&quot;75\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;946\&quot; y=\&quot;835\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1148\&quot; y=\&quot;695\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-52\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;818.6666666666667\&quot; y=\&quot;695\&quot; width=\&quot;63.33333333333333\&quot; height=\&quot;75\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1056\&quot; y=\&quot;835\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1148\&quot; y=\&quot;695\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-53\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;818.6666666666667\&quot; y=\&quot;695\&quot; width=\&quot;63.33333333333333\&quot; height=\&quot;75\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1186\&quot; y=\&quot;835\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1148\&quot; y=\&quot;695\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-57\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry width=\&quot;50\&quot; height=\&quot;50\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;988\&quot; y=\&quot;495\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;928\&quot; y=\&quot;565\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-58\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry width=\&quot;50\&quot; height=\&quot;50\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1078\&quot; y=\&quot;495\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1136\&quot; y=\&quot;565\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-61\&quot; value=\&quot;View dashboards&amp;lt;br&amp;gt;in browser\&quot; style=\&quot;shape=flexArrow;endArrow=classic;html=1;strokeColor=#9673a6;strokeWidth=1;fillColor=#e1d5e7;spacing=8;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry width=\&quot;50\&quot; height=\&quot;50\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1259.5\&quot; y=\&quot;415\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1109.5\&quot; y=\&quot;445\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-63\&quot; value=\&quot;&amp;lt;h1&amp;gt;Monitoring architecture&amp;amp;nbsp;&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Keep it simple, sunshine!&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;i&amp;gt;Grafana&amp;lt;/i&amp;gt; retrieves metrics from &amp;lt;i&amp;gt;Prometheus&amp;lt;/i&amp;gt; and logs from&amp;amp;nbsp;&amp;lt;i&amp;gt;Loki&amp;lt;/i&amp;gt;,&amp;amp;nbsp;&amp;lt;span&amp;gt;shows dashboards (in a web browser) and does alerting (via Zulip)&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;&amp;lt;p&amp;gt;&amp;lt;i&amp;gt;Prometheus&amp;lt;/i&amp;gt; stores metrics it pulls from various &amp;lt;i&amp;gt;Exporters&amp;lt;/i&amp;gt; on nodes&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;i&amp;gt;Promtail&amp;lt;/i&amp;gt; on nodes pushes logs to &amp;lt;i&amp;gt;Loki&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&amp;lt;/i&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;We try to keep the system as simple as possible: All monitoring and alerting runs on a single machine.&amp;lt;/p&amp;gt;&amp;lt;h2&amp;gt;Changes&amp;lt;/h2&amp;gt;&amp;lt;div&amp;gt;v3: Fix WireGuard/Wireshark braino, Update Auth (Google, not GitHub)&amp;lt;/div&amp;gt;&amp;lt;div&amp;gt;&amp;lt;br&amp;gt;&amp;lt;/div&amp;gt;&amp;lt;div&amp;gt;v2: Add Github authentication to Grafana. Add management VPN.&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;v1: Initial version&amp;lt;/div&amp;gt;\&quot; style=\&quot;text;html=1;strokeColor=none;fillColor=none;spacing=5;spacingTop=-20;whiteSpace=wrap;overflow=hidden;rounded=0;shadow=0;comic=0;sketch=0;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;596\&quot; y=\&quot;315\&quot; width=\&quot;164\&quot; height=\&quot;605\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-65\&quot; value=\&quot;Send alerts\&quot; style=\&quot;shape=flexArrow;endArrow=classic;html=1;strokeColor=#9673a6;strokeWidth=1;fillColor=#e1d5e7;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry width=\&quot;50\&quot; height=\&quot;50\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1086\&quot; y=\&quot;405.46000000000004\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1236\&quot; y=\&quot;375.46000000000004\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;UserObject label=\&quot;Monitoring server\&quot; link=\&quot;https://monitoring.private.storage/\&quot; id=\&quot;2mYkRctJDop23S32jJdh-3\&quot;&gt;&lt;mxCell style=\&quot;text;html=1;strokeColor=none;fillColor=none;align=left;verticalAlign=middle;whiteSpace=wrap;rounded=0;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;845\&quot; y=\&quot;411\&quot; width=\&quot;120\&quot; height=\&quot;20\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;/UserObject&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-44\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;806\&quot; y=\&quot;695\&quot; width=\&quot;50\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;894.6666666666666\&quot; y=\&quot;695\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;826\&quot; y=\&quot;835\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;vhdg0YFc32S7_3H95Ew1-7\&quot; value=\&quot;GSuite&amp;lt;br&amp;gt;OAuth2\&quot; style=\&quot;ellipse;shape=cloud;whiteSpace=wrap;html=1;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;984\&quot; y=\&quot;305\&quot; width=\&quot;98\&quot; height=\&quot;60\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;vhdg0YFc32S7_3H95Ew1-8\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry width=\&quot;50\&quot; height=\&quot;50\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1032.76\&quot; y=\&quot;417\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1032.76\&quot; y=\&quot;372\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;/root&gt;&lt;/mxGraphModel&gt;&lt;/diagram&gt;&lt;/mxfile&gt;&quot;,&quot;toolbar&quot;:&quot;pages zoom layers lightbox&quot;,&quot;page&quot;:0}"></div>
<script type="text/javascript" src="https://app.diagrams.net/js/viewer-static.min.js"></script>
</body>
</html>
Monitoring
==========
This section gives a high-level overview of the PrivateStorageio monitoring efforts.
Goals
`````
Alerting
Something might break soon, so somebody needs to do something.
Comparing over time or experiment groups
Is our service slower than it was last week? Does database B answer queries faster than database A?
Analyzing long-term trends
How big is my database and how fast is it growing? How quickly is my daily-active user count growing?
Architecture
````````````
Below you find a diagram of the software and systems that comprise our monitoring and alerting infrastructure.
It has intentionally been kept simple, yet is already surprisingly complex (at least if you are new to the monitoring world).
The software stack is industry standard and chosen so it would be easy to find solutions to problems and people who can help out.
Log in to `staging <https://monitoring.privatestorage-staging.com/>`_ and `production <https://monitoring.private.storage/>`_ Grafana via your GSuite Private.Storage session.
.. raw:: html
:file: monitoring-architecture.html
Introduction to our dashboards
``````````````````````````````
We have two groups of dashboards: Requests (external view, RED method) and Resources (internal view, USE method).
Resources like CPU and memory exist independently of one another (at least in theory) and their corresponding dashboards are listed in arbitrary order.
Services, on the other hand, often directly depend on other services:
A request might cause sub-requests, which in turn might call other services.
These dependencies can be visualized as a DAG (directed acyclic graph, like a tree but with directed edges) from external-facing to internal systems.
When a service fails, and an Alert is triggered, often the services which depend on the failing service will fail and trigger Alerts as well.
This can cause confusion and cost valuable time especially when the current on-call staff is not familiar with the inner workings of a particular machinery.
To mitigate this problem, we order our dashboards to resemble these dependencies according to a `breadth-first-search <https://en.wikipedia.org/wiki/Breadth-first_search>`_ of the service dependency DAG:
.. graphviz:: service-dag-to-dashboard-order.dot
:caption: DAG of services to resulting order of corresponding dashboards
This makes finding the first failing link, and thus the cause of the problem, quicker:
Problems of a failing service lowest in the DAG bubble "upwards".
Therefore, the "lowest" dashboard that indicates a problem has a high probability of highlighting the origin of the cascading failures.
Meaning of our metrics
``````````````````````
Google's *Monitoring Distributed Systems* book about what they call the `Four Golden Signals <https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals>`_ has a great explanation and definition of useful metrics:
Latency
Requests, but also errors take time, so don't discard them.
Traffic
What constitutes "Traffic" depends on the nature of your system.
Errors
(The *rate* of ) failed requests.
Saturation
How "full" your service is. Take action (i.e. page a human) before service degrades.
"If you measure all four golden signals and page a human when one signal is problematic (or, in the case of saturation, nearly problematic), your service will be at least decently covered by monitoring."
RED method for services ("request-scoped", "external view")
```````````````````````````````````````````````````````````
Request rate, Errors, Duration (+ Saturation?)
* Instrument everything that takes time and could fail
* "In contrast to logging, services should instrument every meaningful number available for capture." (Peter Bourgon)
* Plot 99th percentile, 50th percentile and average
* 50th and average should be close - else something is wrong
* Averages sum neatly - Service latency average should be sum of child service latencies
USE method for resources ("resource-scoped", "internal view")
`````````````````````````````````````````````````````````````
Utilization, Saturation, Errors:
* CPU saturation (Idea: max saturation value per machine, since our load is mostly single-core)
* Memory saturation
* Network saturation
* Disks
* Storage capacity
* I/O saturation
* Software resources
* File descriptors
Logging
```````
Peter Bourgon has a lot of wise things to say about logging in `his brilliant article Logging v. instrumentation <https://peter.bourgon.org/blog/2016/02/07/logging-v-instrumentation.html#:~:text=Instrumentation%20is%20for%20all%20remaining,meaningful%20number%20available%20for%20capture.>`_.
* "[S]ervices should only log actionable information. That includes serious, panic-level errors that need to be consumed by humans, or structured data that needs to be consumed by machines."
* "Logs read by humans should be sparse, ideally silent if nothing is going wrong. Logs read by machines should be well-defined, ideally with a versioned schema."
* "A (service) never concerns itself with routing or storage of its output stream. It should not attempt to write to or manage logfiles. Instead, each running process writes its event stream, unbuffered, to stdout."
* "Finally, understand that logging is expensive." and "Resist the urge to log any information that doesn’t meet the above criteria. As a concrete example, logging each incoming HTTP request is almost certainly a mistake."
Alerts
``````
Nobody likes being alerted needlessly.
Don't give *Alert Fatigue* a chance!
Rob Ewaschuk gives some great advice in Google's `Monitoring Distributed Systems <https://sre.google/sre-book/monitoring-distributed-systems/#tying-these-principles-together-nqsJfw>`_: "a good starting point for writing or reviewing a new alert":
- Only alert on actionable and urgent events that negatively affect users consistently and that cannot wait & cannot be automated.
- Page one person at a time.
- Do only page on novel problems.
See also
````````
This methodology was inspired by (inter alia)
* `Brendan Gregg: The Utilization Saturation and Errors (USE) Method. 2007. <http://www.brendangregg.com/usemethod.html>`_
* `Rob Ewaschuk, Betsy Beyer: Monitoring Distributed Systems. 2017. <https://sre.google/sre-book/monitoring-distributed-systems/>`_ The Four Golden Signals SRE Book by Google.
* `Tom Wilkie (Kausal): The RED method. How To Instrument Your Services. Feb 2018. <https://www.youtube.com/watch?v=9dRSYjBPaZM>`_
* `Steve Mushero: How to Monitor the SRE Golden Signals. Nov 10, 2017. <https://steve-mushero.medium.com/linuxs-sre-golden-signals-af5aaa26ebae>`_
* `Cindy Sridharan: Logs and Metrics. Apr 30, 2017. <https://copyconstruct.medium.com/logs-and-metrics-6d34d3026e38>`_
* `Peter Bourgon: Logging v. instrumentation. 2016 02 07. <https://peter.bourgon.org/blog/2016/02/07/logging-v-instrumentation.html#:~:text=Instrumentation%20is%20for%20all%20remaining,meaningful%20number%20available%20for%20capture.>`_ What not to log.
File moved
digraph {
subgraph cluster01 {
label = "DAG of service dependencies";
1->2;
1->3;
3->4;
3->5;
}
subgraph cluster02 {
label = "Resulting order of dashboards";
node [ shape = box ];
edge [ style = invis ];
d1 [ label = 1 ];
d2 [ label = 2 ];
d3 [ label = 3 ];
d4 [ label = 4 ];
d5 [ label = 5 ];
d1->d2;
d2->d3;
d3->d4;
d4->d5;
}
}
Stripe
======
We use Stripe for payment processing.
We have test-mode keys for use in staging and live-mode keys for use in production.
There is "payment link" state in Stripe to facilitate the payment workflow.
This was created with ``admin/create-payment-link.sh``
(once for test-mode and once for live-mode).
The payment links can be found in PrivateStorageOps.
They are the values of the stage3 variables ``stripe_payment_link_staging`` and ``stripe_payment_link_production``.
There is also "webhook" state in Stripe so that PaymentServer receives notification of payment.
This was created with ``admin/create-webhook.sh``
(once for test-mode and once for live-mode).
The test-mode webhook is ``we_1LxwKnBHXBAMm9bPDJXJNcDN``.
The live-mode webhook is ``we_1LzioNA9OAm23rYOmAcp3V85``.
The webhook secrets can be found with the rest of the each grid's private keys in ``stripe.webhook-secret``.
.. include::
../../README.rst
!.gitignore
digraph subscriptions {
rankdir=LR
subgraph cluster_usercontrolled {
label = "User Operated"
rankdir=LR
GridSync [label="GridSync", shape=circle]
Browser [label="Browser", shape=circle]
TahoeLAFS [label="Tahoe-LAFS", shape=circle]
}
subgraph cluster_pscontrolled {
label = "PrivateStorage.io Operated"
rankdir = TB
PSWebServer [label="PrivateStorage.io Web Server", shape=box]
SubscriptionConfigWHPeer [label="Subscription Config Wormhole Peer", shape=box]
PaymentServer [label="Payment Server", shape=box]
SATIssuer [label="SAT Issuer", shape=box]
PSStorageGrid [label="PrivateStorage.io Storage Grid", shape=box]
}
User [label="User", shape=egg]
Stripe [label="Stripe", shape=pentagon]
User -> PSWebServer [label="1. Get wormhole code", fontcolor=red, color=red]
PSWebServer -> User [label="2. 7-petulant-banana", fontcolor=blue, color=blue]
User -> GridSync [label="3. 7-petulant-banana", fontcolor=brown, color=brown]
GridSync -> SubscriptionConfigWHPeer [label="4. Get configuration", fontcolor=black, color=black]
SubscriptionConfigWHPeer -> GridSync [label="5. Grid configuration", fontcolor=magenta, color=magenta]
GridSync -> TahoeLAFS [label="6. Instantiate", fontcolor=aquamarine3, color=aquamarine3]
GridSync -> TahoeLAFS [label="7. Redeem PRN", fontcolor=crimson, color=crimson]
TahoeLAFS -> PaymentServer [label="8. Redeem PRN", fontcolor=crimson, color=crimson]
PaymentServer -> TahoeLAFS [label="9. Payment required", fontcolor=gold3, color=gold3]
TahoeLAFS -> GridSync [label="10. Payment required", fontcolor=gold3, color=gold3]
GridSync -> Browser [label="11. Open payment window", fontcolor=gold3, color=gold3]
User -> Browser [label="12. Enter payment info", fontcolor=blue, color=blue]
Browser -> Stripe [label="13. Submit payment form", fontcolor=brown, color=brown]
Stripe -> Browser [label="14. Payment ok", fontcolor=black, color=black]
Stripe -> PaymentServer [label="15. Payment notification", fontcolor=magenta, color=magenta]
GridSync -> TahoeLAFS [label="16. Redeem PRN", fontcolor=aquamarine3, color=aquamarine3]
TahoeLAFS -> TahoeLAFS [label="17. Generate blinded tokens", fontcolor=crimson, color=crimson]
TahoeLAFS -> SATIssuer [label="18. Redeem PRN, blinded-tokens=xs", fontcolor=crimson, color=crimson]
SATIssuer -> PaymentServer [label="19. Check PRN", fontcolor=gold3, color=gold3]
PaymentServer -> SATIssuer [label="20. PRN Valid", fontcolor=gold3, color=gold3]
SATIssuer -> TahoeLAFS [label="21. PRN valid, signed-tokens=ys", fontcolor=crimson, color=crimson]
TahoeLAFS -> TahoeLAFS [label="22. Store signed tokens", fontcolor=crimson, color=crimson]
TahoeLAFS -> GridSync [label="23. PRN Redeemed", fontcolor=red, color=red]
TahoeLAFS -> PSStorageGrid [label="24. Use storage, passes=y", fontcolor=magenta, color=magenta]
}
Architecture Overview
=====================
.. graphviz:: architecture-overview.dot
...@@ -36,11 +36,68 @@ lib ...@@ -36,11 +36,68 @@ lib
--- ---
This contains Nix library code for defining the grids. This contains Nix library code for defining the grids.
It has all the details of how each type of node in our grid is configured.
It knows about morph (so defines ``deployment.secrets`` and has the logic for collecting data defined by other nodes).
It defines options (i.e. ``grid.*``) for things specific to how we configure grids (e.g. ``grid.publicKeyPath``).
It defines metadata about nodes that we use on other nodes (e.g. ``grid.monitoringvpnIPv4`` which is used to define various things on the monitoring node).
Each top-level module here defines one type of node with all (or at least most) of the configuration necessary for that node.
grid grid
---- ----
Specific grid definitions live in subdirectories beneath this directory. Specific grid definitions live in subdirectories beneath this directory.
They consist almost exclusively setting options defined in ``morph/lib`` (and few options defined elsewhere) and then delegating to the ``morph/lib`` modules.
private-keys
~~~~~~~~~~~~
This must be created and populated before the grid can be built or deployed.
This directory contains all of the secrets necessary to deploy the grid.
Secrets beneath this directory are referenced by ``config.json`` and ``grid.nix``
(and possibly elsewhere).
Some of the paths are configurable and some are just convention.
This path is **ignored** by git.
The intended workflow is that the secrets will be maintained on secure storage and a symlink to the correct location created here.
This keeps the secrets themselves out of the git working tree as an extra protection against unintentionally committing them.
An exception is the ``private-keys`` directory in the ``local`` morph grid:
That directory is fully populated, provided as an example, and mostly: not very secret.
Do not deploy these keys to machines reachable via the internet.
Strictly speaking,
this path is configurable in the grid's ``config.json`` but all three grids currently use this name.
public-keys
~~~~~~~~~~~
This must be created and populated before the grid can be built or deployed.
This directory contains any public key material necessary for operation of the grid.
This includes the public keys corresponding to any private keys held in ``private-keys``.
As for ``private-keys``,
this path can be configured in the grid's ``config.json``.
Star-crossed Keys
^^^^^^^^^^^^^^^^^
Where the system uses keypairs,
the public and private parts of those keypairs are stored in different locations
(``public-keys`` and ``private-keys`` mentioned above).
This somewhat complicates key management because any key rotation involves changing key material in two location instead of just one.
This complication is balanced against a specific operational goal:
that our build systems operate without copies of our private keys.
Our system configurations do currently have build-time dependencies on public keys.
Splitting public keys and private keys across two different storage locations provides a simple mechanism for providing build systems with the public keys but withholding the private keys.
In the future we may:
* be sufficiently confident in the security of our build systems to let them have our private keys; or
* remove the dependency upon public keys from the build process.
Either of these directions would let us re-unify public/private-key storage and remove this complication.
config.json config.json
~~~~~~~~~~~ ~~~~~~~~~~~
......
{ pkgs ? import ../nixpkgs.nix {} }:
let
lib = pkgs.lib;
gridlib = import ./lib;
inherit (gridlib.pkgs) ourpkgs;
grids-path = "${builtins.toString ./.}/grid";
grid-configs = lib.mapAttrs (n: v: grids-path + "/${n}/grid.nix") (lib.filterAttrs (n: v: v == "directory") (builtins.readDir grids-path));
# It would be useful if morph exposed this as a function.
# https://github.com/DBCDK/morph/pull/166
morph-eval = networkExpr: (import "${pkgs.morph.lib}/eval-machines.nix") { inherit networkExpr; };
grids = lib.mapAttrs (n: v: (morph-eval v)) grid-configs;
# Derivation with symlinks to the morph output for each grid.
output = pkgs.runCommand "privatestorage-morph"
{ preferLocalBuild = true; allowSubstitutes = false; passthru = { inherit gridlib ourpkgs grids; }; }
''
mkdir $out
${lib.concatStringsSep "\n" (
lib.mapAttrsToList (
name: morph:
let
output = morph.machines {
# It would be nice if we didn't need to write this data to a file.
# https://github.com/DBCDK/morph/pull/186
argsFile = pkgs.writeText "args" (builtins.toJSON { Names = lib.attrNames morph.nodes; });
};
in
''
ln -s ${output} $out/${lib.escapeShellArg name}
''
) grids
)}'';
in output