Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision

Target

Select target project
  • privatestorage/PrivateStorageio
  • tomprince/PrivateStorageio
2 results
Select Git revision
Show changes
Showing
with 608 additions and 225 deletions
$HEADLINE
=========
*The goal is to do the least design we can get away with while still making a quality product.*
*Think of this as a tool to help define the problem, analyze solutions, and share results.*
*Feel free to skip sections that you don't think are relevant*
*(but say that you are doing so).*
*Delete the bits in italics*
**Contacts:** *The primary contacts for this design.*
**Date:** *The last time this design was modified. YYYY-MM-DD*
*Short description of the feature.*
*Consider clarifying by also describing what it is not.*
Rationale
---------
*Why are we doing this now?*
*What value does this give to our users?*
*Which users?*
User Stories
------------
**$STORY NAME**
**Category:** *must / nice to have / must not*
As a **$PERSON** I want **$FEATURE** so that **$BENEFIT**.
**Acceptance Criteria:**
* *What concrete conditions must be met for the implementation to be acceptable?*
* *Surface assumptions about the user story that may not be shared across the team.*
* *Describe failure modes and negative scenarios when preconditions for using the feature are not met.*
* *Place the story in a performance/scaling context with real numbers.*
*Have as many as you like.*
*Group user stories together into meaningfully deliverable units.*
*Gather Feedback*
-----------------
*It might be a good idea to stop at this point & get feedback to make sure you're solving the right problem.*
Alternatives Considered
-----------------------
*What we've considered.*
*What trade-offs are involved with each choice.*
*Why we've chosen the one we did.*
Detailed Implementation Design
------------------------------
*Focus on:*
* external and internal interfaces
* how externally-triggered system events (e.g. sudden reboot; network congestion) will affect the system
* scalability and performance
Data Integrity
~~~~~~~~~~~~~~
*If we get this wrong once, we lose forever.*
*What data does the system need to operate on?*
*How will old data be upgraded to meet the requirements of the design?*
*How will data be upgraded to future versions of the implementation?*
Security
~~~~~~~~
*What threat model does this design take into account?*
*What new attack surfaces are added by this design?*
*What defenses are deployed with the implementation to keep those surfaces safe?*
Backwards Compatibility
~~~~~~~~~~~~~~~~~~~~~~~
*What existing systems are impacted by these changes?*
*How does the design ensure they will continue to work?*
Performance and Scalability
~~~~~~~~~~~~~~~~~~~~~~~~~~~
*How will performance of the implementation be measured?*
*After measuring it, record the results here.*
Further Reading
---------------
*Links to related things.*
*Other designs, tickets, epics, mailing list threads, etc.*
......@@ -6,13 +6,16 @@
Welcome to PrivateStorageio's documentation!
============================================
Howdy! We separated the documentation into parts addressing different audiences. Please enjoy our docs for:
Howdy!
We separated the documentation into parts addressing different audiences.
Please enjoy our docs for:
.. toctree::
:maxdepth: 2
Administrators <ops/README>
Developers <dev/README>
System Designs <dev/designs/index>
Naming
......
......@@ -3,11 +3,11 @@ Administrator documentation
This contains documentation regarding running PrivateStorageio.
.. include::
../../../morph/README.rst
.. include::
monitoring.rst
.. include::
generating-keys.rst
.. toctree::
:maxdepth: 2
morph
monitoring
generating-keys
backup-recovery
stripe
Backup/Recovery
===============
This document covers the details of backups of the data required for PrivateStorageio to operate.
It describes the situations in which these backups are intended to be useful.
It also explains how to use these backups to recover in these situations.
Tahoe-LAFS Storage Nodes
------------------------
The state associated with a Tahoe-LAFS storage node consists of at least:
1. the "node directory" containing
configuration,
logs,
public and private keys,
and service fURLs.
2. the "storage" directory containing
user ciphertext,
garbage collector state,
and corruption advisories.
Node Directories
~~~~~~~~~~~~~~~~
The "node directory" changes gradually over time.
New logs are written (including incident reports).
The announcement sequence number is incremented.
The introducer cache is updated.
The critical state necessary to reproduce an identical storage node does not change.
This state consists of
* the node id (my_nodeid)
* the node private key (private/node.privkey)
* the node x509v3 certificate (private/node.pem)
A backup of the node directory can be used to create a Tahoe-LAFS storage node with the same identity as the original storage node.
It *cannot* be used to recover the user ciphertext held by the original storage node.
Nor will it recover the state which gradually changes over time.
Backup
``````
A one-time backup has been made of these directories in the PrivateStorageio 1Password account.
The "Tahoe-LAFS Storage Node Backups" vault contains backups of staging and production node directories.
The process for creating these backups is as follows:
::
DOMAIN=private.storage
FILES="node.pubkey private/ tahoe.cfg my_nodeid tahoe-client.tac node.url permutation-seed"
DIR=/var/db/tahoe-lafs/storage
for n in $(seq 1 5); do
NODE=storage00${n}.${DOMAIN}
ssh $NODE tar vvjcf - -C $DIR $FILES > ${NODE}.tar.bz2
done
tar vvjcf ${DOMAIN}.tar.bz2 *.tar.bz2
Recovery
````````
#. Prepare a system onto which to recover the node directory.
The rest of these steps assume that PrivateStorageio is deployed on the node.
#. Download the backup tarball from 1Password
#. Extract the particular node directory backup to recover from ::
[LOCAL]$ tar xvf ${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}.tar.bz2
#. Upload the node directory backup to the system onto which recovery is taking place ::
[LOCAL]$ scp ${NODE}.${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}:recovery.tar.bz2
#. Clean up the local copies of the backup files ::
[LOCAL]$ rm -iv ${DOMAIN}.tar.bz2 ${NODE}.${DOMAIN}.tar.bz2
#. The rest of the steps are executed on the system on which recovery is taking place.
Log in ::
[LOCAL]$ ssh ${NODE}.${DOMAIN}
#. On the node make sure there is no storage service running ::
[REMOTE]$ systemctl status tahoe.storage.service
If there is then figure out why and stop it if it is safe to do so ::
[REMOTE]$ systemctl stop tahoe.storage.service
#. On the node make sure there is no existing node directory ::
[REMOTE]$ stat /var/db/tahoe-lafs/storage
If there is then figure out why and remove it if it is safe to do so.
#. Unpack the node directory backup into the correct location ::
[REMOTE]$ mkdir -p /var/db/tahoe-lafs/storage
[REMOTE]$ tar xvf recovery.tar.bz2 -C /var/db/tahoe-lafs/storage
#. Mark the node directory as created and consistent ::
[REMOTE]$ touch /var/db/tahoe-lafs/storage.created
#. Start the storage service ::
[REMOTE]$ systemctl start tahoe.storage.service
#. Clean up the remote copies of the backup file ::
[REMOTE]$ rm -iv recovery.tar.bz2
Storage Directories
~~~~~~~~~~~~~~~~~~~
The user ciphertext is backed up using `Borg backup <https://borgbackup.readthedocs.io/>`_ to a separate location - currently a SaaS backup storage service (`borgbase.com <https://borgbase.com>`_).
Borg backup uses a *RepoKey* secured by a *passphrase* to encrypt the backup data and an *SSH key* to authenticate against the backup storage service.
Each Borg backup job requires one *backup repository*.
The backups are automatically checked periodically.
SSH keys
````````
Borgbase `recommends creating ed25519 ssh keys with one hundred KDF rounds <https://www.borgbase.com/ssh>`_.
We create one key pair per grid (not per host)::
$ ssh-keygen -f borgbackup-appendonly-staging -t ed25519 -a 100
$ ssh-keygen -f borgbackup-appendonly-production -t ed25519 -a 100
Save the key without a passphrase and upload the public part to `Borgbase SSH keys <https://www.borgbase.com/ssh>`_.
Passphrase
``````````
Make up a passphrase to encrypt our repository key with. Use computer help if you like::
nix-shell --packages pwgen --command 'pwgen --secure 83 1' # 83 is the year I was born. Very random.
Create & initialize the backup repository
`````````````````````````````````````````
Borgbase.com offers a `borgbase.com GraphQL API <https://docs.borgbase.com/api/>`_.
Since our current number of repositories is small we save time by creating the repositories by clicking a few buttons in the `borgbase.com Web Interface <https://www.borgbase.com/repositories>`_:
* Set up one repository per backup job.
* Set the *Repository Name* to the FQDN of the host to be backed up.
* Add the SSH key created earlier as *Append-Only Access* key.
* Leave the other settings at their defaults.
Then initialize those repositories with our chosen parameters::
export BORG_PASSCOMMAND="cat borgbackup-passphrase-staging"
export BORG_RSH="ssh -i borgbackup-appendonly-staging"
borg init -e repokey-blake2 xyxyx123@xyxyx123.repo.borgbase.com:repo
Reliability checks
``````````````````
Borg handles large amounts of data.
Given enough bits rare, spurious bit flips become a problem.
That is why regular runs of ``borg check`` are recommended
(see the `borgbase FAQ <https://docs.borgbase.com/faq/#how-often-should-i-run-borg-check>`_).
Recovery
````````
Borg offers various methods to restore backups.
A very convenient method is to mount a backup set using FUSE.
Please consult the restore documentation at `Borgbase <https://docs.borgbase.com/restore/>`_ and `Borg <https://borgbackup.readthedocs.io/en/stable/usage/mount.html>`_.
......@@ -42,17 +42,6 @@ For example::
echo -n "SILOWzbnkBjxC1hGde9d5Q3Ir/4yLosCLEnEQGAxEQE=" > ristretto.signing-key
ZKAP-Issuer TLS
```````````````
The ZKAPIssuer.service needs a working TLS certificate and expects it in the certbot directory for the domain you configured, in my case::
openssl req -x509 -newkey rsa:4096 -nodes -keyout privkey.pem -out cert.pem -days 3650
touch chain.pem
Move the three .pem files into the payment's server ``/var/lib/letsencrypt/live/payments.localdev/`` directory and issue a ``sudo systemctl restart zkapissuer.service``.
Monitoring VPN
``````````````
......
<mxfile host="app.diagrams.net" modified="2023-04-20T20:17:44.466Z" agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36" etag="8PyLTVr0G94q4Dna4Dsz" version="21.2.1" type="device">
<diagram name="Page-1" id="aaaa8250-4180-3840-79b5-4cada1eebb92">
<mxGraphModel dx="794" dy="476" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="1920" pageHeight="1200" background="#ffffff" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<mxCell id="vhdg0YFc32S7_3H95Ew1-1" value="&lt;span&gt;Management VPN&lt;br&gt;(Wireshark, TINC...)&lt;/span&gt;" style="ellipse;shape=cloud;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="780" y="720" width="450" height="110" as="geometry" />
</mxCell>
<mxCell id="2mYkRctJDop23S32jJdh-2" value="" style="rounded=0;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="840" y="405.46" width="370" height="304.54" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-3" value="Loki" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.application2;fillColor=#86E83A;strokeColor=#B0F373;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="1116" y="592.9000000000001" width="62" height="53" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-4" value="Prometheus" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.application;fillColor=#4286c5;strokeColor=#57A2D8;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="866" y="585" width="62" height="68.8" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-5" value="Grafana" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.ami2;aspect=fixed;fillColor=#FF9900;strokeColor=#ffffff;" parent="1" vertex="1">
<mxGeometry x="996" y="425" width="74" height="50" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-6" value="Node 1" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="779" y="845" width="74" height="50" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-7" value="Operator" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.end_user;strokeColor=#9673a6;fillColor=#e1d5e7;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="1276" y="305" width="49" height="100.46" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-16" value="Node ..." style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="902" y="845" width="74" height="50" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-17" value="Node ..." style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="1025" y="845" width="74" height="50" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-18" value="Node N" style="verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;" parent="1" vertex="1">
<mxGeometry x="1147.5" y="845" width="74" height="50" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-45" value="" style="endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;" parent="1" edge="1">
<mxGeometry x="806" y="695" width="50" height="50" as="geometry">
<mxPoint x="894.6666666666666" y="695" as="sourcePoint" />
<mxPoint x="936" y="835" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-46" value="" style="endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;" parent="1" edge="1">
<mxGeometry x="806" y="695" width="50" height="50" as="geometry">
<mxPoint x="894.6666666666666" y="695" as="sourcePoint" />
<mxPoint x="1046" y="835" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-47" value="" style="endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;" parent="1" edge="1">
<mxGeometry x="806" y="695" width="50" height="50" as="geometry">
<mxPoint x="894.6666666666666" y="695" as="sourcePoint" />
<mxPoint x="1166" y="835" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-50" value="" style="endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;" parent="1" edge="1">
<mxGeometry x="818.6666666666667" y="695" width="63.33333333333333" height="75" as="geometry">
<mxPoint x="846" y="835" as="sourcePoint" />
<mxPoint x="1148" y="695" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-51" value="" style="endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;" parent="1" edge="1">
<mxGeometry x="818.6666666666667" y="695" width="63.33333333333333" height="75" as="geometry">
<mxPoint x="946" y="835" as="sourcePoint" />
<mxPoint x="1148" y="695" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-52" value="" style="endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;" parent="1" edge="1">
<mxGeometry x="818.6666666666667" y="695" width="63.33333333333333" height="75" as="geometry">
<mxPoint x="1056" y="835" as="sourcePoint" />
<mxPoint x="1148" y="695" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-53" value="" style="endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;" parent="1" edge="1">
<mxGeometry x="818.6666666666667" y="695" width="63.33333333333333" height="75" as="geometry">
<mxPoint x="1186" y="835" as="sourcePoint" />
<mxPoint x="1148" y="695" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-57" value="" style="endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="988" y="495" as="sourcePoint" />
<mxPoint x="928" y="565" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-58" value="" style="endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="1078" y="495" as="sourcePoint" />
<mxPoint x="1136" y="565" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-61" value="View dashboards&lt;br&gt;in browser" style="shape=flexArrow;endArrow=classic;html=1;strokeColor=#9673a6;strokeWidth=1;fillColor=#e1d5e7;spacing=8;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="1259.5" y="415" as="sourcePoint" />
<mxPoint x="1109.5" y="445" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-63" value="&lt;h1&gt;Monitoring architecture&amp;nbsp;&lt;/h1&gt;&lt;p&gt;Keep it simple, sunshine!&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;i&gt;Grafana&lt;/i&gt; retrieves metrics from &lt;i&gt;Prometheus&lt;/i&gt; and logs from&amp;nbsp;&lt;i&gt;Loki&lt;/i&gt;,&amp;nbsp;&lt;span&gt;shows dashboards (web) and does alerting (via eMail? Slack?)&lt;/span&gt;&lt;br&gt;&lt;p&gt;&lt;i&gt;Prometheus&lt;/i&gt; stores metrics it pulls from various &lt;i&gt;Exporters&lt;/i&gt; on nodes&lt;/p&gt;&lt;p&gt;&lt;i&gt;Promtail&lt;/i&gt; on nodes pushes logs to &lt;i&gt;Loki&lt;br&gt;&lt;br&gt;&lt;/i&gt;&lt;/p&gt;&lt;p&gt;We try to keep the system as simple as possible: All monitoring and alerting runs on a single machine.&lt;/p&gt;&lt;h2&gt;Changes&lt;/h2&gt;&lt;div&gt;v2: Add Github authentication to Grafana. Add management VPN.&lt;br&gt;&lt;br&gt;v1: Initial version&lt;/div&gt;" style="text;html=1;strokeColor=none;fillColor=none;spacing=5;spacingTop=-20;whiteSpace=wrap;overflow=hidden;rounded=0;shadow=0;comic=0;sketch=0;" parent="1" vertex="1">
<mxGeometry x="596" y="315" width="164" height="545" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-65" value="Send alerts" style="shape=flexArrow;endArrow=classic;html=1;strokeColor=#9673a6;strokeWidth=1;fillColor=#e1d5e7;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="1086" y="405.46000000000004" as="sourcePoint" />
<mxPoint x="1236" y="375.46000000000004" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="2mYkRctJDop23S32jJdh-3" value="Monitoring server" style="text;html=1;strokeColor=none;fillColor=none;align=left;verticalAlign=middle;whiteSpace=wrap;rounded=0;" parent="1" vertex="1">
<mxGeometry x="845" y="411" width="120" height="20" as="geometry" />
</mxCell>
<mxCell id="TrmSFti5pUnXIGkjMKb6-44" value="" style="endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;" parent="1" edge="1">
<mxGeometry x="806" y="695" width="50" height="50" as="geometry">
<mxPoint x="894.6666666666666" y="695" as="sourcePoint" />
<mxPoint x="826" y="835" as="targetPoint" />
</mxGeometry>
</mxCell>
<mxCell id="vhdg0YFc32S7_3H95Ew1-7" value="GitHub&lt;br&gt;OAuth2" style="ellipse;shape=cloud;whiteSpace=wrap;html=1;" parent="1" vertex="1">
<mxGeometry x="984" y="305" width="98" height="60" as="geometry" />
</mxCell>
<mxCell id="vhdg0YFc32S7_3H95Ew1-8" value="" style="endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="1032.76" y="417" as="sourcePoint" />
<mxPoint x="1032.76" y="372" as="targetPoint" />
</mxGeometry>
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>
<!--[if IE]><meta http-equiv="X-UA-Compatible" content="IE=5,IE=9" ><![endif]-->
<!DOCTYPE html>
<html>
<head>
<title>monitoring-architecture.html</title>
<meta charset="utf-8"/>
</head>
<body>
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;" data-mxgraph="{&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;resize&quot;:true,&quot;xml&quot;:&quot;&lt;mxfile host=\&quot;app.diagrams.net\&quot; modified=\&quot;2023-04-20T20:19:05.428Z\&quot; agent=\&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36\&quot; etag=\&quot;4ETb3iIJGh8a_GDWm07E\&quot; version=\&quot;21.2.1\&quot; type=\&quot;device\&quot;&gt;&lt;diagram name=\&quot;Page-1\&quot; id=\&quot;aaaa8250-4180-3840-79b5-4cada1eebb92\&quot;&gt;&lt;mxGraphModel dx=\&quot;794\&quot; dy=\&quot;476\&quot; grid=\&quot;1\&quot; gridSize=\&quot;10\&quot; guides=\&quot;1\&quot; tooltips=\&quot;1\&quot; connect=\&quot;1\&quot; arrows=\&quot;1\&quot; fold=\&quot;1\&quot; page=\&quot;1\&quot; pageScale=\&quot;1\&quot; pageWidth=\&quot;1920\&quot; pageHeight=\&quot;1200\&quot; background=\&quot;#ffffff\&quot; math=\&quot;0\&quot; shadow=\&quot;0\&quot;&gt;&lt;root&gt;&lt;mxCell id=\&quot;0\&quot;/&gt;&lt;mxCell id=\&quot;1\&quot; parent=\&quot;0\&quot;/&gt;&lt;mxCell id=\&quot;vhdg0YFc32S7_3H95Ew1-1\&quot; value=\&quot;&amp;lt;span&amp;gt;Management VPN&amp;lt;br&amp;gt;(WireGuard)&amp;lt;/span&amp;gt;\&quot; style=\&quot;ellipse;shape=cloud;whiteSpace=wrap;html=1;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;780\&quot; y=\&quot;720\&quot; width=\&quot;450\&quot; height=\&quot;110\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;2mYkRctJDop23S32jJdh-2\&quot; value=\&quot;\&quot; style=\&quot;rounded=0;whiteSpace=wrap;html=1;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;840\&quot; y=\&quot;405.46\&quot; width=\&quot;370\&quot; height=\&quot;304.54\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-3\&quot; value=\&quot;Loki\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.application2;fillColor=#86E83A;strokeColor=#B0F373;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;1116\&quot; y=\&quot;592.9000000000001\&quot; width=\&quot;62\&quot; height=\&quot;53\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-4\&quot; value=\&quot;Prometheus\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.application;fillColor=#4286c5;strokeColor=#57A2D8;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;866\&quot; y=\&quot;585\&quot; width=\&quot;62\&quot; height=\&quot;68.8\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-5\&quot; value=\&quot;Grafana\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.ami2;aspect=fixed;fillColor=#FF9900;strokeColor=#ffffff;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;996\&quot; y=\&quot;425\&quot; width=\&quot;74\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-6\&quot; value=\&quot;Node 1\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;779\&quot; y=\&quot;845\&quot; width=\&quot;74\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-7\&quot; value=\&quot;Operator\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.end_user;strokeColor=#9673a6;fillColor=#e1d5e7;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;1276\&quot; y=\&quot;305\&quot; width=\&quot;49\&quot; height=\&quot;100.46\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-16\&quot; value=\&quot;Node ...\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;902\&quot; y=\&quot;845\&quot; width=\&quot;74\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-17\&quot; value=\&quot;Node ...\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;1025\&quot; y=\&quot;845\&quot; width=\&quot;74\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-18\&quot; value=\&quot;Node N\&quot; style=\&quot;verticalLabelPosition=bottom;html=1;verticalAlign=top;strokeWidth=1;align=center;outlineConnect=0;dashed=0;outlineConnect=0;shape=mxgraph.aws3d.worker;fillColor=#ECECEC;strokeColor=#5E5E5E;aspect=fixed;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;1147.5\&quot; y=\&quot;845\&quot; width=\&quot;74\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-45\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;806\&quot; y=\&quot;695\&quot; width=\&quot;50\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;894.6666666666666\&quot; y=\&quot;695\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;936\&quot; y=\&quot;835\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-46\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;806\&quot; y=\&quot;695\&quot; width=\&quot;50\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;894.6666666666666\&quot; y=\&quot;695\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1046\&quot; y=\&quot;835\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-47\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;806\&quot; y=\&quot;695\&quot; width=\&quot;50\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;894.6666666666666\&quot; y=\&quot;695\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1166\&quot; y=\&quot;835\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-50\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;818.6666666666667\&quot; y=\&quot;695\&quot; width=\&quot;63.33333333333333\&quot; height=\&quot;75\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;846\&quot; y=\&quot;835\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1148\&quot; y=\&quot;695\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-51\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;818.6666666666667\&quot; y=\&quot;695\&quot; width=\&quot;63.33333333333333\&quot; height=\&quot;75\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;946\&quot; y=\&quot;835\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1148\&quot; y=\&quot;695\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-52\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;818.6666666666667\&quot; y=\&quot;695\&quot; width=\&quot;63.33333333333333\&quot; height=\&quot;75\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1056\&quot; y=\&quot;835\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1148\&quot; y=\&quot;695\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-53\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#82b366;strokeWidth=1;fillColor=#d5e8d4;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;818.6666666666667\&quot; y=\&quot;695\&quot; width=\&quot;63.33333333333333\&quot; height=\&quot;75\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1186\&quot; y=\&quot;835\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1148\&quot; y=\&quot;695\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-57\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry width=\&quot;50\&quot; height=\&quot;50\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;988\&quot; y=\&quot;495\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;928\&quot; y=\&quot;565\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-58\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry width=\&quot;50\&quot; height=\&quot;50\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1078\&quot; y=\&quot;495\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1136\&quot; y=\&quot;565\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-61\&quot; value=\&quot;View dashboards&amp;lt;br&amp;gt;in browser\&quot; style=\&quot;shape=flexArrow;endArrow=classic;html=1;strokeColor=#9673a6;strokeWidth=1;fillColor=#e1d5e7;spacing=8;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry width=\&quot;50\&quot; height=\&quot;50\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1259.5\&quot; y=\&quot;415\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1109.5\&quot; y=\&quot;445\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-63\&quot; value=\&quot;&amp;lt;h1&amp;gt;Monitoring architecture&amp;amp;nbsp;&amp;lt;/h1&amp;gt;&amp;lt;p&amp;gt;Keep it simple, sunshine!&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;i&amp;gt;Grafana&amp;lt;/i&amp;gt; retrieves metrics from &amp;lt;i&amp;gt;Prometheus&amp;lt;/i&amp;gt; and logs from&amp;amp;nbsp;&amp;lt;i&amp;gt;Loki&amp;lt;/i&amp;gt;,&amp;amp;nbsp;&amp;lt;span&amp;gt;shows dashboards (in a web browser) and does alerting (via Zulip)&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;&amp;lt;p&amp;gt;&amp;lt;i&amp;gt;Prometheus&amp;lt;/i&amp;gt; stores metrics it pulls from various &amp;lt;i&amp;gt;Exporters&amp;lt;/i&amp;gt; on nodes&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;&amp;lt;i&amp;gt;Promtail&amp;lt;/i&amp;gt; on nodes pushes logs to &amp;lt;i&amp;gt;Loki&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&amp;lt;/i&amp;gt;&amp;lt;/p&amp;gt;&amp;lt;p&amp;gt;We try to keep the system as simple as possible: All monitoring and alerting runs on a single machine.&amp;lt;/p&amp;gt;&amp;lt;h2&amp;gt;Changes&amp;lt;/h2&amp;gt;&amp;lt;div&amp;gt;v3: Fix WireGuard/Wireshark braino, Update Auth (Google, not GitHub)&amp;lt;/div&amp;gt;&amp;lt;div&amp;gt;&amp;lt;br&amp;gt;&amp;lt;/div&amp;gt;&amp;lt;div&amp;gt;v2: Add Github authentication to Grafana. Add management VPN.&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;v1: Initial version&amp;lt;/div&amp;gt;\&quot; style=\&quot;text;html=1;strokeColor=none;fillColor=none;spacing=5;spacingTop=-20;whiteSpace=wrap;overflow=hidden;rounded=0;shadow=0;comic=0;sketch=0;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;596\&quot; y=\&quot;315\&quot; width=\&quot;164\&quot; height=\&quot;605\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-65\&quot; value=\&quot;Send alerts\&quot; style=\&quot;shape=flexArrow;endArrow=classic;html=1;strokeColor=#9673a6;strokeWidth=1;fillColor=#e1d5e7;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry width=\&quot;50\&quot; height=\&quot;50\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1086\&quot; y=\&quot;405.46000000000004\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1236\&quot; y=\&quot;375.46000000000004\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;UserObject label=\&quot;Monitoring server\&quot; link=\&quot;https://monitoring.private.storage/\&quot; id=\&quot;2mYkRctJDop23S32jJdh-3\&quot;&gt;&lt;mxCell style=\&quot;text;html=1;strokeColor=none;fillColor=none;align=left;verticalAlign=middle;whiteSpace=wrap;rounded=0;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;845\&quot; y=\&quot;411\&quot; width=\&quot;120\&quot; height=\&quot;20\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;/UserObject&gt;&lt;mxCell id=\&quot;TrmSFti5pUnXIGkjMKb6-44\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#6c8ebf;strokeWidth=1;fillColor=#dae8fc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;806\&quot; y=\&quot;695\&quot; width=\&quot;50\&quot; height=\&quot;50\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;894.6666666666666\&quot; y=\&quot;695\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;826\&quot; y=\&quot;835\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;vhdg0YFc32S7_3H95Ew1-7\&quot; value=\&quot;GSuite&amp;lt;br&amp;gt;OAuth2\&quot; style=\&quot;ellipse;shape=cloud;whiteSpace=wrap;html=1;\&quot; parent=\&quot;1\&quot; vertex=\&quot;1\&quot;&gt;&lt;mxGeometry x=\&quot;984\&quot; y=\&quot;305\&quot; width=\&quot;98\&quot; height=\&quot;60\&quot; as=\&quot;geometry\&quot;/&gt;&lt;/mxCell&gt;&lt;mxCell id=\&quot;vhdg0YFc32S7_3H95Ew1-8\&quot; value=\&quot;\&quot; style=\&quot;endArrow=classic;html=1;strokeColor=#d79b00;strokeWidth=1;fillColor=#ffe6cc;\&quot; parent=\&quot;1\&quot; edge=\&quot;1\&quot;&gt;&lt;mxGeometry width=\&quot;50\&quot; height=\&quot;50\&quot; relative=\&quot;1\&quot; as=\&quot;geometry\&quot;&gt;&lt;mxPoint x=\&quot;1032.76\&quot; y=\&quot;417\&quot; as=\&quot;sourcePoint\&quot;/&gt;&lt;mxPoint x=\&quot;1032.76\&quot; y=\&quot;372\&quot; as=\&quot;targetPoint\&quot;/&gt;&lt;/mxGeometry&gt;&lt;/mxCell&gt;&lt;/root&gt;&lt;/mxGraphModel&gt;&lt;/diagram&gt;&lt;/mxfile&gt;&quot;,&quot;toolbar&quot;:&quot;pages zoom layers lightbox&quot;,&quot;page&quot;:0}"></div>
<script type="text/javascript" src="https://app.diagrams.net/js/viewer-static.min.js"></script>
</body>
</html>
......@@ -17,6 +17,19 @@ Analyzing long-term trends
How big is my database and how fast is it growing? How quickly is my daily-active user count growing?
Architecture
````````````
Below you find a diagram of the software and systems that comprise our monitoring and alerting infrastructure.
It has intentionally been kept simple, yet is already surprisingly complex (at least if you are new to the monitoring world).
The software stack is industry standard and chosen so it would be easy to find solutions to problems and people who can help out.
Log in to `staging <https://monitoring.privatestorage-staging.com/>`_ and `production <https://monitoring.private.storage/>`_ Grafana via your GSuite Private.Storage session.
.. raw:: html
:file: monitoring-architecture.html
Introduction to our dashboards
``````````````````````````````
......
.. include::
../../morph/README.rst
Stripe
======
We use Stripe for payment processing.
We have test-mode keys for use in staging and live-mode keys for use in production.
There is "payment link" state in Stripe to facilitate the payment workflow.
This was created with ``admin/create-payment-link.sh``
(once for test-mode and once for live-mode).
The payment links can be found in PrivateStorageOps.
They are the values of the stage3 variables ``stripe_payment_link_staging`` and ``stripe_payment_link_production``.
There is also "webhook" state in Stripe so that PaymentServer receives notification of payment.
This was created with ``admin/create-webhook.sh``
(once for test-mode and once for live-mode).
The test-mode webhook is ``we_1LxwKnBHXBAMm9bPDJXJNcDN``.
The live-mode webhook is ``we_1LzioNA9OAm23rYOmAcp3V85``.
The webhook secrets can be found with the rest of the each grid's private keys in ``stripe.webhook-secret``.
!.gitignore
Developer documentation
=======================
Building
--------
The build system uses `Nix`_ which must be installed before anything can be built.
Start by setting up the development/operations environment::
$ nix-shell
Testing
-------
The test system uses `Nix`_ which must be installed before any tests can be run.
Unit tests are run using this command::
$ nix-build nixos/unit-tests.nix
Unit tests are also run on CI.
The system tests are run using this command::
$ nix-build nixos/system-tests.nix
The system tests boot QEMU VMs which prevents them from running on CI at this time.
The build requires > 10 GB of disk space,
and the VMs might be timing out on slow or busy machines.
If you run into timeouts,
try `raising the number of retries <https://whetstone.privatestorage.io/privatestorage/PrivateStorageio/-/blob/e8233d2/nixos/modules/tests/run-introducer.py#L55-62>`_.
It is also possible go through the testing script interactively - useful for debugging::
$ nix-build -A private-storage.driver nixos/system-tests.nix
This will give you a result symlink in the current directory.
Inside that is bin/nixos-test-driver which gives you a kind of REPL for interacting with the VMs.
The kind of `Python in this testScript <https://whetstone.privatestorage.io/privatestorage/PrivateStorageio/-/blob/78881a3/nixos/modules/tests/private-storage.nix#L180>`_ is what you can enter into this REPL.
Consult the `official documentation on NixOS Tests <https://nixos.org/manual/nixos/stable/index.html#sec-nixos-tests>`_ for more information.
Updatings Pins
--------------
Nixpkgs
```````
To update the version of NixOS we deploy with, run:
.. code: shell
nix-shell --run 'update-nixpkgs'
That will update ``nixpkgs-2015.json`` to the latest release on the nixos-21.05 channel.
To update the channel, the script will need to be updated,
along with the filenames that have the channel in them.
Gitlab Repositories
```````````````````
To update the version of packages we import from gitlab, run:
.. code: shell
nix-shell --command 'tools/update-gitlab nixos/pkgs/<package>/repo.json'
That will update the package to point at the latest version of the project.\
The command uses branch and repository owner specified in the ``repo.json`` file,
but you can override them by passing the ``--branch`` or ``-owner`` arguments to the command.
A specific revision can also be pinned, by passing ``-rev``.
Architecture overview
---------------------
.. graphviz:: architecture-overview.dot
.. include::
../../../morph/grid/local/README.rst
.. _Nix: https://nixos.org/nix
digraph subscriptions {
rankdir=LR
subgraph cluster_usercontrolled {
label = "User Operated"
rankdir=LR
GridSync [label="GridSync", shape=circle]
Browser [label="Browser", shape=circle]
TahoeLAFS [label="Tahoe-LAFS", shape=circle]
}
subgraph cluster_pscontrolled {
label = "PrivateStorage.io Operated"
rankdir = TB
PSWebServer [label="PrivateStorage.io Web Server", shape=box]
SubscriptionConfigWHPeer [label="Subscription Config Wormhole Peer", shape=box]
PaymentServer [label="Payment Server", shape=box]
SATIssuer [label="SAT Issuer", shape=box]
PSStorageGrid [label="PrivateStorage.io Storage Grid", shape=box]
}
User [label="User", shape=egg]
Stripe [label="Stripe", shape=pentagon]
User -> PSWebServer [label="1. Get wormhole code", fontcolor=red, color=red]
PSWebServer -> User [label="2. 7-petulant-banana", fontcolor=blue, color=blue]
User -> GridSync [label="3. 7-petulant-banana", fontcolor=brown, color=brown]
GridSync -> SubscriptionConfigWHPeer [label="4. Get configuration", fontcolor=black, color=black]
SubscriptionConfigWHPeer -> GridSync [label="5. Grid configuration", fontcolor=magenta, color=magenta]
GridSync -> TahoeLAFS [label="6. Instantiate", fontcolor=aquamarine3, color=aquamarine3]
GridSync -> TahoeLAFS [label="7. Redeem PRN", fontcolor=crimson, color=crimson]
TahoeLAFS -> PaymentServer [label="8. Redeem PRN", fontcolor=crimson, color=crimson]
PaymentServer -> TahoeLAFS [label="9. Payment required", fontcolor=gold3, color=gold3]
TahoeLAFS -> GridSync [label="10. Payment required", fontcolor=gold3, color=gold3]
GridSync -> Browser [label="11. Open payment window", fontcolor=gold3, color=gold3]
User -> Browser [label="12. Enter payment info", fontcolor=blue, color=blue]
Browser -> Stripe [label="13. Submit payment form", fontcolor=brown, color=brown]
Stripe -> Browser [label="14. Payment ok", fontcolor=black, color=black]
Stripe -> PaymentServer [label="15. Payment notification", fontcolor=magenta, color=magenta]
GridSync -> TahoeLAFS [label="16. Redeem PRN", fontcolor=aquamarine3, color=aquamarine3]
TahoeLAFS -> TahoeLAFS [label="17. Generate blinded tokens", fontcolor=crimson, color=crimson]
TahoeLAFS -> SATIssuer [label="18. Redeem PRN, blinded-tokens=xs", fontcolor=crimson, color=crimson]
SATIssuer -> PaymentServer [label="19. Check PRN", fontcolor=gold3, color=gold3]
PaymentServer -> SATIssuer [label="20. PRN Valid", fontcolor=gold3, color=gold3]
SATIssuer -> TahoeLAFS [label="21. PRN valid, signed-tokens=ys", fontcolor=crimson, color=crimson]
TahoeLAFS -> TahoeLAFS [label="22. Store signed tokens", fontcolor=crimson, color=crimson]
TahoeLAFS -> GridSync [label="23. PRN Redeemed", fontcolor=red, color=red]
TahoeLAFS -> PSStorageGrid [label="24. Use storage, passes=y", fontcolor=magenta, color=magenta]
}
......@@ -36,11 +36,17 @@ lib
---
This contains Nix library code for defining the grids.
It has all the details of how each type of node in our grid is configured.
It knows about morph (so defines ``deployment.secrets`` and has the logic for collecting data defined by other nodes).
It defines options (i.e. ``grid.*``) for things specific to how we configure grids (e.g. ``grid.publicKeyPath``).
It defines metadata about nodes that we use on other nodes (e.g. ``grid.monitoringvpnIPv4`` which is used to define various things on the monitoring node).
Each top-level module here defines one type of node with all (or at least most) of the configuration necessary for that node.
grid
----
Specific grid definitions live in subdirectories beneath this directory.
They consist almost exclusively setting options defined in ``morph/lib`` (and few options defined elsewhere) and then delegating to the ``morph/lib`` modules.
private-keys
~~~~~~~~~~~~
......
{ pkgs ? import ../nixpkgs.nix {} }:
let
lib = pkgs.lib;
gridlib = import ./lib;
inherit (gridlib.pkgs) ourpkgs;
grids-path = "${builtins.toString ./.}/grid";
grid-configs = lib.mapAttrs (n: v: grids-path + "/${n}/grid.nix") (lib.filterAttrs (n: v: v == "directory") (builtins.readDir grids-path));
# It would be useful if morph exposed this as a function.
# https://github.com/DBCDK/morph/pull/166
morph-eval = networkExpr: (import "${pkgs.morph.lib}/eval-machines.nix") { inherit networkExpr; };
grids = lib.mapAttrs (n: v: (morph-eval v)) grid-configs;
# Derivation with symlinks to the morph output for each grid.
output = pkgs.runCommand "privatestorage-morph"
{ preferLocalBuild = true; allowSubstitutes = false; passthru = { inherit gridlib ourpkgs grids; }; }
''
mkdir $out
${lib.concatStringsSep "\n" (
lib.mapAttrsToList (
name: morph:
let
output = morph.machines {
# It would be nice if we didn't need to write this data to a file.
# https://github.com/DBCDK/morph/pull/186
argsFile = pkgs.writeText "args" (builtins.toJSON { Names = lib.attrNames morph.nodes; });
};
in
''
ln -s ${output} $out/${lib.escapeShellArg name}
''
) grids
)}'';
in output
......@@ -8,14 +8,18 @@ Issues with networking that looked like guest misconfigurations vanished after c
This requires `NixOS <https://nixos.org/>`_.
Nix without the OS will not work.
Use the local development environment
`````````````````````````````````````
0. Add VirtualBox to your NixOs system configuration at ``/etc/nixos/configuration.nix``::
0. Add to your NixOS system configuration at ``/etc/nixos/configuration.nix`` (and rebuild)::
virtualisation.virtualbox.host.enable = true;
# Save bytes and build time, optional but recommended:
virtualisation.virtualbox.host.headless = true;
# Enable libvirt - likely incompatible with virtualisation.virtualbox!
virtualisation.libvirtd.enable = true;
# Required for LibVirt
security.polkit.enable = true;
# Enable HW acceleration if (nested virtualisation is) available
#boot.kernelModules = [ "kvm-amd" "kvm-intel" ];
1. Enter the morph local grid directory::
......@@ -27,19 +31,27 @@ Use the local development environment
3. Build and start the VMs::
VAGRANT_DEFAULT_PROVIDER=virtualbox vagrant up
vagrant up --provider=libvirt
Optionally, to switch from QEMU to KVM virtualization, edit the virtual machine definition of all the machines and replace the "qemu" on the first line with "kvm"::
sudo virsh list
sudo virsh edit <machine id> (once for every machine)
vagrant halt
vagrant up
4. Then, add the Vagrant SSH configuration to your user's ``~/.ssh/config`` file::
install -d ~/.ssh ; vagrant ssh-config >> ~/.ssh/config
Latest Morph honors the ``SSH_CONFIG_FILE`` environment variable (`since 3f90aa88 (March 2020, v 1.5.0) <https://github.com/DBCDK/morph/commit/3f90aa885fac1c29fce9242452fa7c0c505744ef#diff-d155ad793bd62e6ea4c44ba985049ecb13a4f4f32f799791b2bce695a16c0101>`_), so in the future this should get a bit more convenient.
Latest Morph honors the ``SSH_CONFIG_FILE`` environment variable (`since 3f90aa88 (March 2020, v 1.5.0) <https://github.com/DBCDK/morph/commit/3f90aa885fac1c29fce9242452fa7c0c505744ef#diff-d155ad793bd62e6ea4c44ba985049ecb13a4f4f32f799791b2bce695a16c0101>`_), so in the future this should get a bit more convenient.
6. Create a ``public-keys/users.nix`` file with your SSH key (see ``public-keys/users.nix.example`` for the format) so you'll be able to log in after deploying the new configuration::
5. Create a ``public-keys/users.nix`` file with your SSH key (see ``public-keys/users.nix.example`` for the format) so you'll be able to log in after deploying the new configuration::
$EDITOR public-keys/users.nix
7. Then, build and deploy our software to the Vagrant VMs::
6. Then, build and deploy our software to the Vagrant VMs::
morph build grid.nix
morph push grid.nix
......@@ -48,4 +60,4 @@ Use the local development environment
vagrant up
morph upload-secrets grid.nix
You should now be able to log in with the users and keys you set in your ``users.nix`` file.
You should now be able to log in with the users and keys you set in your ``users.nix`` file.
# -*- mode: ruby -*-
# vi: set ft=ruby :
# This Vagrantfile worked for Florian Sesser using Vagrant 2.2.16dev and
# the VirtualBox Hypervisor. Earlier Vagrant and LibVirt did not work.
# This Vagrantfile worked for Florian Sesser using Vagrant 2.2.19 and
# the LibVirt with QEmu Hypervisor. Earlier Vagrant and VirtualBox did worked too.
# Get a dedicated LibVirt pool name or use default one
pool_name = ENV.has_key?('POOL_NAME') ? ENV['POOL_NAME'] : 'default'
# For instance, one could create such pool beforehand as follows:
# export POOL_NAME=morph_local_$(id -un)
# POOL_PATH="/path/to/your/storage"
# mkdir -p "${POOL_PATH}"
# sudo virsh pool-define-as ${POOL_NAME} --type dir --target "${POOL_PATH}"
# sudo virsh pool-autostart ${POOL_NAME}
# sudo virsh pool-start ${POOL_NAME}
Vagrant.configure("2") do |config|
# For a complete reference, please see the online documentation at
# https://docs.vagrantup.com.
config.vm.define "payments.localdev" do |config|
config.vm.hostname = "payments"
config.vm.box = "esselius/nixos"
config.vm.box_version = "20.09"
config.vm.box_check_update = false
# Select the base image
config.vm.box = "esselius/nixos"
config.vm.box_version = "20.09"
config.vm.box_check_update = false
# No need to sync the working dir. with the guest boxess
# Better use SFTP to transfer
config.vm.synced_folder ".", "/vagrant", disabled: true
# Tune LibVirt/QEmu guests
config.vm.provider :libvirt do |domain|
# The default of one CPU should work
# Increase to speed up boot/push/deploy
# domain.cpus = 1
# To use the self-updating deployment system you need more memory. Giving
# all of the VMs enough memory for this is rather taxing, though, and the
# self-updating deployment system is not particularly useful for local
# dev. But should you want to:
#
# config.vm.provider "virtualbox" do |v|
# v.memory = 4096
# end
# domain.memory = 4096
#
# Meanwhile, 1024 was apparently the default with VirtualBox
domain.memory = 1024
# Using a specific pool may help to manage the disk space
domain.storage_pool_name = pool_name
domain.snapshot_pool_name = pool_name
# No need of graphics - better use serial
domain.graphics_type = "none"
domain.video_type = "none"
end
config.vm.define "payments.localdev" do |config|
config.vm.hostname = "payments"
config.vm.network "private_network", ip: "192.168.67.21"
# Assign a static IP address inside the box host-only (Vagrant
# calls it "private") network. The address must be in the range
# VirtualBox allows.
# https://www.virtualbox.org/manual/ch06.html#network_hostonly says some
# things about this.
config.vm.network "private_network", ip: "192.168.56.21"
# Add self signed SSL key for zkap-issuer:
config.vm.provision "file", source: "private-keys/payments-localdev-ssl", destination: "/tmp/payments-localdev-ssl"
config.vm.provision "shell", inline: "sudo mkdir -p /var/lib/letsencrypt/live/payments.localdev/"
......@@ -32,35 +69,30 @@ Vagrant.configure("2") do |config|
config.vm.define "storage1.localdev" do |config|
config.vm.hostname = "storage1"
config.vm.box = "esselius/nixos"
config.vm.box_version = "20.09"
config.vm.box_check_update = false
config.vm.network "private_network", ip: "192.168.67.22"
config.vm.network "private_network", ip: "192.168.56.22"
end
config.vm.define "storage2.localdev" do |config|
config.vm.hostname = "storage2"
config.vm.box = "esselius/nixos"
config.vm.box_version = "20.09"
config.vm.box_check_update = false
config.vm.network "private_network", ip: "192.168.67.23"
config.vm.network "private_network", ip: "192.168.56.23"
end
config.vm.define "monitoring.localdev" do |config|
config.vm.hostname = "monitoring"
config.vm.box = "esselius/nixos"
config.vm.box_version = "20.09"
config.vm.box_check_update = false
config.vm.network "private_network", ip: "192.168.67.24"
config.vm.network "private_network", ip: "192.168.56.24"
end
# To make the VMs assign the static IPs to the network interfaces we need a rebuild:
config.vm.provision "shell", inline: "echo '{nix.trustedUsers = [ \"@wheel\" \"root\" \"vagrant\" ];}' > /etc/nixos/custom-configuration.nix"
## Rename to 'nix.settings.trusted-users' after 20.09 or so:
config.vm.provision "shell",
inline: "echo '{ nix.trustedUsers = [ \"@wheel\" \"root\" \"vagrant\" ]; boot.kernelParams = [ \"console=tty0\" \"console=ttyS0,115200\" ]; }' > /etc/nixos/custom-configuration.nix"
config.vm.provision "shell", inline: "nixos-rebuild switch"
config.vm.provision "shell", inline: "systemctl stop firewall.service"
config.vm.provision "shell", inline: "systemctl start serial-getty@ttyS0.service"
config.trigger.after :up do |trigger|
trigger.info = "Hostname and IP address this host actually uses:"
trigger.run_remote = {inline: "echo `hostname` `ifconfig | egrep -o '192.168.67.[0-9]* '`"}
trigger.run_remote = {inline: "echo `hostname` `ifconfig | egrep -o '192.168.56.[0-9]* '`"}
end
end
......@@ -2,9 +2,10 @@
, "publicStoragePort": 8898
, "publicKeyPath": "./public-keys"
, "privateKeyPath": "./private-keys"
, "monitoringvpnEndpoint": "192.168.67.24:51820"
, "monitoringvpnEndpoint": "192.168.56.24:51820"
, "passValue": 1000000
, "issuerDomains": ["payments.localdev"]
, "monitoringDomains": ["monitoring.localdev"]
, "letsEncryptAdminEmail": "florian@privatestorage.io"
, "allowedChargeOrigins": [
"http://localhost:5000"
......
let
pkgs = import <nixpkgs> { };
gridlib = import ../../lib;
grid-config = pkgs.lib.trivial.importJSON ./config.json;
grid-config = builtins.fromJSON (builtins.readFile ./config.json);
ssh-users = let
ssh-users-file = ./public-keys/users.nix;
......@@ -59,6 +57,7 @@ let
grid = {
publicKeyPath = toString ./. + "/${grid-config.publicKeyPath}";
privateKeyPath = toString ./. + "/${grid-config.privateKeyPath}";
inherit (grid-config) monitoringvpnEndpoint letsEncryptAdminEmail;
};
# Configure deployment management authorization for all systems in the grid.
services.private-storage.deployment = {
......@@ -70,72 +69,71 @@ let
payments = {
imports = [
gridlib.issuer
(gridlib.customize-issuer (grid-config // {
monitoringvpnIPv4 = "172.23.23.11";
}))
grid-module
];
config = {
grid.publicIPv4 = "192.168.67.21";
grid.monitoringvpnIPv4 = "172.23.23.11";
grid.publicIPv4 = "192.168.56.21";
grid.issuer = {
inherit (grid-config) issuerDomains allowedChargeOrigins;
};
};
};
storage1 = {
imports = [
gridlib.storage
(gridlib.customize-storage (grid-config // {
monitoringvpnIPv4 = "172.23.23.12";
stateVersion = "19.09";
}))
grid-module
];
config = {
grid.publicIPv4 = "192.168.67.22";
grid.monitoringvpnIPv4 = "172.23.23.12";
grid.publicIPv4 = "192.168.56.22";
grid.storage = {
inherit (grid-config) passValue publicStoragePort;
};
system.stateVersion = "19.09";
};
};
storage2 = {
imports = [
gridlib.storage
(gridlib.customize-storage (grid-config // {
monitoringvpnIPv4 = "172.23.23.13";
stateVersion = "19.09";
}))
grid-module
];
config = {
grid.publicIPv4 = "192.168.67.23";
grid.monitoringvpnIPv4 = "172.23.23.13";
grid.publicIPv4 = "192.168.56.23";
grid.storage = {
inherit (grid-config) passValue publicStoragePort;
};
system.stateVersion = "19.09";
};
};
monitoring = {
imports = [
gridlib.monitoring
(gridlib.customize-monitoring {
inherit hostsMap vpnClientIPs nodeExporterTargets paymentExporterTargets;
inherit (grid-config) letsEncryptAdminEmail;
googleOAuthClientID = grid-config.monitoringGoogleOAuthClientID;
enableSlackAlert = false;
monitoringvpnIPv4 = "172.23.23.1";
stateVersion = "19.09";
})
grid-module
];
config = {
grid.publicIPv4 = "192.168.67.24";
grid.monitoringvpnIPv4 = "172.23.23.1";
grid.publicIPv4 = "192.168.56.24";
grid.monitoring = {
inherit paymentExporterTargets blackboxExporterHttpsTargets;
inherit (grid-config) monitoringDomains;
googleOAuthClientID = grid-config.monitoringGoogleOAuthClientID;
enableZulipAlert = false;
};
system.stateVersion = "19.09";
};
};
# TBD: derive these automatically:
hostsMap = {
"172.23.23.1" = [ "monitoring" "monitoring.monitoringvpn" ];
"172.23.23.11" = [ "payments" "payments.monitoringvpn" ];
"172.23.23.12" = [ "storage1" "storage1.monitoringvpn" ];
"172.23.23.13" = [ "storage2" "storage2.monitoringvpn" ];
};
vpnClientIPs = [ "172.23.23.11" "172.23.23.12" "172.23.23.13" ];
nodeExporterTargets = [ "monitoring" "payments" "storage1" "storage2" ];
paymentExporterTargets = [ "payments" ];
paymentExporterTargets = [ "payments.monitoringvpn" ];
blackboxExporterHttpsTargets = [
# "https://private.storage/"
# "https://payments.private.storage/"
];
in {
network = {
......