- Sep 12, 2022
-
-
Jean-Paul Calderone authored
Otherwise it can't validate TLS certificates when it is trying to download dependencies.
-
Jean-Paul Calderone authored
Put nix in the shell environment See merge request !353
-
Jean-Paul Calderone authored
-
Jean-Paul Calderone authored
Previously we got nix from the host env without even realizing it. As soon as it upgraded things broke.
-
Jean-Paul Calderone authored
We don't want to inherit nix from the host environment because who knows if it is compatible with our software or not.
-
Jean-Paul Calderone authored
-
- Sep 08, 2022
-
-
Jean-Paul Calderone authored
Monitoring: Fix backup duration alert See merge request !349
-
Florian Sesser authored
This should implement my actual intentions: Alert when backups run for longer than 3h, and the the repo check for more than 6h.
-
Jean-Paul Calderone authored
Add dashboards for Tahoe Incident Report count and rate Closes privatestorageops#288 See merge request !347
-
Jean-Paul Calderone authored
Publish number of Tahoe Incident Reports Closes privatestorageops#287 See merge request !346
-
Florian Sesser authored
-
Florian Sesser authored
Fixes privatestorageops#287
-
Jean-Paul Calderone authored
Also alert when hosts that run ZFS run out of RAM See merge request !345
-
Jean-Paul Calderone authored
Prometheus should keep metrics as long as Loki keeps logs See merge request !344
-
Florian Sesser authored
Forgot to add a second alert when I added the workaround to not count the ZFS ARC into used memory :/
-
Florian Sesser authored
... which is both governed by our retention policy.
-
- Sep 07, 2022
-
-
Florian Sesser authored
Monitoring: Count Tahoe's corruption advisories Closes privatestorageops#287 See merge request !341
-
Florian Sesser authored
Monitoring: Add Tahoe-LAFS corruption advisory count + rate + alert on rate > 0 Closes privatestorageops#288 See merge request !342
-
Florian Sesser authored
-
-
Jean-Paul Calderone authored
Monitoring: Fix resources dashboard See merge request !343
-
- Sep 06, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
-
- Sep 05, 2022
-
-
Florian Sesser authored
-
Florian Sesser authored
-
- Sep 02, 2022
-
-
Jean-Paul Calderone authored
Monitoring: Backup duration and backup set size Closes privatestorageops#429 See merge request !339
-
- Aug 31, 2022
-
-
Florian Sesser authored
This is a bit buggy still in our version of Grafana, but already nice to look at / maybe useful. Refs privatestorageops#429
-
Florian Sesser authored
This adds alerting to the backup job duration graph: Grafana alerting works with systemd unit metrics, i.e. a backup job unit being "active" for too long. Use that fact for alerting on long-running backup jobs.
-
Florian Sesser authored
... instead of connected lines default, also with working label for host Refs privatestorageops#429
-
Florian Sesser authored
Refs privatestorageops#429
-
- Aug 29, 2022
-
-
Florian Sesser authored
Backup: Borg: All logs as JSON please See merge request !338
-
Florian Sesser authored
That makes working with its output in Grafana easier. Good documentation can be found at https://borgbackup.readthedocs.io/en/stable/internals/frontends.html#json-output
-
Florian Sesser authored
-
Florian Sesser authored
Backup: Add --stats to borgbackup create options See merge request !337
-
Florian Sesser authored
-
Florian Sesser authored
We want to have this in our own monitoring
-
Florian Sesser authored
Monitoring: Backup: Clearer coloring See merge request !336
-
Florian Sesser authored
Failed backups now have a filled red area instead of a thin yellow line. Refs privatestorageops#429.
-
- Aug 24, 2022
-
-
Florian Sesser authored
Disregard ZFS ARC cache when monitoring free RAM Closes #119 See merge request !331
-
Florian Sesser authored
Monitoring: Add a "Node Exporter Full" dashboard See merge request !332
-