Commits · cca975905fbbffe0532956125a49772fe9fcea1a · PrivateStorage / PrivateStorageio

Sep 08, 2022
- Merge branch 'fix-backup-duration-alert' into 'develop' · cca97590
  Jean-Paul Calderone authored 2 years ago
  
  Monitoring: Fix backup duration alert See merge request !349
  cca97590
- Monitoring: Fix backup duration alert · 3d87801c
  Florian Sesser authored 2 years ago
  
  This should implement my actual intentions: Alert when backups run for longer than 3h, and the the repo check for more than 6h.
  3d87801c
- Merge branch '288.report-tahoe-incidents' into 'develop' · f529d819
  Jean-Paul Calderone authored 2 years ago
  
  Add dashboards for Tahoe Incident Report count and rate Closes privatestorageops#288 See merge request !347
  f529d819
- Merge branch '287.publish-tahoe-incidents-count-with-prometheus' into 'develop' · d00836d6
  Jean-Paul Calderone authored 2 years ago
  
  Publish number of Tahoe Incident Reports Closes privatestorageops#287 See merge request !346
  d00836d6
- Add dashboards for Tahoe Incident Report count and rate · 06b8e92b
  Florian Sesser authored 2 years ago
  
  06b8e92b
- Also publish number of Tahoe Incident Reports · ea3fd773
  Florian Sesser authored 2 years ago
  
  Fixes privatestorageops#287
  ea3fd773
- Merge branch 'alert-on-full-ram-with-zfs-too' into 'develop' · e17e8e8f
  Jean-Paul Calderone authored 2 years ago
  
  Also alert when hosts that run ZFS run out of RAM See merge request !345
  e17e8e8f
- Merge branch 'keep-metrics-as-long-as-logs' into 'develop' · 1bb44eb3
  Jean-Paul Calderone authored 2 years ago
  
  Prometheus should keep metrics as long as Loki keeps logs See merge request !344
  1bb44eb3
- Also alert when hosts that run ZFS run out of RAM · 83faabb7
  Florian Sesser authored 2 years ago
  
  Forgot to add a second alert when I added the workaround to not count the ZFS ARC into used memory :/
  83faabb7
- Prometheus should keep metrics as long as Loki keeps logs · cfcad885
  Florian Sesser authored 2 years ago
  
  ... which is both governed by our retention policy.
  cfcad885
Sep 07, 2022
- Merge branch '287.publish-tahoe-error-rate-with-prometheus' into 'develop' · 9c99fe11
  Florian Sesser authored 2 years ago
  
  Monitoring: Count Tahoe's corruption advisories Closes privatestorageops#287 See merge request !341
  9c99fe11
- Merge branch '288.report-tahoe-errors' into 'develop' · f8bc31bf
  Florian Sesser authored 2 years ago
  
  Monitoring: Add Tahoe-LAFS corruption advisory count + rate + alert on rate > 0 Closes privatestorageops#288 See merge request !342
  f8bc31bf
- whitespace · 7c80724d
  Florian Sesser authored 2 years ago
  
  7c80724d
- Apply 1 suggestion(s) to 1 file(s) · 0117b060
  Jean-Paul Calderone authored 2 years ago and Florian Sesser committed 2 years ago
  
  0117b060
- Merge branch '124.fix-resources-dashboard' into 'develop' · 86c7424c
  Jean-Paul Calderone authored 2 years ago
  
  Monitoring: Fix resources dashboard See merge request !343
  86c7424c
Sep 06, 2022
- Monitoring: Import in the web GUI, export, import again? Refs #124 · 4a5bebe1
  Florian Sesser authored 2 years ago
  
  4a5bebe1
- Monitoring: Tahoe-LAFS dashboard: These numbers seem more sensible · b7369157
  Florian Sesser authored 2 years ago
  
  b7369157
Sep 05, 2022
- Monitoring: Add Tahoe-LAFS corruption advisory count + rate + alert on rate > 0 · f8dbc058
  Florian Sesser authored 2 years ago
  
  f8dbc058
- Monitoring: Count Tahoe's corruption advisories · cfc67572
  Florian Sesser authored 2 years ago
  
  cfc67572
Sep 02, 2022
- Merge branch '429.monitoring-backup-durations' into 'develop' · 3af22871
  Jean-Paul Calderone authored 2 years ago
  
  Monitoring: Backup duration and backup set size Closes privatestorageops#429 See merge request !339
  3af22871
Aug 31, 2022

Monitoring: Backup: Add a backup set size dash · 9eeabfef

Florian Sesser authored 2 years ago

This is a bit buggy still in our version of Grafana, but already
nice to look at / maybe useful.

Refs privatestorageops#429

9eeabfef

Monitoring: Backup: Add two dashboards for alerting on backup duration · b484865b

Florian Sesser authored 2 years ago

This adds alerting to the backup job duration graph: Grafana alerting
works with systemd unit metrics, i.e. a backup job unit being "active"
for too long. Use that fact for alerting on long-running backup jobs.

b484865b

Monitoring: Backup: Daily backup duration dash: Barchart · 7ccd55bb
Florian Sesser authored 2 years ago
```
... instead of connected lines default, also with working label for host

Refs privatestorageops#429
```
7ccd55bb
Monitoring: Backup: new dash: Daily backup duration · eed84424
Florian Sesser authored 2 years ago
```
Refs privatestorageops#429
```
eed84424

Aug 29, 2022
- Merge branch '429.monitoring-backup-stats-json' into 'develop' · 4b0ca53c
  Florian Sesser authored 2 years ago
  
  Backup: Borg: All logs as JSON please See merge request !338
  4b0ca53c
- Backup: Borg: All logs as JSON please · bd2b86a4
  Florian Sesser authored 2 years ago
  
  That makes working with its output in Grafana easier. Good documentation can be found at https://borgbackup.readthedocs.io/en/stable/internals/frontends.html#json-output
  bd2b86a4
- Backup: Borg: Stats output as JSON please · c9f0e14b
  Florian Sesser authored 2 years ago
  
  c9f0e14b
- Merge branch '429.monitoring-backup-stats' into 'develop' · 146f4602
  Florian Sesser authored 2 years ago
  
  Backup: Add --stats to borgbackup create options See merge request !337
  146f4602
- Borg backup: Hopefully more clear explanation · b6ff34dc
  Florian Sesser authored 2 years ago
  
  b6ff34dc
- Backup: Add stats to borgbackup create options · b0858c16
  Florian Sesser authored 2 years ago
  
  We want to have this in our own monitoring
  b0858c16
- Merge branch '429.monitoring-backup-colors' into 'develop' · 53145b0b
  Florian Sesser authored 2 years ago
  
  Monitoring: Backup: Clearer coloring See merge request !336
  53145b0b
- Monitoring: Backup: Clearer coloring · 01fefd8e
  Florian Sesser authored 2 years ago
  
  Failed backups now have a filled red area instead of a thin yellow line. Refs privatestorageops#429.
  01fefd8e
Aug 24, 2022
- Merge branch 'monitoring-disregard-arc-memory' into 'develop' · 7186fc73
  Florian Sesser authored 2 years ago
  
  Disregard ZFS ARC cache when monitoring free RAM Closes #119 See merge request !331
  7186fc73
- Merge branch 'monitoring-add-full-node-dashboard' into 'develop' · 0c0fd5cb
  Florian Sesser authored 2 years ago
  
  Monitoring: Add a "Node Exporter Full" dashboard See merge request !332
  0c0fd5cb
Aug 17, 2022
- Monitoring: Add a "Node Exporter Full" dashboard · 158822fd
  Florian Sesser authored 2 years ago
  
  , a dashboard that "displays a lot of data about one single host". This is
  158822fd
- undo irrellevant change · 2c261a44
  Florian Sesser authored 2 years ago
  
  2c261a44
- Workaround: Consider ZoL ARC "free" memory too · 4b0e1680
  Florian Sesser authored 2 years ago
  
  One query for hosts with ZFS and one for those without.
  4b0e1680
Aug 16, 2022

Disregard ZFS ARC cache when monitoring free RAM · cae8812a

Florian Sesser authored 2 years ago

Since ZoL frees ARC under memory pressure, let's not
count it as "used" but instead as "free" memory.

cae8812a

Aug 15, 2022
- Merge branch '88.minimal-profile' into 'develop' · 5eff69ab
  Jean-Paul Calderone authored 2 years ago
  
  Set minimal nixos profile for our grid machines See merge request !330
  5eff69ab
- Set minimal nixos profile for our grid machines · 8989bcc7
  Florian Sesser authored 2 years ago
  
  8989bcc7