Skip to content
Snippets Groups Projects

Server-side metric for spent ZKAPs for Prometheus

Merged Jean-Paul Calderone requested to merge 141.serverside-prometheus-metrics into main

Fixes #259 (closed)

This is broken down into a histogram by number of ZKAPs spent. This correlates relatively well with actual size. It's also simpler than trying to fix two dimensions into one histogram (file size and ZKAPs spent). If this provides insufficient visibility we can add another metric but let's see how it works in practice first.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
17 17 Tahoe-LAFS.
18 18 """
19 19
20 from __future__ import absolute_import
21
20 22 import random
21 23 from datetime import datetime
22 24 from functools import partial
23 25 from weakref import WeakValueDictionary
24 26
27 try:
28 from typing import Callable
  • Administrator
  • Administrator
    Administrator @root started a thread on commit 9b0dd25b
  • 218 return list(2 ** n for n in range(11)) + [float("inf")]
    219
    220 def __attrs_post_init__(self):
    221 self._metric_spending_successes = Histogram(
    222 "zkapauthorizer_server_spending_successes",
    223 "ZKAP Spending Successes histogram",
    224 registry=self._registry,
    225 buckets=self._get_buckets(),
    226 )
    227
    228 def _clear_metrics(self):
    229 """
    230 Forget all recorded metrics.
    231 """
    232 # There is also a `clear` method but it raises an AttributeError.
    233 self._metric_spending_successes._metric_init()
    • Created by: tomprince

      The docstring for clear says "Remove all labelsets from the metric", which somewhat suggests it is only intended to be used with metrics that have labels. That said, it seems reasonable to want to use it with metrics that don't have labels as well. Is there a bug upstream for this?

    • There is an upstream bug, added a link in the comment.

  • Administrator
    Administrator @root started a thread on commit 9b0dd25b
  • 236 286 allocated_size,
    237 287 )
    238 288
    289 # XXX Some shares may exist already so those passes aren't necessarily
    290 # spent...
    291 self._metric_spending_successes.observe(len(validation.valid))
  • Administrator
    Administrator @root started a thread on commit 9b0dd25b
  • 277 331 validation,
    278 332 self._original,
    279 333 )
    334 self._metric_spending_successes.observe(len(validation.valid))
    • Created by: tomprince

      Similarly, I think this should be between the call to self._original.remote_add_lease and the return.

  • Administrator
    Administrator @root started a thread on commit 9b0dd25b
  • 393 449 renew_leases=False,
    394 450 )
    395 451
    452 self._metric_spending_successes.observe(required_new_passes)
  • Administrator
    Administrator @root started a thread on commit 9b0dd25b
  • 455 514
    456 :return: ``None`` if the given number of passes is sufficient.
    515 :return: A mapping from share number to share size on the server if the
    516 number of passes given is sufficient.
    457 517 """
    458 518 allocated_sizes = dict(
    459 519 get_share_sizes(
    460 520 storage_server,
    461 521 storage_index,
    462 522 list(get_all_share_numbers(storage_server, storage_index)),
    463 523 ),
    464 ).values()
    465 check_pass_quantity(pass_value, validation, allocated_sizes)
    524 )
    525 check_pass_quantity(pass_value, validation, allocated_sizes.values())
    526 return allocated_sizes
  • Administrator
  • Administrator
  • Administrator
    Administrator @root started a thread on commit 9b0dd25b
  • 139 139 )
    140 140
    141 141
    142 def read_count(storage_server):
    143 buckets = storage_server._metric_spending_successes._buckets
    144 return sum(b.get() for b in buckets)
    145
    146
    147 def read_bucket(storage_server, size):
    148 bounds = storage_server._get_buckets()
    149 for bucket_number, upper_bound in enumerate(bounds):
    150 if size <= upper_bound:
    151 break
    152
    153 note("bucket_number {}".format(bucket_number))
    154 buckets = storage_server._metric_spending_successes._buckets
  • Administrator
  • Administrator
    Administrator @root started a thread on commit 9b0dd25b
  • 639 allocated_size=size,
    640 canary=LocalReferenceable(None),
    641 ),
    642 )
    643
    644 after_count = read_count(self.storage_server)
    645 after_bucket = read_bucket(self.storage_server, num_passes)
    646
    647 self.expectThat(
    648 after_count,
    649 Equals(1),
    650 "Unexpected histogram sum value",
    651 )
    652 self.assertThat(
    653 after_bucket,
    654 Equals(1),
    • Created by: tomprince

      Should we be checking the other buckets are 0?

    • They must be 0 because the sum is 1. Or prometheus_client is broken. :)

    • Created by: tomprince

      I guess that is true, but if it isn't too much work, I might be inclined to make the strong assertion anyway; I think the test would be slightly easier to reason about that way. (We could possibly even skip the sum assertion instead).

    • That's reasonable. I started to do this and realized I gave an incorrect explanation - read_count does return the sum, but it's the sum we calculate by looking at all of the buckets, not the sum prometheus_client calculates. So the other buckets must be 0 or read_count (now read_spending_success_histogram_total) is broken.

  • Administrator
    Administrator @root started a thread on commit 9b0dd25b
  • 524 547 actual_sizes,
    525 548 Equals(expected_sizes),
    526 549 )
    550
    551 @given(
    552 storage_index=storage_indexes(),
    553 secrets=tuples(
    554 write_enabler_secrets(),
    555 lease_renew_secrets(),
    556 lease_cancel_secrets(),
    557 ),
    558 test_and_write_vectors_for_shares=slot_test_and_write_vectors_for_shares(),
    559 )
    560 def test_mutable_spending_metrics(
    • Created by: tomprince

      I would be strongly inclined to fold these tests into the existing happy path tests which live in test_storage_protocol, rather than having multiple tests that exercise these code paths in the same way.

    • I agree the factoring here is not great. I am somewhat inclined to leave it as-is for now, at least until the other metrics code we want to write is written and maybe until the spending service changes land, and then do a refactoring pass.

  • Created by: tomprince

    Review: Approved

    This looks good, with a couple of changes (two minor and one medium):

    • we are working around some prometheus-client bugs, we should link to an upstream bug for them
    • There are a few names that should be more descriptive, given that we have tahoe-lafs buckets and now histogram buckets
    • I'd like to avoid duplicating the happy-path test cases that already exist in test_storage_protocol. I think the new storage server tests here should be fairly easy to fold into the existing tests there.
  • Created by: tomprince

    (note that I left enough comments that some are hidden by default :frowning_face:)

  • Created by: tomprince

    Another option that occurred to me, is that we could expose metrics via the client resource, rather than a file, though I guess the server isn't widely accessible enough to be scraped directly. (not suggesting we change it here)

  • Jean-Paul Calderone
  • Created by: hacklschorsch

    Review: Commented

    Good stuff.

    I agree with Tom's comments about naming - we could have been more concrete.

    I'm available for another pairing session on this if you like @exarkun !

  • Good stuff.

    I agree with Tom's comments about naming - we could have been more concrete.

    I'm available for another pairing session on this if you like @exarkun !

    Great. I hope the last rev I pushed passes CI and I think we should merge if so. We still have the negative case metrics to implement (invalid signatures and whatever) in a follow-up PR.

  • Please register or sign in to reply
    Loading