If a third-party backs up the ZKAPAuthorizer database and later restores it as part of an overall failure recovery process, some of the ZKAPs in the database may already have been spent. If they are left in place, ZKAPAuthorizer will slowly churn through them and eventually get to some unspent ZKAPs and everything will proceed as normal.
Depending on various factors (not least of which is how stale the backup is), it seems possible this churn phase could take a long, long time (a day seems likely in some plausible scenarios). There is also no way to indicate progress towards completion during this churn since the thing ZKAPAuthorizer specifically does not know is how many ZKAPs were already spent.
Beyond ZKAPs, there is other significant state in the database that changes from time to time. There are vouchers in various states and there are records of ZKAPs that couldn't be spent for some reason (along with a record of that reason). All of this stuff should, ideally, be recorded in a backup so it is available after recovery.
https://sqlite.org/sessionintro.html might be a better basis for an incremental backup system. It doesn't rely on any domain knowledge of the database schema and it can more easily be applied to all persistent state instead of primarily focusing on the unblinded-tokens table.
However, it's significant more complex. The Python bindings to SQLite3 seem to lack support for using it. If they supported it then ZKAPAuthorizer would have to adjust its SQLite3 usage to open a session around all database interactions. ZKAPAuthorizer also needs to be sure to use SQLite3 only in the ways that allow changes to be captured by the session mechanism (the limitations are mild but critical to respect). Then ZKAPAuthorizer would have to expose the changesets (or patchsets? patchsets are more compact but less robust against divergence ... of which we should have none...). Then a third-party needs to poll that data source and push it to the backup site. Significantly, care needs to be taken not to lose a changeset as the Tahoe node is shutting down (orderly or disorderly!). Then, over the course of multiple runs of the Tahoe node, multiple changesets will accumulate. All of these need to be persisted along with the backup.
For recovery, the original backup file as well as all of the changesets need to be retrieved from the backup site and then the changesets applied to the database in the correct order.
At the end of all this, the recovered database should be identical to a recent version of the lost database. "Recent" is determined only by how fresh the last backed up changeset is.
A somewhat similar but simpler approach would be to use the database in WAL mode and take control of WAL checkpointing. The APIs controlling this are all available to Python already (they're mostly pragmas). Details can be found at https://sqlite.org/wal.html#activating_and_configuring_wal_mode
This still requires somewhat careful coordination by ZKAPAuthorizer and a third-party like GridSync. The general workflow would go something like this:
ZKAPAuthorizer always uses the database in WAL mode with autocheckpoint disabled
GridSync tells ZKAPAuthorizer it wants to back up the database
ZKAPAuthorizer stops writing to the database
GridSync uploads a copy of the database to the grid
GridSync tells ZKAPAuthorizer it can resume
From time to time, GridSync tells ZKAPAuthorizer it wants to update the backup.
ZKAPAuthorizer stops writing to the database
GridSync uploads the WAL to the grid
GridSync tells ZKAPAuthorizer it is done
ZKAPAuthorizer runs a checkpoint
ZKAPAuthorizer resumes normal operation
This process results in a database file and a number of WAL files stored on the grid. The recovery workflow is like this:
GridSync downloads the database file and all of the WAL files
GridSync puts the database in place
While there are still WAL files, GridSync puts the oldest WAL file next to the database and tells SQLite3 to do a checkpoint
At the end of this process, the database has been brought up to the state as of the last WAL file (ie, it is as up-to-date as the last backup)
As with the above session-based solution, this involves no knowledge of the database schema and captures all database state.
It is somewhat safer because it does not involve ensuring a changeset is extracted from the session before ZKAPAuthorizer shuts down (SQLite3 takes care of always writing all changes to the WAL files).
It is still somewhat complex as it involves careful and exactingly correct handling of the database file and all of the WAL files. I assume that even a slight mistake will render all WAL files from that point on unusable and so seriously degrade the quality of the backup.
There is also probably some other important complexity in deciding when too many WAL files have accumulated and starting the process over from a fresh database backup. This is an optimization but it is necessary if we want to impose a bounded storage constraint on the system as a whole (which we probably do since storage costs money).
I think this is probably strictly superior to the session-based solution.
Another possible solution, suggested by @hacklschorsch , is to use a backup tool that supports efficient binary deltas - for example, borgbackup. In this case, GridSync would make an initial copy of the database file and then periodically compute a diff against that copy, uploading just the diff.
Unfortunately borgbackup doesn't support Windows, though we have a suspicion that it is not far from being able to do so. We're not sure what kind of footprint impact distributing borgbackup with GridSync would have though.
The ticket description is pretty good for the most part but it turns out the suggested action (there and in the summary) is probably wrong. I'm editing the ticket to just talk about the problem and then I'll work up some design ideas for a solution in a doc in the repo.
A modification to any of the above plans could be to wrap up all of the complexity inside ZKAPAuthorizer itself. So from a third-party's perspective, the backup process becomes roughly:
Send a request to a ZKAPAuthorizer HTTP endpoint telling it to maintain a backup and perhaps configuring some parameters. Some possible parameters:
frequency to update the backup,
preference for optimizing for storage space or bandwidth usage
a directory cap where the backup should be placed (the other two are nice features, this one is probably mandatory)
And the recovery process becomes roughly:
Send a request to a ZKAPAuthorizer HTTP endpoint giving it a directory cap holding a backup and telling it to recover its state from that backup
This seems a lot nicer than exposing all of the internal details related to checkpointing to the third-party driving the backup process. The result is probably simpler integration and an easier path to migrate to new a backup system later should that prove desirable.
Some combination of this idea with the WAL idea seems plausible - or even with a simpler idea to start with (for example, monolithic database file backups - or even just monolithic database dump backups, which I like a lot better than the idea of keeping sqlite3 files on the grid).
It looks like SQLite3 also checkpoints on connection close. This doesn't necessarily stop us from using the checkpoint-based backup scheme but it does mean we need to be quite careful not to close the database connection without backing up the latest wal file. Crashing is okay because then SQLite3 clearly won't have a chance to do a checkpoint.
With very little zkap-authorizer context, etc I generally like last idea (of zkap-authorizer being responsible for putting stuff in the grid, and for restoring). It's easier for zkap-authorizer to co-ordinate things like "don't write to the database now, we're backing-up" than anything else.
...and there's no "chicken and egg" problem (if you're starting fresh, and have just the read-cap for the backup) because we only charge for uploads (right?).
It also has the bonus of not depending very much on third-party software (e.g. gridsync) so can be used outside that (i.e. "more flexible").
A slight tweak: the "backup" HTTP endpoint "does" the backup and then returns the read-cap of the backup. Basically the tweak is that "something else" co-ordinates the schedule and is responsible for storing the readcap of the backup. (Then you don't need other APIs like "when was the last backup?" etc for UIs to show users comforting dates, if need be). A UI is in a better position to provide features like "my human is going offline for a while shortly, do a backup now".
A slight tweak: the "backup" HTTP endpoint "does" the backup and then returns the read-cap of the backup. Basically the tweak is that "something else" co-ordinates the schedule and is responsible for storing the readcap of the backup.
Okay. So, concretely:
POST /backup-endpoint200 OKContent-Type: application/json{"success": true, "recovery-readcap": "URI:..."}
This implies one of two things, I think.
A new, complete backup on every invocation.
Extra local state inside ZKAPAuthorizer to remember where the last backup was made so it can compute an incremental update to it.
Ideally we would be able to do incremental backups with this system since the state is dozens or even hundreds of megabytes of data. So let's rule out (1) for the moment.
(2) is fine since ZKAPAuthorizer has local state already (that's why we're here!). So to expand on (2) a bit...
The backup endpoint is idempotent
Only one backup is ever maintained
This doesn't really go along with the idea:
"something else" co-ordinates the schedule
But I think that's better anyway. ZKAPAuthorizer has deep, intimate knowledge of when it makes sense to update the backup. It knows that it just inserted some new signatures into the database - that's a great time to update the backup (and in fact, it knows exactly what update is needed, no need to mess around with SQLite3 logs or anything - just append those ZKAPs to the on-grid mutable that holds signatures).
So then the overall workflow looks like:
Third-party creates a ZKAPAuthorizer-enabled Tahoe-LAFS node
Third-party posts to the backup endpoint and receives the recovery readcap
Third-party persists the recovery readcap somewhere durable (eg, GridSync can link it into the rootcap so it is reachable using GridSync's recovery key mechanism)
"Normal" usage proceeds. For example
Vouchers are added. The backup is incrementally updated with these.
Random tokens are generated. The backup is incrementally updated with these.
Signatures are obtained. The backup is incrementally updated with these.
Tokens/signatures are spent. The backup is incrementally updated to track this (this might be similar to the current system where there is kind of a "cursor" that GridSync advance when spending happens so spent tokens don't have to be removed from the backup but it is still easy to jump over spent tokens after recovery - but since ZKAPAuthorizer is now in charge of this part we have more flexibility to try to implement other systems that depend on more internal knowledge without worrying about leaking that knowledge in a public interface).
The system crashes and all local state is lost.
Third-party creates a new ZKAPAuthorizer-enabled Tahoe-LAFS node
Third-party posts the recovery readcap to ZKAPAuthorizer
ZKAPAuthorizer downloads the backup and updates itself to match that state. Conveniently, the recovery readcap remains valid; the third-party does not need to update it, ZKAPAuthorizer can keep updating it with subsequent changes.
There probably needs to be some code to handle the case where zkapauthorizer crashes between the local state being updated, and the remote backup being updated, as well.
Resync local database and grid state on every startup with some kind of exhaustive inspection logic.
Have a positive indicator of clean shutdown (eg write a marker file at the end of orderly shutdown, notice and delete it at startup). Do the above expensive re-sync only when it's not known that the last shutdown was clean.
Have an application-level write-ahead log that gets flushed to the grid independently of local SQLite3's state (like, a file that we append lines like "add voucher xyz" and "spend token abc" to).
Implement two-phase commit between SQLite3 and the grid backup (which basically blocks local state changes on grid updates) (noo don't do this).
Might (also) be a good idea to have some kind of "/wait-for-backup-to-be-current" endpoint (possibly as a future enhancement). Thinking here of the "nice" shutdown use-case, where the third party wants to be sure stuff is sync'd before killing Tahoe. Perhaps this also covers whatever the use-case is for "crash between local state and backup": there's always going to be a window where the backup is non-current (unless you do the "two-phase commit" thing) so the third party needs some way to communicate that to its human.
Maybe this is more of a "tahoe feature" though .. basically a "shutdown nicely now, take as long as you need" command/endpoint to finish active operations etc .. which could include giving plugins like ZKAPAuthorizer a chance to do stuff. Currently, a third-party could accomplish this for pure Tahoe operations by examining the "active operations" JSON if they're the only one using the client .. but now the plugin will also be using the client for grid operations, so that wouldn't be true in this backup scenario.
In user-story language: "as a third-party application running a zkap-enabled tahoe client I wish to know whether the on-grid backup is current so correct status can be shown to my human".
Thinking here of the "nice" shutdown use-case, where the third party wants to be sure stuff is sync'd before killing Tahoe.
Maybe this should just be the normal shutdown behavior? ie, the SIGINT / reactor.stop() behavior on POSIX and whatever the equivalent is on Windows - start shutting down, but don't finish until the backup is sync'd (the implementation mechanism here would be to return a Deferred from some IService.stopService that doesn't fire until whatever we wanted to do is done).
I'm not sure how tricky it would be to integrate this with the rest of Tahoe's services ... you'd want to make sure the node stops accepting new work so that it is guaranteed the backup sync will eventually catch up.
Maybe this should just be the normal shutdown behavior? ie, the SIGINT / reactor.stop()
Yeah, that could be the way to invoke the API / behavior .. but as you noted, it might be a little tricky to get right (i.e. shutdown order might become very important).
The simplist thing right now is probably an extra endpoint for just this plugin (the very same code could be called when/if Tahoe grows a "services, finish your stuff" feature).
you'd want to make sure the node stops accepting new work so that it is guaranteed the backup sync will eventually catch up.
The third-party should be able to ensure this .. that is, they'll know if they're considering killing Tahoe so they can stop producing new stuff (e.g. kill magic-folders, not call any more WebUI endpoints, etc) then tell ZKAPAuthorizer to ensure the backup is current, then kill tahoe. So I guess another point for "simpler to do this outside Tahoe right now". (Put differently: if this were in Tahoe, it would have to turn off just the WebUI but still accept new work "internally" -- i.e. from the plugin(s) -- then let all that stuff finish, then shut down).
Related, you mentioned "Tokens/signatures are spent" as a point to make a backup .. so I guess care needs to be taken here to not induce an infinite loop, essentially. (That is, you'll spend tokens to upload the new bit of backup data which might cause the plugin to decide to do another incremental backup because token(s) were spent ... or I guess put another way, there will always be more spent tokens locally than in the on-grid backup).
Maybe this is what the "cursor" stuff is about though ...
The backup system should definitely be careful about the effects of spending during backup, yea. This is helped by the fact that a mutable can be rewritten at no cost as long as it doesn't grow past the next quantized size (so 500,000 byte mutable can grow to <1,000,000 bytes without spending any more zkaps). Probably most backup updates will end up being free, with only occasional spending when those thresholds are crossed (it will be a bit easier to reason about this after a schema for the backup is determined, of course).
https://litestream.io/ is essentially exactly an implementation of the WAL-based idea I described above. It looks like they have a more robust implementation technique than I described which does not require direct cooperation from application code using the SQLite3 database. This seems like good evidence that the basic approach is reasonable. The project seems to be around 10kloc of Go. This includes extra functionality we don't need but it still suggests it's not a trivial undertaking. It's also implemented as a stand-alone process so that's an extra piece. If we wanted to maintain the external interface discussed above this means ZKAPAuthorizer would need to start managing a child process. This might be feasible but it's not my favorite idea.
Oh litestream has a feature where you run it and it runs your application, which largely removes the issue of complexity from extra process management - though this changes the interface between third-parties and ZKAPAuthorizer-enabled Tahoe-LAFS (they need to run litestream instead of Tahoe ...) which is kind of weird.
Presumably litestream doesn't support streaming to Tahoe-LAFS right? So where would these backups be going to? (Or is there some plugin type of situation?)
Presumably litestream doesn't support streaming to Tahoe-LAFS right? So where would these backups be going to? (Or is there some plugin type of situation?)
Not explicitly, but it does support streaming to an SFTP server - so in practice, maybe it works (but Tahoe's SFTP server is a bit quirky so it's probably worth experimentally verifying this rather than assuming it will work).
BTW, how big is the database?
250GB worth of ZKAP material results in a database of around 150MB.