Architecture - End-to-end Checksumming

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Requirements

 * Implement Lustre network checksumming for data with integrity checking in both client and server ends.


 * Integrate the lustre network checksumming with the DMU on-disk checksum mechanism. The plan is to use a Tiger Tree Hash algorithm to allow checksumming the individual pages on the client independently of the blocksize of the object.

The Lustre part
Two pieces of functionality are required:
 * 1) new checksum algorithm (easily done because there is compatibility for this)
 * 2) probably more space to store a larger checksum value (only 32-bits today) - mechanism for reads to force a buffer out of cache if client detects wrong checksum and the client computed checksum matches that on the server

DMU part

 * 1) new checksum algorithm - interface for caller to attach checksum (possibly array) to buffers for writes;
 * 2) DMU uses the checksum(s) on write buffers instead of computing own checksum;
 * 3) DMU attaches checksum(s) to buffers on reads after verify/retry/rebuild;
 * 4) interface for caller to extract checksum from buffers (newly read or from cache) to send to client
 * 5) mechanism (if it doesn't already exist) to purge a buffer from cache if it is found to contain the wrong checksum

Note: It would be reasonable to use a 1kB minimum blocksize (up to 4kB would be acceptable if this significantly simplifies implementation or improves performance), aggregating these checksums to a single value in the RPC over the wire, and then recomputing the per-block checksums for the DMU and aggregating those to verify against the RPC checksum. The DMU buffers will pass the checksum down to the disk on writes, and will also keep the verified on-disk checksums on the buffers in memory so that reads from cache can be verified by the client.

Use Cases (TODO)
1. Write data (a) calculate checksum (b) send RPC with checksum (c) receive RPC and check integrity (d) calculate checksums for ZFS blocks (e) pass data blocks along with checksums to ZFS

2. Read data (a) read data from ZFS, get checksum too (b) check data integrity (c) calculate checksums for Lustre data blocks if needed, check it is the same (d) send reply with data and chechsum (e) receive data with checksum, check integrity