Architecture - ZFS TinyZAP

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Definitions

 * ZAP : ZFS Attribute Processor, a hashed name=value lookup table that can be used to do efficient and scalable attribute storage
 * MicroZAP : a form of ZAP used by ZFS that only allows value to be a __u64
 * TinyZAP : a compact ZAP format that allows arbitrary values to be stored
 * FatZAP : a form of ZAP used by ZFS that allows arbitrary name/value pairs but (as name implies) consumes a lot of space

Implementation constraints
TinyZAP needs to be flexible enough to store arbitrary name/value data, including both Lustre LOV EA, and also MDT directories with extended FID data. Using a MicroZAP is not possible because this only allows storage of a single __u64 value with each entry. Using a FatZAP is wasteful as it requires a full block just for the header and a separate block for the leaf data.

A preferred implementation would have a structure similar to the existing zap_leaf_{phys_t,chunk} for the TinyZAP, since the leaf structure is reasonably compact, and may avoid a large amount of almost-identical code in the ZAP.

ZFS should be adapted (if necessary) to be able to handle directories created with TinyZAP layout, so they can can get the objid from the first __u64 and ignore the FID component of the directory entry.

The current ZAP implementation uses an object set and object number as parameters and we will need to interface using a buffer that might be located in the dnode or in an external block. So this might require some refactoring of the ZAP code.

This needs to handle endian swabbing issues correctly, as does all ZFS code.

Questions and Issues
Should we "wrap" the FID data after the DMU object id in an MDT directory so that it is possible in the future to add other extra data in a directory without compromising compatibility? Something like:


 * 1) define ZAP_LUSTRE_FID 0x110f1d0f1d0f1d10

struct zap_dir_fid { __u64 zdf_magic; struct lu_fid zdf_fid; /* or other data as appropriate */ };

This means we can skip (possibly unknown) extra directory info by skipping (zdf_len) bytes at a time looking for zdf_magic == ZAP_LUSTRE_FID.
 * 1) define zdf_len (zdf_magic & 0xff).