Architecture - Wide Striping

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

There are several use cases where Lustre wants to write exceptionally many stripes in files:


 * 1) Major HPC installations may have many hundreds or thousands of OSTs and we need to be able to stripe files over all of them
 * 2) Server Network Striping (SNS) will use parity declustering, resulting in an very large number of objects building up the striped file.

Therefore, wide striping will be a commonly encountered case. The goal is to encode the striping information in a very compact way.

Definitions (see fid-hld)

 * A pool: defines an un-ordered sets of OSTs and will be used to describe the striping in a manageable way.
 * fid seq number: part of fully specified FID, contains sequence in which object was created
 * fid number:    part of fully specified FID, contains object id within its sequence
 * object version: part of fully specified FID, contains object version number
 * FID:           fully specified object identification structure:   FID = {f-sequence, f-number, f-version}
 * FLDB:          FID Location DataBase, provides fid sequence to server (OST, MDS) mapping

APIs required

 * 1) Get a consecutive set of fid sequence numbers from the FLDB
 * 2) define an on-disk EA that contains a pool name and other RAID striping parameters, for use as a default directory EA
 * 3) define an on-disk EA that contains a RAID type, raid parameters, a starting fid sequence number, a count of objects over which the object may be striped, a sequence skip count, a single fid number used by this file in all specified sequences, the object version, possibly the pool from which this object was allocated (for future reference)
 * 4) offsets within the file are {lov_offset, stripe_index} = fn(file_offset, raid_type, raid_parameters}
 * 5) individual objects OBJ{0, ..., num_obj - 1} in the file can be located:
 * 6) * OST(stripe_idx) = FLDB(seq_start + stripe_idx*seq_skip)
 * 7) * OBJ(stripe_idx) = FID{seq_start + stripe_idx*seq_skip,fid_number,obj_version}