Architecture - Wide Striping: Difference between revisions

Revision as of 14:01, 14 January 2010

There are several use cases where Lustre wants to write exceptionally many stripes in files:

Major HPC installations may have many hundreds or thousands of OSTs and we need to be able to stripe files over all of them
Server Network Striping (SNS) will use parity declustering, resulting in an very large number of objects building up the striped file.

Therefore, wide striping will be a commonly encountered case. The goal is to encode the striping information in a very compact way.

A pool: defines an un-ordered sets of OSTs and will be used to describe the striping in a manageable way.
fid seq number: part of fully specified FID, contains sequence in which object was created
fid number: part of fully specified FID, contains object id within its sequence
object version: part of fully specified FID, contains object version number
FID: fully specified object identification structure: FID = {f-sequence, f-number, f-version}
FLDB: FID Location DataBase, provides fid sequence to server (OST, MDS) mapping

Get a consecutive set of fid sequence numbers from the FLDB
define an on-disk EA that contains a pool name and other RAID striping parameters, for use as a default directory EA
define an on-disk EA that contains a RAID type, raid parameters, a starting fid sequence number, a count of objects over which the object may be striped, a sequence skip count, a single fid number used by this file in all specified sequences, the object version, possibly the pool from which this object was allocated (for future reference)
offsets within the file are {lov_offset, stripe_index} = fn(file_offset, raid_type, raid_parameters}
individual objects OBJ{0, ..., num_obj - 1} in the file can be located:
- OST(stripe_idx) = FLDB(seq_start + stripe_idx*seq_skip)
- OBJ(stripe_idx) = FID{seq_start + stripe_idx*seq_skip,fid_number,obj_version}

Revision as of 07:44, 8 August 2007 (view source) Adilger (talk \| contribs) m (add raid parameters to EA, offsete calculations)	Revision as of 14:01, 14 January 2010 (view source) Docadmin (talk \| contribs) m (1 revision) Newer edit →
(No difference)