Architecture - Free Space Management

Free Space Management
In Lustre 1.6, the MDS assigns file stripe objects to OSTs using a mix of load balancing and free space considerations in order to optimize filesystem performance and OSS space utilizations. Emptier OSTs are prefered for stripes, and stripes are distributed evenly over OSSs to increase network bandwidth utilization. The weighting factor between these two optimizations is user-adjustable.

There are two stripe allocation methods. The method is determined by the amount of free-space imbalance on the OSTs. The weighted allocator is used when any two OSTs are imbalanced by more than 20%. Until then, a faster round-robin allocater is used. The round-robin order maximizes network balancing.

Round-Robin Allocator
When OSTs have approximately the same free space (within 20%), an efficient round-robin allocator is used. The round-robin allocator alternates stripes between OSTs on different OSSs. Here are some example round-robin stripe orders (the same letter represents the different OSTs on a single OSS):

Weighted Allocator
When the OSTs free space difference is significant, then a weighting algorithm is used to influence OST ordering based on size and location. Note that these are weightings for a random algorithm and so will not necessarily strictly choose the "emptiest" OST every time. On average, it will fill the emptier OSTs faster.

Adjusting the weighting between free space and location
This priority can be adjusted via the proc file /proc/fs/lustre/lov/lustre-mdtlov/qos_prio_free. The default in the future will be 90%. You can set this permanently on existing betas with this command on the MGS: Increasing the value will put more weighting on free space. When set to 100% free space priority, then location is no longer used in the stripe ordering calculations, and the weighting is based entirely on free space.

Note that setting the priority to 100% just means that OSS distribution doesn't count in the weighting, but the stripe assignment is still done via a weighting -- if OST2 has twice as much free space as OST1, it will be twice as likely to be used, but is still not guaranteed to be used.