WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Architecture - Free Space Management

From Obsolete Lustre Wiki
Jump to navigationJump to search

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Free Space Management

In Lustre 1.6, the MDS assigns file stripe objects to OSTs using a mix of load balancing and free space considerations in order to optimize filesystem performance and OSS space utilizations. Emptier OSTs are prefered for stripes, and stripes are distributed evenly over OSSs to increase network bandwidth utilization. The weighting factor between these two optimizations is user-adjustable.

There are two stripe allocation methods. The method is determined by the amount of free-space imbalance on the OSTs. The weighted allocator is used when any two OSTs are imbalanced by more than 20%. Until then, a faster round-robin allocater is used. The round-robin order maximizes network balancing.

Round-Robin Allocator

When OSTs have approximately the same free space (within 20%), an efficient round-robin allocator is used. The round-robin allocator alternates stripes between OSTs on different OSSs. Here are some example round-robin stripe orders (the same letter represents the different OSTs on a single OSS):

3: AAA a single 3-OST OSS
3+3: ABABAB 2 3-OST OSS's

3+4: BBABABA
a 3-OST OSS (A) and and 4-OST OSS (B)

3+5: BBABBABA

3+5+1: BBABABABC

3+5+2: BABABCBABC

4+6+2: BABABCBABABC

Weighted Allocator

When the OSTs free space difference is significant, then a weighting algorithm is used to influence OST ordering based on size and location. Note that these are weightings for a random algorithm and so will not necessarily strictly choose the "emptiest" OST every time. On average, it will fill the emptier OSTs faster.


Adjusting the weighting between free space and location

This priority can be adjusted via the proc file /proc/fs/lustre/lov/lustre-mdtlov/qos_prio_free. The default in the future will be 90%. You can set this permanently on existing betas with this command on the MGS:

lctl conf_param <fsname>-MDT0000.lov.qos_prio_free=90

Increasing the value will put more weighting on free space. When set to 100% free space priority, then location is no longer used in the stripe ordering calculations, and the weighting is based entirely on free space.

Note that setting the priority to 100% just means that OSS distribution doesn't count in the weighting, but the stripe assignment is still done via a weighting -- if OST2 has twice as much free space as OST1, it will be twice as likely to be used, but is still not guaranteed to be used.