Architecture - Simple Space Balance Migration

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Overview
Migration is the process of moving individual objects or entire files between OSTs. This can be used for a wide variety of reasons, such as moving objects from full OSTs to less-full or less-busy OSTs, or evacuating OSTs for removal/replacement. Simple space balance migration is a subset of full data migration in that it is limited to migrating files that are not currently in use (i.e. no clients have the file open for read or write).

Definitions

 * migration : movement of individual objects or entire files between OSTs
 * agent : the process running on a Lustre client that migrates a single object/file (lfs migrate for Simple Space Balance Migration)
 * management node : a Lustre client used to spawn and dispatch instructions to agents on other nodes

Command-line Definitions

 * lfs find -obd OST_UUID [-obd OST2_UUID ...] -size +S -mtime +M -print0 : locate files with objects on any of the specified OST(s) with size bigger than S and older than M days. The purpose is to find large objects (e.g. 4MB or more) that were not recently modified, these are more suitable for migration as this avoids lots of inefficient small-file move operations and reduces the chance of interruption.  Any of the supported lfs find checks could be used (using conjunctive (AND) semantics).


 * lfs migrate [-0] --source-obd OST_UUID [--source-obd OST2_UUID ...] [--target-obd OST3_UUID ...] [--restripe|[--restripe-count] [--restripe-size]] {[--object N] pathname} ... : migrate the specified pathname(s) onto the specified target-obds (or all other OSTs if unspecified), and do not create new objects on the source-obds listed.
 * If only a pathname is passed, migrate the entire file. If a pair --object N pathname is given move object N in the pathname to one of the target OSTs.
 * Read the list of files to migrate from stdin if pathname is -. If -0 is given, files read from stdin are NUL-separated (from -print0 output).
 * By default the migrated file has the same stripe count and stripe size as the original file. If --restripe is given, then restripe the file(s) according to the default striping in the parent directory of each file. If --restripe-count and/or --restripe-size are given, use the specified stripe count and/or size per lfs setstripe for all migrated files.

Implementation Constraints

 * 1) lfs find needs to be modified to handle multiple -obd arguments in order to efficiently generate the list of files with objects on multiple source OSTs
 * 2) lfs find -size should only get the OST size information if all checks related to MDS resident data have already passed.
 * 3) lfs migrate  should run on a Lustre client filesystem, doing normal read operations on the source file.  lfs migrate can run on a normal client, or could optionally run on a client filesystem mounted on an OSS node in order to avoid sending the write IO over the network.
 * 4) lfs migrate should verify before starting the migration that the file being migrated has an object on one of the source OSTs, or report an error.  This avoids a race condition and useless work if the file is recreated between the time the lfs find is run and when the file begins migration.
 * 5) lfs migrate should have a mechanism to lock or otherwise monitor the source file on the MDS against updates or against being opened by another process.  The user-space process should be notified by the kernel (from the MDS), through a signal or otherwise, before the source file is modified.  lfs migrate should then stop the migration in a timely manner (finish if file is small, or abort) and release the lock.  If lfs migrate does not release the lock in a timely manner the client is evicted from the MDS and all OSTs (via existing evict-by-nid functionality).
 * 6) lfs migrate should create a temporary target file (perhaps with only a single object in it) and immediately unlink the file so that it is destroyed if the agent is terminated, to avoid leaking the object.  Once the data copy is complete, the object(s) (or entire file, if restriping) are moved over to the source file and removed from the temporary file (by manipulating the EAs, not by renaming), and releasing the lock to signal migration completion.
 * 7) lfs migrate shall move all objects on the source OSTs to the target OSTs (or one of the least full OSTs, as determined in a manner similar to lfs df).  If one of the --restripe options is given and this would actually change the striping of the file then the whole file will be read and rewritten, otherwise only the objects on the source OST(s) need to be moved.
 * 8) There should optionally be some mechanism for the lfs find output to be efficiently split among multiple (possibly remote) client invocations of lfs migrate.