[edit] WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Architecture - Simple Space Balance Migration

From Obsolete Lustre Wiki
Jump to: navigation, search

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.



Migration is the process of moving individual objects or entire files between OSTs. This can be used for a wide variety of reasons, such as moving objects from full OSTs to less-full or less-busy OSTs, or evacuating OSTs for removal/replacement. Simple space balance migration is a subset of full data migration in that it is limited to migrating files that are not currently in use (i.e. no clients have the file open for read or write).


movement of individual objects or entire files between OSTs
the process running on a Lustre client that migrates a single object/file (lfs migrate for Simple Space Balance Migration)
management node 
a Lustre client used to spawn and dispatch instructions to agents on other nodes

Simple Space Balance Migration Architecture

A. Open-unlinked files: When temporary files are created for holding the temporary migration object(s) they are immediately unlinked from the filesystem namespace. If the migration agent fails, or the node on which it is running fails, the objects will be destroyed by MDS-OST orphan recovery to avoid leaking space. In the future this may be implemented by an O_UNLINKED flag to open(2) to avoid putting the temporary file into the namespace at all.
B. Determine space imbalance: Use the lfs df output (or internal API equivalent from llapi_obd_statfs()) to determine if an imbalance exists in OST space usage.
C. Lock objects against modification: The files being migrated will be locked on the MDS, and the migration agent will be signalled if another process tries to open the file. The other opening process(es) will block until the agent releases the lock (either aborting the migration or completing it).
D. Maintain object attributes: The target object(s) of the migration should inherit the same attributes from the source object(s) to ensure no user visible changes. This includes mtime, atime, ctime, uid, gid, size, parent inode/generation in fid EA (this will require a new mechanism to set).

Use cases


id quality attribute summary
migrate_from_osts performance, availability One or more OSTs is too full, while others are much less full. Migrate objects from the full OST(s) to the remaining OSTs to more evenly balance space.
open_migrating_file availability Another process opens a file undergoing migration. The migration should complete quickly or be aborted.
cluster_migration_failure availability The cluster on which migrations are active suffers a failure. Partial migrations are reverted.
agent_failure availability The agent performing a migration suffers a failure. The partial migration is reverted.

Migration from OSTs

Scenario: One of more OSTs is too full
Business Goals: Avoid job failure due to out-of-space; allow decomissioning of OSTs
Relevant QA's: Performance & Availability
details Stimulus: One or more OSTs is too full (e.g. more than 20% higher than mean OST Use%), administrator or script begins migration.
Stimulus source: Administrator or automated monitoring script.
Environment: OST(s) above space imbalance threshold
Artifact: OST space usage.
Response: Disable source OSTs on MDS to avoid new objects being created there. Run lfs find to generate list of files to migrate. Start one or more migration processes running lfs migrate to migrate selected files until space usage is within variance threshold.
Response measure: OST space usage is below variance threshold (e.g. within 5% of mean OST Use%).
Questions: None.
Issues: None.

Open migrated file

Scenario: The file being migrated is opened by another process
Business Goals: Avoid blocking jobs using the file
Relevant QA's: Availability
details Stimulus: A signal is sent to the agent.
Stimulus source: Another process on any client opens the file being migrated.
Environment: File currently undergoing migration
Artifact: Migration agent.
Response: Migration is stopped (completed or aborted) within specified timeframe and lock is released. Temporary objects are removed. Other process can open file.
Response measure: Stop migration within specified time period (10s).
Questions: None.
Issues: None.

Command-line Definitions

lfs find -obd OST_UUID [-obd OST2_UUID ...] -size +S -mtime +M -print0 
locate files with objects on any of the specified OST(s) with size bigger than S and older than M days. The purpose is to find large objects (e.g. 4MB or more) that were not recently modified, these are more suitable for migration as this avoids lots of inefficient small-file move operations and reduces the chance of interruption. Any of the supported lfs find checks could be used (using conjunctive (AND) semantics).
lfs migrate [-0] --source-obd OST_UUID [--source-obd OST2_UUID ...] [--target-obd OST3_UUID ...] [--restripe|[--restripe-count] [--restripe-size]] {[--object N] pathname} ... 
migrate the specified pathname(s) onto the specified target-obds (or all other OSTs if unspecified), and do not create new objects on the source-obds listed.
  • If only a pathname is passed, migrate the entire file. If a pair --object N pathname is given move object N in the pathname to one of the target OSTs.
  • Read the list of files to migrate from stdin if pathname is -. If -0 is given, files read from stdin are NUL-separated (from -print0 output).
  • By default the migrated file has the same stripe count and stripe size as the original file. If --restripe is given, then restripe the file(s) according to the default striping in the parent directory of each file. If --restripe-count and/or --restripe-size are given, use the specified stripe count and/or size per lfs setstripe for all migrated files.

Implementation Constraints

  1. lfs find needs to be modified to handle multiple -obd arguments in order to efficiently generate the list of files with objects on multiple source OSTs
  2. lfs find -size should only get the OST size information if all checks related to MDS resident data have already passed.
  3. lfs migrate should run on a Lustre client filesystem, doing normal read operations on the source file. lfs migrate can run on a normal client, or could optionally run on a client filesystem mounted on an OSS node in order to avoid sending the write IO over the network.
  4. lfs migrate should verify before starting the migration that the file being migrated has an object on one of the source OSTs, or report an error. This avoids a race condition and useless work if the file is recreated between the time the lfs find is run and when the file begins migration.
  5. lfs migrate should have a mechanism to lock or otherwise monitor the source file on the MDS against updates or against being opened by another process. The user-space process should be notified by the kernel (from the MDS), through a signal or otherwise, before the source file is modified. lfs migrate should then stop the migration in a timely manner (finish if file is small, or abort) and release the lock. If lfs migrate does not release the lock in a timely manner the client is evicted from the MDS and all OSTs (via existing evict-by-nid functionality).
  6. lfs migrate should create a temporary target file (perhaps with only a single object in it) and immediately unlink the file so that it is destroyed if the agent is terminated, to avoid leaking the object. Once the data copy is complete, the object(s) (or entire file, if restriping) are moved over to the source file and removed from the temporary file (by manipulating the EAs, not by renaming), and releasing the lock to signal migration completion.
  7. lfs migrate shall move all objects on the source OSTs to the target OSTs (or one of the least full OSTs, as determined in a manner similar to lfs df). If one of the --restripe options is given and this would actually change the striping of the file then the whole file will be read and rewritten, otherwise only the objects on the source OST(s) need to be moved.
  8. There should optionally be some mechanism for the lfs find output to be efficiently split among multiple (possibly remote) client invocations of lfs migrate.


The parent tracking bug for Space Management is bug 13107.

Personal tools