Architecture - Migration (2)

Summary

Migration is a process of data and metadata moving within one cluster (one namespace) as well as to/from external non-Lustre storage servers. HSM, free space balance, file restriping are examples of migration.

Definitions

Agent: actually copies objects
Feed: object enumeration
Feed Generator: produces object enumeration for migration agent
Coordinator: menages migrations and running migration agents
Initiator: initiates a migration, issues a migration request

Use cases

ID	Quality Attribute	Summary
simple migration	usability, performance	a simple migration within one name space
duplicate requests are merged	performance	duplicate migration requests are proccessed as one request
conflicting requests abort in-progress migration	performance	when an object in process of migrating to HSM, any access to the object aborts migration
coherent access to moving objects	usability	moving object continues to be accessible to clients
propagate punch and trunc to source, as well as llog on sink	performance	don't copy truncated data
recovery	availability	restore moving object state after a server crash
single namespace at all times	usability	moving and migrated objects are in the same namespace
scalability	scalability	more servers - faster migration
reconnect with dirty cache after server migration has completed (FLDB)	usability	dirty cache reintegration when the objects are moved already
support partial file reclamations	performance, usability	file can't fit fully in cache
cache full / master full management (grants?)	usability	we don't want destination server to run out of space in the middle of migration
IO optimization	performance	agents will somehow create large IOs

Quality Attribute Scenarios

Simple migration

Scenario:		simple data and md migration within one namespace
Business Goals:		advanced control over Lustre object placement
Relevant QA's:		usability, performance
details	Stimulus:	a migration request
	Stimulus source:	administrator, lustre control utilities, client
	Environment:	a Lustre cluster
	Artifact:	migration coordinator
	Response:	an administrator issues a migration request to a coordinator. The coordinator starts or wakes up one or more migration agents. The coordinator asks migration agents to move Lustre objects with given IDs. The migration agents do actual migration.
	Response measure:	successful migration, achieving good performance by moving objects in parallel
Questions:
Issues:

duplicate requests are merged

Scenario:		second migration request is issued when the object is being migrated
Business Goals:		handle duplicated requests efficiently
Relevant QA's:		performance
details	Stimulus:	a migration request
	Stimulus source:	client, administrator, a control utility
	Environment:	an object being migrated
	Artifact:	coordinator
	Response:	the coordinator detects duplication of the requests and execute them as one
	Response measure:	successful execution of two requests, no duplicated requrests disturbing each other
Questions:
Issues:

conflicting requests abort in-progress migration

Scenario:		archiving to a tape is aborted if conflicting migration request is issued
Business Goals:		eliminate useless arhive operation
Relevant QA's:		performance
details	Stimulus:	client's file access (for the HSM case)
	Stimulus source:	client application
	Environment:	HSM, a file is being archived to a tape, someone wants to write to the file
	Artifact:	coordinator
	Response:	coordinator aborts the archiving operation
	Response measure:	archiving op is aborted
Questions:
Issues:

coherent access to moving objects

Scenario:		transparent access to objects being migrated
Business Goals:		concurrent access to the moving objects
Relevant QA's:		performance, scalability
details	Stimulus:	file access
	Stimulus source:	client application
	Environment:	an object being moved, a client application accesses the object
	Artifact:	moving object
	Response:	The migration agent(s) move the objects in chunks, protecting the currently moved chunks by exclusive locks and redirect application requests to appropriate data location (source or target)
	Response measure:	client is able to access an object being moved, client access isn't blocked for the period of object migration.
Questions:		always redirect to target?
Issues:

Empty UC

Scenario:
Business Goals:
Relevant QA's:
details	Stimulus:
	Stimulus source:
	Environment:
	Artifact:
	Response:
	Response measure:
Questions:
Issues:

Implementation details

IMP1 all IO involving Lustre servers uses client API (exploit existing locking and sync LLITE infrastructure)
- avoid reimplementing client
IMP2 separate problem into coordinators and agents
- agents have datamover plugins
IMP3: pull model if target is Lustre OSD (run agent on sink)
IMP4: if target is Lustre OSD, send all requests to target (block client requests until they can be filled on target). source OSD must redirect
- let MDTmaintain the redirection, in line with flash cache
- callback layout (stripe descriptor) when we initiate migration, sends
- blocking asts to all clients with any locks on the file
- 3 phase: old layout, dual layout, final layout
IMP5: need a lock bit for layouts
- means we need to drop the client lock when we flush inode
IMP6: creation of target object results in llog entry with old and new EA (SOM-style recovery)
- IMP7: record (llog) and execute trunc/punch on tgt, propagate to source
IMP8: bit (or extent log) on MDT (master copy) indicating copy or tape is current ("can I reclaim this space?") 1 llog of extents indicating which files are on tape (for fast space reclamation). Similar to WBC on clients
IMP9: Use commit CB on EA with tape FID to tell the tape "not an orphan" (i.e. "we're counting on this tape fid")
IMP10: MDS inode objects never change fids
IMP11 locks a migrator might take automatically interact with locks other clients may take

References

bug 14698

WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Architecture - Migration (2)

Contents

Summary

Definitions

Use cases

Quality Attribute Scenarios

Implementation details

References

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools