Architecture - Fileset

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Summary

A user application (or Lustre internal features) may want to perform an action on a very large set of files. Such actions might include migration to slower storage, purging of old files, or replication to a proxy server. A fileset is an efficient representation of these file identifiers (fids).

The definition of any particular fileset is left to an external agent; no search features will be included in Lustre itself (excluding Least Recently Used files, which is probably only efficiently tracked within Lustre). Typically searches for files with particular metadata characteristics will be done a database that mirrors the Lustre file tree via a ChangeLog. The files matching these criteria will be added to a fileset via a Lustre fileset API.

Filesets will generally come in two flavors: arbitrary collections of files, or a full file tree. See Enumeration below.

Definitions

Fileset: an arbitrary subset of files from within a single filesystem's namespace.
Consumer: an entity acting on the contents of a fileset
Internal consumer: a Lustre internal feature using a fileset (e.g. fileset client mount, maybe replicator, migrator)
External consumer: an entity external to Lustre using a fileset. This may be limited to a user of a fileset client mount, and no access to any other representation of a fileset is needed. see Client Access below.
type: fileset type, see Enumeration below

Qualities

Description	Quality	Semantics
coherence	usability	file modifications are reflected in the fileset (e.g. unlink, rename)
permanence	usability, scalability	when filesets are discarded.
synchronization	usability	the list of files in the set may change.
physiology	scalability	internal representation must be used efficiently
hashing	scalability	actions on a fileset may need to be distributed across multiple servers for scalability
modification	usability	the contents of a fileset may be modified over time to add or remove items

Use Cases

id	quality attribute	summary
compliance	usability, scalability	delete all files modified in 2002
workset	availability	the files in the fileset are available on a remote proxy server
backup	scalability	filesystem must be subdivided into manageable chunks for backup / replication

compliance

Scenario:		delete all files modified in 2002
Business Goals:		Provide an API to facilitate filesystem operations based on database search output
Relevant QA's:		Usability, scalability
details	Stimulus:	The fileset and requested operation are fed to the API
	Stimulus source:	Compliance policy dictates removal of old files
	Environment:	Database has recent FS information (from watching a ChangeLog)
	Artifact:	Fileset, type 1
	Response:	Lustre performs the requested operation on each of the files in the fileset
	Response measure:	fileset is created, operation is completed on all elements of the fileset
Questions:		Are all operations executed from userspace on a client (external), or some directly on Lustre via an API?
Issues:

workset

Scenario:		all files with the words "bunny rabbit" are replicated at a dozen remote analysis clusters
Business Goals:		Provide current access to dynamic set of files on a proxy server
Relevant QA's:		Availability
details	Stimulus:	Search results are fed to the API
	Stimulus source:	External search or project directory
	Environment:	Database has recent FS information (e.g. from watching a ChangeLog)
	Artifact:	Fileset, type 1 or type 2
	Response:	Lustre creates an internal representation of the fileset and makes it available for export.
	Response measure:	Fileset is created
Questions:		Is a small time lag acceptable, or must proxies / filesets be absolutely synchronous
Issues:

backup

Scenario:		filesystem must be subdivided into manageable chunks for backup / replication
Business Goals:		User requires particular backup policies on particular sets of files
Relevant QA's:		Feature, Scalability
details	Stimulus:	External app reads all files in a fileset
	Stimulus source:	External HSM or backup application
	Environment:	Client access to a limited, defined list of files
	Artifact:	Fileset, type 2
	Response:	All files in fileset are backed up
	Response measure:	Backup time, minor filesystem load during backup
Questions:		Subdivision of migration work seems like it should be handled by migration architecture; doesn't seem to really have anything to do with filesets
Issues:

Requirements

Dynamic

Search results may be returned slowly, or new files that meet the search criteria may be added to the filesystem. In those cases, it should be possible to add (or remove) items to an existing fileset. The fileset should in turn notify consumers of the fileset. Alternately, some filesets may be defined to be static.

Persistence

The workset case implies a fileset must be persistent across server / client reboots.

Specification

It may be desirable for a remote site to specify a fileset that should be locally proxied (i.e. pull instead of push). A fileset name is probably useful for this. (e.g. a client requests mirror fileset 'bunnyrabbit' on local proxy servers)

Coherence

Files referenced in the fileset must be coherent with the original file. E.g. if a file referenced by a fileset is moved, the fileset should reflect the new file location. If a file in a fileset is deleted, the file should disappear from the fileset. Maybe this can be achieved by having the fileset take appropriate locks on the original files.

Coherence requirements:

 - unlink
 - rename
 - move to a new directory
 - file metadata (access time, perms, owner, etc.)

Note that if changing the above would cause a file to no longer meet the original search criteria that generated that fileset, it is up to the search generator to (eventually) remove it from the fileset. There are two exception to this rule, where the file should be removed from the fileset automatically:

1. unlink
2. move of a file included by virtue of its location in a file tree to a location outside of that tree (see Enumeration below)

Fileset as Object

Depending on the intended use, some filesets may be represented more efficiently than others, or may require different descriptors or methods. Implementing filesets as objects with variable attributes and methods may provide broad but efficient coverage of the range of uses. For example, one common type of fileset may be "a user's home directory", which could be efficiently represented as a single directory fid.

Hashing

When performing an action on large filesets or large numbers of filesets, we must be able to distribute load across multiple servers to insure performant operation. This is true for internal consumers, but perhaps this function should be offloaded to a distributed application for external consumers.

For example, 10,000 filesets are to be replicated independently. A changelog per fileset may not scale well, and instead we may need a scalable algorithm to find the results for each fileset from a global changelog.

ChangeLog

It may be useful to have a per-fileset changelog maintained for audit or replication purposes. A fileset-specific changelog could be used to provide migration/replication-related events specific to the fileset to migration agents. The agents would then use this information e.g. to abort / commence copying a file.

However, maintaining a per-fileset changelog may not scale. At some point, it make make more sense to process a common global changelog.

Multiple Membership

A file may be part of multiple filesets. A type 2 fileset may implicitly include other type 2 filesets. Operations on a file should affect all filesets it belongs to, and vice-versa.

Fileset API

The user API for filesets should include the following functionality:

Start a new fileset
Add items to a fileset
Remove items from a fileset
Delete a fileset
Initiate activity of an internal consumer (e.g. migrate fileset bunny from poolA to poolB)
Provide client access to a fileset (see Client Access below)

Implementation Notes

Enumeration

Fileset enumeration should be handled in two ways:

Type 1. An explicit enumeration of files or directories. Files within directories are not included in the fileset unless explicitly listed as well.
Type 2. Inclusive file trees. All files / subdirectories below enumerated directories are included in the fileset.

We should have provision for using both types of filesets. In fact, with some per-entry flags, we can define "mixed" filesets including both of the above (each entry in a fileset may be type 1 (flat=single file) or type 2 (tree). Perhaps a 3rd type; "not_included" would be a useful definition as well, to specifically exclude a particular subdirectory from a type 2 fileset.

Storage

Permanent fileset definitions would probably be stored on the MDT (as opposed to the MGS) for scalability and namespace-related locking.

UI

Maintenance

The UI for maintaining filesets might reasonably be run through lfs similar to pools:

lfs fileset_new <fileset name> Define a new fileset
lfs fileset_add <fileset name> <options> <filename1> <filename2> ... Add the named files to the fileset; define type 1 or type 2
lfs fileset_remove <fileset name> <filename> Remove the named file from the fileset
lfs fileset_destroy <fileset name> Remove the definition of the fileset

Client Access

For arbitrary user access to the files in a fileset, a mechanism like mount(8) seems like it would provide a clear, simple way to retrieve a fileset. (Command format might be "mount -t lustre mgs://fsname/fileset mntpt")

For type 1 filesets, a hierarchical namespace defined by the files and directories in the fileset would be constructed locally. Directories would all be read/execute-only; a client cannot add new entries into the fileset by creating files in the fileset hierarchy. Regular files would keep their normal access permissions.

For type 2 filesets, the mount point would act exactly like a subtree of the full lustre fs.

References

bug 14168
server changelogs

WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.