Architecture - Fileset
Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.
A user application (or Lustre internal features) may want to perform an action on a very large set of files. Such actions might include migration to slower storage, purging of old files, or replication to a proxy server. A fileset is an efficient representation of these file identifiers (fids).
The definition of any particular fileset is left to an external agent; no search features will be included in Lustre itself (excluding Least Recently Used files, which is probably only efficiently tracked within Lustre). Typically searches for files with particular metadata characteristics will be done a database that mirrors the Lustre file tree via a ChangeLog. The files matching these criteria will be added to a fileset via a Lustre fileset API.
Filesets will generally come in two flavors: arbitrary collections of files, or a full file tree. See Enumeration below.
- an arbitrary subset of files from within a single filesystem's namespace.
- an entity acting on the contents of a fileset
- Internal consumer
- a Lustre internal feature using a fileset (e.g. fileset client mount, maybe replicator, migrator)
- External consumer
- an entity external to Lustre using a fileset. This may be limited to a user of a fileset client mount, and no access to any other representation of a fileset is needed. see Client Access below.
- fileset type, see Enumeration below
|coherence||usability||file modifications are reflected in the fileset (e.g. unlink, rename)|
|permanence||usability, scalability||when filesets are discarded.|
|synchronization||usability||the list of files in the set may change.|
|physiology||scalability||internal representation must be used efficiently|
|hashing||scalability||actions on a fileset may need to be distributed across multiple servers for scalability|
|modification||usability||the contents of a fileset may be modified over time to add or remove items|
|compliance||usability, scalability||delete all files modified in 2002|
|workset||availability||the files in the fileset are available on a remote proxy server|
|backup||scalability||filesystem must be subdivided into manageable chunks for backup / replication|
|Scenario:||delete all files modified in 2002|
|Business Goals:||Provide an API to facilitate filesystem operations based on database search output|
|Relevant QA's:||Usability, scalability|
|details||Stimulus:||The fileset and requested operation are fed to the API|
|Stimulus source:||Compliance policy dictates removal of old files|
|Environment:||Database has recent FS information (from watching a ChangeLog)|
|Artifact:||Fileset, type 1|
|Response:||Lustre performs the requested operation on each of the files in the fileset|
|Response measure:||fileset is created, operation is completed on all elements of the fileset|
|Questions:||Are all operations executed from userspace on a client (external), or some directly on Lustre via an API?|
|Scenario:||all files with the words "bunny rabbit" are replicated at a dozen remote analysis clusters|
|Business Goals:||Provide current access to dynamic set of files on a proxy server|
|details||Stimulus:||Search results are fed to the API|
|Stimulus source:||External search or project directory|
|Environment:||Database has recent FS information (e.g. from watching a ChangeLog)|
|Artifact:||Fileset, type 1 or type 2|
|Response:||Lustre creates an internal representation of the fileset and makes it available for export.|
|Response measure:||Fileset is created|
|Questions:||Is a small time lag acceptable, or must proxies / filesets be absolutely synchronous|
|Scenario:||filesystem must be subdivided into manageable chunks for backup / replication|
|Business Goals:||User requires particular backup policies on particular sets of files|
|Relevant QA's:||Feature, Scalability|
|details||Stimulus:||External app reads all files in a fileset|
|Stimulus source:||External HSM or backup application|
|Environment:||Client access to a limited, defined list of files|
|Artifact:||Fileset, type 2|
|Response:||All files in fileset are backed up|
|Response measure:||Backup time, minor filesystem load during backup|
|Questions:||Subdivision of migration work seems like it should be handled by migration architecture; doesn't seem to really have anything to do with filesets|
Search results may be returned slowly, or new files that meet the search criteria may be added to the filesystem. In those cases, it should be possible to add (or remove) items to an existing fileset. The fileset should in turn notify consumers of the fileset. Alternately, some filesets may be defined to be static.
The workset case implies a fileset must be persistent across server / client reboots.
It may be desirable for a remote site to specify a fileset that should be locally proxied (i.e. pull instead of push). A fileset name is probably useful for this. (e.g. a client requests mirror fileset 'bunnyrabbit' on local proxy servers)
Files referenced in the fileset must be coherent with the original file. E.g. if a file referenced by a fileset is moved, the fileset should reflect the new file location. If a file in a fileset is deleted, the file should disappear from the fileset. Maybe this can be achieved by having the fileset take appropriate locks on the original files.
- unlink - rename - move to a new directory - file metadata (access time, perms, owner, etc.)
Note that if changing the above would cause a file to no longer meet the original search criteria that generated that fileset, it is up to the search generator to (eventually) remove it from the fileset. There are two exception to this rule, where the file should be removed from the fileset automatically:
- 1. unlink
- 2. move of a file included by virtue of its location in a file tree to a location outside of that tree (see Enumeration below)
Fileset as Object
Depending on the intended use, some filesets may be represented more efficiently than others, or may require different descriptors or methods. Implementing filesets as objects with variable attributes and methods may provide broad but efficient coverage of the range of uses. For example, one common type of fileset may be "a user's home directory", which could be efficiently represented as a single directory fid.
When performing an action on large filesets or large numbers of filesets, we must be able to distribute load across multiple servers to insure performant operation. This is true for internal consumers, but perhaps this function should be offloaded to a distributed application for external consumers.
For example, 10,000 filesets are to be replicated independently. A changelog per fileset may not scale well, and instead we may need a scalable algorithm to find the results for each fileset from a global changelog.
It may be useful to have a per-fileset changelog maintained for audit or replication purposes. A fileset-specific changelog could be used to provide migration/replication-related events specific to the fileset to migration agents. The agents would then use this information e.g. to abort / commence copying a file.
However, maintaining a per-fileset changelog may not scale. At some point, it make make more sense to process a common global changelog.
A file may be part of multiple filesets. A type 2 fileset may implicitly include other type 2 filesets. Operations on a file should affect all filesets it belongs to, and vice-versa.
The user API for filesets should include the following functionality:
- Start a new fileset
- Add items to a fileset
- Remove items from a fileset
- Delete a fileset
- Initiate activity of an internal consumer (e.g. migrate fileset bunny from poolA to poolB)
- Provide client access to a fileset (see Client Access below)
Fileset enumeration should be handled in two ways:
- Type 1. An explicit enumeration of files or directories. Files within directories are not included in the fileset unless explicitly listed as well.
- Type 2. Inclusive file trees. All files / subdirectories below enumerated directories are included in the fileset.
We should have provision for using both types of filesets. In fact, with some per-entry flags, we can define "mixed" filesets including both of the above (each entry in a fileset may be type 1 (flat=single file) or type 2 (tree). Perhaps a 3rd type; "not_included" would be a useful definition as well, to specifically exclude a particular subdirectory from a type 2 fileset.
Permanent fileset definitions would probably be stored on the MDT (as opposed to the MGS) for scalability and namespace-related locking.
The UI for maintaining filesets might reasonably be run through lfs similar to pools:
- lfs fileset_new <fileset name> Define a new fileset
- lfs fileset_add <fileset name> <options> <filename1> <filename2> ... Add the named files to the fileset; define type 1 or type 2
- lfs fileset_remove <fileset name> <filename> Remove the named file from the fileset
- lfs fileset_destroy <fileset name> Remove the definition of the fileset
For arbitrary user access to the files in a fileset, a mechanism like mount(8) seems like it would provide a clear, simple way to retrieve a fileset. (Command format might be "mount -t lustre mgs://fsname/fileset mntpt")
For type 1 filesets, a hierarchical namespace defined by the files and directories in the fileset would be constructed locally. Directories would all be read/execute-only; a client cannot add new entries into the fileset by creating files in the fileset hierarchy. Regular files would keep their normal access permissions.
For type 2 filesets, the mount point would act exactly like a subtree of the full lustre fs.