WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Subsystem Map
libcfs | |
Summary | Libcfs provides an API comprising fundamental primitives and subsystems - e.g. process management and debugging support which is used throughout LNET, Lustre, and associated utilities. This API defines a portable runtime environment that is implemented consistently on all supported build targets. |
Code |
lustre/lnet/libcfs/**/*.[ch] |
lnet | |
Summary | LNET = the Lustre Networking subsystem.
See the Lustre Networking white paper for details. |
Code |
lustre/lnet/**/*.[ch] |
ptlrpc | |
Summary | Ptlrpc implements Lustre communications over LNET.
All communication between Lustre processes are handled by RPCs, in which a request is sent to an advertised service, and the service processes the request and returns a reply. Note that a service may be offered by any Lustre process - e.g. the OST service on an OSS processes I/O requests and the AST service on a client processes notifications of lock conflicts. The initial request message of an RPC is special - it is received into the first available request buffer at the destination. All other communications involved in an RPC are like RDMAs - the peer targets them specifically. For example, in a bulk read, the OSC posts reply and bulk buffers and sends descriptors for them (the LNET matchbits used to post them) in the RPC request. After the server has received the request, it GETs or PUTs the bulk data and PUTs the RPC reply directly. Ptlrpc ensures all resources involved in an RPC are freed in finite time. If the RPC does not complete within a timeout, all buffers associated with the RPC must be unlinked. These buffers are still accessible to the network until their completion events have been delivered. |
Code |
lustre/ptlrpc/*.[ch] lustre/ldlm/ldlm_lib.c |
llog | |
Summary |
Overview LLog is the generic logging mechanism in Lustre. It allows Lustre to store records in an appropriate format and access them later using a reasonable API. LLog is used is various cases. The main LLog use cases are the following:
General design Each llog type has two main parts:
|
Code |
obdclass/llog.c obdclass/llog_cat.c obdclass/llog_lvfs.c obdclass/llog_obd.c obdclass/llog_swab.c obdclass/llog_test.c lov/lov_log.c ptlrpc/llog_client.c ptlrpc/llog_server.c ptlrpc/llog_net.c |
obdclass | |
Summary | The obdclass code is generic Lustre configuration and device handling. Different functional parts of the Lustre code are split into obd devices which can be configured and connected in various ways to form a server or client filesystem.
Several examples of obd devices include:
The obdclass code provides services used by all Lustre devices for configuration, memory allocation, generic hashing, kernel interface routines, random number generation, etc. |
Code |
lustre/obdclass/class_hash.c - scalable hash code for imports lustre/obdclass/class_obd.c - base device handling code lustre/obdclass/debug.c - helper routines for dumping data structs lustre/obdclass/genops.c - device allocation/configuration/connection lustre/obdclass/linux-module.c - linux kernel module handling lustre/obdclass/linux-obdo.c - pack/unpack obdo and other IO structs lustre/obdclass/linux-sysctl.c - /proc/sys configuration parameters lustre/obdclass/lprocfs_status.c - /proc/fs/lustre configuration/stats, helpers lustre/obdclass/lustre_handles.c - wire opaque pointer handlers lustre/obdclass/lustre_peer.c - peer target identification by UUID lustre/obdclass/obd_config.c - configuration file parsing lustre/obdclass/obd_mount.c - server filesystem mounting lustre/obdclass/obdo.c - more obdo handling helpers lustre/obdclass/statfs_pack.c - statfs helpers for wire pack/unpack lustre/obdclass/uuid.c - UUID pack/unpack lustre/lvfs/lvfs_common.c - kernel interface helpers lustre/lvfs/lvfs_darwin.c - darwin kernel helper routines lustre/lvfs/lvfs_internal.h - lvfs internal function prototypes lustre/lvfs/lvfs_lib.c - statistics lustre/lvfs/lvfs_linux.c - linux kernel helper routines lustre/lvfs/lvfs_userfs.c - userspace helper routines lustre/lvfs/prng.c - long period pseudo-random number generator lustre/lvfs/upcall_cache.c - supplementary group upcall for MDS |
luclass | |
Summary | luclass is a body of data-type definitions and functions implementing support for a layered object, that is an entity where every layer in the Lustre device stack (both data and meta-data, and both client and server side) can maintain its own private state, and modify a behavior of a compound object in a systematic way.
Specifically, data-types are introduced, representing a device type (struct lu_device_type, layer in the Lustre stack), a device (struct lu_device, a specific instance of the type), and object (struct lu_object). Following lu_object functionality is implemented by a generic code:
In addition to objects and devices, luclass includes lu_context, which is a way to efficiently allocate space, without consuming stack space. luclass design is specified in the MD API DLD. |
Code |
include/lu_object.h obdclass/lu_object.c |
ldlm | |
Summary | The Lustre Distributed Lock Manager (LDLM) is the Lustre locking infrastructure; it handles locks between clients and servers and locks local to a node. Different kinds of locks are available with different properties. Also as a historic heritage, ldlm happens to have some of the generic connection service code (both server and client). |
Code |
interval_tree.c - this is used by extent locks to maintain interval trees (bug 11300). l_lock.c - resourse locking primitives. ldlm_extent.c - extents locking code used for locking regions inside objects. ldlm_flock.c - bsd and posix locking lock types. ldlm_inodebits.c - inodebis locks used for metadata locking. ldlm_lib.c - target and client connecting/reconnecting/recovery code. Does not really belong to ldlm, but is historically placed there. Should be in ptlrpc instead. ldlm_lock.c - this source file mostly has functions dealing with struct. ldlm_lock ldlm_lockd.c - functions that imply replying to incoming lock-related rpcs (that could be both on server (lock enq/cancel/...) and client (ast handling)). ldlm_plain.c - plain locks, predecessor to inodebits locks; not widely used now. ldlm_pool.c - pools of locks, related to dynamic lrus and freeing locks on demand. ldlm_request.c - collection of functions to work with locks based handles as opposed to lock structures themselves. ldlm_resource.c - functions operating on namespaces and lock resources. include/lustre_dlm.h - important defines and declarations for ldlm. |
fids | |
Summary | FID is unique object identifier in cluster since 1.7. It has few properties, main of them are the following:
FID consists of 3 fields:
|
Code |
fid/fid_request.c fid/fid_lib.c fld/*.[ch] |
seq | |
Summary | Overview
Sequence management is a basic mechanism in new MDS server which is related to managing FIDs. FID is an unique object identifier in Lustre starting from version 1.7. All FIDs are organized into sequences. One sequence is number of FIDs. Sequences are granted/allocated to clients by servers. FIDs are allocated by clients inside granted sequence. All FIDs inside one sequence live on same MDS server and as such are one "migration unit" and one "indexing unit", meaning that FLD (FIDs Location Database) indexes them all using one sequence and thus has only one mapping entry for all FIDs in sequence. Please read section devoted to FIDs bellow in the root table to find more info on FLD service and FIDs. A sequence has the limit of FIDs to be allocated in it. When this limit is reached, new sequence is allocated. Upon disconnect, server allocates new sequence to the client when it comes back. Previously used sequence is abandoned even if it was not exhausted. Sequences are valuable resource but in the case of recovery, using new sequence makes things easier and also allows to group FIDs and objects by working sessions, new connection - new sequence. Code description Server side code is divided into two parts:
Client side code allocates new sequences from granted meta-sequence. When meta-sequence is exhausted, new one is allocated on server and sent to the client. Client code consists of API for working with both server side parts, not only with sequence manager as all servers need to talk to sequence controller, they also use client API for this. One important part of client API is FIDs allocation. New FID is allocated in currently granted sequence until sequence is exhausted. |
Code |
fid/fid_handler.c - server side sequence management code; fid/fid_request.c - client side sequence management code; fid/fid_lib.c - fids related miscellaneous stuff. |
mountconf | |
Summary | MountConf is how servers and clients are set up, started, and configured. A MountConf usage document is here.
The major subsystems are the MGS, MGC, and the userspace tools mount.lustre and mkfs.lustre. The basic idea is:
|
Code |
MountConf file areas: lustre/mgs/* lustre/mgc/* lustre/obdclass/obd_mount.c lustre/utils/mount_lustre.c lustre/utils/mkfs_lustre.c |
liblustre | |
Summary | Liblustre is a userspace library, used along with libsysio (developed by Sandia), that allows Lustre usage just by linking (or ld_preload'ing) applications with it. Liblustre does not require any kernel support. It is also used on old Cray XT3 machines (and not so old, in the case of Sandia), where all applications are just linked with the library and loaded into memory as the only code to run. Liblustre does not support async operations of any kind due to a lack of interrupts and other notifiers from lower levels to Lustre. Liblustre includes another set of LNDs that are able to work from userspace. |
Code |
dir.c - directory operations file.c - file handling operations (like open) llite_lib.c - general support (init/cleanp/parse options) lutil.c - supplementary code to get IP addresses and init various structures needed to emulate the normal Linux process from other layers' perspective. namei.c - metadata operations code. rw.c - I/O code, including read/write super.c - "superblock" operation - mounting/umounting, inode operations.tests - directory with liblustre-specific tests. |
echo client/server | |
Summary | The echo_client and obdecho are OBD devices which help testing and performance measurement.
They were implemented originally for network testing - obdecho can replace obdfilter and echo_client can exercise any downstream configurations. They are normally used in the following configurations:
|
Code |
lustre/obdecho/ |
client vfs | |
Summary | The client VFS interface, also called llite, is the bridge between the Linux kernel and the underlying Lustre infrastructure represented by the LOV, MDC, and LDLM subsystems. This includes mounting the client filesystem, handling name lookups, starting file I/O, and handling file permissions.
The Linux VFS interface shares a lot in common with the liblustre interface, which is used in the Catamount environment; as of yet, the code for these two subsystems is not common and contains a lot of duplication. |
Code |
lustre/llite/dcache.c - Interface with Linux dentry cache/intents lustre/llite/dir.c - readdir handling, filetype in dir, dir ioctl lustre/llite/file.c - File handles, file ioctl, DLM extent locks lustre/llite/llite_close.c - File close for opencache lustre/llite/llite_internal.h - Llite internal function prototypes, structures lustre/llite/llite_lib.c - Majority of request handling, client mount lustre/llite/llite_mmap.c - Memory-mapped I/O lustre/llite/llite_nfs.c - NFS export from clients lustre/llite/lloop.c - Loop-like block device export from object lustre/llite/lproc_llite.c - /proc interface for tunables, statistics lustre/llite/namei.c - Filename lookup, intent handling lustre/llite/rw24.c - Linux 2.4 IO handling routines lustre/llite/rw26.c - Linux 2.6 IO handling routines lustre/llite/rw.c - Linux generic IO handling routines lustre/llite/statahead.c - Directory statahead for "ls -l" and "rm -r" lustre/llite/super25.c - Linux 2.6 VFS file method registration lustre/llite/super.c - Linux 2.4 VFS file method registration lustre/llite/symlink.c - Symbolic links lustre/llite/xattr.c - User-extended attributes |
| colspan="2" valign="top |
libcfs
|-
| Summary
| Libcfs provides an API comprising fundamental primitives and subsystems - e.g. process management and debugging support which is used throughout LNET, Lustre, and associated utilities. This API defines a portable runtime environment that is implemented consistently on all supported build targets.
|-
| Code
|
lustre/lnet/libcfs/**/*.[ch]
|}
lnet | |
Summary | LNET = the Lustre Networking subsystem.
See the Lustre Networking white paper for details. |
Code |
lustre/lnet/**/*.[ch] |
ptlrpc | |
Summary | Ptlrpc implements Lustre communications over LNET.
All communication between Lustre processes are handled by RPCs, in which a request is sent to an advertised service, and the service processes the request and returns a reply. Note that a service may be offered by any Lustre process - e.g. the OST service on an OSS processes I/O requests and the AST service on a client processes notifications of lock conflicts. The initial request message of an RPC is special - it is received into the first available request buffer at the destination. All other communications involved in an RPC are like RDMAs - the peer targets them specifically. For example, in a bulk read, the OSC posts reply and bulk buffers and sends descriptors for them (the LNET matchbits used to post them) in the RPC request. After the server has received the request, it GETs or PUTs the bulk data and PUTs the RPC reply directly. Ptlrpc ensures all resources involved in an RPC are freed in finite time. If the RPC does not complete within a timeout, all buffers associated with the RPC must be unlinked. These buffers are still accessible to the network until their completion events have been delivered. |
Code |
lustre/ptlrpc/*.[ch] lustre/ldlm/ldlm_lib.c |
llog | |
Summary |
Overview LLog is the generic logging mechanism in Lustre. It allows Lustre to store records in an appropriate format and access them later using a reasonable API. LLog is used is various cases. The main LLog use cases are the following:
General design Each llog type has two main parts:
|
Code |
obdclass/llog.c obdclass/llog_cat.c obdclass/llog_lvfs.c obdclass/llog_obd.c obdclass/llog_swab.c obdclass/llog_test.c lov/lov_log.c ptlrpc/llog_client.c ptlrpc/llog_server.c ptlrpc/llog_net.c |
obdclass | |
Summary | The obdclass code is generic Lustre configuration and device handling. Different functional parts of the Lustre code are split into obd devices which can be configured and connected in various ways to form a server or client filesystem.
Several examples of obd devices include:
The obdclass code provides services used by all Lustre devices for configuration, memory allocation, generic hashing, kernel interface routines, random number generation, etc. |
Code |
lustre/obdclass/class_hash.c - scalable hash code for imports lustre/obdclass/class_obd.c - base device handling code lustre/obdclass/debug.c - helper routines for dumping data structs lustre/obdclass/genops.c - device allocation/configuration/connection lustre/obdclass/linux-module.c - linux kernel module handling lustre/obdclass/linux-obdo.c - pack/unpack obdo and other IO structs lustre/obdclass/linux-sysctl.c - /proc/sys configuration parameters lustre/obdclass/lprocfs_status.c - /proc/fs/lustre configuration/stats, helpers lustre/obdclass/lustre_handles.c - wire opaque pointer handlers lustre/obdclass/lustre_peer.c - peer target identification by UUID lustre/obdclass/obd_config.c - configuration file parsing lustre/obdclass/obd_mount.c - server filesystem mounting lustre/obdclass/obdo.c - more obdo handling helpers lustre/obdclass/statfs_pack.c - statfs helpers for wire pack/unpack lustre/obdclass/uuid.c - UUID pack/unpack lustre/lvfs/lvfs_common.c - kernel interface helpers lustre/lvfs/lvfs_darwin.c - darwin kernel helper routines lustre/lvfs/lvfs_internal.h - lvfs internal function prototypes lustre/lvfs/lvfs_lib.c - statistics lustre/lvfs/lvfs_linux.c - linux kernel helper routines lustre/lvfs/lvfs_userfs.c - userspace helper routines lustre/lvfs/prng.c - long period pseudo-random number generator lustre/lvfs/upcall_cache.c - supplementary group upcall for MDS |
luclass | |
Summary | luclass is a body of data-type definitions and functions implementing support for a layered object, that is an entity where every layer in the Lustre device stack (both data and meta-data, and both client and server side) can maintain its own private state, and modify a behavior of a compound object in a systematic way.
Specifically, data-types are introduced, representing a device type (struct lu_device_type, layer in the Lustre stack), a device (struct lu_device, a specific instance of the type), and object (struct lu_object). Following lu_object functionality is implemented by a generic code:
In addition to objects and devices, luclass includes lu_context, which is a way to efficiently allocate space, without consuming stack space. luclass design is specified in the MD API DLD. |
Code |
include/lu_object.h obdclass/lu_object.c |
ldlm | |
Summary | The Lustre Distributed Lock Manager (LDLM) is the Lustre locking infrastructure; it handles locks between clients and servers and locks local to a node. Different kinds of locks are available with different properties. Also as a historic heritage, ldlm happens to have some of the generic connection service code (both server and client). |
Code |
interval_tree.c - this is used by extent locks to maintain interval trees (bug 11300). l_lock.c - resourse locking primitives. ldlm_extent.c - extents locking code used for locking regions inside objects. ldlm_flock.c - bsd and posix locking lock types. ldlm_inodebits.c - inodebis locks used for metadata locking. ldlm_lib.c - target and client connecting/reconnecting/recovery code. Does not really belong to ldlm, but is historically placed there. Should be in ptlrpc instead. ldlm_lock.c - this source file mostly has functions dealing with struct. ldlm_lock ldlm_lockd.c - functions that imply replying to incoming lock-related rpcs (that could be both on server (lock enq/cancel/...) and client (ast handling)). ldlm_plain.c - plain locks, predecessor to inodebits locks; not widely used now. ldlm_pool.c - pools of locks, related to dynamic lrus and freeing locks on demand. ldlm_request.c - collection of functions to work with locks based handles as opposed to lock structures themselves. ldlm_resource.c - functions operating on namespaces and lock resources. include/lustre_dlm.h - important defines and declarations for ldlm. |
fids | |
Summary | FID is unique object identifier in cluster since 1.7. It has few properties, main of them are the following:
FID consists of 3 fields:
|
Code |
fid/fid_request.c fid/fid_lib.c fld/*.[ch] |
seq | |
Summary | Overview
Sequence management is a basic mechanism in new MDS server which is related to managing FIDs. FID is an unique object identifier in Lustre starting from version 1.7. All FIDs are organized into sequences. One sequence is number of FIDs. Sequences are granted/allocated to clients by servers. FIDs are allocated by clients inside granted sequence. All FIDs inside one sequence live on same MDS server and as such are one "migration unit" and one "indexing unit", meaning that FLD (FIDs Location Database) indexes them all using one sequence and thus has only one mapping entry for all FIDs in sequence. Please read section devoted to FIDs bellow in the root table to find more info on FLD service and FIDs. A sequence has the limit of FIDs to be allocated in it. When this limit is reached, new sequence is allocated. Upon disconnect, server allocates new sequence to the client when it comes back. Previously used sequence is abandoned even if it was not exhausted. Sequences are valuable resource but in the case of recovery, using new sequence makes things easier and also allows to group FIDs and objects by working sessions, new connection - new sequence. Code description Server side code is divided into two parts:
Client side code allocates new sequences from granted meta-sequence. When meta-sequence is exhausted, new one is allocated on server and sent to the client. Client code consists of API for working with both server side parts, not only with sequence manager as all servers need to talk to sequence controller, they also use client API for this. One important part of client API is FIDs allocation. New FID is allocated in currently granted sequence until sequence is exhausted. |
Code |
fid/fid_handler.c - server side sequence management code; fid/fid_request.c - client side sequence management code; fid/fid_lib.c - fids related miscellaneous stuff. |
mountconf | |
Summary | MountConf is how servers and clients are set up, started, and configured. A MountConf usage document is here.
The major subsystems are the MGS, MGC, and the userspace tools mount.lustre and mkfs.lustre. The basic idea is:
|
Code |
MountConf file areas: lustre/mgs/* lustre/mgc/* lustre/obdclass/obd_mount.c lustre/utils/mount_lustre.c lustre/utils/mkfs_lustre.c |
liblustre | |
Summary | Liblustre is a userspace library, used along with libsysio (developed by Sandia), that allows Lustre usage just by linking (or ld_preload'ing) applications with it. Liblustre does not require any kernel support. It is also used on old Cray XT3 machines (and not so old, in the case of Sandia), where all applications are just linked with the library and loaded into memory as the only code to run. Liblustre does not support async operations of any kind due to a lack of interrupts and other notifiers from lower levels to Lustre. Liblustre includes another set of LNDs that are able to work from userspace. |
Code |
dir.c - directory operations file.c - file handling operations (like open) llite_lib.c - general support (init/cleanp/parse options) lutil.c - supplementary code to get IP addresses and init various structures needed to emulate the normal Linux process from other layers' perspective. namei.c - metadata operations code. rw.c - I/O code, including read/write super.c - "superblock" operation - mounting/umounting, inode operations.tests - directory with liblustre-specific tests. |
echo client/server | |
Summary | The echo_client and obdecho are OBD devices which help testing and performance measurement.
They were implemented originally for network testing - obdecho can replace obdfilter and echo_client can excercise any downstream configurations. They are normally used in the following configurations...
|
Code |
lustre/obdecho/ |
client vfs | |
Summary | The client VFS interface, also called llite is the bridge between the Linux kernel and the underlying Lustre infrastructure represented by the LOV, MDC, and LDLM subsystems. This includes mounting the client filesystem, handling name lookups, starting file IO, and handling file permissions.
The Linux VFS interface shares a lot in common with the liblustre interface, which is used in the Catamount environment, but as yet the code for these two subsystems is not common and contains a lot of duplication. |
Code |
lustre/llite/dcache.c - Interface with Linux dentry cache/intents lustre/llite/dir.c - Readdir handling, filetype in dir, dir ioctl lustre/llite/file.c - File handles, file ioctl, DLM extent locks lustre/llite/llite_close.c - File close for opencache lustre/llite/llite_internal.h - Llite internal function prototypes, structures lustre/llite/llite_lib.c - Majority of request handling, client mount lustre/llite/llite_mmap.c - Memory-mapped IO lustre/llite/llite_nfs.c - NFS export from clients lustre/llite/lloop.c - Loop-like block device export from object lustre/llite/lproc_llite.c - /proc interface for tunables, statistics lustre/llite/namei.c - Filename lookup, intent handling lustre/llite/rw24.c - Linux 2.4 IO handling routines lustre/llite/rw26.c - Linux 2.6 IO handling routines lustre/llite/rw.c - Linux generic IO handling routines lustre/llite/statahead.c - Directory statahead for "ls -l" and "rm -r" lustre/llite/super25.c - Linux 2.6 VFS file method registration lustre/llite/super.c - Linux 2.4 VFS file method registration lustre/llite/symlink.c - Symbolic links lustre/llite/xattr.c - User extended attributes |
client vm | |
Summary | Client code interacts with VM/MM subsystems of the host OS kernel to cache data (in the form of pages), and to react to various memory-related events, like memory pressure.
Two key components of this interaction are:
|
Code |
This describes the next generation Lustre client I/O code, which is expected to appear in Lustre 2.0. Code location is not finalized. cfs_page_t interface is defined and implemented in:
Generic part of cl-page will be located in:
Linux kernel implementation is currently in:
|
client I/O | |
Summary | Client I/O is a group of interfaces used by various layers of a Lustre client to manage file data (as opposed to metadata). Main functions of these interfaces are:
Client I/O subsystem interacts with VFS, VM/MM, DLM, and PTLRPC. Client I/O interfaces are based on the following data-types:
|
Code |
This describes the next generation Lustre client I/O code. The code location is not finalized. The generic part is at:
Layer-specific methods are currently at
where LAYER is one of llite, lov, osc. |
client metadata | |
Summary | The Meta Data Client (MDC) is the client-side interface for all operations related to the Meta Data Server MDS. In current configurations there is a single MDC on the client for each filesystem mounted on the client. The MDC is responsible for enqueueing metadata locks (via LDLM), and packing and unpacking messages on the wire.
In order to ensure a recoverable system, the MDC is limited at the client to only a single filesystem-modifying operation in flight at one time. This includes operations like create, rename, link, unlink, setattr. For non-modifying operations like getattr and statfs the client can multiple RPC requests in flight at one time, limited by a tunable on the client, to avoid overwhelming the MDS. |
Code |
lustre/mdc/lproc_mdc.c - /proc interface for stats/tuning lustre/mdc/mdc_internal.h - Internal header for prototypes/structs lustre/mdc/mdc_lib.c - Packing of requests to MDS lustre/mdc/mdc_locks.c - Interface to LDLM and client VFS intents lustre/mdc/mdc_reint.c - Modifying requests to MDS lustre/mdc/mdc_request.c - Non-modifying requests to MDS |
client lmv | |
Summary | LMV is a module which implements CMD client-side abstraction device. It allows client to work with many MDSes without any changes in Llite module and even without knowing that CMD is supported. Llite just translates Linux VFS requests into metadata API calls and forwards them down to the stack.
As LMV needs to know which MDS to talk for any particular operation, it uses some new services introduced in CMD3 times. They are:
LMV supports split objects. This means that for every split directory it creates special in-memory structure which contains information about object stripes. This includes MDS number, FID, etc. All consequent operations use these structures for determining what MDS should be used for particular action (create, take lock, etc). |
Code |
lmv/*.[ch] |
lov | |
Summary | The LOV device presents a single virtual device interface to upper layers (llite, liblustre, MDS). The LOV code is responsible for splitting of requests to the correct OSTs based on striping information (lsm), and the merging of the replies to a single result to pass back to the higher layer.
It calculates per-object membership and offsets for read/write/truncate based on the virtual file offset passed from the upper layer. It is also responsible for splitting the locking across all servers as needed. The LOV on the MDS is also involved in object allocation. |
Code |
lustre/lov/lov_ea.c - Striping attributes pack/unpack/verify lustre/lov/lov_internal.h - Header for internal function prototypes/structs lustre/lov/lov_merge.c - Struct aggregation from many objects lustre/lov/lov_obd.c - Base LOV device configuration lustre/lov/lov_offset.c - File offset and object calculations lustre/lov/lov_pack.c - Pack/unpack of striping attributes lustre/lov/lov_qos.c - Object allocation for different OST loading lustre/lov/lov_request.c - Request handling/splitting/merging lustre/lov/lproc_lov.c - /proc/fs/lustre/lov tunables/statistics |
quota | |
Summary | Quotas allow a system administrator to limit the maximum amount of disk space a user or group can consume. Quotas are set by root, and can be specified for individual users and/or groups. Quota limits can be set on both blocks and inodes.
Lustre quota enforcement differs from standard Linux quota support in several ways:
|
Code |
Quota core:
Interactions with the underlying ldiskfs filesystem:
Hooks under:
Regression tests:
|
security-gss | |
Summary | The secure ptlrpc (sptlrpc) is a framework inside of ptlrpc layer. It act upon both side of each ptlrpc connection between 2 nodes, doing transformation on every RPC message, turn this into a secure communication link. By using GSS, sptlrpc is able to support multiple authentication mechanism, but currently we only support Kerberos 5.
Supported security flavors:
|
Code |
lustre/ptlrpc/sec*.c lustre/ptlrpc/gss/ lustre/utils/gss/ |
security-capa | |
Summary | Capabilities are pieces of data generated by one service - the master service, passed to a client and presented by the client to another service - the slave service, to authorize an action. It is independent from the R/W/X permission based file operation authorization. |
Code |
lustre/llite/llite_capa.c lustre/mdt/mdt_capa.c lustre/obdfilter/filter_capa.c lustre/obdclass/capa.c lustre/include/lustre_capa.h |
security-identity | |
Summary | Lustre identity is a miscellaneous framework for lustre file operation authorization. Generally, it can be divided into two parts:
|
Code |
lustre/llite/llite_rmtacl.c lustre/mdt/mdt_identity.c lustre/mdt/mdt_idmap.c lustre/mdt/mdt_lib.c lustre/obdclass/idmap.c lustre/utils/l_getidentity.c lustre/include/lustre_idmap.h lustre/llite/xattr.c lustre/mdt/mdt_xattr.c lustre/cmm/cmm_object.c lustre/cmm/mdc_object.c lustre/mdd/mdd_permission.c lustre/mdd/mdd_object.c lustre/mdd/mdd_dir.c lustre/obdclass/acl.c lustre/include/lustre_eacl.h |
OST | |
Summary | OST is a very thin layer of data server. Its main responsibility is to translate RPCs to local calls of obdfilter, i.e. RPC parsing. |
Code |
lustre/ost/*.[ch] |
ldiskfs | |
Summary | ldiskfs is local disk filesystem built on top of ext3. it adds extents support to ext3, multiblock allocator, multimount protection and iopen features. |
Code |
There is no ldiskfs code in CVS. Instead, ext3 code is copied from the kernel, the patches are applied and then whole thing gets renamed to ldiskfs. For details, go to ldiskfs/. |
fsfilt | |
Summary | The fsfilt layer abstracts the backing filesystem specifics away from the obdfilter and mds code in 1.4 and 1.6 lustre. This avoids linking the obdfilter and mds directly against the filesystem module, and in theory allows different backing filesystems, but in practise this was never implemented. In Lustre 1.8 and later this code is replaced by the OSD layer.
There is a core fsfilt module which can auto-load the backing filesystem type based on the type specified during configuration. This loads a filesystem-specific fsfilt_{fstype} module with a set of methods for that filesystem. There are a number of different kinds of methods:
|
Code |
The files used for the fsfilt code reside in: lustre/lvfs/fsfilt.c - interface used by obdfilter/MDS, module autoloading lustre/lvfs/fsfilt_ext3.c - interface to ext3/ldiskfs filesystem The fsfilt_ldiskfs.c file is auto-generated from fsfilt_ext3.c in lustre/lvfs/autoMakefile.am using sed to replace instances of ext3 and EXT3 with ldiskfs, and a few other replacements to avoid symbol clashes. |
ldiskfs OSD | |
Summary | ldiskfs-OSD is an implementation of dt_{device,object} interfaces on top of (modified) ldiskfs file-system.
It uses standard ldiskfs/ext3 code to do file I/O. It supports 2 types of indices (in the same file system):
ldiskfs-OSD uses read-write mutex to serialize compound operations. |
Code |
lustre/include/dt_object.h lustre/osd/osd_internal.h lustre/osd/osd_handler.c |
DMU OSD | |
Summary | This is another implementation of the OSD API for userspace DMU. It uses DMU's ZAP for indices. |
Code |
dmu-osd/*.[ch] in b_hd_dmu branch |
DMU | |
Summary | The DMU is one of the layers in Sun's ZFS filesystem which is responsible for presenting a transactional object store to its consumers. It is used as Lustre's backend object storage mechanism for the userspace MDSs and OSSs.
The ZFS community page has a source tour which is useful as an introduction to the several ZFS layers: ZFS source There are many useful resources in that community page. For reference, here's a list of DMU features:
|
Code |
src/ -> source code src/cmd/ -> ZFS/DMU related programs src/cmd/lzfs/ -> lzfs, the filesystem administration utility src/cmd/lzpool/ -> lzpool, the pool administration utility src/cmd/lzdb/ -> lzdb, the zfs debugger src/cmd/lztest/ -> lztest, the DMU test suite src/cmd/lzfsd/ -> lzfsd, the ZFS daemon src/lib/ -> Libraries src/lib/port/ -> Portability layer src/lib/solcompat/ -> Solaris -> Linux portability layer (deprecated, use libport instead) src/lib/avl/ -> AVL trees, used in many places in the DMU code src/lib/nvpair/ -> Name-value pairs, used in many places in the DMU code src/lib/umem/ -> Memory management library src/lib/zpool/ -> Main ZFS/DMU code src/lib/zfs/ -> ZFS library used by the lzfs and lzpool utilities src/lib/zfscommon/ -> Common ZFS code between libzpool and libzfs src/lib/ctl/ -> Userspace control/management interface src/lib/udmu/ -> Lustre uDMU code (thin library around the DMU) src/scons/ -> local copy of SCons tests/regression/ -> Regression tests. misc/ -> miscellaneous files/scripts |
obdfilter | |
Summary | obdfilter is a core component of OST (data server) making underlying disk filesystem a part of distributed system:
|
Code |
lustre/obdfilter/*.[ch] |
MDS | |
Summary | The MDS service in Lustre 1.4 and 1.6 is a monolithic body of code that provides multiple functions related to filesystem metadata. It handles the incoming RPCs and service threads for metadata operations (create, rename, unlink, readdir, etc), interfaces with the Lustre lock manager (DLM), and also manages the underlying filesystem (via the interface fsfilt interface).
The MDS is the primary point of access control for clients, allocates the objects belonging to a file (in conjunction with LOV) and passing that information to the clients when they access a file. The MDS is also ultimately responsible for deleting objects on the OSTs, either by passing object information for destroy to the client removing the last link or open reference on a file and having the client do it, or by destroying the objects on the OSTs itself in case the client fails to do so. In the 1.8 and later releases, the functionality provided by the MDS code has been split into multiple parts (MDT, MDD, OSD) in order to allow stacking of the metadata devices for clustered metadata. |
Code |
lustre/mds/commit_confd.c lustre/mds/handler.c - RPC request handler lustre/mds/lproc_mds.c - /proc interface for stats/control lustre/mds/mds_fs.c - Mount/configuration of underlying filesystem lustre/mds/mds_internal.h - Header for internal declarations lustre/mds/mds_join.c - Handle join_file operations lustre/mds/mds_lib.c - Unpack of wire structs from requests lustre/mds/mds_log.c - Lustre log interface (llog) for unlink/setattr lustre/mds/mds_lov.c - Interface to LOV for create and orphan lustre/mds/mds_open.c - File open/close handling lustre/mds/mds_reint.c - Reintegration of changes made by clients lustre/mds/mds_unlink_open.c - Handling of open-unlinked files (PENDING dir) lustre/mds/mds_xattr.c - User-extended attribute handling |
MDT | |
Summary | MDT stands for MetaData Target. This is a top-most layer in the MD server device stack. Responsibility of MDT are all this networking, as far as meta-data are concerned:
Theoretically MDT is an optional layer: completely local Lustre setup, with single mete-data server, and locally mounted client can exist without MDT (and still use networking for non-metadata access). |
Code |
lustre/mdt/mdt.mod.c lustre/mdt/mdt_capa.c lustre/mdt/mdt_handler.c lustre/mdt/mdt_identity.c lustre/mdt/mdt_idmap.c lustre/mdt/mdt_internal.h lustre/mdt/mdt_lib.c lustre/mdt/mdt_lproc.c lustre/mdt/mdt_open.c lustre/mdt/mdt_recovery.c lustre/mdt/mdt_reint.c lustre/mdt/mdt_xattr.c |
CMM | |
Summary | ===Overview===
The CMM is a new layer in the MDS which cares about all clustered metadata issues and relationships. The CMM does the following:
CMM functionalityCMM chooses all servers involved in operation and sends depended request if needed. The calling of remote MDS is a new feature related to the CMD. CMM mantain the list of MDC to connect with all other MDS. ObjectsThe CMM can allocate two types of object - local and remote. Remote object can occur during metadata operations with more than one object involved. Such operation is called as cross-ref operation. |
Code |
lustre/cmm |
MDD | |
Summary | MDD is metadata layer in the new MDS stack, which is the only layer operating the metadata in MDS. The implementation is similar as VFS meta operation but based on OSD storage. MDD API is currently only used in new MDS stack, called by CMM layer.
In theory, MDD should be local metadata layer, but for compatibility with old MDS stack and reuse some mds codes(llog and lov), a mds device is created and connected to the mdd. So the llog and lov in mdd still use original code through this temporary mds device. And it will be removed when the new llog and lov layer in the new MDS stack are implemented. |
Code |
lustre/lustre/mdd/ |
recovery | |
Summary |
OverviewClient recovery starts in case when no server reply is received within given timeout or when server tells to client that it is not connected (client was evicted on server earlier for whatever reason). The recovery consists of trying to connect to server and then step through several recovery states during which various client-server data is synchronized, namely all requests that were already sent to server but not yet confirmed as received and DLM locks. Should any problems arise during recovery process (be it a timeout or server’s refuse to recognise client again), the recovery is restarted from the very beginning. During recovery all new requests to the server are not sent to the server, but added to special delayed requests queue that is then sent once if recovery completes succesfully. Replay and Resend
|
Code |
Recovery code is scattered through all code almost. Though important code: ldlm/ldlm_lib.c - generic server recovery code ptlrpc/ - client recovery code |
version recovery | |
Summary |
Version Based RecoveryThis recovery technique is based on using versions of objects (inodes) to allow clients to recover later than ordinary server recovery timeframe.
|
Code |
Recovery code is scattered through all code almost. Though important code: ldlm/ldlm_lib.c - generic server recovery code ptlrpc/ - client recovery code |
IAM | |
Summary | IAM stands for 'Index Access Module': it is an extension to the ldiskfs directory code, adding generic indexing capability.
File system directory can be thought of as an index mapping keys, which are strings (file names), to the records which are integers (inode numbers). IAM removes limitations on key and record size and format, providing an abstraction of a transactional container, mapping arbitrary opaque keys into opaque records. Implementation notes:
IAM is used by ldiskfs-OSD to implement dt_index_operations interface. |
Code |
lustre/ldiskfs/kernel_patches/patches/ext3-iam-2.6-sles10.patch lustre/ldiskfs/kernel_patches/patches/ext3-iam-ops.patch lustre/ldiskfs/kernel_patches/patches/ext3-iam-2.6.18-rhel5.patch lustre/ldiskfs/kernel_patches/patches/ext3-iam-rhel4.patch lustre/ldiskfs/kernel_patches/patches/ext3-iam-2.6.18-vanilla.patch lustre/ldiskfs/kernel_patches/patches/ext3-iam-separate.patch lustre/ldiskfs/kernel_patches/patches/ext3-iam-2.6.9-rhel4.patch lustre/ldiskfs/kernel_patches/patches/ext3-iam-sles10.patch lustre/ldiskfs/kernel_patches/patches/ext3-iam-common.patch lustre/ldiskfs/kernel_patches/patches/ext3-iam-uapi.patch |
SOM | |
Summary | Size-on-MDS is a metadata improvement, which includes the caching of the inode size, blocks, ctime and mtime on MDS. Such an attribute caching allows clients to avoid making RPCs to the OSTs to find the attributes encoded in the file objects kept on those OSTs what results in the significantly improved performance of listing directories. |
Code |
llite/llite_close.c -- client side SOM code liblustre/file.c -- liblustre SOM code mdt/mdt_handler.c -- general handling of SOM-related rpc mdt/mdt_open.c -- MDS side SOM code mdt/mdt_recovery.c -- MDS side SOM recovery code obdfilter/filter_log.c -- OST side IO epoch lloging code; |