WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Subsystem Map
libcfs | |
Summary | Libcfs provides an API comprising fundamental primitives and subsystems - e.g. process management and debugging support which is used throughout LNET, Lustre, and associated utilities. This API defines a portable runtime environment that is implemented consistently on all supported build targets. |
Code |
lustre/lnet/libcfs/**/*.[ch] |
lnet | |
Summary | LNET = the Lustre Networking subsystem.
See the Lustre Networking white paper for details. |
Code |
lustre/lnet/**/*.[ch] |
ptlrpc | |
Summary | Ptlrpc implements Lustre communications over LNET.
All communication between Lustre processes are handled by RPCs, in which a request is sent to an advertised service, and the service processes the request and returns a reply. Note that a service may be offered by any Lustre process - e.g. the OST service on an OSS processes I/O requests and the AST service on a client processes notifications of lock conflicts. The initial request message of an RPC is special - it is received into the first available request buffer at the destination. All other communications involved in an RPC are like RDMAs - the peer targets them specifically. For example, in a bulk read, the OSC posts reply and bulk buffers and sends descriptors for them (the LNET matchbits used to post them) in the RPC request. After the server has received the request, it GETs or PUTs the bulk data and PUTs the RPC reply directly. Ptlrpc ensures all resources involved in an RPC are freed in finite time. If the RPC does not complete within a timeout, all buffers associated with the RPC must be unlinked. These buffers are still accessible to the network until their completion events have been delivered. |
Code |
lustre/ptlrpc/*.[ch] lustre/ldlm/ldlm_lib.c |
llog | |
Summary |
Overview LLog is the generic logging mechanism in Lustre. It allows Lustre to store records in an appropriate format and access them later using a reasonable API. LLog is used is various cases. The main LLog use cases are the following:
General design Each llog type has two main parts:
|
Code |
obdclass/llog.c obdclass/llog_cat.c obdclass/llog_lvfs.c obdclass/llog_obd.c obdclass/llog_swab.c obdclass/llog_test.c lov/lov_log.c ptlrpc/llog_client.c ptlrpc/llog_server.c ptlrpc/llog_net.c |
obdclass | |
Summary | The obdclass code is generic Lustre configuration and device handling. Different functional parts of the Lustre code are split into obd devices which can be configured and connected in various ways to form a server or client filesystem.
Several examples of obd devices include:
The obdclass code provides services used by all Lustre devices for configuration, memory allocation, generic hashing, kernel interface routines, random number generation, etc. |
Code |
lustre/obdclass/class_hash.c - scalable hash code for imports lustre/obdclass/class_obd.c - base device handling code lustre/obdclass/debug.c - helper routines for dumping data structs lustre/obdclass/genops.c - device allocation/configuration/connection lustre/obdclass/linux-module.c - linux kernel module handling lustre/obdclass/linux-obdo.c - pack/unpack obdo and other IO structs lustre/obdclass/linux-sysctl.c - /proc/sys configuration parameters lustre/obdclass/lprocfs_status.c - /proc/fs/lustre configuration/stats, helpers lustre/obdclass/lustre_handles.c - wire opaque pointer handlers lustre/obdclass/lustre_peer.c - peer target identification by UUID lustre/obdclass/obd_config.c - configuration file parsing lustre/obdclass/obd_mount.c - server filesystem mounting lustre/obdclass/obdo.c - more obdo handling helpers lustre/obdclass/statfs_pack.c - statfs helpers for wire pack/unpack lustre/obdclass/uuid.c - UUID pack/unpack lustre/lvfs/lvfs_common.c - kernel interface helpers lustre/lvfs/lvfs_darwin.c - darwin kernel helper routines lustre/lvfs/lvfs_internal.h - lvfs internal function prototypes lustre/lvfs/lvfs_lib.c - statistics lustre/lvfs/lvfs_linux.c - linux kernel helper routines lustre/lvfs/lvfs_userfs.c - userspace helper routines lustre/lvfs/prng.c - long period pseudo-random number generator lustre/lvfs/upcall_cache.c - supplementary group upcall for MDS |
luclass | |
Summary | luclass is a body of data-type definitions and functions implementing support for a layered object, that is an entity where every layer in the Lustre device stack (both data and meta-data, and both client and server side) can maintain its own private state, and modify a behavior of a compound object in a systematic way.
Specifically, data-types are introduced, representing a device type (struct lu_device_type, layer in the Lustre stack), a device (struct lu_device, a specific instance of the type), and object (struct lu_object). Following lu_object functionality is implemented by a generic code:
In addition to objects and devices, luclass includes lu_context, which is a way to efficiently allocate space, without consuming stack space. luclass design is specified in the MD API DLD. |
Code |
include/lu_object.h obdclass/lu_object.c |
ldlm | |
Summary | The Lustre Distributed Lock Manager (LDLM) is the Lustre locking infrastructure; it handles locks between clients and servers and locks local to a node. Different kinds of locks are available with different properties. Also as a historic heritage, ldlm happens to have some of the generic connection service code (both server and client). |
Code |
interval_tree.c - this is used by extent locks to maintain interval trees (bug 11300). l_lock.c - resourse locking primitives. ldlm_extent.c - extents locking code used for locking regions inside objects. ldlm_flock.c - bsd and posix locking lock types. ldlm_inodebits.c - inodebis locks used for metadata locking. ldlm_lib.c - target and client connecting/reconnecting/recovery code. Does not really belong to ldlm, but is historically placed there. Should be in ptlrpc instead. ldlm_lock.c - this source file mostly has functions dealing with struct. ldlm_lock ldlm_lockd.c - functions that imply replying to incoming lock-related rpcs (that could be both on server (lock enq/cancel/...) and client (ast handling)). ldlm_plain.c - plain locks, predecessor to inodebits locks; not widely used now. ldlm_pool.c - pools of locks, related to dynamic lrus and freeing locks on demand. ldlm_request.c - collection of functions to work with locks based handles as opposed to lock structures themselves. ldlm_resource.c - functions operating on namespaces and lock resources. include/lustre_dlm.h - important defines and declarations for ldlm. |
fids | |
Summary | FID is unique object identifier in cluster since 1.7. It has few properties, main of them are the following:
FID consists of 3 fields:
|
Code |
fid/fid_request.c fid/fid_lib.c fld/*.[ch] |
seq | |
Summary | Overview
Sequence management is a basic mechanism in new MDS server which is related to managing FIDs. FID is an unique object identifier in Lustre starting from version 1.7. All FIDs are organized into sequences. One sequence is number of FIDs. Sequences are granted/allocated to clients by servers. FIDs are allocated by clients inside granted sequence. All FIDs inside one sequence live on same MDS server and as such are one "migration unit" and one "indexing unit", meaning that FLD (FIDs Location Database) indexes them all using one sequence and thus has only one mapping entry for all FIDs in sequence. Please read section devoted to FIDs bellow in the root table to find more info on FLD service and FIDs. A sequence has the limit of FIDs to be allocated in it. When this limit is reached, new sequence is allocated. Upon disconnect, server allocates new sequence to the client when it comes back. Previously used sequence is abandoned even if it was not exhausted. Sequences are valuable resource but in the case of recovery, using new sequence makes things easier and also allows to group FIDs and objects by working sessions, new connection - new sequence. Code description Server side code is divided into two parts:
Client side code allocates new sequences from granted meta-sequence. When meta-sequence is exhausted, new one is allocated on server and sent to the client. Client code consists of API for working with both server side parts, not only with sequence manager as all servers need to talk to sequence controller, they also use client API for this. One important part of client API is FIDs allocation. New FID is allocated in currently granted sequence until sequence is exhausted. |
Code |
fid/fid_handler.c - server side sequence management code; fid/fid_request.c - client side sequence management code; fid/fid_lib.c - fids related miscellaneous stuff. |
mountconf | |
Summary | MountConf is how servers and clients are set up, started, and configured. A MountConf usage document is here.
The major subsystems are the MGS, MGC, and the userspace tools mount.lustre and mkfs.lustre. The basic idea is:
|
Code |
MountConf file areas: lustre/mgs/* lustre/mgc/* lustre/obdclass/obd_mount.c lustre/utils/mount_lustre.c lustre/utils/mkfs_lustre.c |