Preparing to Install Lustre

(Updated: Feb 2010)

The installation prerequisites described in this section must be met to successfully install and run the Lustre™ software.

Supported Operating System, Platform and Interconnect
Lustre 1.8 supports the following operating systems, platforms and interconnects. To install Lustre from downloaded packages (RPMs), you must use a supported configuration.

For Lustre 1.6.x requirements, see the Lustre 1.6 Operations Manual.

Required Lustre Software
To install Lustre, the following are required: These packages can be downloaded from the Lustre download site.
 * Linux kernel patched with Lustre-specific patches (the patched Linux kernel is required only on Lustre MDSs and OSSs)
 * Lustre modules compiled for the Linux kernel
 * Lustre utilities required for Lustre configuration
 * (Optional) Network-specific kernel modules and libraries (for example, kernel modules and libraries required for an InfiniBand interconnect)

Required Tools and Utilities
Several third-party utilities are required:
 * e2fsprogs - Lustre requires a recent version of e2fsprogs that understands extents.Use e2fsprogs-1.41-6 or later, available on the Lustre download site.


 * Note: Lustre-patched e2fsprogs utility only needs to be installed on machines that mount backend (ldiskfs) file systems, such as the OSS, MDS and MGS nodes. It does not need to be loaded on clients.


 * Perl - Various userspace utilities are written in Perl. Any recent version of Perl will work with Lustre.

(Optional) High-Availability Software
If you plan to enable failover server functionality with Lustre (either on an OSS or an MDS), you must add high-availability (HA) software to your cluster software. For more information, see Chapter 8: Failover in the Lustre Operations Manual.

Environmental Requirements
Make sure the following environmental requirements are met before installing Lustre:
 * (Recommended) Provide remote shell access to clients. Although not strictly required to run Lustre, we recommend that all cluster nodes have remote shell client access, to facilitate the use of Lustre configuration and monitoring scripts. Parallel Distributed SHell (pdsh) is preferable, although Secure SHell (SSH) is acceptable.
 * Ensure client clocks are synchronized. Lustre uses client clocks for timestamps. If clocks are out-of-sync between clients and servers, timeouts and client evictions will occur. Drifting clocks can also cause problems by, for example, making it difficult to debug multi-node issues or correlate logs, which depend on timestamps. We recommend that you use Network Time Protocol (NTP) to keep client and server clocks in sync with each other. For more information about NTP, go to ntp.org.
 * Maintain uniform file access permissions on all cluster nodes. Use the same user IDs (UID) and group IDs (GID) on all clients. If use of supplemental groups is required, verify that the group_upcall requirements have been met. For more information about User/Group Cache Upcall, see Chapter 29: Lustre Programming Interfaces in the Lustre Operations Manual.
 * (Recommended) Disable Security-Enhanced Linux (SELinux) on servers and clients. Lustre does not support SELinux. Therefore, disable the SELinux system extension on all Lustre nodes and make sure other security extensions, like Novell AppArmorand network packet filtering tools (such as iptables) do not interfere with Lustre.

Memory Requirements
This section describes the memory requirements for Lustre.

MDS Memory Requirements
MDS memory requirements are determined by the following factors: For example, for a single MDT on an MDS with 1,000 clients, 16 interactive nodes, and a 2 million file working set (of which 400,000 files are cached on the clients), memory requirements include:
 * Number of clients
 * Size of the directories
 * Extent of load
 * File system journal = 400 MB
 * 1000 * 4-core clients * 100 files/core * 2kB = 800 MB
 * 16 interactive clients * 10,000 files * 2kB = 320 MB
 * 1,600,000 file extra working set * 1.5kB/file = 2400 MB

Thus, the minimum requirement for a system with this configuration is 4-GB RAM. However, additional memory may significantly improve performance. For information about determining MDS memory requirements for more complex systems, see Section 3.1.7: Memory Requirements in the Lustre Operations Manual.

OSS Memory Requirements
When planning the hardware for an OSS node, consider the memory usage of several components in the Lustre system (i.e., journal, service threads, file system metadata, etc.). Also, consider the effect of the OSS read cache feature (new in Lustre 1.8), which consumes memory as it caches data on the OSS node.
 * Journal size - By default, each Lustre ldiskfs file system has 400 MB for the journal size. This can pin up to an equal amount of RAM on the OSS node per file system.
 * Service threads - The service threads on the OSS node pre-allocate a 1 MB I/O buffer for each ost_io service thread, so these buffers do not need to be allocated and freed for each I/O request.
 * File system metadata - A reasonable amount of RAM needs to be available for file system metadata. While no hard limit can be placed on the amount of file system metadata, if more RAM is available, then the disk I/O is needed less often to retrieve the metadata.
 * Network transport - If you are using TCP or other network transport that uses system memory for send/receive buffers, this must also be taken into consideration.
 * Failover configuration - If the OSS node will be used for failover from another node, then the RAM for each journal should be doubled, so the backup server can handle the additional load if the primary server fails.
 * OSS read cache - OSS read cache provides read-only caching of data on an OSS, using the regular Linux page cache to store the data. Just like caching from a regular file system in Linux, OSS read cache uses as much physical memory as is available.

Because of these memory requirements, the following calculations should be taken as determining the absolute minimum RAM required in an OSS node.

Calculating OSS Memory Requirements
The minimum recommended RAM size for an OSS with two OSTs is computed below:
 * 1.5 MB per OST IO thread * 512 threads = 768 MB
 * e1000 RX descriptors, RxDescriptors=4096 for 9000 byte MTU = 128 MB
 * Operating system overhead = 512 MB
 * 400 MB journal size * 2 OST devices = 800 MB
 * 600 MB file system metadata cache * 2 OSTs = 1200 MB

This consumes about 1,700 MB just for the pre-allocated buffers, and an additional 2 GB for minimal file system and kernel usage. Therefore, for a non-failover configuration, the minimum RAM would be 4 GB for an OSS node with two OSTs. While it is not strictly required, adding additional memory on the OSS will improve the performance of reading smaller, frequently-accessed files.

For a failover configuration, the minimum RAM would be at least 6 GB. For 4 OSTs on each OSS in a failover configuration 10GB of RAM is reasonable. When the OSS is not handling any failed-over OSTs the extra RAM will be used as a read cache. As a reasonable rule of thumb, about 2 GB of base memory plus 1 GB per OST can be used. In failover configurations, about 2 GB per OST is needed.