WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Lustre 1.8: Difference between revisions
(117 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
<< | <small>''(Updated: Jan 2010)''</small> | ||
__TOC__ | |||
Lustre™ 1.8.0.1 introduces several robust, new features and improved system functionality. This page provides feature descriptions and lists the benefits offered by upgrading to the Lustre 1.8 release branch. The change log and release notes are [[Change_Log_1.8|here]]. | |||
==Adaptive Timeouts== | |||
The adaptive timeouts feature (enabled, by default) causes Lustre to use an adaptive mechanism to set RPC timeouts, so users no longer have to tune the obd_timeout value. RPC service time histories are tracked on all servers for each service, and estimates for future RPCs are reported back to clients. Clients use these service time estimates along with their own observations of the network delays to set future RPC timeout values. | |||
If server request processing slows down, its estimates increase and the clients allow more time for RPC completion before retrying. If RPCs queued up on the server approach their timeouts, the server sends early replies to the client, telling it to allow more time. Conversely, as the load on the server is reduced, the RPC timeout values decrease, allowing faster client detection of non-responsive servers and faster attempts to reconnect to a server's failover partner. | |||
<big>Why should I upgrade to Lustre 1.8 to get it?</big> | |||
Adaptive timeouts offers these benefits: | |||
* Simplified management for small and large clusters. | |||
* Automatically adjusts RPC timeouts as network conditions and server load changes. | |||
* Reduces server recovery time in cases where the server load is low at time of failure. | |||
<big>Additional Resources</big> | |||
For more information about adaptive timeouts, see: | |||
* [[Architecture - Adaptive_Timeouts_-_Use_Cases|Architecture page - Adaptive timeouts (use cases)]] | |||
* [[Media:Adaptive-timeouts-hld.pdf|HLD - Adaptive RPC timeouts]] | |||
==Client Interoperability== | |||
The client interoperability feature enables Lustre 1.8 clients to work with a new network protocol that will be introduced in Lustre 2.0. This feature allows transparent client, server, network and storage interoperability when migrating from 1.6 architecture-based clusters to clusters with 2.0 architecture-based servers. Lustre 1.8.x clients will interoperate with Lustre 2.0 servers. | |||
<big>Why should I upgrade to Lustre 1.8 to get it?</big> | |||
Client interoperability offers this benefit: | |||
* When Lustre 2.x is released, Lustre 1.8.x users will be able to upgrade to 2.x servers while the Lustre filesystem is up and running. This transparent upgrade feature will enable users to upgrade their servers to Lustre 2.x and reboot them without disturbing applications using the filesystem on clients. It will no longer be necessary to unmount clients from the filesystem to upgrade servers to the new software. After the 2.x upgrade, Lustre 2.x servers will interoperate with 1.8.x clients. | |||
<big>Additional Resources</big> | |||
For more information on client interoperability, see: | |||
* [[Architecture - Interoperability_fids_zfs|Architecture page - Interoperability FIDs and ZFS]] | |||
* [[Media:Interop_disk_fidea.pdf|HLD - Interoperability at the Server Side]] | |||
* [[Media:Sptlrpc_interop-hld.pdf|HLD - Sptlrpc interoperability]] | |||
* [[Media:Interop-client-recov-dld.pdf|DLD - Interoperable Client Recovery]] | |||
* [[Media:Sptlrpc_interop-dld.pdf|DLD - Sptlrpc interoperability]] | |||
==OSS Read Cache== | ==OSS Read Cache== | ||
The OSS read cache feature provides read-only caching of data on an OSS. It uses a regular Linux pagecache to store the data. OSS read cache improves Lustre performance when several clients access the same data set, and the data fits the OSS cache (which can occupy most of the available memory). The overhead of OSS read cache is | The OSS read cache feature provides read-only caching of data on an OSS. It uses a regular Linux pagecache to store the data. OSS read cache improves Lustre performance when several clients access the same data set, and the data fits the OSS cache (which can occupy most of the available memory). The overhead of OSS read cache is very low on modern CPUs, and cache misses do not negatively impact performance compared to Lustre releases before OSS read cache was available. | ||
<big>Why should I upgrade to Lustre 1.8 to get it?</big> | |||
OSS read cache can improve Lustre performance, and offers these benefits: | OSS read cache can improve Lustre performance, and offers these benefits: | ||
* Allows OSTs to cache read data more frequently | * Allows OSTs to cache read data more frequently | ||
* Improves repeated reads to network speeds | * Improves repeated reads to match network speeds instead of disk speeds | ||
* Provides the building block for OST write cache (small write aggregation) | * Provides the building block for OST write cache (small write aggregation) | ||
<big>Additional Resources</big> | |||
For more information on OSS read cache, see: | For more information on OSS read cache, see: | ||
* | * [[Architecture - Caching_OSS|Architecture page - Caching OSS]] | ||
==OST Pools== | ==OST Pools== | ||
The OST pools feature | The OST pools feature allows the administrator to name a group of OSTs for file striping purposes. For instance, a group of local OSTs could be defined for faster access; a group of higher-performance OSTs could be defined for specific applications; a group of non-RAID OSTs could be defined for scratch files; or groups of OSTs could be defined for particular users. | ||
striping purposes. For instance, a group of local OSTs could be defined | |||
for faster access; a group of higher-performance OSTs could be defined for | |||
or groups of OSTs could be defined for particular | |||
Pools are defined by the system administrator, using regular Lustre tools (lctl). Pool usage is specified and stored along with other striping information | Pools are defined by the system administrator, using regular Lustre tools (lctl). Pool usage is specified and stored along with other striping information | ||
(e.g., stripe count, stripe size) for directories or individual files (lfs | (e.g., stripe count, stripe size) for directories or individual files (lfs | ||
setstripe or llapi_create_file). Traditional automated OST selection | setstripe or llapi_create_file()). Traditional automated OST selection | ||
optimizations (QOS) occur within a pool (e.g., free-space leveling within | optimizations (QOS) occur within a pool (e.g., free-space leveling within | ||
the pool). OSTs can be added or removed from a pool at any time (and existing | the pool). OSTs can be added or removed from a pool at any time (and existing | ||
files always remain in place and available.) | files always remain in place and available.) | ||
OST pools characteristics include: | |||
* An OST can be associated with multiple pools | * An OST can be associated with multiple pools | ||
* No ordering of OSTs is implied or defined within a pool | * No ordering of OSTs is implied or defined within a pool | ||
* OST membership in a pool can change over time | * OST membership in a pool can change over time | ||
* a directory can default to a specific pool and new files/subdirectories created therein will use that pool | |||
'''NOTE:''' In its current implementation, the OST pools feature does not implement an automated policy or restrict users from creating files in any of the pools; it must be managed directly by administrator/user. It is a building block for policy-managed storage. | |||
<big>Why should I upgrade to Lustre 1.8 to get it?</big> | |||
OST pools offers these benefits: | OST pools offers these benefits: | ||
* Allows sets of OSTs to be managed via named groups | |||
* Pools can separate heterogeneous OSTs within the same filesystem | |||
** Fast vs. slow disks | |||
** Local network vs. remote network (e.g. WAN) | |||
** RAID 1 vs. RAID5 backing storage, etc. | |||
** Specific OSTs for users/groups/applications (by directory) | |||
* Easier disk usage policy implementation for administrators | * Easier disk usage policy implementation for administrators | ||
* Hardware can be more closely optimized for particular usage patterns | * Hardware can be more closely optimized for particular usage patterns | ||
* Human-readable stripe mappings | * Human-readable stripe mappings | ||
<big>Additional Resources</big> | |||
For more information on OST pools, see: | For more information on OST pools, see: | ||
* | * [[Architecture - Pools_of_targets|Architecture page - OST pools]] | ||
* [[Media:OstPools-DLD.pdf|DLD - OST Pools]] | |||
* [ | * [[Media:Ostpools-large-scale_testplan.pdf|Test plan - OST pools]] | ||
* [ | |||
==Version Based Recovery== | ==Version-Based Recovery== | ||
Version-based Recovery (VBR) improves the robustness of client recovery operations and allows Lustre to recover, even if multiple clients fail at the same time as the server. With VBR, recovery is more flexible; not all clients are evicted if some miss recovery, and a missed client may try to recover after the server recovery window. | Version-based Recovery (VBR) improves the robustness of client recovery operations and allows Lustre to recover, even if multiple clients fail at the same time as the server. With VBR, recovery is more flexible; not all clients are evicted if some miss recovery, and a missed client may try to recover after the server recovery window. | ||
<big>Why should I upgrade to Lustre 1.8 to get it?</big> | |||
VBR functionality in Lustre 1.8 allows more flexible recovery after a failure. | VBR functionality in Lustre 1.8 allows more flexible recovery after a failure. Previous Lustre releases enforced a strict, in-order replay condition that required all clients to reconnect during the recovery period. If a client was missing and the recovery period timed out, then the remaining clients were evicted. With VBR, conditional out-of-order replay is allowed. VBR uses versions to detect conflicting transactions. If an object's version matches what is expected, the transaction is replayed. If there is a version mis-match, clients attempting to modify the object are stopped. Recovery continues even if some clients do not reconnect (the missed clients can try to recover later). With VBR, Lustre clients may successfully recover in a wider variety of failure scenarios. | ||
VBR offers these benefits: | VBR offers these benefits: | ||
* Improves the robustness of client recovery operations | * Improves the robustness of client recovery operations | ||
* Allows Lustre recovery to | * Allows Lustre recovery to continue even if multiple clients fail at the same time as the server | ||
* Provides a building block for disconnected client operations | * Provides a building block for disconnected client operations | ||
<big>Additional Resources</big> | |||
For more information on VBR, see: | For more information on VBR, see: | ||
* | * [http://wiki.lustre.org/manual/LustreManual20_HTML/LustreRecovery.html#50438268_pgfId-1287769 Section 30.4: ''Version-based Recovery''] in the [http://wiki.lustre.org/manual/LustreManual20_HTML/index.html ''Lustre Operations Manual'']. | ||
* [[Architecture - Version_Based_Recovery|Architecture page - VBR]] | |||
* | * [[Media:20080612165106%21Version_base_recovery-hld.pdf|HLD - VBR]] | ||
* [[Media:Version_recovery.pdf|DLD - VBR]] | |||
* [[Media:VBR_phase2_large_scale_testplan.pdf|Test plan - VBR]] |
Latest revision as of 11:24, 20 January 2011
(Updated: Jan 2010)
Lustre™ 1.8.0.1 introduces several robust, new features and improved system functionality. This page provides feature descriptions and lists the benefits offered by upgrading to the Lustre 1.8 release branch. The change log and release notes are here.
Adaptive Timeouts
The adaptive timeouts feature (enabled, by default) causes Lustre to use an adaptive mechanism to set RPC timeouts, so users no longer have to tune the obd_timeout value. RPC service time histories are tracked on all servers for each service, and estimates for future RPCs are reported back to clients. Clients use these service time estimates along with their own observations of the network delays to set future RPC timeout values.
If server request processing slows down, its estimates increase and the clients allow more time for RPC completion before retrying. If RPCs queued up on the server approach their timeouts, the server sends early replies to the client, telling it to allow more time. Conversely, as the load on the server is reduced, the RPC timeout values decrease, allowing faster client detection of non-responsive servers and faster attempts to reconnect to a server's failover partner.
Why should I upgrade to Lustre 1.8 to get it?
Adaptive timeouts offers these benefits:
- Simplified management for small and large clusters.
- Automatically adjusts RPC timeouts as network conditions and server load changes.
- Reduces server recovery time in cases where the server load is low at time of failure.
Additional Resources
For more information about adaptive timeouts, see:
Client Interoperability
The client interoperability feature enables Lustre 1.8 clients to work with a new network protocol that will be introduced in Lustre 2.0. This feature allows transparent client, server, network and storage interoperability when migrating from 1.6 architecture-based clusters to clusters with 2.0 architecture-based servers. Lustre 1.8.x clients will interoperate with Lustre 2.0 servers.
Why should I upgrade to Lustre 1.8 to get it?
Client interoperability offers this benefit:
- When Lustre 2.x is released, Lustre 1.8.x users will be able to upgrade to 2.x servers while the Lustre filesystem is up and running. This transparent upgrade feature will enable users to upgrade their servers to Lustre 2.x and reboot them without disturbing applications using the filesystem on clients. It will no longer be necessary to unmount clients from the filesystem to upgrade servers to the new software. After the 2.x upgrade, Lustre 2.x servers will interoperate with 1.8.x clients.
Additional Resources
For more information on client interoperability, see:
- Architecture page - Interoperability FIDs and ZFS
- HLD - Interoperability at the Server Side
- HLD - Sptlrpc interoperability
- DLD - Interoperable Client Recovery
- DLD - Sptlrpc interoperability
OSS Read Cache
The OSS read cache feature provides read-only caching of data on an OSS. It uses a regular Linux pagecache to store the data. OSS read cache improves Lustre performance when several clients access the same data set, and the data fits the OSS cache (which can occupy most of the available memory). The overhead of OSS read cache is very low on modern CPUs, and cache misses do not negatively impact performance compared to Lustre releases before OSS read cache was available.
Why should I upgrade to Lustre 1.8 to get it?
OSS read cache can improve Lustre performance, and offers these benefits:
- Allows OSTs to cache read data more frequently
- Improves repeated reads to match network speeds instead of disk speeds
- Provides the building block for OST write cache (small write aggregation)
Additional Resources
For more information on OSS read cache, see:
OST Pools
The OST pools feature allows the administrator to name a group of OSTs for file striping purposes. For instance, a group of local OSTs could be defined for faster access; a group of higher-performance OSTs could be defined for specific applications; a group of non-RAID OSTs could be defined for scratch files; or groups of OSTs could be defined for particular users.
Pools are defined by the system administrator, using regular Lustre tools (lctl). Pool usage is specified and stored along with other striping information (e.g., stripe count, stripe size) for directories or individual files (lfs setstripe or llapi_create_file()). Traditional automated OST selection optimizations (QOS) occur within a pool (e.g., free-space leveling within the pool). OSTs can be added or removed from a pool at any time (and existing files always remain in place and available.)
OST pools characteristics include:
- An OST can be associated with multiple pools
- No ordering of OSTs is implied or defined within a pool
- OST membership in a pool can change over time
- a directory can default to a specific pool and new files/subdirectories created therein will use that pool
NOTE: In its current implementation, the OST pools feature does not implement an automated policy or restrict users from creating files in any of the pools; it must be managed directly by administrator/user. It is a building block for policy-managed storage.
Why should I upgrade to Lustre 1.8 to get it?
OST pools offers these benefits:
- Allows sets of OSTs to be managed via named groups
- Pools can separate heterogeneous OSTs within the same filesystem
- Fast vs. slow disks
- Local network vs. remote network (e.g. WAN)
- RAID 1 vs. RAID5 backing storage, etc.
- Specific OSTs for users/groups/applications (by directory)
- Easier disk usage policy implementation for administrators
- Hardware can be more closely optimized for particular usage patterns
- Human-readable stripe mappings
Additional Resources
For more information on OST pools, see:
Version-Based Recovery
Version-based Recovery (VBR) improves the robustness of client recovery operations and allows Lustre to recover, even if multiple clients fail at the same time as the server. With VBR, recovery is more flexible; not all clients are evicted if some miss recovery, and a missed client may try to recover after the server recovery window.
Why should I upgrade to Lustre 1.8 to get it?
VBR functionality in Lustre 1.8 allows more flexible recovery after a failure. Previous Lustre releases enforced a strict, in-order replay condition that required all clients to reconnect during the recovery period. If a client was missing and the recovery period timed out, then the remaining clients were evicted. With VBR, conditional out-of-order replay is allowed. VBR uses versions to detect conflicting transactions. If an object's version matches what is expected, the transaction is replayed. If there is a version mis-match, clients attempting to modify the object are stopped. Recovery continues even if some clients do not reconnect (the missed clients can try to recover later). With VBR, Lustre clients may successfully recover in a wider variety of failure scenarios.
VBR offers these benefits:
- Improves the robustness of client recovery operations
- Allows Lustre recovery to continue even if multiple clients fail at the same time as the server
- Provides a building block for disconnected client operations
Additional Resources
For more information on VBR, see: