WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Architecture - Recovery Failures

From Obsolete Lustre Wiki
Jump to navigationJump to search

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Recovery Terminology

Replay A client resends a transaction that had previously been executed and for which a reply was sent, but was lost in a server crash, because it was not committed.
Collecting clients A server waiting for a short period to allow more clients to connect during recovery.
Version controlled replay clients connecting at any time and are allowed to replay updates to anything that did not change since the update was made before the server failure.
Version based cache revalidation Cached client data may be retained if transactions it depended on were replayed.
Commit on sharing before new or updated metadata is shared with a node that is not the updater or creator, the metadata is committed.
Commit on sharing sequences Attributes or data of objects that belong to one and the same inode sequence are committed before nodes other than the updator or creator can access them.
Eviction A client flushing its cache because the server indicates it does not have the cached data anymore and possibly cannot re-create it.

Recovery Architectures

A. All clients reconnect: After a server failure client collecting. If all clients reconnect they replay all missing transactions. All data lost in the server failure is restored on the server and in client caches. (Currently implemented by Lustre on Linux)
B. Version recovery: After a server failure first method A is tried. If it fails all connecting clients are subject to are subject to collecting clients, version controlled replay and version based cache validation. Nodes that see mismatching versions for cached items are evicted.
C. Version checking (aka relaxed recovery requirement) This is version based cache revalidation, without replay. Not further discussed.

This can be combined with other mechanisms:

  • Commit on sharing or
  • Commit on sharing sequences - just mentioned for historical relevance, not further documented or discussed.

Use cases


id quality attribute summary
create_in_one_dir performance, availability many clients create files in one directory.
race_crash_and_share availability a client is making a lot of updates. it starts computing, the server crashes and another client wants to access these updates.

Sharing among clients

Scenario: Clients sharing recently modified/created metadata
Business Goals: Maximize availability
Relevant QA's: Performance & Availability
details Stimulus: One client creates or modifies metadata, another one accesses it
Stimulus source: two or more client systems, one making updates.
Environment: server failures
Artifact: recoverable state in client caches
Response: see below
Response measure: How many clients recover?
Questions: What does customer want?
Issues: None.


A unless both clients reconnect both will be evicted.
B if the client making the updates/creates reconnects first both clients will recover, even if there is a delayed reconnect for the second client.

Individual client performance

Scenario: A single client make a burst of file system updates or many clients making bursts file system updates without sharing metadata. Other clients (accessing clients) access some of the updates.
Business Goals: Maximize performance & availability
Relevant QA's: Performance & Availability
details Stimulus: One client creates or modifies metadata, another one accesses it
Stimulus source: one or more client systems making updates.
Environment: server failures
Artifact: recoverable state in the cluster
Response: throughput of cluster during the updates and count of recoverable clients
Response measure: Performance
Questions: What does customer want?
Issues: None.


A Fast performance. All clients must be present to avoid evicting all clients.
B Fast performance. If the updates are independent of each other all updating clients can reconnect at any time and perform a replay. An accessing client will recover if and only if it reconnects after all updating clients preceding its access have recovered. On catamount this leads to a high likelihood of recovery failures.
B with commit on sharing Every accessing client will cause a cache flush, possibly degrading performance heavily. Any accessing client reconnectingwill recover.