WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Architecture - Recovery Failures: Difference between revisions
From Obsolete Lustre Wiki
				
				
				Jump to navigationJump to search
				
				| m (Protected "Architecture - Recovery Failures" ([edit=sysop] (indefinite))) | No edit summary | ||
| Line 1: | Line 1: | ||
| '''''Note:''''' The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.   | '''''Note:''''' ''The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.''  | ||
| == Recovery Terminology == | == Recovery Terminology == | ||
Latest revision as of 14:22, 22 January 2010
Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.
Recovery Terminology
| Replay | A client resends a transaction that had previously been executed and for which a reply was sent, but was lost in a server crash, because it was not committed. | 
| Collecting clients | A server waiting for a short period to allow more clients to connect during recovery. | 
| Version controlled replay | clients connecting at any time and are allowed to replay updates to anything that did not change since the update was made before the server failure. | 
| Version based cache revalidation | Cached client data may be retained if transactions it depended on were replayed. | 
| Commit on sharing | before new or updated metadata is shared with a node that is not the updater or creator, the metadata is committed. | 
| Commit on sharing sequences | Attributes or data of objects that belong to one and the same inode sequence are committed before nodes other than the updator or creator can access them. | 
| Eviction | A client flushing its cache because the server indicates it does not have the cached data anymore and possibly cannot re-create it. | 
Recovery Architectures
| A. All clients reconnect: | After a server failure client collecting. If all clients reconnect they replay all missing transactions. All data lost in the server failure is restored on the server and in client caches. (Currently implemented by Lustre on Linux) | 
| B. Version recovery: | After a server failure first method A is tried. If it fails all connecting clients are subject to are subject to collecting clients, version controlled replay and version based cache validation. Nodes that see mismatching versions for cached items are evicted. | 
| C. Version checking (aka relaxed recovery requirement) | This is version based cache revalidation, without replay. Not further discussed. | 
This can be combined with other mechanisms:
- Commit on sharing or
- Commit on sharing sequences - just mentioned for historical relevance, not further documented or discussed.
Use cases
Summary
| id | quality attribute | summary | 
|---|---|---|
| create_in_one_dir | performance, availability | many clients create files in one directory. | 
| race_crash_and_share | availability | a client is making a lot of updates. it starts computing, the server crashes and another client wants to access these updates. | 
Sharing among clients
| Scenario: | Clients sharing recently modified/created metadata | |
| Business Goals: | Maximize availability | |
| Relevant QA's: | Performance & Availability | |
| details | Stimulus: | One client creates or modifies metadata, another one accesses it | 
| Stimulus source: | two or more client systems, one making updates. | |
| Environment: | server failures | |
| Artifact: | recoverable state in client caches | |
| Response: | see below | |
| Response measure: | How many clients recover? | |
| Questions: | What does customer want? | |
| Issues: | None. | |
responses
| A | unless both clients reconnect both will be evicted. | 
| B | if the client making the updates/creates reconnects first both clients will recover, even if there is a delayed reconnect for the second client. | 
Individual client performance
| Scenario: | A single client make a burst of file system updates or many clients making bursts file system updates without sharing metadata. Other clients (accessing clients) access some of the updates. | |
| Business Goals: | Maximize performance & availability | |
| Relevant QA's: | Performance & Availability | |
| details | Stimulus: | One client creates or modifies metadata, another one accesses it | 
| Stimulus source: | one or more client systems making updates. | |
| Environment: | server failures | |
| Artifact: | recoverable state in the cluster | |
| Response: | throughput of cluster during the updates and count of recoverable clients | |
| Response measure: | Performance | |
| Questions: | What does customer want? | |
| Issues: | None. | |
Responses
| A | Fast performance. All clients must be present to avoid evicting all clients. | 
| B | Fast performance. If the updates are independent of each other all updating clients can reconnect at any time and perform a replay. An accessing client will recover if and only if it reconnects after all updating clients preceding its access have recovered. On catamount this leads to a high likelihood of recovery failures. | 
| B with commit on sharing | Every accessing client will cause a cache flush, possibly degrading performance heavily. Any accessing client reconnectingwill recover. | 

