WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Architecture - Commit on Share: Difference between revisions
From Obsolete Lustre Wiki
Jump to navigationJump to search
m (Protected "Architecture - Commit on Share" ([edit=sysop] (indefinite) [move=sysop] (indefinite))) |
No edit summary |
||
Line 1: | Line 1: | ||
'''''Note:''''' The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information. | '''''Note:''''' ''The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.'' | ||
== Summary == | == Summary == |
Latest revision as of 13:00, 22 January 2010
Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.
Summary
Commit-on-Share is intended to allow better recoverability in enviroment where clients miss reconnect window.
Definitions
- Dependent transactions
- Two transactions are dependent if the second one cannot be executed until the first one is executed.
- Isolation
- defines level to which we consider transactions dependent:
- per-object -- all changes to same object are considered dependent,
- fine-grained -- some changes to same object are considered independent.
- Uncommitted object
- an object with changes cached and non-committed to disk.
- Dependency resolution
- remove request from replay queue committing it to persistent storage
Requirements
- Provide with mechanism to avoid non-recoverable requests .
- Mechanism to be optional (runtime???) in order to allow users to choose between performance and reliability.
- No changes in wire protocol are allowed.
- Provide compatibility for old clients.
Use Cases
ID | Quality Attribute | Summary |
---|---|---|
dependent request from same client | performance | performance shouldn't suffer with COS enabled |
independent request from different client | performance | performance shouldn't suffer with COS enabled |
dependent request from different client | availability | no dependency allowed after request execution, must be resolved before |
set of independent requests | availability, performance | with fine-grained isolation request can be independent from all except some one |
commit | availability, performance | commit event |
CoS enable | usability | when and how we can enable CoS |
CoS disable | usability | when and how we can disable CoS |
Quality Attribute Scenarios
Dependent request from same client
Scenario: | Dependent request from same client | |
Business Goals: | application's performance doesn't drop | |
Relevant QA's: | performance | |
details | Stimulus source: | application |
Stimulus: | request modifying file system | |
Environment: | object has non-committed modification and new request depends on that | |
Artifact: | a record for dependency tracker | |
Response: | immediate execution, no dependency resolution is required | |
Response measure: | roughly same performance as with COS disabled | |
Questions: |
Independent request from different client
Scenario: | Independent request from different client | |
Business Goals: | application's performance doesn't drop | |
Relevant QA's: | performance | |
details | Stimulus source: | application |
Stimulus: | request modifying file system | |
Environment: | object has no modifications which new request depends on | |
Artifact: | a record for dependency tracker | |
Response: | immediate execution, no dependency resolution required | |
Response measure: | roughly same performance as with COS disabled | |
Questions: |
Dependent request from different client
Scenario: | Dependent request from different client | |
Business Goals: | prevent recovery failure if client doesn't re-connect in time | |
Relevant QA's: | availablity | |
details | Stimulus source: | application |
Stimulus: | request modifying file system | |
Environment: | object has non-committed modification and new request depends on that | |
Artifact: | old dependency records are released, new one is created | |
Response: | server resolves dependency flushing non-committed changes and suspend current operation till commit event | |
Response measure: | performance degrades compared non-CoS | |
Questions: |
Set of independent requests
Scenario: | Set of independent requests | |
Business Goals: | Allow few independent requests against same object to co-exist | |
Relevant QA's: | performance, availability | |
details | Stimulus source: | application |
Stimulus: | few requests modifying file system | |
Environment: | few clients issue requests against same object | |
Artifact: | few records for dependency tracker | |
Response: | immediate execution, no dependency resolution required, but each request should be checked it doesn't depend on any request from the set | |
Response measure: | performance doesn't degrade significantly | |
Questions: | is it really a requirement for current CoS? |
Commit
Scenario: | Commit | |
Business Goals: | Block dependent operations for as short as possible | |
Relevant QA's: | availability, performance | |
details | Stimulus source: | underlying disk file system |
Stimulus: | all previous changes are committed to storage | |
Environment: | ||
Artifact: | all dependencies become resolved | |
Response: | dependency tracker get new "committed" border and continue suspended operations | |
Response measure: | Dependency tables don't grow indefinitely | |
Questions: |
CoS enable
Scenario: | CoS enable | |
Business Goals: | allow customers to control CoS | |
Relevant QA's: | usability | |
details | Stimulus source: | administrator |
Stimulus: | request through control utility and/or procfs | |
Environment: | CoS is disabled | |
Artifact: | per-server flag enabling CoS | |
Response: | since now dependency tracker checks whether coming operation depends on any uncommitted ones | |
Response measure: | dependent operations are slow, but recovery during the server's reconnect window is guaranteed to succeed | |
Questions: | do we need this run-time? if so, we need to take care of possible races here |
CoS disable
Scenario: | CoS disable | |
Business Goals: | Allow customers to control CoS | |
Relevant QA's: | usability | |
details | Stimulus source: | administrator |
Stimulus: | request through control utility and/or procfs | |
Environment: | CoS is enabled | |
Artifact: | per-server flag disabling CoS | |
Response: | dependency tracker considers all operations independent since now | |
Response measure: | dependent operations are fast, but recovery failure is possible in case of missed client | |
Questions: | do we need this run-time? |
- QAS template
Scenario: | ||
Business Goals: | ||
Relevant QA's: | ||
details | Stimulus source: | |
Stimulus: | ||
Environment: | ||
Artifact: | ||
Response: | ||
Response measure: | ||
Questions: |
Memos for HLD
- is it possible to cancel client's lock and do sync in parallel?
Questions
- runtime control?