Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information. 
Summary
Commit-on-Share is intended to allow better recoverability in enviroment where clients miss reconnect window.
Definitions
- Dependent transactions
- Two transactions are dependent if the second one cannot be executed until the first one is executed.
- Isolation
- defines level to which we consider transactions dependent:
- per-object -- all changes to same object are considered dependent,
- fine-grained -- some changes to same object are considered independent.
- Uncommitted object
- an object with changes cached and non-committed to disk.
- Dependency resolution
- remove request from replay queue committing it to persistent storage
Requirements
- Provide with mechanism to avoid non-recoverable requests .
- Mechanism to be optional (runtime???) in order to allow users to choose between performance and reliability.
- No changes in wire protocol are allowed.
- Provide compatibility for old clients.
Use Cases
| ID | Quality Attribute | Summary | 
| dependent request from same client | performance | performance shouldn't suffer with COS enabled | 
| independent request from different client | performance | performance shouldn't suffer with COS enabled | 
| dependent request from different client | availability | no dependency allowed after request execution, must be resolved before | 
| set of independent requests | availability, performance | with fine-grained isolation request can be independent from all except some one | 
| commit | availability, performance | commit event | 
| CoS enable | usability | when and how we can enable CoS | 
| CoS disable | usability | when and how we can disable CoS | 
Quality Attribute Scenarios
Dependent request from same client
| Scenario: | Dependent request from same client | 
| Business Goals: | application's performance doesn't drop | 
| Relevant QA's: | performance | 
| details | Stimulus source: | application | 
| Stimulus: | request modifying file system | 
| Environment: | object has non-committed modification and new request depends on that | 
| Artifact: | a record for dependency tracker | 
| Response: | immediate execution, no dependency resolution is required | 
| Response measure: | roughly same performance as with COS disabled | 
| Questions: |  | 
Independent request from different client
| Scenario: | Independent request from different client | 
| Business Goals: | application's performance doesn't drop | 
| Relevant QA's: | performance | 
| details | Stimulus source: | application | 
| Stimulus: | request modifying file system | 
| Environment: | object has no modifications which new request depends on | 
| Artifact: | a record for dependency tracker | 
| Response: | immediate execution, no dependency resolution required | 
| Response measure: | roughly same performance as with COS disabled | 
| Questions: |  | 
Dependent request from different client
| Scenario: | Dependent request from different client | 
| Business Goals: | prevent recovery failure if client doesn't re-connect in time | 
| Relevant QA's: | availablity | 
| details | Stimulus source: | application | 
| Stimulus: | request modifying file system | 
| Environment: | object has non-committed modification and new request depends on that | 
| Artifact: | old dependency records are released, new one is created | 
| Response: | server resolves dependency flushing non-committed changes and suspend current operation till commit event | 
| Response measure: | performance degrades compared non-CoS | 
| Questions: |  | 
Set of independent requests
| Scenario: | Set of independent requests | 
| Business Goals: | Allow few independent requests against same object to co-exist | 
| Relevant QA's: | performance, availability | 
| details | Stimulus source: | application | 
| Stimulus: | few requests modifying file system | 
| Environment: | few clients issue requests against same object | 
| Artifact: | few records for dependency tracker | 
| Response: | immediate execution, no dependency resolution required, but each request should be checked it doesn't depend on any request from the set | 
| Response measure: | performance doesn't degrade significantly | 
| Questions: | is it really a requirement for current CoS? | 
Commit
| Scenario: | Commit | 
| Business Goals: | Block dependent operations for as short as possible | 
| Relevant QA's: | availability, performance | 
| details | Stimulus source: | underlying disk file system | 
| Stimulus: | all previous changes are committed to storage | 
| Environment: |  | 
| Artifact: | all dependencies become resolved | 
| Response: | dependency tracker get new "committed" border and continue suspended operations | 
| Response measure: | Dependency tables don't grow indefinitely | 
| Questions: |  | 
CoS enable
| Scenario: | CoS enable | 
| Business Goals: | allow customers to control CoS | 
| Relevant QA's: | usability | 
| details | Stimulus source: | administrator | 
| Stimulus: | request through control utility and/or procfs | 
| Environment: | CoS is disabled | 
| Artifact: | per-server flag enabling CoS | 
| Response: | since now dependency tracker checks whether coming operation depends on any uncommitted ones | 
| Response measure: | dependent operations are slow, but recovery during the server's reconnect window is guaranteed to succeed | 
| Questions: | do we need this run-time? if so, we need to take care of possible races here | 
CoS disable
| Scenario: | CoS disable | 
| Business Goals: | Allow customers to control CoS | 
| Relevant QA's: | usability | 
| details | Stimulus source: | administrator | 
| Stimulus: | request through control utility and/or procfs | 
| Environment: | CoS is enabled | 
| Artifact: | per-server flag disabling CoS | 
| Response: | dependency tracker considers all operations independent since now | 
| Response measure: | dependent operations are fast, but recovery failure is possible in case of missed client | 
| Questions: | do we need this run-time? | 
- QAS template
| Scenario: |  | 
| Business Goals: |  | 
| Relevant QA's: |  | 
| details | Stimulus source: |  | 
| Stimulus: |  | 
| Environment: |  | 
| Artifact: |  | 
| Response: |  | 
| Response measure: |  | 
| Questions: |  | 
Memos for HLD
- is it possible to cancel client's lock and do sync in parallel?
Questions