WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Architecture - Commit on Share

From Obsolete Lustre Wiki
Jump to navigationJump to search

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Summary

Commit-on-Share is intended to allow better recoverability in enviroment where clients miss reconnect window.

Definitions

Dependent transactions
Two transactions are dependent if the second one cannot be executed until the first one is executed.
Isolation
defines level to which we consider transactions dependent:
  • per-object -- all changes to same object are considered dependent,
  • fine-grained -- some changes to same object are considered independent.
Uncommitted object
an object with changes cached and non-committed to disk.
Dependency resolution
remove request from replay queue committing it to persistent storage

Requirements

  1. Provide with mechanism to avoid non-recoverable requests .
  2. Mechanism to be optional (runtime???) in order to allow users to choose between performance and reliability.
  3. No changes in wire protocol are allowed.
  4. Provide compatibility for old clients.

Use Cases

ID Quality Attribute Summary
dependent request from same client performance performance shouldn't suffer with COS enabled
independent request from different client performance performance shouldn't suffer with COS enabled
dependent request from different client availability no dependency allowed after request execution, must be resolved before
set of independent requests availability, performance with fine-grained isolation request can be independent from all except some one
commit availability, performance commit event
CoS enable usability when and how we can enable CoS
CoS disable usability when and how we can disable CoS

Quality Attribute Scenarios

Dependent request from same client

Scenario: Dependent request from same client
Business Goals: application's performance doesn't drop
Relevant QA's: performance
details Stimulus source: application
Stimulus: request modifying file system
Environment: object has non-committed modification and new request depends on that
Artifact: a record for dependency tracker
Response: immediate execution, no dependency resolution is required
Response measure: roughly same performance as with COS disabled
Questions:


Independent request from different client

Scenario: Independent request from different client
Business Goals: application's performance doesn't drop
Relevant QA's: performance
details Stimulus source: application
Stimulus: request modifying file system
Environment: object has no modifications which new request depends on
Artifact: a record for dependency tracker
Response: immediate execution, no dependency resolution required
Response measure: roughly same performance as with COS disabled
Questions:

Dependent request from different client

Scenario: Dependent request from different client
Business Goals: prevent recovery failure if client doesn't re-connect in time
Relevant QA's: availablity
details Stimulus source: application
Stimulus: request modifying file system
Environment: object has non-committed modification and new request depends on that
Artifact: old dependency records are released, new one is created
Response: server resolves dependency flushing non-committed changes and suspend current operation till commit event
Response measure: performance degrades compared non-CoS
Questions:

Set of independent requests

Scenario: Set of independent requests
Business Goals: Allow few independent requests against same object to co-exist
Relevant QA's: performance, availability
details Stimulus source: application
Stimulus: few requests modifying file system
Environment: few clients issue requests against same object
Artifact: few records for dependency tracker
Response: immediate execution, no dependency resolution required, but each request should be checked it doesn't depend on any request from the set
Response measure: performance doesn't degrade significantly
Questions: is it really a requirement for current CoS?

Commit

Scenario: Commit
Business Goals: Block dependent operations for as short as possible
Relevant QA's: availability, performance
details Stimulus source: underlying disk file system
Stimulus: all previous changes are committed to storage
Environment:
Artifact: all dependencies become resolved
Response: dependency tracker get new "committed" border and continue suspended operations
Response measure: Dependency tables don't grow indefinitely
Questions:

CoS enable

Scenario: CoS enable
Business Goals: allow customers to control CoS
Relevant QA's: usability
details Stimulus source: administrator
Stimulus: request through control utility and/or procfs
Environment: CoS is disabled
Artifact: per-server flag enabling CoS
Response: since now dependency tracker checks whether coming operation depends on any uncommitted ones
Response measure: dependent operations are slow, but recovery during the server's reconnect window is guaranteed to succeed
Questions: do we need this run-time? if so, we need to take care of possible races here

CoS disable

Scenario: CoS disable
Business Goals: Allow customers to control CoS
Relevant QA's: usability
details Stimulus source: administrator
Stimulus: request through control utility and/or procfs
Environment: CoS is enabled
Artifact: per-server flag disabling CoS
Response: dependency tracker considers all operations independent since now
Response measure: dependent operations are fast, but recovery failure is possible in case of missed client
Questions: do we need this run-time?


QAS template
Scenario:
Business Goals:
Relevant QA's:
details Stimulus source:
Stimulus:
Environment:
Artifact:
Response:
Response measure:
Questions:

Memos for HLD

  1. is it possible to cancel client's lock and do sync in parallel?

Questions

  • runtime control?