WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Architecture - Sub Tree Locks
Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.
Summary
Subtree lock is a lock on a directory which protects an entire namespace (or its part) rooted at that directory. Subtree lock is supposed to be optimal for workloads where clients work in isolated directories and to not make things worse in highly contended workloads by resorting to current client-server locking protocol.
Definitions
- STL
- Sub tree lock
- strong STL
- an STL lock which invalidates all conflicting locks inside the sub tree. This is not of any use because of high acquisition and cancellation latencies
- weak STL
- an STL lock which delays lock conflict resolution until the STL holder actually accesses (fetches into its cache) conflicting object
- extra weak STL, EW STL
- an optimization to the weak STL mode when STL holder may response to a BAST by dropping object from the cache.
- path revalidation
- scanning the file name components up to the root for possible conflicts with STL locks
Requirements
- Performance
- reduce lock RPC traffic for STL-locked objects.
- Scalability
- reduce load of DLM server and memory consumption on servers and clients
- Correctness
- provide a correct interaction between STLs and ordinary DLM locks.
- Usability
- usability to other components (WBC, disconnected operations).
Details
Strong vs Weak STL
Strong STL has two disadvantages. First, it is too strong. Its acquiring immediately affects all locks behind and might force large caches to flush. Second, the Strong STL approach requires an ability to search all conflicting locks behind an STL lock. Even in non-CMD case it looks as an resource-eating task. That makes the Weak STL the primary candidate to implement. We are assuming Weak STL when we say STL or subtree lock below in this document.
What does a subtree lock protect
Subtree lock on a directory protects the directory itself explicitly (both attributes and body). All other objects in the namespace are protected unless they are open files, hardlinked files, mount points or locked by other clients.
What does a subtree lock not protect
open files, hardlinked files, mount points and locked objects are not protected by subtree lock. For all these cases but mount points subtree lock owner has to obtain ordinary lock on an object.
Subtree lock acquiring policy
server and client contribute to the policy
STL locking rules
- any lock (STL and non-STL) can be acquired after a lookup from the fs root or after successful path revalidation procedure.
- when and STL holder accesses hardlinked files, objects under conflicting ordinary locks, the thread fallbacks to ordinary lock mode (non STL).
- when an STL holder's lookup operation crosses an ordinary locked directory, STL stops to work under that directory and the thread should continue with ordinary locks
- taking a lock on parent directory starting with ordinary lock or leaving an STL lock protected area requires to revoke all conflicting STL locks above. The revalidation may stop when an ordinary directory lock is met.
- a directory non-STL lock holder can lookups and take more ordinary locks under the directory.
Path revalidation
A procedure which recover object's full path and guarantees (at its completion) that there are no STL locks above the object.
Use cases
ID | Quality attribute | Summary |
---|---|---|
Acquire STL | performance, scalability | traverse path and get subtree lock for the last directory path component |
Object access under subtree lock | usability | subtree lock "expands" as client does LOOKUP/GETATTR within the subtree |
Concurrent lookup | usability | PR STL and non-STL do concurrently lookup the same object |
Access to ".." | usability | Client accesses ".." under outsider's ST |
CMD | usability | the directory and its children protected by STL is spread over several MDS servers |
Policy | usability | MDS uses lock granting policy taking into account client desire and own consideration |
Callback to lock | usability | client 1 holds an ordinary lock on an object X within a subtree of directory Y, client 2 acquires subtree lock on directory Y |
Callback to subtree lock | usability | client 1 holds a subtree lock on a directory X, client 2 locks object within the directory X |
Subtree lock and migration | usability | migration involves data which might be cached under subtree lock |
Persistent subtree lock | usability | persistent subtree lock is used for disconnected operations |
Split subtree lock | performance | on lock contention instead on yielding the whole lock, take locks on subdirs, the client contributes to the lock split policy |
Subtree lock and proxy | recovery | if proxy server is disconnected and top level STL gets broken, some of its sub STL could survive and help further cache revalidation procedure |
Quality Attribute Scenarios
Acquire STL
Scenario: | traverse path and get subtree lock for the last directory path component | |
Business Goals: | reduce DLM overhead | |
Relevant QA's: | performance, scalability | |
details | Stimulus: | filesystem operation system call, mkdir, open, etc |
Stimulus source: | client application | |
Environment: | normal use | |
Artifact: | DLM | |
Response: | the client traverses the path element by element, it requests from MDS ordinary locks for all path elements but the last directory path element where STL is requested, if MDS notices lock conflict it sends a BAST to conflicting client and grants the client with a requested lock. Conflicting locks which might exist below the STL root do not get checked. | |
Response measure: | the client is able to access files under STL w/o any additional locking (but already locked files, open and hardlinked files, mount points) | |
Questions: | ||
Issues: | the client does not have other subtree lock on the path it traverses |
Object access under subtree lock
Scenario: | client does LOOKUP/GETATTR within the subtree protected by STL | |
Business Goals: | to maintain STL and ordinary locks consistency | |
Relevant QA's: | usability | |
details | Stimulus: | filesystem operation system call, stat, chmod, etc |
Stimulus source: | client application | |
Environment: | the client holds STL and accesses an object within the subtree, the object is not cached on the client | |
Artifact: | filesystem object,DLM | |
Response: | the client sends getattr request to an MDS, the MDS looks for object and if it is protected by the STL (that is it is not open, not hard link, not a mount point, not locked by other client) - attributes are returned to the client. Otherwise, ordinary lock acquire schema is applied. | |
Response measure: | consistent cache of objects under STL and ordinary locks | |
Questions: | when object is protected by STL - the server does not make a note that STL holder fetched the object? | |
Issues: |
Concurrent lookup
Scenario: | An ordinary lock PR STL and non-STL do concurrently lookup the same object;
client application C1 has a CWD(/a/b/c) and has a PR lock on "/a/b/c"; Another client C2 has a PR STL(/a/b) and is doing a lookup for "/a/b/c/d"; | |
Business Goals: | to maintain STL and ordinary locks consistency | |
Relevant QA's: | usability | |
details | Stimulus: | the non-STL holder creates "d/f" file. |
Stimulus source: | client application | |
Environment: | normal | |
Artifact: | filesystem object, DLM, MDS | |
Response: | When C1 sees a lock on /a/b/c it continues the lookup op with ordinary locks tatakes a PW lock on "/a/b/c/d", C1 and C2 find a conflict between ordinary locks on "/a/b/c/d" and one of them (suppose C1) revokes a C2 lock or vice versa. | |
Response measure: | there should be a lock conflict between C1 and C2 for the "/a/b/c/d" | |
Questions: | ||
Issues: | If C2 didn't use oridinary locks there is a possibility for C1 and C2 to (incorrectly) take conflicting locks on "/a/b/c/d", C1 gets the object attributes and finds nothing to assume that /a/b/c/d is protected under STL, and C2 takes an ordinary PW lock on the same object. |
Access to ".."
Scenario: | a client accesses ".." which is under STL held by another client | |
Business Goals: | take a lock correctly regarding STL lock above | |
Relevant QA's: | usability, correctness | |
details | Stimulus: | filesystem operation system call |
Stimulus source: | client application | |
Environment: | a client holds subtree lock on a directory, another client stays inside of that directory and goes up via ".." | |
Artifact: | the ".." directory | |
Response: | client ask server to perform path revalidation before taking the ".." lock and to revoke all conflicting STL locks above the object. Then the server grants a lock on ".." | |
Response measure: | no conflicts with STL locks are missed. | |
Questions: | ||
Issues: |
CMD
Scenario: | filesystem has a cluster of metdata servers | |
Business Goals: | ||
Relevant QA's: | ||
details | Stimulus: | filesystem configuration |
Stimulus source: | filesystem administrator | |
Environment: | normal use | |
Artifact: | ||
Response: | subtree lock is granted on the MDS which stores root directory to be locked. There is nothing to do on other MDSs. | |
Response measure: | ||
Questions: | ||
Issues: |
- interaction between subtree lock CMD: what happens when subtree lock is given on a directory, whose subdirectories live on other servers.
Policy
this is about when _not_ to grant subtree lock
Scenario: | client requests subtree lock, server decides whether to grant it | |
Business Goals: | avoid lock acquisition ping-pong effect | |
Relevant QA's: | performance | |
details | Stimulus: | client request |
Stimulus source: | client application | |
Environment: | normal use | |
Artifact: | DLM | |
Response: | there are client policy and server policy, client decides to ask for subtree lock for last directory path component, server decides whether to grant subtree lock based on a history of accesses to the object | |
Response measure: | fewer callbacks between server and clients | |
Questions: | ||
Issues: |
Callback to lock
Scenario: | STL to ordinary lock conflict | |
Business Goals: | Achiving lock correctness | |
Relevant QA's: | usability | |
details | Stimulus: | a getattr request for a fs object under STL |
Stimulus source: | a client | |
Environment: | An STL lock held by a client, another ordinary lock inside STL | |
Artifact: | the server, DLM, the ordinary lock | |
Response: | the STL owner fallbacks to ordinary lock mode and sends a BAST to the lock owner | |
Response measure: | the ordinary lock owner gets BAST | |
Questions: | ||
Issues: |
Callback to subtree lock
Scenario: | one client holds subtree lock, another client accesses an object in the namespace protected by the subtree lock | |
Business Goals: | Achiving lock correctness | |
Relevant QA's: | usability | |
details | Stimulus: | client request |
Stimulus source: | client application | |
Environment: | ||
Artifact: | DLM | |
Response: | another client has to assume that subtree lock holder cached all objects in the subtree and to send BAST to subtree lock holder about the particular object it needs, subtree lock holder has to flush the object if it was changed and de-cache it, in order to be cached later if necessary or the subtree lock can split into subtree locks of sub directories | |
Response measure: | client caches needed objects, subtree lock holder has lost it | |
Questions: | ||
Issues: | one can't get a new lock without subtree traversal, therefore this can only happen when trying to lock root of subtree |
Subtree lock and migration
Scenario: | data protected by STL is being migrated | |
Business Goals: | correct migration | |
Relevant QA's: | usability | |
details | Stimulus: | STL revokation request |
Stimulus source: | migration agent | |
Environment: | Clients use STL and may have dirty caches, running migration meets an STL | |
Artifact: | STL locks, DLM | |
Response: | Client flushes caches and cancels STLs | |
Response measure: | STLs are revoked | |
Questions: | ||
Issues: |
Persistent subtree lock
persistent subtree lock is granted after commit
Scenario: | Acquiring persistent STL lock | |
Business Goals: | support of disconnected operations | |
Relevant QA's: | availability | |
details | Stimulus: | A client |
Stimulus source: | a lock request for Persistent STL | |
Environment: | a Lustre cluster | |
Artifact: | a server, DLM, the directory object | |
Response: | the server grants PSTL after properly logging the lock operation on disk and only when the underlaying fs transaction is committed | |
Response measure: | PSTL survives server crash | |
Questions: | ||
Issues: |
Split subtree lock
Scenario: | releasing an STL lock due to another client request and keeping STLs on the directory children | |
---|---|---|
Business Goals: | avoid whole STL lock flushing | |
Relevant QA's: | performance | |
details | Stimulus: | a lock request conflicting with the STL lock |
Stimulus source: | a client | |
Environment: | a directory under STL lock, another client access the directory with conflicting ordinary lock | |
Artifact: | STL lock, DLM | |
Response: | Using lock request information about what is targeted inside STL
and information from the client which sub STL are more important for the client, the server splits the STL most optimal way. | |
Response measure: | the lock is granted and the STL gets split | |
Questions: | ||
Issues: |
Subtree lock and proxy
Scenario: | a STL lock owner, proxy server, with dirty cached data, was disconnected for some time and now reconnects back to the cluster | |
Business Goals: | fast proxy cache content revalidation, achiving better cache revalidation result - save more cached data | |
Relevant QA's: | performance, availability | |
details | Stimulus: | proxy cache reconnect event |
Stimulus source: | proxy cache | |
Environment: | proxy server had an STL lock protecting the dirty cached data, then the proxy disconneted and the lock was broken and transformed into still valid sub STL locks | |
Artifact: | the proxy cache server | |
Response: | the proxy cache gets an information how the lock was transformed and starts a cache revalidation procedure for the parts of the STL lock which were missed during disconnection period | |
Response measure: | cache integration speed, minimumum data loss | |
Questions: | ||
Issues: |
Implementation details
1. inode protected with subtree lock (during lookup) protects all objects
- if you take a subtree lock on MDT, everything underneath is now unreachable.
- may already be existing locks under subtree, can't expand them up.
- if you haven't getattr on an element of subtree, there may be a lock on it already
2. if caching under an STL hits open file, open dir, hardlink or mount point ordinary lock is granted.
- this is because once a file is open, client has fid access, doesn't need to traverse anymore, so it will not see that file is protected by subtree lock.
3. any use of ".." on client requires revalidate path - new fs method on client, or can do it on server (harder on server with cmd)
- this is because in a subdir under a stl held on a different client and doing, for example, stat(..),
we don't traverse through stl, client knows fid so we do stat by fid, bypasses name traversal, so we don't see conflict with stl. Path revalidation (on server?) is needed.
4. when storage management by FID on directories, all subtree locks are revoked
- object is cached on client without server knowing it
- or maybe migration is fine, we just mark it dirty after we flush subtree lock
- layout lock bit must be protected? client must lock layout before using during migration - must update it on the mds anyhow
???5. during migration STL cached data is "layout" invalidated (everything with a new layout must be flushed) - and data,
- on all clients (broadcast!) (degraded performance during migration)
6. Every lookup based on stl includes fid of STL root??
7. If stl1 is called back
- flush update cache
- take stl(i)'s on children of stl1, callback on stl1 then client requests N stli's for children with N < ...
- release stl1
- (client policy)
- do this so that e.g. ls -l on parent can finish without having to flush big proxy cache
8. collect access statistics on server in order to avoid subtree locks on highly contended resources.
- If stl(i) sees cb's > x msec then no more stl(i)'s (server)
9. persistent STL is granted after commit