WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Architecture - Sub Tree Locks

From Obsolete Lustre Wiki
Jump to navigationJump to search

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.


Subtree lock is a lock on a directory which protects an entire namespace (or its part) rooted at that directory. Subtree lock is supposed to be optimal for workloads where clients work in isolated directories and to not make things worse in highly contended workloads by resorting to current client-server locking protocol.


Sub tree lock
strong STL
an STL lock which invalidates all conflicting locks inside the sub tree. This is not of any use because of high acquisition and cancellation latencies
weak STL
an STL lock which delays lock conflict resolution until the STL holder actually accesses (fetches into its cache) conflicting object
extra weak STL, EW STL
an optimization to the weak STL mode when STL holder may response to a BAST by dropping object from the cache.
path revalidation
scanning the file name components up to the root for possible conflicts with STL locks


reduce lock RPC traffic for STL-locked objects.
reduce load of DLM server and memory consumption on servers and clients
provide a correct interaction between STLs and ordinary DLM locks.
usability to other components (WBC, disconnected operations).


Strong vs Weak STL

Strong STL has two disadvantages. First, it is too strong. Its acquiring immediately affects all locks behind and might force large caches to flush. Second, the Strong STL approach requires an ability to search all conflicting locks behind an STL lock. Even in non-CMD case it looks as an resource-eating task. That makes the Weak STL the primary candidate to implement. We are assuming Weak STL when we say STL or subtree lock below in this document.

What does a subtree lock protect

Subtree lock on a directory protects the directory itself explicitly (both attributes and body). All other objects in the namespace are protected unless they are open files, hardlinked files, mount points or locked by other clients.

What does a subtree lock not protect

open files, hardlinked files, mount points and locked objects are not protected by subtree lock. For all these cases but mount points subtree lock owner has to obtain ordinary lock on an object.

Subtree lock acquiring policy

server and client contribute to the policy

STL locking rules

  • any lock (STL and non-STL) can be acquired after a lookup from the fs root or after successful path revalidation procedure.
  • when and STL holder accesses hardlinked files, objects under conflicting ordinary locks, the thread fallbacks to ordinary lock mode (non STL).
  • when an STL holder's lookup operation crosses an ordinary locked directory, STL stops to work under that directory and the thread should continue with ordinary locks
  • taking a lock on parent directory starting with ordinary lock or leaving an STL lock protected area requires to revoke all conflicting STL locks above. The revalidation may stop when an ordinary directory lock is met.
  • a directory non-STL lock holder can lookups and take more ordinary locks under the directory.

Path revalidation

A procedure which recover object's full path and guarantees (at its completion) that there are no STL locks above the object.

Use cases

ID Quality attribute Summary
Acquire STL performance, scalability traverse path and get subtree lock for the last directory path component
Object access under subtree lock usability subtree lock "expands" as client does LOOKUP/GETATTR within the subtree
Concurrent lookup usability PR STL and non-STL do concurrently lookup the same object
Access to ".." usability Client accesses ".." under outsider's ST
CMD usability the directory and its children protected by STL is spread over several MDS servers
Policy usability MDS uses lock granting policy taking into account client desire and own consideration
Callback to lock usability client 1 holds an ordinary lock on an object X within a subtree of directory Y, client 2 acquires subtree lock on directory Y
Callback to subtree lock usability client 1 holds a subtree lock on a directory X, client 2 locks object within the directory X
Subtree lock and migration usability migration involves data which might be cached under subtree lock
Persistent subtree lock usability persistent subtree lock is used for disconnected operations
Split subtree lock performance on lock contention instead on yielding the whole lock, take locks on subdirs, the client contributes to the lock split policy
Subtree lock and proxy recovery if proxy server is disconnected and top level STL gets broken, some of its sub STL could survive and help further cache revalidation procedure

Quality Attribute Scenarios

Acquire STL

Scenario: traverse path and get subtree lock for the last directory path component
Business Goals: reduce DLM overhead
Relevant QA's: performance, scalability
details Stimulus: filesystem operation system call, mkdir, open, etc
Stimulus source: client application
Environment: normal use
Artifact: DLM
Response: the client traverses the path element by element, it requests from MDS ordinary locks for all path elements but the last directory path element where STL is requested, if MDS notices lock conflict it sends a BAST to conflicting client and grants the client with a requested lock. Conflicting locks which might exist below the STL root do not get checked.
Response measure: the client is able to access files under STL w/o any additional locking (but already locked files, open and hardlinked files, mount points)
Issues: the client does not have other subtree lock on the path it traverses

Object access under subtree lock

Scenario: client does LOOKUP/GETATTR within the subtree protected by STL
Business Goals: to maintain STL and ordinary locks consistency
Relevant QA's: usability
details Stimulus: filesystem operation system call, stat, chmod, etc
Stimulus source: client application
Environment: the client holds STL and accesses an object within the subtree, the object is not cached on the client
Artifact: filesystem object,DLM
Response: the client sends getattr request to an MDS, the MDS looks for object and if it is protected by the STL (that is it is not open, not hard link, not a mount point, not locked by other client) - attributes are returned to the client. Otherwise, ordinary lock acquire schema is applied.
Response measure: consistent cache of objects under STL and ordinary locks
Questions: when object is protected by STL - the server does not make a note that STL holder fetched the object?

Concurrent lookup

Scenario: An ordinary lock PR STL and non-STL do concurrently lookup the same object;

client application C1 has a CWD(/a/b/c) and has a PR lock on "/a/b/c"; Another client C2 has a PR STL(/a/b) and is doing a lookup for "/a/b/c/d";

Business Goals: to maintain STL and ordinary locks consistency
Relevant QA's: usability
details Stimulus: the non-STL holder creates "d/f" file.
Stimulus source: client application
Environment: normal
Artifact: filesystem object, DLM, MDS
Response: When C1 sees a lock on /a/b/c it continues the lookup op with ordinary locks tatakes a PW lock on "/a/b/c/d", C1 and C2 find a conflict between ordinary locks on "/a/b/c/d" and one of them (suppose C1) revokes a C2 lock or vice versa.
Response measure: there should be a lock conflict between C1 and C2 for the "/a/b/c/d"
Issues: If C2 didn't use oridinary locks there is a possibility for C1 and C2 to (incorrectly) take conflicting locks on "/a/b/c/d", C1 gets the object attributes and finds nothing to assume that /a/b/c/d is protected under STL, and C2 takes an ordinary PW lock on the same object.

Access to ".."

Scenario: a client accesses ".." which is under STL held by another client
Business Goals: take a lock correctly regarding STL lock above
Relevant QA's: usability, correctness
details Stimulus: filesystem operation system call
Stimulus source: client application
Environment: a client holds subtree lock on a directory, another client stays inside of that directory and goes up via ".."
Artifact: the ".." directory
Response: client ask server to perform path revalidation before taking the ".." lock and to revoke all conflicting STL locks above the object. Then the server grants a lock on ".."
Response measure: no conflicts with STL locks are missed.


Scenario: filesystem has a cluster of metdata servers
Business Goals:
Relevant QA's:
details Stimulus: filesystem configuration
Stimulus source: filesystem administrator
Environment: normal use
Response: subtree lock is granted on the MDS which stores root directory to be locked. There is nothing to do on other MDSs.
Response measure:
interaction between subtree lock CMD: what happens when subtree lock is given on a directory, whose subdirectories live on other servers.


this is about when _not_ to grant subtree lock

Scenario: client requests subtree lock, server decides whether to grant it
Business Goals: avoid lock acquisition ping-pong effect
Relevant QA's: performance
details Stimulus: client request
Stimulus source: client application
Environment: normal use
Artifact: DLM
Response: there are client policy and server policy, client decides to ask for subtree lock for last directory path component, server decides whether to grant subtree lock based on a history of accesses to the object
Response measure: fewer callbacks between server and clients

Callback to lock

Scenario: STL to ordinary lock conflict
Business Goals: Achiving lock correctness
Relevant QA's: usability
details Stimulus: a getattr request for a fs object under STL
Stimulus source: a client
Environment: An STL lock held by a client, another ordinary lock inside STL
Artifact: the server, DLM, the ordinary lock
Response: the STL owner fallbacks to ordinary lock mode and sends a BAST to the lock owner
Response measure: the ordinary lock owner gets BAST

Callback to subtree lock

Scenario: one client holds subtree lock, another client accesses an object in the namespace protected by the subtree lock
Business Goals: Achiving lock correctness
Relevant QA's: usability
details Stimulus: client request
Stimulus source: client application
Artifact: DLM
Response: another client has to assume that subtree lock holder cached all objects in the subtree and to send BAST to subtree lock holder about the particular object it needs, subtree lock holder has to flush the object if it was changed and de-cache it, in order to be cached later if necessary or the subtree lock can split into subtree locks of sub directories
Response measure: client caches needed objects, subtree lock holder has lost it
Issues: one can't get a new lock without subtree traversal, therefore this can only happen when trying to lock root of subtree

Subtree lock and migration

Scenario: data protected by STL is being migrated
Business Goals: correct migration
Relevant QA's: usability
details Stimulus: STL revokation request
Stimulus source: migration agent
Environment: Clients use STL and may have dirty caches, running migration meets an STL
Artifact: STL locks, DLM
Response: Client flushes caches and cancels STLs
Response measure: STLs are revoked

Persistent subtree lock

persistent subtree lock is granted after commit

Scenario: Acquiring persistent STL lock
Business Goals: support of disconnected operations
Relevant QA's: availability
details Stimulus: A client
Stimulus source: a lock request for Persistent STL
Environment: a Lustre cluster
Artifact: a server, DLM, the directory object
Response: the server grants PSTL after properly logging the lock operation on disk and only when the underlaying fs transaction is committed
Response measure: PSTL survives server crash

Split subtree lock

Scenario: releasing an STL lock due to another client request and keeping STLs on the directory children
Business Goals: avoid whole STL lock flushing
Relevant QA's: performance
details Stimulus: a lock request conflicting with the STL lock
Stimulus source: a client
Environment: a directory under STL lock, another client access the directory with conflicting ordinary lock
Artifact: STL lock, DLM
Response: Using lock request information about what is targeted inside STL

and information from the client which sub STL are more important for the client, the server splits the STL most optimal way.

Response measure: the lock is granted and the STL gets split

Subtree lock and proxy

Scenario: a STL lock owner, proxy server, with dirty cached data, was disconnected for some time and now reconnects back to the cluster
Business Goals: fast proxy cache content revalidation, achiving better cache revalidation result - save more cached data
Relevant QA's: performance, availability
details Stimulus: proxy cache reconnect event
Stimulus source: proxy cache
Environment: proxy server had an STL lock protecting the dirty cached data, then the proxy disconneted and the lock was broken and transformed into still valid sub STL locks
Artifact: the proxy cache server
Response: the proxy cache gets an information how the lock was transformed and starts a cache revalidation procedure for the parts of the STL lock which were missed during disconnection period
Response measure: cache integration speed, minimumum data loss

Implementation details

1. inode protected with subtree lock (during lookup) protects all objects

  • if you take a subtree lock on MDT, everything underneath is now unreachable.
  • may already be existing locks under subtree, can't expand them up.
  • if you haven't getattr on an element of subtree, there may be a lock on it already

2. if caching under an STL hits open file, open dir, hardlink or mount point ordinary lock is granted.

  • this is because once a file is open, client has fid access, doesn't need to traverse anymore, so it will not see that file is protected by subtree lock.

3. any use of ".." on client requires revalidate path - new fs method on client, or can do it on server (harder on server with cmd)

  • this is because in a subdir under a stl held on a different client and doing, for example, stat(..),

we don't traverse through stl, client knows fid so we do stat by fid, bypasses name traversal, so we don't see conflict with stl. Path revalidation (on server?) is needed.

4. when storage management by FID on directories, all subtree locks are revoked

  • object is cached on client without server knowing it
  • or maybe migration is fine, we just mark it dirty after we flush subtree lock
  • layout lock bit must be protected? client must lock layout before using during migration - must update it on the mds anyhow

???5. during migration STL cached data is "layout" invalidated (everything with a new layout must be flushed) - and data,

on all clients (broadcast!) (degraded performance during migration)

6. Every lookup based on stl includes fid of STL root??

7. If stl1 is called back

  • flush update cache
  • take stl(i)'s on children of stl1, callback on stl1 then client requests N stli's for children with N < ...
  • release stl1
  • (client policy)
  • do this so that e.g. ls -l on parent can finish without having to flush big proxy cache

8. collect access statistics on server in order to avoid subtree locks on highly contended resources.

  • If stl(i) sees cb's > x msec then no more stl(i)'s (server)

9. persistent STL is granted after commit


bug 14176