Architecture - Sub Tree Locks

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Summary

Subtree lock is a lock on a directory which protects an entire namespace (or its part) rooted at that directory. Subtree lock is supposed to be optimal for workloads where clients work in isolated directories and to not make things worse in highly contended workloads by resorting to current client-server locking protocol.

Definitions

STL: Sub tree lock
strong STL: an STL lock which invalidates all conflicting locks inside the sub tree. This is not of any use because of high acquisition and cancellation latencies
weak STL: an STL lock which delays lock conflict resolution until the STL holder actually accesses (fetches into its cache) conflicting object
extra weak STL, EW STL: an optimization to the weak STL mode when STL holder may response to a BAST by dropping object from the cache.
path revalidation: scanning the file name components up to the root for possible conflicts with STL locks

Requirements

Performance: reduce lock RPC traffic for STL-locked objects.
Scalability: reduce load of DLM server and memory consumption on servers and clients
Correctness: provide a correct interaction between STLs and ordinary DLM locks.
Usability: usability to other components (WBC, disconnected operations).

Details

Strong vs Weak STL

Strong STL has two disadvantages. First, it is too strong. Its acquiring immediately affects all locks behind and might force large caches to flush. Second, the Strong STL approach requires an ability to search all conflicting locks behind an STL lock. Even in non-CMD case it looks as an resource-eating task. That makes the Weak STL the primary candidate to implement. We are assuming Weak STL when we say STL or subtree lock below in this document.

What does a subtree lock protect

Subtree lock on a directory protects the directory itself explicitly (both attributes and body). All other objects in the namespace are protected unless they are open files, hardlinked files, mount points or locked by other clients.

What does a subtree lock not protect

open files, hardlinked files, mount points and locked objects are not protected by subtree lock. For all these cases but mount points subtree lock owner has to obtain ordinary lock on an object.

Subtree lock acquiring policy

server and client contribute to the policy

STL locking rules

any lock (STL and non-STL) can be acquired after a lookup from the fs root or after successful path revalidation procedure.
when and STL holder accesses hardlinked files, objects under conflicting ordinary locks, the thread fallbacks to ordinary lock mode (non STL).
when an STL holder's lookup operation crosses an ordinary locked directory, STL stops to work under that directory and the thread should continue with ordinary locks
taking a lock on parent directory starting with ordinary lock or leaving an STL lock protected area requires to revoke all conflicting STL locks above. The revalidation may stop when an ordinary directory lock is met.
a directory non-STL lock holder can lookups and take more ordinary locks under the directory.

Path revalidation

A procedure which recover object's full path and guarantees (at its completion) that there are no STL locks above the object.

Use cases

ID	Quality attribute	Summary
Acquire STL	performance, scalability	traverse path and get subtree lock for the last directory path component
Object access under subtree lock	usability	subtree lock "expands" as client does LOOKUP/GETATTR within the subtree
Concurrent lookup	usability	PR STL and non-STL do concurrently lookup the same object
Access to ".."	usability	Client accesses ".." under outsider's ST
CMD	usability	the directory and its children protected by STL is spread over several MDS servers
Policy	usability	MDS uses lock granting policy taking into account client desire and own consideration
Callback to lock	usability	client 1 holds an ordinary lock on an object X within a subtree of directory Y, client 2 acquires subtree lock on directory Y
Callback to subtree lock	usability	client 1 holds a subtree lock on a directory X, client 2 locks object within the directory X
Subtree lock and migration	usability	migration involves data which might be cached under subtree lock
Persistent subtree lock	usability	persistent subtree lock is used for disconnected operations
Split subtree lock	performance	on lock contention instead on yielding the whole lock, take locks on subdirs, the client contributes to the lock split policy
Subtree lock and proxy	recovery	if proxy server is disconnected and top level STL gets broken, some of its sub STL could survive and help further cache revalidation procedure

Quality Attribute Scenarios

Acquire STL

Scenario:		traverse path and get subtree lock for the last directory path component
Business Goals:		reduce DLM overhead
Relevant QA's:		performance, scalability
details	Stimulus:	filesystem operation system call, mkdir, open, etc
	Stimulus source:	client application
	Environment:	normal use
	Artifact:	DLM
	Response:	the client traverses the path element by element, it requests from MDS ordinary locks for all path elements but the last directory path element where STL is requested, if MDS notices lock conflict it sends a BAST to conflicting client and grants the client with a requested lock. Conflicting locks which might exist below the STL root do not get checked.
	Response measure:	the client is able to access files under STL w/o any additional locking (but already locked files, open and hardlinked files, mount points)
Questions:
Issues:		the client does not have other subtree lock on the path it traverses

Object access under subtree lock

Scenario:		client does LOOKUP/GETATTR within the subtree protected by STL
Business Goals:		to maintain STL and ordinary locks consistency
Relevant QA's:		usability
details	Stimulus:	filesystem operation system call, stat, chmod, etc
	Stimulus source:	client application
	Environment:	the client holds STL and accesses an object within the subtree, the object is not cached on the client
	Artifact:	filesystem object,DLM
	Response:	the client sends getattr request to an MDS, the MDS looks for object and if it is protected by the STL (that is it is not open, not hard link, not a mount point, not locked by other client) - attributes are returned to the client. Otherwise, ordinary lock acquire schema is applied.
	Response measure:	consistent cache of objects under STL and ordinary locks
Questions:		when object is protected by STL - the server does not make a note that STL holder fetched the object?
Issues:

Concurrent lookup

Scenario:		An ordinary lock PR STL and non-STL do concurrently lookup the same object; client application C1 has a CWD(/a/b/c) and has a PR lock on "/a/b/c"; Another client C2 has a PR STL(/a/b) and is doing a lookup for "/a/b/c/d";
Business Goals:		to maintain STL and ordinary locks consistency
Relevant QA's:		usability
details	Stimulus:	the non-STL holder creates "d/f" file.
	Stimulus source:	client application
	Environment:	normal
	Artifact:	filesystem object, DLM, MDS
	Response:	When C1 sees a lock on /a/b/c it continues the lookup op with ordinary locks tatakes a PW lock on "/a/b/c/d", C1 and C2 find a conflict between ordinary locks on "/a/b/c/d" and one of them (suppose C1) revokes a C2 lock or vice versa.
	Response measure:	there should be a lock conflict between C1 and C2 for the "/a/b/c/d"
Questions:
Issues:		If C2 didn't use oridinary locks there is a possibility for C1 and C2 to (incorrectly) take conflicting locks on "/a/b/c/d", C1 gets the object attributes and finds nothing to assume that /a/b/c/d is protected under STL, and C2 takes an ordinary PW lock on the same object.

Access to ".."

Scenario:		a client accesses ".." which is under STL held by another client
Business Goals:		take a lock correctly regarding STL lock above
Relevant QA's:		usability, correctness
details	Stimulus:	filesystem operation system call
	Stimulus source:	client application
	Environment:	a client holds subtree lock on a directory, another client stays inside of that directory and goes up via ".."
	Artifact:	the ".." directory
	Response:	client ask server to perform path revalidation before taking the ".." lock and to revoke all conflicting STL locks above the object. Then the server grants a lock on ".."
	Response measure:	no conflicts with STL locks are missed.
Questions:
Issues:

CMD

Scenario:		filesystem has a cluster of metdata servers
Business Goals:
Relevant QA's:
details	Stimulus:	filesystem configuration
	Stimulus source:	filesystem administrator
	Environment:	normal use
	Artifact:
	Response:	subtree lock is granted on the MDS which stores root directory to be locked. There is nothing to do on other MDSs.
	Response measure:
Questions:
Issues:

interaction between subtree lock CMD: what happens when subtree lock is given on a directory, whose subdirectories live on other servers.

Policy

this is about when _not_ to grant subtree lock

Scenario:		client requests subtree lock, server decides whether to grant it
Business Goals:		avoid lock acquisition ping-pong effect
Relevant QA's:		performance
details	Stimulus:	client request
	Stimulus source:	client application
	Environment:	normal use
	Artifact:	DLM
	Response:	there are client policy and server policy, client decides to ask for subtree lock for last directory path component, server decides whether to grant subtree lock based on a history of accesses to the object
	Response measure:	fewer callbacks between server and clients
Questions:
Issues:

Callback to lock

Scenario:		STL to ordinary lock conflict
Business Goals:		Achiving lock correctness
Relevant QA's:		usability
details	Stimulus:	a getattr request for a fs object under STL
	Stimulus source:	a client
	Environment:	An STL lock held by a client, another ordinary lock inside STL
	Artifact:	the server, DLM, the ordinary lock
	Response:	the STL owner fallbacks to ordinary lock mode and sends a BAST to the lock owner
	Response measure:	the ordinary lock owner gets BAST
Questions:
Issues:

Callback to subtree lock

Scenario:		one client holds subtree lock, another client accesses an object in the namespace protected by the subtree lock
Business Goals:		Achiving lock correctness
Relevant QA's:		usability
details	Stimulus:	client request
	Stimulus source:	client application
	Environment:
	Artifact:	DLM
	Response:	another client has to assume that subtree lock holder cached all objects in the subtree and to send BAST to subtree lock holder about the particular object it needs, subtree lock holder has to flush the object if it was changed and de-cache it, in order to be cached later if necessary or the subtree lock can split into subtree locks of sub directories
	Response measure:	client caches needed objects, subtree lock holder has lost it
Questions:
Issues:		one can't get a new lock without subtree traversal, therefore this can only happen when trying to lock root of subtree

Subtree lock and migration

Scenario:		data protected by STL is being migrated
Business Goals:		correct migration
Relevant QA's:		usability
details	Stimulus:	STL revokation request
	Stimulus source:	migration agent
	Environment:	Clients use STL and may have dirty caches, running migration meets an STL
	Artifact:	STL locks, DLM
	Response:	Client flushes caches and cancels STLs
	Response measure:	STLs are revoked
Questions:
Issues:

Persistent subtree lock

persistent subtree lock is granted after commit

Scenario:		Acquiring persistent STL lock
Business Goals:		support of disconnected operations
Relevant QA's:		availability
details	Stimulus:	A client
	Stimulus source:	a lock request for Persistent STL
	Environment:	a Lustre cluster
	Artifact:	a server, DLM, the directory object
	Response:	the server grants PSTL after properly logging the lock operation on disk and only when the underlaying fs transaction is committed
	Response measure:	PSTL survives server crash
Questions:
Issues:

Split subtree lock

Scenario:		releasing an STL lock due to another client request and keeping STLs on the directory children
Business Goals:		avoid whole STL lock flushing
Relevant QA's:		performance
details	Stimulus:	a lock request conflicting with the STL lock
	Stimulus source:	a client
	Environment:	a directory under STL lock, another client access the directory with conflicting ordinary lock
	Artifact:	STL lock, DLM
	Response:	Using lock request information about what is targeted inside STL and information from the client which sub STL are more important for the client, the server splits the STL most optimal way.
	Response measure:	the lock is granted and the STL gets split
Questions:
Issues:

Subtree lock and proxy

Scenario:		a STL lock owner, proxy server, with dirty cached data, was disconnected for some time and now reconnects back to the cluster
Business Goals:		fast proxy cache content revalidation, achiving better cache revalidation result - save more cached data
Relevant QA's:		performance, availability
details	Stimulus:	proxy cache reconnect event
	Stimulus source:	proxy cache
	Environment:	proxy server had an STL lock protecting the dirty cached data, then the proxy disconneted and the lock was broken and transformed into still valid sub STL locks
	Artifact:	the proxy cache server
	Response:	the proxy cache gets an information how the lock was transformed and starts a cache revalidation procedure for the parts of the STL lock which were missed during disconnection period
	Response measure:	cache integration speed, minimumum data loss
Questions:
Issues:

Implementation details

1. inode protected with subtree lock (during lookup) protects all objects

if you take a subtree lock on MDT, everything underneath is now unreachable.
may already be existing locks under subtree, can't expand them up.
if you haven't getattr on an element of subtree, there may be a lock on it already

2. if caching under an STL hits open file, open dir, hardlink or mount point ordinary lock is granted.

this is because once a file is open, client has fid access, doesn't need to traverse anymore, so it will not see that file is protected by subtree lock.

3. any use of ".." on client requires revalidate path - new fs method on client, or can do it on server (harder on server with cmd)

this is because in a subdir under a stl held on a different client and doing, for example, stat(..),

we don't traverse through stl, client knows fid so we do stat by fid, bypasses name traversal, so we don't see conflict with stl. Path revalidation (on server?) is needed.

4. when storage management by FID on directories, all subtree locks are revoked

object is cached on client without server knowing it
or maybe migration is fine, we just mark it dirty after we flush subtree lock
layout lock bit must be protected? client must lock layout before using during migration - must update it on the mds anyhow

???5. during migration STL cached data is "layout" invalidated (everything with a new layout must be flushed) - and data,

on all clients (broadcast!) (degraded performance during migration)

6. Every lookup based on stl includes fid of STL root??

7. If stl1 is called back

flush update cache
take stl(i)'s on children of stl1, callback on stl1 then client requests N stli's for children with N < ...
release stl1
(client policy)
do this so that e.g. ls -l on parent can finish without having to flush big proxy cache

8. collect access statistics on server in order to avoid subtree locks on highly contended resources.

If stl(i) sees cb's > x msec then no more stl(i)'s (server)

9. persistent STL is granted after commit

References

bug 14176

WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.