WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Architecture - Request Redirection

From Obsolete Lustre Wiki
Revision as of 13:22, 22 January 2010 by Docadmin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.


Target OST
an OST which file data were initially written to, it receives data access requests from clients, manages locks, maintains information about data cached by clients and dedicated servers, maintains persistent and in-code redirect information.
an application which accesses data from target OST
moving file data from one set of OSTs to another
Collaborative cache
read only cache distributed over clients or dedicated cache servers. Target OSTs maintain in-core information about clients and cache servers and data they are caching and redirect read accesses to appropriate cache server.
Flash cache
write only cache. There are dedicated flash cache servers. Target OSTs maintain (persistent?) information about data cached by flash cache servers and redirect write accesses to them.


Request Redirection is a mechanism which allows target OST to redirect client requests to other servers. Target OST may decide to redirect a request in hope to improve system throughput or may have to redirect in case when it does not store requested data anymore.


clients get redirected to instances of data created via different ways with the same request redirection mechanism
the request redirection mechanism can allow clients to specify preferences (for example, "do not redirect me", or "let me choose myself")
adding new ways to create instances of data should require no or minimal changes to request redirection mechanism
requested data are accessible either at servers to which a client is redirected or at target OST
target OST keeps all the information about possible redirections, all lock requests get sent to it, it does locking and optionally can redirect a client to another server where data access will happen without further locking
redirection information can be stored persistently when it has to survive reboots (for example in case of migration) or it can be maintained in memory only (for example in case of collaborative cache)
request redirection mechanism provides means for clients to send/receive redirection information to/from a target OST, means for target OST to store that information
there may be several instances of the same data. Request redirection mechanism should be able to deal with that. For example, in case of collaborative cache target OST has to be able to find clients which are caching requested data and to choose where to redirect.

Use Cases

id quality attribute summary
collaborative cache populating performance,scalability client holds a read lock on data extent, sends read request to target OST
collaborative cache redirection performance,scalability client sends read lock request to target OST
flash cache redirection performance client sends write lock request, there is flash server in the filesystem
client reads or writes migrated data availability client sends lock request to target OST for data extent, data are not on target OST due to migration
data server crash availability data server crashes and gets up
update persistent redirect information availability migration is in progress: data extent
collaborative cache populating
Scenario: a client sends read request to target OST, there are no other caches for the data, the client is running OST service
Business Goals: populate collaborative cache
Relevant QA's: performance, scalability
details Stimulus: client read request
Stimulus source: client
Environment: the requested data are not cached by anybody
Artifact: requested data
Response: target OST reads requested data and sends them to client, if the client node has OST service then the target OST makes a record (in-core) that the requested data extent is cached by this client node
Response measure: target OST knows that certain data are cached by certain client node
Issues: Client node may not desire to participate in collaborative cache, this can be controlled with preferences
collaborative cache redirection
Scenario: a client sends read lock request to target OST, the requested data extent is cached by another client
Business Goals: offload target OST by redirecting read request to hopefully less loaded node
Relevant QA's: performance, scalability
details Stimulus: client read lock request
Stimulus source: client
Environment: the requested data extent is cached by another client
Artifact: requested data
Response: target OST grants lock, checks its records and sees that the data are available via collaborative cache. The request is serviced in accordance with its preferences: list of nodes caching the data range can be returned, or client node in local network can be choosen, etc
Response measure: lock is granted, the client knows where the data can be fetched
flash cache redirection
Scenario: a client sends write lock request to target OST, there is a flash cache in the filesystem
Business Goals:
Relevant QA's: performance
details Stimulus: client write lock request
Stimulus source: client
Environment: flash cache is capable to perform this write
Artifact: write lock
Response: target OST grants lock (all other locks are revoked, caches are released), checks its records to see if the data were already written to flash cache server, selects flash cache server which is able to do this write, redirect the client to selected flash cache server
Response measure: the client has write lock and knows where to sent write request to
Questions: the client has to send a notification to target OST when flash server completes the write, so that target OST could make appropriate redirect record. Only after that the client releases the write lock.
Issues: How is it guaranteed that the flash server will complete the write?

client accesses a file which migrates
Scenario: client accesses a file which is migrating to other OST and accessed extent of data is not available on target OST
Business Goals: allow migration and client access to work simultaneously
Relevant QA's: Availability
details Stimulus: Clients needs to access data to do its job
Stimulus source: Client
Environment: Data requested by the client were migrated to other data server or to several servers
Artifact: lock request
Response: target OST grants the lock, checks its redirect records and sees that the data are already on another server, redirects the client to that server
Response measure: lock is granted to the client, client knows where it can fetch data from, client does not have to wait until migration completes, migration continues
Questions: What does target OST do if requested data extent migrated to several servers? It can return either array of redirections or redirection for first part of extent only
data server crash
Scenario: a data server crashes while a migration agent copies a file hosted by the data server
Business Goals: incorrect redirection is not allowed
Relevant QA's: availabiliy
details Stimulus: OST crash
Stimulus source: power failure
Environment: at time of crash a migration agent worked with data hosted on the crashed data server
Artifact: data server
Response: when the data server is up again, none of RID update requests from the agent are lost
Response measure:
Questions: is there anything to do about recovering? I guess no, if agents follow simple rules interacting with a data server.
Issues: Migration agent is responsible for resending RID update requests which the data server did not complete before crash
agent sends a RID update request to data server
Scenario: agent sends to data server a request to update the RID
Business Goals: keep data server aware of real data location
Relevant QA's: availabiliy
details Stimulus: Agent copied data somewhere
Stimulus source: Agent
Environment: Agent copied data to target data seerver, so source data server's RID has to be updated
Artifact: RID of data server
Response: data server adds new record to its RID and sends to agent completion notification
Response measure:
Issues: Migration agent has to send RID update request after copied data are written to disk on target data server


1. Is there need for request redirection mechanism to be involved into filesystem replication? Hopefully not, because of significant overhead of RID maintainence.

2. in case of replication it may happen that data server can either serve a request locally or redirect the request somewhere else. Who is to make a choice?


bug 14174

Simple Space Balance Migration