Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.
Definitions
- Target OST
- an OST which file data were initially written to, it receives data access requests from clients, manages locks, maintains information about data cached by clients and dedicated servers, maintains persistent and in-code redirect information.
- Client
- an application which accesses data from target OST
- Migration
- moving file data from one set of OSTs to another
- Collaborative cache
- read only cache distributed over clients or dedicated cache servers. Target OSTs maintain in-core information about clients and cache servers and data they are caching and redirect read accesses to appropriate cache server.
- Flash cache
- write only cache. There are dedicated flash cache servers. Target OSTs maintain (persistent?) information about data cached by flash cache servers and redirect write accesses to them.
Summary
Request Redirection is a mechanism which allows target OST to redirect client requests to other servers. Target OST may decide to redirect a request in hope to improve system throughput or may have to redirect in case when it does not store requested data anymore.
Requirements
- Universality
- clients get redirected to instances of data created via different ways with the same request redirection mechanism
- Flexibility
- the request redirection mechanism can allow clients to specify preferences (for example, "do not redirect me", or "let me choose myself")
- Extensibility
- adding new ways to create instances of data should require no or minimal changes to request redirection mechanism
- Availability
- requested data are accessible either at servers to which a client is redirected or at target OST
- Centralization
- target OST keeps all the information about possible redirections, all lock requests get sent to it, it does locking and optionally can redirect a client to another server where data access will happen without further locking
- Modes
- redirection information can be stored persistently when it has to survive reboots (for example in case of migration) or it can be maintained in memory only (for example in case of collaborative cache)
- API
- request redirection mechanism provides means for clients to send/receive redirection information to/from a target OST, means for target OST to store that information
- Multiplexing
- there may be several instances of the same data. Request redirection mechanism should be able to deal with that. For example, in case of collaborative cache target OST has to be able to find clients which are caching requested data and to choose where to redirect.
Use Cases
id |
quality attribute |
summary
|
collaborative cache populating |
performance,scalability |
client holds a read lock on data extent, sends read request to target OST
|
collaborative cache redirection |
performance,scalability |
client sends read lock request to target OST
|
flash cache redirection |
performance |
client sends write lock request, there is flash server in the filesystem
|
client reads or writes migrated data |
availability |
client sends lock request to target OST for data extent, data are not on target OST due to migration
|
data server crash |
availability |
data server crashes and gets up
|
update persistent redirect information |
availability |
migration is in progress: data extent
|
- collaborative cache populating
Scenario: |
a client sends read request to target OST, there are no other caches for the data, the client is running OST service
|
Business Goals: |
populate collaborative cache
|
Relevant QA's: |
performance, scalability
|
details
|
Stimulus: |
client read request
|
Stimulus source: |
client
|
Environment: |
the requested data are not cached by anybody
|
Artifact: |
requested data
|
Response: |
target OST reads requested data and sends them to client, if the client node has OST service then the target OST makes a record (in-core) that the requested data extent is cached by this client node
|
Response measure: |
target OST knows that certain data are cached by certain client node
|
Questions: |
|
Issues: |
Client node may not desire to participate in collaborative cache, this can be controlled with preferences
|
- collaborative cache redirection
Scenario: |
a client sends read lock request to target OST, the requested data extent is cached by another client
|
Business Goals: |
offload target OST by redirecting read request to hopefully less loaded node
|
Relevant QA's: |
performance, scalability
|
details
|
Stimulus: |
client read lock request
|
Stimulus source: |
client
|
Environment: |
the requested data extent is cached by another client
|
Artifact: |
requested data
|
Response: |
target OST grants lock, checks its records and sees that the data are available via collaborative cache. The request is serviced in accordance with its preferences: list of nodes caching the data range can be returned, or client node in local network can be choosen, etc
|
Response measure: |
lock is granted, the client knows where the data can be fetched
|
Questions: |
|
Issues: |
|
- flash cache redirection
Scenario: |
a client sends write lock request to target OST, there is a flash cache in the filesystem
|
Business Goals: |
|
Relevant QA's: |
performance
|
details
|
Stimulus: |
client write lock request
|
Stimulus source: |
client
|
Environment: |
flash cache is capable to perform this write
|
Artifact: |
write lock
|
Response: |
target OST grants lock (all other locks are revoked, caches are released), checks its records to see if the data were already written to flash cache server, selects flash cache server which is able to do this write, redirect the client to selected flash cache server
|
Response measure: |
the client has write lock and knows where to sent write request to
|
Questions: |
the client has to send a notification to target OST when flash server completes the write, so that target OST could make appropriate redirect record. Only after that the client releases the write lock.
|
Issues: |
How is it guaranteed that the flash server will complete the write?
|
- client accesses a file which migrates
Scenario: |
client accesses a file which is migrating to other OST and accessed extent of data is not available on target OST
|
Business Goals: |
allow migration and client access to work simultaneously
|
Relevant QA's: |
Availability
|
details
|
Stimulus: |
Clients needs to access data to do its job
|
Stimulus source: |
Client
|
Environment: |
Data requested by the client were migrated to other data server or to several servers
|
Artifact: |
lock request
|
Response: |
target OST grants the lock, checks its redirect records and sees that the data are already on another server, redirects the client to that server
|
Response measure: |
lock is granted to the client, client knows where it can fetch data from, client does not have to wait until migration completes, migration continues
|
Questions: |
What does target OST do if requested data extent migrated to several servers? It can return either array of redirections or redirection for first part of extent only
|
Issues: |
|
- data server crash
Scenario: |
a data server crashes while a migration agent copies a file hosted by the data server
|
Business Goals: |
incorrect redirection is not allowed
|
Relevant QA's: |
availabiliy
|
details
|
Stimulus: |
OST crash
|
Stimulus source: |
power failure
|
Environment: |
at time of crash a migration agent worked with data hosted on the crashed data server
|
Artifact: |
data server
|
Response: |
when the data server is up again, none of RID update requests from the agent are lost
|
Response measure: |
|
Questions: |
is there anything to do about recovering? I guess no, if agents follow simple rules interacting with a data server.
|
Issues: |
Migration agent is responsible for resending RID update requests which the data server did not complete before crash
|
- agent sends a RID update request to data server
Scenario: |
agent sends to data server a request to update the RID
|
Business Goals: |
keep data server aware of real data location
|
Relevant QA's: |
availabiliy
|
details
|
Stimulus: |
Agent copied data somewhere
|
Stimulus source: |
Agent
|
Environment: |
Agent copied data to target data seerver, so source data server's RID has to be updated
|
Artifact: |
RID of data server
|
Response: |
data server adds new record to its RID and sends to agent completion notification
|
Response measure: |
|
Questions: |
|
Issues: |
Migration agent has to send RID update request after copied data are written to disk on target data server
|
Questions
1. Is there need for request redirection mechanism to be involved into filesystem replication? Hopefully not, because of significant overhead of RID maintainence.
2. in case of replication it may happen that data server can either serve a request locally or redirect the request somewhere else. Who is to make a choice?
References
bug 14174
Simple Space Balance Migration