WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Architecture - Flash Cache
Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.
Summary
flash cache is a write only cache. When clients write, servers may redirect those writes to flash cache server. When clients read data which were written to the flash cache, the data have to be flushed data from flash cache to data server.
Details
Flash servers
A filesystem may have a set of flash cache servers which are typically very fast flash storage of capacity smaller than OST. Whenever a client wants to flush dirty data to storage - it sends data to flash server. Data are never read from flash cache servers.
Layouts
When a client opens a file it gets two file data layouts from the MDS. The client keeps those layouts in redirection layer, the appropriate layout is chosen depending on whether read or write is in progress.
Locking
Flash cache server runs LDLM and clients send write locks requests to it. The flash cache server in its turn takes lock on master OST and then grants the lock to the client. In case case of read, clients send read lock requests to master OST, which is to make flash servers to flush the data if necessary.
Use cases
ID | Quality attribute | Summary |
---|---|---|
remapping extents | performance | the object has different layouts on OSTs and FCSs |
how to take locks | coherency | client uses DLM extent locking during read/write |
how to keep cache and master consistent | correctness, usability | clients see consistent file data in the face of concurrent read-write accesses to the master and proxy servers |
cache miss in cobd | ?? | ?? |
local cache coherency | correctness | for a file accessed by given client only, write followed by the read of the same data returns last written data |
how to acquire EA2 | performance | flash cache layout is obtained from MDS |
powerloss with cached writes in flash | availability | data cached on a flash cache server aren't lost in the case of the flash cache server failure" |
file size recovery/consistency | consistency | all clients see correct file size, file size gets recovered in case of flash cache server failure |
mmap | ?? | |
cache is full | usability | flash cache server free space managements (grants?) |
lose OSTc | fault tolerance | filesystem survives flash cache server dearth |
Quality Attribute Scenarios
- remapping extents
- how to take locks
- how to keep cache and master consistent
- cache miss in cobd
- local cache coherency
- how to acquire EA2
- powerloss with cached writes in flash
- file size recovery/consistency
- mmap
- cache is full
Scenario: | ||
Business Goals: | ||
Relevant QA's: | ||
details | Stimulus: | |
Stimulus source: | ||
Environment: | ||
Artifact: | ||
Response: | ||
Response measure: | ||
Questions: | ||
Issues: |
Implementation details
1. add new layer "redir" between llite and lov in order to redirect write requests to flash cache and to let read requests go to OSTs
2. flash is feature of filesystem - lov descriptor contains flash desc
- revoke config lock for dynamic retrieval EA2
- write: PW lock on ostC, ostC's lov takes PW lock on ostM
- lockless IO with write lock on MDT (close to WBC locks, per-dir write locks) (MDT locks everything with a single extent lock bit). (Good for file-per-process).
3. use extent-lock bit on MDT to protect whole file (data as well)
4. hierarchical locks
5. (retracted) client with lovC lock implies data is still on ostC (no consequence to ostM)
- ostM lock held long time
- client doing reading might have to wait longer - flash cache may have a lot of data.
- client doing read gets ostM lock, write gets ostC lock
6. ostC locks are automatically non-overlapping. Don't hand out optimistic ostM extent locks that violate this.
7. after flushed data, remove from cache (so we don't recover clean data). flash cache only.
8. map lovC to lovM without aliasing
9. lovC obtain max grant from ostM
- special grant rpc
10. all updates go through cache