Architecture - Flash Cache

Summary
flash cache is a write only cache. When clients write, servers may redirect those writes to flash cache server. When clients read data which were written to the flash cache, the data have to be flushed data from flash cache to data server.

Flash servers
A filesystem may have a set of flash cache servers which are typically very fast flash storage of capacity smaller than OST. Whenever a client wants to flush dirty data to storage - it sends data to flash server. Data are never read from flash cache servers.

Layouts
When a client opens a file it gets two file data layouts from the MDS. The client keeps those layouts in redirection layer, the appropriate layout is chosen depending on whether read or write is in progress.

Locking
Flash cache server runs LDLM and clients send write locks requests to it. The flash cache server in its turn takes lock on master OST and then grants the lock to the client. In case case of read, clients send read lock requests to master OST, which is to make flash servers to flush the data if necessary.

Quality Attribute Scenarios

 * remapping extents
 * how to take locks
 * how to keep cache and master consistent
 * cache miss in cobd
 * local cache coherency
 * how to acquire EA2
 * powerloss with cached writes in flash
 * file size recovery/consistency
 * mmap
 * cache is full

Implementation details
1. add new layer "redir" between llite and lov in order to redirect write requests to flash cache and to let read requests go to OSTs

2. flash is feature of filesystem - lov descriptor contains flash desc
 * revoke config lock for dynamic retrieval EA2
 * write: PW lock on ostC, ostC's lov takes PW lock on ostM
 * lockless IO with write lock on MDT (close to WBC locks, per-dir write locks) (MDT locks everything with a single extent lock bit). (Good for file-per-process).

3. use extent-lock bit on MDT to protect whole file (data as well)

4. hierarchical locks

5. (retracted) client with lovC lock implies data is still on ostC (no consequence to ostM)
 * ostM lock held long time
 * client doing reading might have to wait longer - flash cache may have a lot of data.
 * client doing read gets ostM lock, write gets ostC lock

6. ostC locks are automatically non-overlapping. Don't hand out optimistic ostM extent locks that violate this.

7. after flushed data, remove from cache (so we don't recover clean data). flash cache only.

8. map lovC to lovM without aliasing

9. lovC obtain max grant from ostM
 * special grant rpc

10. all updates go through cache