Architecture - Userspace Servers

Summary
Userspace Server is a Lustre server (OSS, MDS, MGS, ?) running in user space in contrast with kernel space.

Definitions

 * DMU: a core of ZFS, capable to run in userspace
 * control request: request from lustre utilities to start/stop/configure services
 * profile: file enlisting actions to set up services associated with given storage device

Requirements

 * 1) run lustre services in userspace
 * 2) make most of lustre code platform independent
 * 3) put all platform dependent code into few components with well-defined API (in order to improve portability)
 * 4) keep same recovery model (atomic updates, executed-once semantics, clients retain non-committed requests)
 * 5) achieve comparable to in-kernel lustre performance

Details
The core idea is to get environment similar to kernel one:
 * 1) single address space
 * 2) ioctl-like interface (control)
 * 3) API to control threads, memory, timers, etc

We break all components into two categories:
 * 1) platform-dependent: control, libcfs, OSD, lnet, build system?
 * 2) platform-independent: everything else, including MDT, MDD, CMM, obdfilter, ldlm, llog, ptlrpc, obdclass, utilities, etc



Now when we define platform-dependent components, we describe them in details.

Control
We introduce a special interface to allow utilities to communicate with other components. This component together with libcfs forms kernel from lustre service's point of view.

Kernel is started by administator or scripts before any call to lustre utilities.

Kernel contains set of threads to handle control requests.

libcfs
libcfs provides other components with platform-independent API and includes functions to control threads, memory, etc. See internal wiki page for libcfs details.

OSD
OSD provides access to persistent storage with well-defined API. For userspace we plan to use OSD built on top of DMU. We consider local caching (blocks, inodes) an internal component of OSD. DMU OSD details

Implementation details

 * 1) poor control over IO in POSIX (AIO, elevator, merging)
 * 2) poor control over memory management in POSIX (no way to communicate memory pressure from the kernel)
 * 3) synchronization primitives (on majority platforms we can't use spinlocks)