WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.
Architecture - Userspace Servers: Difference between revisions
| m (Protected "Architecture - Userspace Servers" ([edit=sysop] (indefinite) [move=sysop] (indefinite))) | No edit summary | ||
| Line 1: | Line 1: | ||
| '''''Note:''''' The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.   | '''''Note:''''' ''The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.''  | ||
| == Summary == | == Summary == | ||
Revision as of 14:25, 22 January 2010
Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.
Summary
Userspace Server is a Lustre server (OSS, MDS, MGS, ?) running in user space in contrast with kernel space.
Definitions
- DMU
- a core of ZFS, capable to run in userspace
- control request
- request from lustre utilities to start/stop/configure services
- profile
- file enlisting actions to set up services associated with given storage device
Requirements
- run lustre services in userspace
- make most of lustre code platform independent
- put all platform dependent code into few components with well-defined API (in order to improve portability)
- keep same recovery model (atomic updates, executed-once semantics, clients retain non-committed requests)
- achieve comparable to in-kernel lustre performance
Details
The core idea is to get environment similar to kernel one:
- single address space
- ioctl-like interface (control)
- API to control threads, memory, timers, etc
We break all components into two categories:
- platform-dependent: control, libcfs, OSD, lnet, build system?
- platform-independent: everything else, including MDT, MDD, CMM, obdfilter, ldlm, llog, ptlrpc, obdclass, utilities, etc
Now when we define platform-dependent components, we describe them in details.
Decomposition
Control
We introduce a special interface to allow utilities to communicate with other components. This component together with libcfs forms kernel from lustre service's point of view.
Kernel is started by administator or scripts before any call to lustre utilities.
Kernel contains set of threads to handle control requests.
Use Cases
| ID | Quality Attribute | Summary | 
|---|---|---|
| kernel start | usability | start kernel component | 
| kernel stop | usability | stop all running services and kernel component | 
| mount | usability | start all services associated with given storage device | 
| umount | usability | stop all service associated with given storage device | 
| forced umount | availability | stop all services associated with given storage, forcefully disconnecting all clients | 
| control request | usability | handle control request from utilities | 
| stats | usability | access to server's and storage's stats | 
Quality Attribute Scenarios
kernel start
| Scenario: | kernel start | |
| Business Goals: | allow customer to run lustre server in userspace | |
| Relevant QA's: | usabilty | |
| details | Stimulus source: | administrator | 
| Stimulus: | lustre.kernel start command | |
| Environment: | no kernel is started yet | |
| Artifact: | kernel is running, control interface is set up | |
| Response: | ||
| Response measure: | lustre utility can talk to control interface | |
| Questions: | ||
kernel stop
| Scenario: | kernel stop | |
| Business Goals: | allow customer to run lustre servers in userspace | |
| Relevant QA's: | usability | |
| details | Stimulus source: | administrator | 
| Stimulus: | lustre.kernel stop command | |
| Environment: | kernel is running | |
| Artifact: | no kernel is running | |
| Response: | ||
| Response measure: | no lustre service can be running | |
| Questions: | ||
mount
| Scenario: | mount | |
| Business Goals: | allow customer to start services on given storage device | |
| Relevant QA's: | usability | |
| details | Stimulus source: | administrator | 
| Stimulus: | lustre.mount [device] command | |
| Environment: | kernel is running, device isn't used yet | |
| Artifact: | service ready to handle requests | |
| Response: | OSD starts on given storage device, mountconf component reads profile and starts all services associated with the device | |
| Response measure: | clients can talk to new services | |
| Questions: | ||
umount
| Scenario: | umount | |
| Business Goals: | allow customer to stop services on given storage device | |
| Relevant QA's: | usability | |
| details | Stimulus source: | administrator | 
| Stimulus: | lustre.umount [device] command | |
| Environment: | kernel is running, services are running | |
| Artifact: | service and device aren't accessible | |
| Response: | mounconf stops all services associated with the device, OSD stops on the device | |
| Response measure: | clients can't talk to these services | |
| Questions: | ||
forced umount
| Scenario: | forced umount | |
| Business Goals: | ||
| Relevant QA's: | ||
| details | Stimulus source: | |
| Stimulus: | ||
| Environment: | ||
| Artifact: | ||
| Response: | ||
| Response measure: | ||
| Questions: | ||
control request
| Scenario: | control request | |
| Business Goals: | ||
| Relevant QA's: | ||
| details | Stimulus source: | |
| Stimulus: | ||
| Environment: | ||
| Artifact: | ||
| Response: | ||
| Response measure: | ||
| Questions: | ||
libcfs
libcfs provides other components with platform-independent API and includes functions to control threads, memory, etc. See internal wiki page for libcfs details.
Use Cases
| ID | Quality Attribute | Summary | 
|---|---|---|
| spinlock | performance | some platforms allow real spinlocks in userspace | 
| swapping | performance | protect all allocated memory from swapping | 
Quality Attribute Scenarios
OSD
OSD provides access to persistent storage with well-defined API. For userspace we plan to use OSD built on top of DMU. We consider local caching (blocks, inodes) an internal component of OSD. DMU OSD details
Use Cases
| ID | Quality Attribute | Summary | 
|---|---|---|
| async IO | performance | use async IO where possible | 
| 0-copy IO | performance | use 0-copy IO where possible | 
| swapping | performance | local structures and cache should be locked in memory preventing swapping | 
Quality Attribute Scenarios
lnet
Build system
Implementation details
- poor control over IO in POSIX (AIO, elevator, merging)
- poor control over memory management in POSIX (no way to communicate memory pressure from the kernel)
- synchronization primitives (on majority platforms we can't use spinlocks)


