Lustre testing is the cornerstone of our organization. In early 2004 we rolled out the ltest deployment and it was very successful. Ltest operates with packages build by lbuild and reports them to the buffalo system. The buffalo system was extended to also allow the queueing of test requests. CFS has successfully run 100's of tests daily with this system and the results published by buffalo have often been sufficient to make dubugging problems much simpler. Few regressions have entered our software, and on the whole testing has been successful.
Recently, more and more requirements have been placed on testing, in particular on the ltest system. Key new requirements are to handle the new mountconf configuration system of Lustre, to test interoperability of multiple versions of Lustre, to test more realistic configurations, such as real failover, software raid and multiple networks easily and to test Lustre in conjunction with CIFS, NFS and other export software. More flexible choices of operating system and Lustre software, hardware and test configurations are desired and it is necessary to prioritize items in the test queues for purposes of releases, to temporarily stop testing and resume it. The ltest system has not found adoption by developers and has been difficult to deploy elsewhere.
CFS attempted to address these issues by modifying ltest. Progress in making modifications to accomodate these have been insufficient to warrant continuing this effort. We will replace ltest with a new system.
The operation of the test system involves many processes which our company and our customers need to repeat routinely, e.g.
Several of these are CFS specific. Building Lustre is something controlled by engineering. An initiative is under way to define Lustre configurations flexibly with spreadsheets and configuing them with CFS supplied tools. Other items are of a commodity nature: distributing and updating software is a mainstream Linux activity using systems like apt, yum etc. Running programs on clusters is something our customers do all the time under job scheduler control. Gathering and reporting results can be a combination of CFS supplied infrastructure and standard reporting tools.
CFS has tradionally been weak at providing good tools for customers to handle these tasks. There is a tremendous opportunity to leverage the new test system to provide first rate tools to tackle these tasks.
|
BO-1: |
Design a modular system which
is usable for Lustre testing and customers alike. |
|
BO-2: |
Meet all requirements as soon as feasible |
|
BO-3: |
Build something that is loved, not hated, and completely intuitive |
| BO-4: | CFS retains some competetive advantage for example, by not distributing its reporting infrastructure but only all other components. |
|
SC-1: |
First components of the system are in use for testing at CFS by July 1. |
|
SC-2: |
Components of the system are
used by customers to maintain Lustre installations by Jan 2007 |
|
SC-3: |
One engineer can maintain the system, all engineers use and fully understand the system |
|
SC-4: |
5 sites outside CFS use the system by Jan 2007. |
|
RI-1: |
Testing has become a critical bottleneck in our engineering organization which has already or could seriously interfere with our release capabilities. |
|
RI-2: |
If improperly designed too few engineers and partners may use the system. |
|
RI-3: |
If the system is too complex its delayed rollout will pose a major risk. |
| RI-4: | If the system is too simplistic it cannot meet the complex feature requirements. |
For CFS partners who integrate the Lustre software the Lustre Testing System (LTS) is a modular suite of tools that will provide the same means as those available to CFS engineering Unlike the current ltest system it will see adoption in the community our product will be designed to be easily integrated in partners engineering environments.
For CFS customers who run Lustre servers on appliances or systems the modules in the Lustre Testing System (LTS) is a modular suite of tools thatwill provide means of easily installing, configuring, installing and upgrading software and perform monitoring, diagnostics and information gathering. Unlike previous attempts to provide this to customes our product will incorporate requirements posed by customers to ensure easy adoption.
|
FE-1: |
Use the lbuild system to provide packages |
|
FE-2: |
Use standard Linux mechanisms, including proxies, to update or install systems with new software |
|
FE-3: |
Create, view, modify, and delete Lustre configurations using a webversion of mountconf tools |
|
FE-4: |
The system is very modular and leverages existing components where possible |
|
FE-5: |
The system is usable with disk install, flash installs and pxe-booting |
|
FE-6: |
The queueing system can manage priorities,
choose from multiple available resources and suspend processing
(similar to PBS (pro?) / Maui/Moab) |
| FE-7: | The system uses secure internet infrastructure and offers privacy for reports |
|
FE-8: |
Complex test definitions involving Lustre with NFS, CIFS exports, Windows are possible |
|
FE-9: |
Tests can report to buffalo as or similar to before, with privacy, quality statistics |
|
FE-10: |
Diagnostic information gathering is extended and available in the reports and through monitoring tools |
|
AS-1: |
The system is designed to handle testing from Lustre 1.6, not necessarily for earlier versions. |
|
AS-2: |
While only parts of the system are available manual effort will supplement the automated parts. |
|
DE-1: |
|
|
Feature |
Desc |
Release 1 |
Release 2 |
Release 3 |
|---|---|---|---|---|
|
FE-1 |
lbuild usage |
lbuild for current configurations |
include Windows, Debian, Ubuntu and others into lbuild |
|
|
FE-2 |
sofware update | included for rpms |
Fully implemented |
Fully implemented |
|
FE-3 |
mountconf config tools |
Upload of config CSV multiple Lustre versions |
Gui implemented |
|
|
FE-4 |
modular |
yes |
yes |
yes |
|
FE-5 |
disk/pxe/flash installs |
pxe booting & disk install |
|
flash |
|
FE-6 |
job scheduler |
run by hand |
initial trial | fully implemented |
|
FE-7 |
secure | full implemented |
|
|
|
FE-8 |
complex tests |
Not implemented |
Fully implemented |
|
|
FE-9 |
buffalo reports |
tests report, privacy |
Quality metrics | |
| FE-10 | diagnostics | into buffalo | extended with performance | monitoring tool included |
|
LI-1: |
Possibly automating Windows support is out of reach |
|
LI-2: |
? |
|
Stakeholder |
Major Value |
Attitudes |
Major Interests |
Constraints |
|---|---|---|---|---|
|
Corporate Management |
improved product quality and opportunities |
strong commitment; top corporate priority |
considerable QA improvements; adoption immediate; no runaway projects |
none identified |
| QA department |
more efficient use of staff time; higher customer satisfaction; more tests, less tinkering |
eager to overcome inability to address corporate requirements |
job satisfaction |
need leadership from more experienced developers |
| Lustre engineers |
tools they can use and love |
strong enthusiasm, but might not use it as much as expected |
simplicity of use; reliability of delivery; |
flexible and lightweight |
| Partners |
share testing effort with others |
receptive but cautious |
cost savings |
no resources yet committed |
|
Customers |
tools for deployment and monitoring |
receptive but cautious |
minimal new technology needed; concern about CFS capabilities in this area |
can only use proven systems |
|
Dimension |
Driver |
Constraint |
Degree of Freedom |
|---|---|---|---|
|
Schedule |
Prove modularity through incremental improvements | Little tolerance for slips | |
|
Features |
Original incentive to redesign |
All features scheduled for release 1.0 must be fully operational |
Little |
|
Quality |
Only extreme quality & usability will lead to adoption. |
||
|
Staff |
Payoff will offset any amount of resources applied |
Sufficient to get into operation very fast | Flexible |
|
Cost |
Within normal operating procedures resources will be made avaialble as needed |