FAQ - Release Testing and Upgrading
(Updated: Dec 2009)
How does the Lustre group fix issues?
The Lustre group approaches bug tracking and fixing seriously and methodically:
- Regression testing: A test is written to reproduce the problem, which is added to the ongoing test suite.
- Architecture and design: Depending on how severe or invasive an update to the architecture description is, it may be written and reviewed by senior architects. A detailed design description for the patch is written and reviewed by principal engineers.
- Implementation: Fixes are implemented according to the design description and added to a bug for review and inspection.
- Review and Inspection: A developer or development team will review the code first and then submit it for a methodical inspection by senior and principal engineers.
- Testing: The developer runs a small suite of tests before the code leaves his or her desk. Then it's added to a branch for regression testing and final release testing.
This process can be tracked closely via Lustre Bugzilla.
What testing does each version undergo prior to release?
Sun and its vendor and customer partners run a large suite of tests on a number of systems, architectures, kernels, and interconnects, including clusters as large as 400 nodes. Major updates receive testing on the largest clusters available to us, around 1,000 nodes.
Are Lustre releases backwards and forward compatible on the disk? On the wire?
Special care is taken to ensure that any disk format changes -- which are rare to begin with -- are handled transparently in previous and subsequent releases. Before the disk format changes, we release versions which support the new format, so you can safely roll back in case of problems. After the format change, new versions continue to support the old formats for some time and transparently update disk structures when old versions are encountered.
Support for running with older protocols are removed on every second major release, so the 1.4.x release will not be interoperable with 1.8.x release. Similarly, the 1.6.x release will not be able to interoperate with 2.0.x release.
The Lustre group always tests release 1.x.y with the preceding 1.x.(y-1) release, and with the older 1.(x-2).latest version when making a release. It isn't possible to exhaustively test all release combinations. As a result, we cannot guarantee that all releases are fully interoperable, even though we strive through design and code review to ensure that any 1.x release will work with any 1.(x-2) release.
Do you have to reboot to upgrade?
Not unless you upgrade your kernel. It's usually a simple matter of unmounting the file system or stopping the server as the case may be, installing the new RPMs, and restarting it.
Some of our customers upgrade servers between wire-compatible releases using failover; a service is failed over, the software is updated on the stopped node, the service is failed back, and the failover partner is upgraded in the same way.