Testing Lustre Code: Difference between revisions

Revision as of 08:52, 2 September 2009

We recommend a "test early, test often" approach to testing.

If you are developing a new feature for Lustre, designing tests to exercise the new feature early in the development process will allow you to test your code as you develop it.

If you are fixing a bug in Lustre, creating a regression test up front will ensure that you can reproduce the reported problem and then verify that it has been fixed. And it will save you the effort of testing the fix manually and then creating a separate regression test later to submit with your bug fix.

We provide several tools to help with testing Lustre code:

Acceptance test suite. This Lustre testing framework from which a suite of acceptance tests called "acceptance-small" can be run is described in the next section #Using the Lustre Testing Framework.
POSIX compliance test suite. Instructions for using the POSIX Compliance Test Suite for testing the Lustre file system is described in POSIX Testing.

Using the Lustre Testing Framework

Before you submit code, it must pass the Lustre acceptance test suite, called "acceptance-small". We recommend you run the test suite often so that you can find out as soon as possible if your code changes result in a regression.

The acceptance-small test suite is run using the script acceptance-small.sh, which is located in the lustre/tests directory of a compiled Lustre tree. For more details, see Acceptance Small (acc-sm) Testing on Lustre

Note: Please set the "acc-sm passed" flag on the attachment for each individual branch that was tested in Bugzilla to indicate this. refer to submitting code topic

Test Scripts

Ensure while working on bugs that there are existing tests that exercise the failing code, or that you have added a new test that exercises the functionality called from one of the existing tests:

sanity.sh	tests that verify operation under normal operating conditions
sanityN.sh	tests that verify operations from two clients under normal operating conditions
liblustre/tests/sanity	runs a test linked to a liblustre client library
recovery-small.sh	tests that verify RPC replay after communications failure (message loss)
replay-single.sh	tests that verify recovery after MDS failure
replay-dual.sh	tests that verify recovery from two clients after server failure
replay-ost-single.sh	tests that verify recovery after OST failure
lfscktest.sh	tests e2fsck and lfsck to detect and fix filesystem corruption
insanity.sh	tests multiple concurrent failure conditions

In most cases, if a defect is found in Lustre, a valid test does not exist for this functionality. It is easiest to start fixing a bug by developing a test case (a scripted subtest of the above!) that causes the bug to be hit, then fix the bug, and finally verify the bug has been fixed by passing the new test.

Bypassing Failures

If you are regularly hitting failures in any of these tests that exist in the parent branch, you should check for a bug on the failure or file a new bug if one has not yet been opened.
If the bug is preventing you from completing the tests, then set the environment variables to skip these specific tests until you or someone else fixes them.
- For example, to skip sanity.sh subtest 36g and 65, replay-single.sh subtest 42, and all of insanity.sh set in your environment:
  export SANITY_EXCEPT="36g 65"
  
  export REPLAY_SINGLE_EXCEPT=42
  
  export INSANITY=no
- You can also skip tests on the command line. For example, when running acceptance-small:
  SANITY_EXCEPT="36g 65" REPLAY_SINGLE_EXCEPT=42 INSANITY=no ./acceptance-small.sh
- The test framework is very flexible, and it is a very easy "hands-off" way of running testing while you are doing other things, like coding.
- Questions/problems with the test framework should be emailed to the lustre-discuss mailing list, so all Lustre users can benefit from improving and documenting it.
If you do not run the whole test suite regularly, you have no idea whether a bug is added from your code or not, and you will waste a lot of time looking.

Test Framework Options

The examples below show how to run a full test or sub-tests from the acceptance-small suite.

Run all tests including "standard" tests (sanity*, liblustre) with lov.sh setup; recovery*.sh; and replay*.sh.

$ cd lustre/tests
$ sh acceptance-small.sh

Run only the recovery-small.sh, replay-single.sh, and conf-sanity.sh tests.

$ ACC_SM_ONLY="recovery-small replay-single conf-sanity" sh acceptance-small.sh

Run acceptance-small with a different configuration (expects myth.sh to generate myth.xml).

$ CONFIGS="myth" sh acceptance-small.sh
</nowiki>

Run only tests 1, 3, 4, 6, 9 in sanity.sh with the lov.sh configuration.

$ ONLY="1 3 4 6 9" NAME=lov sh sanity.sh

Skip tests 1 ... 30 and run remaining tests in sanity.sh.

$ EXCEPT="`seq 1 30`" sh sanity.sh

Clean up after a lov.sh test failure (normally the system is left mounted for debugging after a failure).

 
$ NAME=lov sh llmountcleanup.sh
</nowiki>

Clean up replay-single.sh after a test failure (normally the system is left mounted for debugging after a failure).

 
$ ONLY=cleanup sh replay-single.sh

Adding New Tests

Adding a test to one of the above scripts is easy, and failures can be injected directly into the Lustre kernel code via OBD_FAIL_CHECK(), OBD_FAIL_RACE(), OBD_FAIL_TIMEOUT(), and sysctl -w lustre.fail_loc

Note: You can use the OBD_FAIL_CHECK() or OBD_FAIL_TIMEOUT() hooks in Lustre to monitor failures.

@@ Line 14: / Line 14: @@
 The acceptance-small test suite is run using the script ''acceptance-small.sh'', which is located in the lustre/tests directory of a compiled Lustre tree. For more details, see [http://wiki.lustre.org/index.php?title=Acceptance_Small_%28acc-sm%29_Testing_on_Lustre Acceptance Small (acc-sm) Testing on Lustre]
-''Note:'' Please set the "acc-sm passed" flag on the attachment for each individual branch that was tested in Bugzilla to indicate this. [[refer to submitting code topic]]
+'''''Note:''''' Please set the "acc-sm passed" flag on the attachment for each individual branch that was tested in Bugzilla to indicate this. [[refer to submitting code topic]]
 == Test Scripts ==

WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Testing Lustre Code: Difference between revisions

Revision as of 08:52, 2 September 2009

Contents

Using the Lustre Testing Framework

Test Scripts

Bypassing Failures

Test Framework Options

Adding New Tests

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools