[edit] WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Acceptance Small (acc-sm) Testing on Lustre

From Obsolete Lustre Wiki
(Difference between revisions)
Jump to: navigation, search
m (HEAD branch)
(Do you have to run every acc-sm test?)
 
(97 intermediate revisions by 5 users not shown)
Line 1: Line 1:
The Lustre QE group and developers use acceptance-small (acc-sm) tests to catch bugs early in the development cycle. Within the Lustre group, acc-sm tests are run on YALA, an automated test system. This information is being published to describe the steps to perform acceptance small testing and encourage wider acc-sm testing in the Lustre community.
+
<small>''(Updated: Feb 2010)''</small>
 +
__TOC__
 +
The Lustre™ QE group and developers use acceptance-small (acc-sm) tests to catch bugs early in the development cycle. Within the Lustre group, acc-sm tests are run on YALA, an automated test system. This information is being published to describe the steps to perform acceptance small testing and encourage wider acc-sm testing in the Lustre community.
  
'''NOTE''': For your convenience, this document is also available as a [http://wiki.lustre.org/images/c/c6/AccSm_Testing.pdf PDF].
+
'''NOTE''': For your convenience, this document is also available as a [[Media:AccSm_Testing.pdf|PDF]].  
  
 
==What is acc-sm testing and why do we use it for Lustre?==
 
==What is acc-sm testing and why do we use it for Lustre?==
Line 13: Line 15:
 
==What tests comprise the acc-sm test suite?==
 
==What tests comprise the acc-sm test suite?==
  
Each Lustre CVS branch contains a lustre/tests sub-directory; all acc-sm tests are stored here. The acceptance-small.sh file contains a list of all tests in the acc-sm suite. To get the list, run:
+
Each Lustre tree contains a lustre/tests sub-directory; all acc-sm tests are stored here. The acceptance-small.sh file contains a list of all tests in the acc-sm suite. To get the list, run:
  
 
  $ grep TESTSUITE_LIST acceptance-small.sh
 
  $ grep TESTSUITE_LIST acceptance-small.sh
  
The acc-sm tests are listed below, by CVS branch.
+
The acc-sm tests are listed below, by branch.
  
 
====b1_6 branch====
 
====b1_6 branch====
  
This branch includes 17 acc-sm test suites.
+
This branch includes 18 acc-sm test suites.
  
 
  $ grep TESTSUITE_LIST acceptance-small.sh
 
  $ grep TESTSUITE_LIST acceptance-small.sh
 
  export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK
 
  export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK
  LIBLUSTRE REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE
+
  LIBLUSTRE RACER REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE
 
  REPLAY_DUAL INSANITY SANITY_QUOTA PERFORMANCE_SANITY"
 
  REPLAY_DUAL INSANITY SANITY_QUOTA PERFORMANCE_SANITY"
  
 
====b1_8_gate branch====
 
====b1_8_gate branch====
  
This branch includes 18 acc-sm test suites.
+
This branch includes 28 acc-sm test suites.
  
 
  $ grep TESTSUITE_LIST acceptance-small.sh
 
  $ grep TESTSUITE_LIST acceptance-small.sh
  export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK
+
  export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK LIBLUSTRE
  LIBLUSTRE REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE
+
  RACER REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE REPLAY_DUAL REPLAY_VBR  
REPLAY_DUAL REPLAY_VBR INSANITY SANITY_QUOTA PERFORMANCE_SANITY"
+
INSANITY SANITY_QUOTA PERFORMANCE_SANITY LARGE_SCALE RECOVERY_MDS_SCALE
 +
RECOVERY_DOUBLE_SCALE RECOVERY_RANDOM_SCALE PARALLEL_SCALE METADATA_UPDATES OST_POOLS
 +
SANITY_BENCHMARK LNET_SELFTEST"
  
 
====HEAD branch====
 
====HEAD branch====
  
This branch includes 19 acc-sm test suites.
+
This branch includes 30 acc-sm test suites.
  
 
  $ grep TESTSUITE_LIST acceptance-small.sh
 
  $ grep TESTSUITE_LIST acceptance-small.sh
  export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK
+
  export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK LIBLUSTRE
  LIBLUSTRE REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE
+
  RACER REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE REPLAY_DUAL REPLAY_VBR
  REPLAY_DUAL INSANITY SANITY_QUOTA SANITY_SEC SANITY_GSS
+
  INSANITY SANITY_QUOTA SANITY_SEC SANITY_GSS PERFORMANCE_SANITY LARGE_SCALE
  PERFORMANCE_SANITY"
+
  RECOVERY_MDS_SCALE RECOVERY_DOUBLE_SCALE RECOVERY_RANDOM_SCALE PARALLEL_SCALE
 
+
LUSTRE_RSYNC_TEST METADATA_UPDATES OST_POOLS SANITY_BENCHMARK"
  
 
To see the test cases in a particular acc-sm test, run:
 
To see the test cases in a particular acc-sm test, run:
Line 60: Line 64:
 
  run_test 130e "FIEMAP (test continuation FIEMAP calls)"
 
  run_test 130e "FIEMAP (test continuation FIEMAP calls)"
  
==For each acc-sm test, what does it measure or show?==
+
==What does each acc-sm test measure or show?==
  
The acc-sm test suite are described below.
+
The acc-sm test suites are described below.
  
 
;'''RUNTESTS'''
 
;'''RUNTESTS'''
Line 90: Line 94:
 
;'''LIBLUSTRE'''
 
;'''LIBLUSTRE'''
 
:Runs a test linked to a liblustre client library.
 
:Runs a test linked to a liblustre client library.
 +
 +
;'''RACER'''
 +
:Tests for filesystem race conditions by concurrently creating, moving, deleting, etc. a set of files.
  
 
;'''REPLAY_SINGLE'''
 
;'''REPLAY_SINGLE'''
Line 105: Line 112:
 
;'''REPLAY_DUAL'''
 
;'''REPLAY_DUAL'''
 
:Verifies recovery from two clients after a server failure.
 
:Verifies recovery from two clients after a server failure.
 +
 +
;'''REPLAY_VBR'''
 +
:Verifies version-based recovery feature.
  
 
;'''INSANITY'''
 
;'''INSANITY'''
Line 112: Line 122:
 
:Verifies filesystem quotas.
 
:Verifies filesystem quotas.
  
==How do you get the acc-sm tests?==
+
;'''SANITY_SEC'''
 +
:Verifies Lustre identity features.
  
The acc-sm test suite is stored in the lustre/tests sub-directory on each CVS branch (b1_6, b1_8, and HEAD).
+
;'''SANITY_GSS'''
 +
:Verifies GSS/Kerberos authentication features.
  
==Do you have to run every acc-sm test?==
+
;'''PERFORMANCE_SANITY'''
 +
:Performance mdsrate tests (small file create/open/delete, large file create/open/delete, lookup rate 10M file dir, lookup rate 10M file 10 dir, getattr small file, and getattr large files).
  
No. You can choose to run only specified acc-sm tests. Tests can be run either with or without the acceptance-sm.sh (acc-sm.sh) wrapper script. Here are several examples:
+
;'''LARGE_SCALE'''
 +
:Large-scale tests that verify version-based recovery features.
  
To only run the RUNTESTS and SANITY.sh tests:
+
;'''RECOVERY_MDS_SCALE'''
 +
:The server failover test: for a duration of 24 hours, repeatedly fail over a random facet (MDS or OST) at 10 minute intervals and verify that no application errors occur.
  
ACC_SM_ONLY=”RUNTESTS” sh acceptance-small.sh
+
;'''RECOVERY_DOUBLE_SCALE'''
 +
:Failover test for all pair-wise combinations of node failures.
  
- OR -
+
;'''RECOVERY_RANDOM_SCALE'''
 +
:Verifies client failure not affecting other clients.
  
sh runtests
+
;'''PARALLEL_SCALE'''
 +
:Runs functional tests (connectathon, cascading_rw, write_disjoint, write_append_truncate, parallel_grouplock, statahead), performance tests (IOR, compilebench and metabench), and a stress test (simul).
  
To only run test_1 and test_2 of the SANITY.sh tests:
+
;'''LUSTRE_RSYNC_TEST'''
 +
:Verifies the lustre_rsync (replication) feature.
 +
 
 +
;'''METADATA_UPDATES'''
 +
:Distributed Metadata Update Test to verify that distributed metadata updates are properly completed when multiple clients create/delete files and modify the attributes of files.
 +
 
 +
;'''OST_POOLS'''
 +
:Verifies the OST pools feature.
 +
 
 +
==How do you get the acc-sm tests?==
 +
 
 +
The acc-sm test suite is stored in the lustre/tests subdirectory.
 +
 
 +
==Do you have to run every acc-sm test?==
 +
 
 +
No. You can choose to run only specified acc-sm tests, start the test suite from a defined test, or stop the test suite at a defined test. Tests can be run either with or without the acceptance-sm.sh (acc-sm.sh) wrapper script. Here are several examples:
 +
 
 +
To only run the RUNTESTS and SANITY tests:
 +
 
 +
ACC_SM_ONLY=”RUNTESTS SANITY” sh acceptance-small.sh
 +
 
 +
To only run test_1 and test_2 of the SANITYN tests:
  
 
  ACC_SM_ONLY=”SANITYN” ONLY=”1 2” sh acceptance-small.sh
 
  ACC_SM_ONLY=”SANITYN” ONLY=”1 2” sh acceptance-small.sh
Line 136: Line 175:
 
  ACC_SM_ONLY=”REPLAY_SINGLE” REPLAY_SINGLE_EXCEPT=”3 4” sh acceptance-small.sh
 
  ACC_SM_ONLY=”REPLAY_SINGLE” REPLAY_SINGLE_EXCEPT=”3 4” sh acceptance-small.sh
  
To only run the conf-sanity.sh script (without the acceptance-small.sh wrapper script):
+
To only run conf-sanity.sh tests after #15 (without the acceptance-small.sh wrapper script):
 +
 
 +
CONF_SANITY_EXCEPT=”$(seq 15)“ sh conf-sanity.sh
 +
 
 +
To start the test suite from a defined test, use START_AT. For example:
 +
 
 +
ACC_SM_ONLY=SANITY_BENCHMARK START_AT=fsx sh acceptance-small.sh
 +
 
 +
-or-
 +
 
 +
ACC_SM_ONLY=SANITY START_AT=24c sh acceptance-small.sh
 +
 
 +
To stop the test suite at a defined test, use STOP_AT. For example:
 +
 
 +
  ACC_SM_ONLY=SANITY STOP_AT=77j sh acceptance-small.sh
 +
 
 +
To start and stop the test suite to define a range of tests, use START_AT and STOP_AT. For example, to run sanity tests from 23b to 24f:  
  
  CONF_SANITY_EXCEPT=”$ (seq 15) “ sh conf-sanity.sh
+
  ACC_SM_ONLY=SANITY START_AT=23b STOP_AT=24f sh acceptance-small.sh
  
 
==Do the acc-sm tests have to be run in a specific order?==
 
==Do the acc-sm tests have to be run in a specific order?==
Line 148: Line 203:
 
Currently, the QE group and Lustre developers run acc-sm as the main test suite for Lustre testing. Acc-sm tests are run on YALA, the automated test system, with test reports submitted to Buffalo (a web interface that allows for browsing various Lustre test results). We welcome external contributions to the Lustre acc-sm test efforts – either of the Lustre code base or new testing platforms.
 
Currently, the QE group and Lustre developers run acc-sm as the main test suite for Lustre testing. Acc-sm tests are run on YALA, the automated test system, with test reports submitted to Buffalo (a web interface that allows for browsing various Lustre test results). We welcome external contributions to the Lustre acc-sm test efforts – either of the Lustre code base or new testing platforms.
  
==What type of Lustre environment is needed to run the acc-sm tests? Is anything special needed?==
+
==What type of Lustre environment is needed to run the acc-sm tests?==  
  
The default Lustre configuration for acc-sm testing is a single node setup with one MDS and two OSTs. All devices are loop-back devices. YALA, the automated test system, uses a non-default configuration.
+
The default Lustre configuration is a single node setup with mdscount=1 and ostcount=2. All devices are loop back devices. YALA does not use a default configuration.
  
 
To run the acc-sm test suite on a non-default Lustre configuration, you have to modify the default settings in the acc-sm configuration file, lustre/tests/cfg/local.sh. The configuration variables include mds_HOST, ost_HOST, OSTCOUNT, MDS_MOUNT_OPTS and OST_MOUNT_OPTS, among others.
 
To run the acc-sm test suite on a non-default Lustre configuration, you have to modify the default settings in the acc-sm configuration file, lustre/tests/cfg/local.sh. The configuration variables include mds_HOST, ost_HOST, OSTCOUNT, MDS_MOUNT_OPTS and OST_MOUNT_OPTS, among others.
Line 158: Line 213:
 
  cp cfg/local.sh cfg/my_config.sh
 
  cp cfg/local.sh cfg/my_config.sh
  
Edit the necessary variables in the configuration file (my_config.sh) and run acc-sm as: NAME=my_config sh acceptance-small.sh
+
Edit the necessary variables in the configuration file (my_config.sh) and run acc-sm as:  
 +
 
 +
NAME=my_config sh acceptance-small.sh
 +
 
 +
==Are there other acc-sm requirements or is anything special needed?==
 +
 
 +
Acc-sm testing requires the following programs be installed:
 +
 
 +
* iozone
 +
* dbench
 +
* bonnie++
  
 
==What are the steps to run acc-sm?==
 
==What are the steps to run acc-sm?==
Line 192: Line 257:
 
Here is an example of running acc-sm on a non-default Lustre configuration (MDS is sfire7, OST is sfire8, OSCOUNT=1, etc). In this example, only the SANITY test cases are being run.
 
Here is an example of running acc-sm on a non-default Lustre configuration (MDS is sfire7, OST is sfire8, OSCOUNT=1, etc). In this example, only the SANITY test cases are being run.
  
  ACC_SM_ONLY=SANITY mds_HOST=sfire7 ost8_HOST=sfire8 MDSDEV1=/dev/sda1
+
  ACC_SM_ONLY=SANITY mds_HOST=sfire7 ost1_HOST=sfire8 MDSDEV1=/dev/sda1
 
  OSTCOUNT=1 OSTDEV1=/dev/sda1 MDSSIZE=5000000 OSTSIZE=5000000
 
  OSTCOUNT=1 OSTDEV1=/dev/sda1 MDSSIZE=5000000 OSTSIZE=5000000
 
  MDS_MOUNT_OPTS="-o user_xattr" OST_MOUNT_OPTS=" -o user_xattr"
 
  MDS_MOUNT_OPTS="-o user_xattr" OST_MOUNT_OPTS=" -o user_xattr"
Line 201: Line 266:
 
* If you regularly hit a failure in any of these tests, check if a bug has been reported on the failure or file a new bug if one has not yet been opened.
 
* If you regularly hit a failure in any of these tests, check if a bug has been reported on the failure or file a new bug if one has not yet been opened.
 
* If the bug prevents you from completing the tests, set the environment variables to skip the specific test(s) until you or someone else fixes them.
 
* If the bug prevents you from completing the tests, set the environment variables to skip the specific test(s) until you or someone else fixes them.
** For example, to skip sanity.sh subtest 36g and 65, replay-single.sh subtest 42, and all of insanity.sh set in your environment:
+
:* For example, to skip sanity.sh subtest 36g and 65, replay-single.sh subtest 42, and all of insanity.sh set in your environment:
**:
+
**:: <pre><nowiki>
+
:<pre>
**::
+
;export SANITY_EXCEPT="36g 65"
**:: export SANITY_EXCEPT="36g 65"
+
;export REPLAY_SINGLE_EXCEPT=42
**:: export REPLAY_SINGLE_EXCEPT=42
+
;export INSANITY=no
**:: export INSANITY=no
+
</pre>
**:: </nowiki></pre>
+
 
**::
+
:* You can also skip tests on the command line. For example, when running acceptance-small:
** You can also skip tests on the command line. For example, when running acceptance-small:
+
**:
+
:<pre>
**:: <pre><nowiki>
+
;SANITY_EXCEPT="36g 65" REPLAY_SINGLE_EXCEPT=42 INSANITY=no ./acceptance-small.sh
**:: SANITY_EXCEPT="36g 65" REPLAY_SINGLE_EXCEPT=42 INSANITY=no ./acceptance-small.sh
+
</pre>
**:: </nowiki></pre>
+
 
** The test framework is very flexible, and it is a very easy "hands-off" way of running testing while you are doing other things, like coding.
+
:* The test framework is very flexible, and it is a very easy "hands-off" way of running testing while you are doing other things, like coding.
** Questions/problems with the test framework should be emailed to the lustre-discuss mailing list, so all Lustre users can benefit from improving and documenting it.
+
:* Questions/problems with the test framework should be emailed to the [[Lustre Mailing Lists|''lustre-discuss'' mailing list]], so all Lustre users can benefit from improving and documenting it.
 
* If you do not run the entire test suite regularly, you will have no idea whether a bug is added from your code or not, and you will waste a lot of time looking.
 
* If you do not run the entire test suite regularly, you will have no idea whether a bug is added from your code or not, and you will waste a lot of time looking.
  
Line 227: Line 292:
 
==How do you run acc-sm with and without reformat?==
 
==How do you run acc-sm with and without reformat?==
  
By default, the acc-sm test suite does not reformat Lustre. If you want to reformat Lustre, run acc-sm with REFORMAT="--reformat":
+
By default, the acc-sm test suite does not reformat Lustre. If this is a new system or if you are using new devices and want to reformat Lustre, run acc-sm with REFORMAT="--reformat":
  
 
  REFORMAT="--reformat" sh acceptance-small.sh
 
  REFORMAT="--reformat" sh acceptance-small.sh
Line 247: Line 312:
 
==What is the SLOW variable and how is it used with acc-sm?==
 
==What is the SLOW variable and how is it used with acc-sm?==
  
The SLOW variable is used to run a subset of acc-sm tests. By default, the variable is set to SLOW=no, which causes some of the longer acc-sm tests to be skipped and acc-sm test run to complete in less than 2 hours. To run all of the acc-sm tests, set the variable to SLOW=yes:
+
The SLOW variable is used to run a subset of acc-sm tests.  
 +
 
 +
*By default, SLOW is set to "no" (SLOW=no), which causes some of the longer acc-sm tests to be skipped and acc-sm test run to complete in less than 2 hours.  
 +
*If SLOW is set to "yes" (SLOW=yes), then all acc-sm tests are run.
  
 
  SLOW=yes sh acceptance-small.sh
 
  SLOW=yes sh acceptance-small.sh
Line 253: Line 321:
 
==What is the FAIL_ON_ERROR variable and how is it used with acc-sm?==
 
==What is the FAIL_ON_ERROR variable and how is it used with acc-sm?==
  
The FAIL_ON_ERROR variable is used to "stop" or "continue" running acc-sm tests after a test failure occurs. If the variable is set to "true" (FAIL_ON_ERROR=true), then acc-sm stops after test_N fails and test_N+1 does not run. If the variable is set to "false" (FAIL_ON_ERROR=false), then acc-sm continues after test_N fails and test_N+1 does run.
+
The FAIL_ON_ERROR variable is used to "stop" or "continue" running acc-sm tests after a test failure occurs.  
  
FALSE_ON_ERROR=false, by default, for the sanity, sanityn and sanity-quota tests. FALSE_ON_ERROR=true for the replay/recovery tests.
+
* If FAIL_ON_ERROR is set to "true" (FAIL_ON_ERROR=true), then acc-sm stops after test_N fails and test_N+1 does not run. By default, FAIL_ON_ERROR=true for the REPLAY and RECOVERY tests.
 +
 
 +
* If FAIL_ON_ERROR is set to "false" (FAIL_ON_ERROR=false), then acc-sm continues after test_N fails and test_N+1 does run. By default, FAIL_ON_ERROR=false for the SANITY, SANITYN and SANITY_QUOTA tests.
  
 
==What is the PDSH variable and how it is used with acc-sm?==
 
==What is the PDSH variable and how it is used with acc-sm?==
Line 264: Line 334:
  
 
If the client has no access to the servers, you can run acc-sm without PDSH, but the tests which need PDSH access are skipped. A summary report is generated which lists the skipped tests.
 
If the client has no access to the servers, you can run acc-sm without PDSH, but the tests which need PDSH access are skipped. A summary report is generated which lists the skipped tests.
 +
 +
==What is the LOAD_MODULES_REMOTE variable and how is it used with acc-sm?==
 +
 +
The LOAD_MODULES_REMOTE variable is used to load/unload modules on remote nodes.
 +
 +
*By default, LOAD_MODULES_REMOTE is set to "false" (LOAD_MODULES_REMOTE=false), and modules are not loaded or unloaded on remote nodes during acceptance small testing.
 +
 +
*If LOAD_MODULES_REMOTE is set to "true" (LOAD_MODULES_REMOTE=true), then modules are loaded/unloaded on remote nodes when running the acc-sm tests:
 +
 +
LOAD_MODULES_REMOTE=true sh acceptance-small.sh
 +
 +
==What is the EXCEPT_LIST_FILE variable and how is it used with acc-sm?==
 +
 +
In Lustre 1.8.2 and later, the EXCEPT_LIST_FILE variable can be used to specify the tests-to-skip file, which tracks the tests to skip during acc-sm runs. To specify the EXCEPT_LIST_FILE parameter, set the following in your Lustre environment:
 +
 +
EXCEPT_LIST_FILE=/full/path/to/skip/file # 
 +
 +
The tests-to-skip file can also be specified by having a file named tests-to-skip.sh in the LUSTRE/tests/cfg directory. The EXCEPT_LIST_FILE variable will be used if it is defined. Otherwise, the script looks for LUSTRE/tests/cfg/tests-to-skip.sh and uses this file, if it exists.
 +
 +
If a tests-to-skip file is found, its contents are dumped to stdout before it is read into the t-f environment so the file's contents are visible in the rest results. By following a structured format of commenting skip entries, the tests-to-skip.sh file can serve as a log of test failures and help track bugs associated with those failures (for easy reference).
 +
 +
This is a sample tests-to-skip file:
 +
 +
## SAMPLES for ONLYs
 +
#export ACC_SM_ONLY="METADATA_UPDATES"
 +
#export ONLY="25 26 27 28 29"
 +
 +
export SANITY_EXCEPT="${SANITY_EXCEPT} 71" # requires dbench
 +
export SANITY_EXCEPT="${SANITY_EXCEPT} 117" # bz-21361 crashes on raven, single-node acc-sm
 +
export SANITY_EXCEPT="${SANITY_EXCEPT} 900" # does not seem to work on raven
 +
 +
export SANITYN_EXCEPT="${SANITYN_EXCEPT} 16" # bz-21173 test_16 fails with 120 running fsx
 +
 +
export REPLAY_SINGLE_EXCEPT="${REPLAY_SINGLE_EXCEPT} 70b" # bz-19480 - hitting on raven
 +
export OST_POOLS_EXCEPT="${OST_POOLS_EXCEPT} 23"        # bz-21224 - uses lfs quotacheck which crashes the node
 +
 +
# entries may be commented out to test fixes when available like this line below
 +
#export REPLAY_DUAL_EXCEPT="${REPLAY_DUAL_EXCEPT} 14b" # bz-19884
 +
 +
# the lines above turn on/off individual test cases
 +
# the lines below turn on/off entire test suites
 +
# lines preceded by comments will be run
 +
# lines which are not commented and set the name of the test suite to "no" will be skipped.
 +
 +
export SLOW="no"
 +
# export RUNTESTS="no"
 +
# export SANITY="no"
 +
# export FSX="no"
 +
# export DBENCH="no"
 +
# export BONNIE="no"
 +
# export IOZONE="no"
 +
# export SANITYN="no"
 +
export LFSCK="no"              # 1.8.1: bz 19477
 +
# export LIBLUSTRE="no"
 +
# export RACER="no"
 +
# export REPLAY_SINGLE="no"
 +
# export CONF_SANITY="no"
 +
# export RECOVERY_SMALL="no"
 +
# export REPLAY_OST_SINGLE="no"
 +
# export REPLAY_DUAL="no"
 +
# export REPLAY_VBR="no"
 +
# export INSANITY="no"
 +
# export LARGE_SCALE="no"
 +
export SANITY_QUOTA="no"        # bz-21224
 +
# export RECOVERY_MDS_SCALE="no"
 +
# export RECOVERY_DOUBLE_SCALE="no"
 +
# export RECOVERY_RANDOM_SCALE="no"
 +
# export PARALLEL_SCALE="no"
 +
# export METADATA_UPDATES="no"
 +
# export OST_POOLS="no"
 +
 +
==What is the NFSCLIENT variable and how is it used with acc-sm?==
 +
 +
The NFSCLIENT variable is used to skip any Lustre-related check, setup or cleanup (on both servers and clients). NFSCLIENT can also be used to run acc-sm tests on any type of file system mounted on $MOUNT (like an NFS client).
 +
 +
Use this setting to switch acc-sm to NFSCLIENT mode:
 +
 +
NFSCLIENT=yes
 +
 +
'''Note''':
 +
* Acc-sm testing does not set up or start the NFS servers and clients, it only runs the specified tests.
 +
 +
* It does not makes sense to run Lustre-specific tests in this mode.
 +
 +
==What is the CLIENTONLY variable and how is it used with acc-sm?==
 +
 +
The CLIENTONLY variable is used to skip any actions on servers, such as unmounting server devices, formatting the devices, etc.
 +
 +
Use this setting to switch acc-sm to CLIENTONLY mode.
 +
 +
CLIENTONLY=yes
 +
 +
==What is the SHARED_DIR_LOGS variable and how is it used with acc-sm?==
 +
 +
The SHARED_DIR_LOGS variable is used to gather acc-sm logs. By default, the SHARED_DIR_LOGS variable is empty; it has no default value.
 +
 +
* If SHARED_DIR_LOGS is not set (empty), then logs from all cluster nodes are copied into the $TMP directory of the local client using the ''rsync -az'' command.
 +
 +
* If SHARED_DIR_LOGS is set, then logs from all cluster nodes are not copied to the local client, and the user can grab them directly from this shared directory.
 +
 +
To specify the SHARED_DIR_LOGS parameter, set the following in your testing environment:
 +
 +
SHARED_DIR_LOGS=/path/to/shared/dir
 +
 +
==What is the SHARED_DIRECTORY variable and how is it used with acc-sm?==
 +
 +
The SHARED_DIRECTORY variable is used by the following recovery scale test suites to monitor client load processes:
 +
 +
* recovery-mds-scale.sh (RECOVERY_MDS_SCALE)
 +
* recovery-random-scale.sh (RECOVERY_RANDOM_SCALE)
 +
* recovery-double-scale.sh (RECOVERY_DOUBLE_SCALE)
 +
 +
By default, SHARED_DIRECTORY is empty; it has no default value.
 +
 +
SHARED_DIRECTORY variable must be set to a reasonable value; otherwise, the above-listed test suites will fail.
 +
 +
To specify SHARED_DIRECTORY, set the following in your test environment:
 +
 +
SHARED_DIRECTORY=/path/to/shared/dir
 +
 +
==What is the LOADS variable and how is it used with acc-sm?==
 +
 +
The LOADS variable specifies the list of utilities (utils) which are run on clients during recovery scale tests. The default list of utils is LOADS="dd tar dbench iozone IOR".
 +
 +
Each util is run by a corresponding script:
 +
 +
* run_dd.sh
 +
* run_tar.sh
 +
* run_dbench.sh
 +
* run_IOR.sh
 +
* run_iozone.sh
 +
 +
A user can change this list to contain only those utils which he wants to run, i.e. if the user does not have mpirun and iozone installed on the clients, then he can remove IOR and IOzone from the list and use LOADS="dd tar dbench".
 +
 +
==What is the FAILURE_MODE variable and how is it used with acc-sm?==
 +
 +
The FAILURE_MODE variable is used to set the mode in which acc-sm emulates facet failure.
 +
 +
*By default, FAILURE_MODE is set to "SOFT" (FAILURE_MODE=SOFT), which causes acc-sm to perform umount -f facet.
 +
 +
*If FAILURE_MODE is set to "HARD" (FAILURE_MODE=HARD), then acc-sm runs the functions specified by the following variables:
 +
 +
POWER_DOWN=${POWER_DOWN:-"powerman --off"}
 +
POWER_UP=${POWER_UP:-"powerman --on"}
 +
 +
The user can define his own "power down" and "power up" functions:
 +
 +
export POWER_DOWN="pdsh -S -w 10.8.0.118 /usr/bin/powerman -0";
 +
export POWER_UP="pdsh -S -w 10.8.0.118 /usr/bin/powerman -1";
  
 
==What is the CMD configuration for HEAD?==
 
==What is the CMD configuration for HEAD?==
  
For the HEAD branch, specify the MDSCOUNT variable (number of MDTs). By default, the variable is set to 1. If you have a Lustre configuration with several MDT nodes, they need to be specified in the configuration file as mds1_HOST, mds2_HOST, ...
+
For the HEAD branch, specify the MDSCOUNT variable (number of MDTs). By default, MDSCOUNT is set to 1. If the Lustre configuration has several MDT nodes, they need to be specified in the configuration file as mds1_HOST, mds2_HOST, ...
  
 
By default, all of these variables are set to the mds_HOST value.
 
By default, all of these variables are set to the mds_HOST value.
Line 273: Line 492:
 
==What do we do with the acc-sm test results?==
 
==What do we do with the acc-sm test results?==
  
Acc-sm results are sent to Buffalo, a web interface for Lustre test results. The default Buffalo display shows a summary of tests run on different hardware configurations for various CVS branches for the past 24 hours, with links to the various reports. For more information on reporting test results
+
If an acc-sm test fails, the failure is investigated. If the investigation reveals there is a Lustre defect, a bug is opened in [https://bugzilla.lustre.org/ Bugzilla] to fix the problem and the acc-sm defect.
to Buffalo, see [http://wiki.lustre.org/index.php?title=Buffalizing_Tests Buffalizing Tests].
+
 
+
If an acc-sm test fails, then the failure is investigated. If the investigation reveals there is a Lustre defect, a bug is opened in Bugzilla to fix the problem and also the acc-sm issue.
+

Latest revision as of 17:46, 10 March 2010

(Updated: Feb 2010)

Contents

The Lustre™ QE group and developers use acceptance-small (acc-sm) tests to catch bugs early in the development cycle. Within the Lustre group, acc-sm tests are run on YALA, an automated test system. This information is being published to describe the steps to perform acceptance small testing and encourage wider acc-sm testing in the Lustre community.

NOTE: For your convenience, this document is also available as a PDF.

What is acc-sm testing and why do we use it for Lustre?

Acceptance small (acc-sm) testing is a suite of test cases used to verify different aspects of Lustre functionality.

  • These tests are run using the acceptance-small.sh script.
  • The script is run from the lustre/tests directory in a compiled Lustre tree.
  • The acceptance-small.sh script runs a number of test scripts that are also run by the ltest (Buffalo) test harness on Lustre test clusters.

What tests comprise the acc-sm test suite?

Each Lustre tree contains a lustre/tests sub-directory; all acc-sm tests are stored here. The acceptance-small.sh file contains a list of all tests in the acc-sm suite. To get the list, run:

$ grep TESTSUITE_LIST acceptance-small.sh

The acc-sm tests are listed below, by branch.

b1_6 branch

This branch includes 18 acc-sm test suites.

$ grep TESTSUITE_LIST acceptance-small.sh
export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK
LIBLUSTRE RACER REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE
REPLAY_DUAL INSANITY SANITY_QUOTA PERFORMANCE_SANITY"

b1_8_gate branch

This branch includes 28 acc-sm test suites.

$ grep TESTSUITE_LIST acceptance-small.sh
export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK LIBLUSTRE 
RACER REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE REPLAY_DUAL REPLAY_VBR 
INSANITY SANITY_QUOTA PERFORMANCE_SANITY LARGE_SCALE RECOVERY_MDS_SCALE 
RECOVERY_DOUBLE_SCALE RECOVERY_RANDOM_SCALE PARALLEL_SCALE METADATA_UPDATES OST_POOLS 
SANITY_BENCHMARK LNET_SELFTEST"

HEAD branch

This branch includes 30 acc-sm test suites.

$ grep TESTSUITE_LIST acceptance-small.sh
export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK LIBLUSTRE 
RACER REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE REPLAY_DUAL REPLAY_VBR 
INSANITY SANITY_QUOTA SANITY_SEC SANITY_GSS PERFORMANCE_SANITY LARGE_SCALE 
RECOVERY_MDS_SCALE RECOVERY_DOUBLE_SCALE RECOVERY_RANDOM_SCALE PARALLEL_SCALE 
LUSTRE_RSYNC_TEST METADATA_UPDATES OST_POOLS SANITY_BENCHMARK"

To see the test cases in a particular acc-sm test, run:

$ grep run_ <test suite script>

For example, to see the last 3 test cases that comprise the SANITY test:

$ grep run_ sanity.sh | tail -3
run_test 130c "FIEMAP (2-stripe file with hole)"
run_test 130d "FIEMAP (N-stripe file)"
run_test 130e "FIEMAP (test continuation FIEMAP calls)"

What does each acc-sm test measure or show?

The acc-sm test suites are described below.

RUNTESTS
A basic regression test with unmount/remount.
SANITY
Verifies Lustre operation under normal operating conditions.
DBENCH
Dbench benchmark for simulating N clients to produce the filesystem load.
BONNIE
Bonnie++ benchmark for creation, reading and deleting many small files
IOZONE
IOzone benchmark for generating and measuring a variety of file operations.
FSX
Filesystem exerciser.
SANITYN
Verifies operations from two clients under normal operating conditions.
LFSCK
Tests e2fsck and lfsck to detect and fix filesystm corruption.
LIBLUSTRE
Runs a test linked to a liblustre client library.
RACER
Tests for filesystem race conditions by concurrently creating, moving, deleting, etc. a set of files.
REPLAY_SINGLE
Verifies recovery after an MDS failure.
CONF_SANITY
Verifies various Lustre configurations (including wrong ones), where the system must behave correctly.
RECOVERY_SMALL
Verifies RPC replay after a communications failure (message loss).
REPLAY_OST_SINGLE
Verifies recovery after an OST failure.
REPLAY_DUAL
Verifies recovery from two clients after a server failure.
REPLAY_VBR
Verifies version-based recovery feature.
INSANITY
Tests multiple concurrent failure conditions.
SANITY_QUOTA
Verifies filesystem quotas.
SANITY_SEC
Verifies Lustre identity features.
SANITY_GSS
Verifies GSS/Kerberos authentication features.
PERFORMANCE_SANITY
Performance mdsrate tests (small file create/open/delete, large file create/open/delete, lookup rate 10M file dir, lookup rate 10M file 10 dir, getattr small file, and getattr large files).
LARGE_SCALE
Large-scale tests that verify version-based recovery features.
RECOVERY_MDS_SCALE
The server failover test: for a duration of 24 hours, repeatedly fail over a random facet (MDS or OST) at 10 minute intervals and verify that no application errors occur.
RECOVERY_DOUBLE_SCALE
Failover test for all pair-wise combinations of node failures.
RECOVERY_RANDOM_SCALE
Verifies client failure not affecting other clients.
PARALLEL_SCALE
Runs functional tests (connectathon, cascading_rw, write_disjoint, write_append_truncate, parallel_grouplock, statahead), performance tests (IOR, compilebench and metabench), and a stress test (simul).
LUSTRE_RSYNC_TEST
Verifies the lustre_rsync (replication) feature.
METADATA_UPDATES
Distributed Metadata Update Test to verify that distributed metadata updates are properly completed when multiple clients create/delete files and modify the attributes of files.
OST_POOLS
Verifies the OST pools feature.

How do you get the acc-sm tests?

The acc-sm test suite is stored in the lustre/tests subdirectory.

Do you have to run every acc-sm test?

No. You can choose to run only specified acc-sm tests, start the test suite from a defined test, or stop the test suite at a defined test. Tests can be run either with or without the acceptance-sm.sh (acc-sm.sh) wrapper script. Here are several examples:

To only run the RUNTESTS and SANITY tests:

ACC_SM_ONLY=”RUNTESTS SANITY” sh acceptance-small.sh

To only run test_1 and test_2 of the SANITYN tests:

ACC_SM_ONLY=”SANITYN” ONLY=”1 2” sh acceptance-small.sh

To only run the replay-single.sh test and except (not run) the test_3* and test_4* tests:

ACC_SM_ONLY=”REPLAY_SINGLE” REPLAY_SINGLE_EXCEPT=”3 4” sh acceptance-small.sh

To only run conf-sanity.sh tests after #15 (without the acceptance-small.sh wrapper script):

CONF_SANITY_EXCEPT=”$(seq 15)“ sh conf-sanity.sh

To start the test suite from a defined test, use START_AT. For example:

ACC_SM_ONLY=SANITY_BENCHMARK START_AT=fsx sh acceptance-small.sh 

-or-

ACC_SM_ONLY=SANITY START_AT=24c sh acceptance-small.sh 

To stop the test suite at a defined test, use STOP_AT. For example:

 ACC_SM_ONLY=SANITY STOP_AT=77j sh acceptance-small.sh

To start and stop the test suite to define a range of tests, use START_AT and STOP_AT. For example, to run sanity tests from 23b to 24f:

ACC_SM_ONLY=SANITY START_AT=23b STOP_AT=24f sh acceptance-small.sh

Do the acc-sm tests have to be run in a specific order?

The test order is defined in the acceptance-small.sh script and in each test script. Users do not have to (and should not) do anything to change the order of tests.

Who runs the acc-sm tests?

Currently, the QE group and Lustre developers run acc-sm as the main test suite for Lustre testing. Acc-sm tests are run on YALA, the automated test system, with test reports submitted to Buffalo (a web interface that allows for browsing various Lustre test results). We welcome external contributions to the Lustre acc-sm test efforts – either of the Lustre code base or new testing platforms.

What type of Lustre environment is needed to run the acc-sm tests?

The default Lustre configuration is a single node setup with mdscount=1 and ostcount=2. All devices are loop back devices. YALA does not use a default configuration.

To run the acc-sm test suite on a non-default Lustre configuration, you have to modify the default settings in the acc-sm configuration file, lustre/tests/cfg/local.sh. The configuration variables include mds_HOST, ost_HOST, OSTCOUNT, MDS_MOUNT_OPTS and OST_MOUNT_OPTS, among others.

To create your own configuration file, copy cfg/local.sh to cfg/my_config.sh:

cp cfg/local.sh cfg/my_config.sh

Edit the necessary variables in the configuration file (my_config.sh) and run acc-sm as:

NAME=my_config sh acceptance-small.sh

Are there other acc-sm requirements or is anything special needed?

Acc-sm testing requires the following programs be installed:

  • iozone
  • dbench
  • bonnie++

What are the steps to run acc-sm?

There are two methods to run the acc-sm tests.

1. Check out a Lustre branch (b1_6, b1_8 or HEAD).

2. Change directory to lustre/tests:

cd lustre/tests

3. Build lustre/tests.

4. Run acc-sm on a local, default Lustre configuration (1 MGS/MDT, 1 OST and 1 client):

sh acceptance-small.sh 2>&1 | tee /tmp/output

- OR -

1. Install the lustre-tests RPM (available at lts-head:/var/cache/cfs/PACKAGE/rpm/lustre).

2. Change directory to lustre/tests:

cd /usr/lib/lustre/tests

3. Create your own configuration file and edit it for your configuration.

cp cfg/local.sh cfg/my_config.sh

4. Run acc-sm on a local Lustre configuration.

Here is an example of running acc-sm on a non-default Lustre configuration (MDS is sfire7, OST is sfire8, OSCOUNT=1, etc). In this example, only the SANITY test cases are being run.

ACC_SM_ONLY=SANITY mds_HOST=sfire7 ost1_HOST=sfire8 MDSDEV1=/dev/sda1
OSTCOUNT=1 OSTDEV1=/dev/sda1 MDSSIZE=5000000 OSTSIZE=5000000
MDS_MOUNT_OPTS="-o user_xattr" OST_MOUNT_OPTS=" -o user_xattr"
REFORMAT="--reformat" PDSH="pdsh -S -w" sh acceptance-small.sh

What if I hit a failure on an acc-sm test?

  • If you regularly hit a failure in any of these tests, check if a bug has been reported on the failure or file a new bug if one has not yet been opened.
  • If the bug prevents you from completing the tests, set the environment variables to skip the specific test(s) until you or someone else fixes them.
  • For example, to skip sanity.sh subtest 36g and 65, replay-single.sh subtest 42, and all of insanity.sh set in your environment:
export SANITY_EXCEPT="36g 65"
export REPLAY_SINGLE_EXCEPT=42
export INSANITY=no
  • You can also skip tests on the command line. For example, when running acceptance-small:
SANITY_EXCEPT="36g 65" REPLAY_SINGLE_EXCEPT=42 INSANITY=no ./acceptance-small.sh
  • The test framework is very flexible, and it is a very easy "hands-off" way of running testing while you are doing other things, like coding.
  • Questions/problems with the test framework should be emailed to the lustre-discuss mailing list, so all Lustre users can benefit from improving and documenting it.
  • If you do not run the entire test suite regularly, you will have no idea whether a bug is added from your code or not, and you will waste a lot of time looking.

How do you run acc-sm on a mounted Lustre system?

To run acc-sm on a Lustre system that is already mounted, you need to use the correct configuration file (according to the mounted Lustre system) and run acc-sm as:

SETUP=: CLEANUP=: FORMAT=: NAME=<config> sh acceptance-small.sh

How do you run acc-sm with and without reformat?

By default, the acc-sm test suite does not reformat Lustre. If this is a new system or if you are using new devices and want to reformat Lustre, run acc-sm with REFORMAT="--reformat":

REFORMAT="--reformat" sh acceptance-small.sh

If needed, you can specify WRITECONF="writeconf", and then run acc-sm with WRITECONF="writeconf":

WRITECONF="writeconf" sh acceptance-small.sh

How do you run acc-sm in a Lustre configuration with several clients?

The default configuration file for acc-sm is cfg/local.sh, which uses only one client (local). To use additional remote clients, specify the RCLIENTS list and use the cfg/ncli.sh configuration file (or your own copy of ncli configuration).

NAME=ncli RCLIENT=<space-separated list of remote clients> sh acceptance-small.sh

For example:

NAME=ncli RCLIENT="client2 client3 client11" sh acceptance-small.sh

What is the SLOW variable and how is it used with acc-sm?

The SLOW variable is used to run a subset of acc-sm tests.

  • By default, SLOW is set to "no" (SLOW=no), which causes some of the longer acc-sm tests to be skipped and acc-sm test run to complete in less than 2 hours.
  • If SLOW is set to "yes" (SLOW=yes), then all acc-sm tests are run.
SLOW=yes sh acceptance-small.sh

What is the FAIL_ON_ERROR variable and how is it used with acc-sm?

The FAIL_ON_ERROR variable is used to "stop" or "continue" running acc-sm tests after a test failure occurs.

  • If FAIL_ON_ERROR is set to "true" (FAIL_ON_ERROR=true), then acc-sm stops after test_N fails and test_N+1 does not run. By default, FAIL_ON_ERROR=true for the REPLAY and RECOVERY tests.
  • If FAIL_ON_ERROR is set to "false" (FAIL_ON_ERROR=false), then acc-sm continues after test_N fails and test_N+1 does run. By default, FAIL_ON_ERROR=false for the SANITY, SANITYN and SANITY_QUOTA tests.

What is the PDSH variable and how it is used with acc-sm?

The PDSH variable is used to provide remote shell access. If acc-sm is run on a Lustre configuration with remote servers, specify PDSH like this:

PDSH="pdsh -S w" sh acceptance-small.sh

If the client has no access to the servers, you can run acc-sm without PDSH, but the tests which need PDSH access are skipped. A summary report is generated which lists the skipped tests.

What is the LOAD_MODULES_REMOTE variable and how is it used with acc-sm?

The LOAD_MODULES_REMOTE variable is used to load/unload modules on remote nodes.

  • By default, LOAD_MODULES_REMOTE is set to "false" (LOAD_MODULES_REMOTE=false), and modules are not loaded or unloaded on remote nodes during acceptance small testing.
  • If LOAD_MODULES_REMOTE is set to "true" (LOAD_MODULES_REMOTE=true), then modules are loaded/unloaded on remote nodes when running the acc-sm tests:
LOAD_MODULES_REMOTE=true sh acceptance-small.sh

What is the EXCEPT_LIST_FILE variable and how is it used with acc-sm?

In Lustre 1.8.2 and later, the EXCEPT_LIST_FILE variable can be used to specify the tests-to-skip file, which tracks the tests to skip during acc-sm runs. To specify the EXCEPT_LIST_FILE parameter, set the following in your Lustre environment:

EXCEPT_LIST_FILE=/full/path/to/skip/file #  

The tests-to-skip file can also be specified by having a file named tests-to-skip.sh in the LUSTRE/tests/cfg directory. The EXCEPT_LIST_FILE variable will be used if it is defined. Otherwise, the script looks for LUSTRE/tests/cfg/tests-to-skip.sh and uses this file, if it exists.

If a tests-to-skip file is found, its contents are dumped to stdout before it is read into the t-f environment so the file's contents are visible in the rest results. By following a structured format of commenting skip entries, the tests-to-skip.sh file can serve as a log of test failures and help track bugs associated with those failures (for easy reference).

This is a sample tests-to-skip file:

## SAMPLES for ONLYs 
#export ACC_SM_ONLY="METADATA_UPDATES" 
#export ONLY="25 26 27 28 29" 

export SANITY_EXCEPT="${SANITY_EXCEPT} 71" # requires dbench 
export SANITY_EXCEPT="${SANITY_EXCEPT} 117" # bz-21361 crashes on raven, single-node acc-sm 
export SANITY_EXCEPT="${SANITY_EXCEPT} 900" # does not seem to work on raven 

export SANITYN_EXCEPT="${SANITYN_EXCEPT} 16" # bz-21173 test_16 fails with 120 running fsx 

export REPLAY_SINGLE_EXCEPT="${REPLAY_SINGLE_EXCEPT} 70b" # bz-19480 - hitting on raven 
export OST_POOLS_EXCEPT="${OST_POOLS_EXCEPT} 23"        # bz-21224 - uses lfs quotacheck which crashes the node 

# entries may be commented out to test fixes when available like this line below 
#export REPLAY_DUAL_EXCEPT="${REPLAY_DUAL_EXCEPT} 14b" # bz-19884 

# the lines above turn on/off individual test cases 
# the lines below turn on/off entire test suites 
# lines preceded by comments will be run 
# lines which are not commented and set the name of the test suite to "no" will be skipped. 

export SLOW="no" 
# export RUNTESTS="no" 
# export SANITY="no" 
# export FSX="no" 
# export DBENCH="no" 
# export BONNIE="no" 
# export IOZONE="no" 
# export SANITYN="no" 
export LFSCK="no"               # 1.8.1: bz 19477 
# export LIBLUSTRE="no" 
# export RACER="no" 
# export REPLAY_SINGLE="no" 
# export CONF_SANITY="no" 
# export RECOVERY_SMALL="no" 
# export REPLAY_OST_SINGLE="no" 
# export REPLAY_DUAL="no" 
# export REPLAY_VBR="no" 
# export INSANITY="no" 
# export LARGE_SCALE="no" 
export SANITY_QUOTA="no"        # bz-21224 
# export RECOVERY_MDS_SCALE="no" 
# export RECOVERY_DOUBLE_SCALE="no" 
# export RECOVERY_RANDOM_SCALE="no" 
# export PARALLEL_SCALE="no" 
# export METADATA_UPDATES="no" 
# export OST_POOLS="no"

What is the NFSCLIENT variable and how is it used with acc-sm?

The NFSCLIENT variable is used to skip any Lustre-related check, setup or cleanup (on both servers and clients). NFSCLIENT can also be used to run acc-sm tests on any type of file system mounted on $MOUNT (like an NFS client).

Use this setting to switch acc-sm to NFSCLIENT mode:

NFSCLIENT=yes

Note:

  • Acc-sm testing does not set up or start the NFS servers and clients, it only runs the specified tests.
  • It does not makes sense to run Lustre-specific tests in this mode.

What is the CLIENTONLY variable and how is it used with acc-sm?

The CLIENTONLY variable is used to skip any actions on servers, such as unmounting server devices, formatting the devices, etc.

Use this setting to switch acc-sm to CLIENTONLY mode.

CLIENTONLY=yes

What is the SHARED_DIR_LOGS variable and how is it used with acc-sm?

The SHARED_DIR_LOGS variable is used to gather acc-sm logs. By default, the SHARED_DIR_LOGS variable is empty; it has no default value.

  • If SHARED_DIR_LOGS is not set (empty), then logs from all cluster nodes are copied into the $TMP directory of the local client using the rsync -az command.
  • If SHARED_DIR_LOGS is set, then logs from all cluster nodes are not copied to the local client, and the user can grab them directly from this shared directory.

To specify the SHARED_DIR_LOGS parameter, set the following in your testing environment:

SHARED_DIR_LOGS=/path/to/shared/dir

What is the SHARED_DIRECTORY variable and how is it used with acc-sm?

The SHARED_DIRECTORY variable is used by the following recovery scale test suites to monitor client load processes:

  • recovery-mds-scale.sh (RECOVERY_MDS_SCALE)
  • recovery-random-scale.sh (RECOVERY_RANDOM_SCALE)
  • recovery-double-scale.sh (RECOVERY_DOUBLE_SCALE)

By default, SHARED_DIRECTORY is empty; it has no default value.

SHARED_DIRECTORY variable must be set to a reasonable value; otherwise, the above-listed test suites will fail.

To specify SHARED_DIRECTORY, set the following in your test environment:

SHARED_DIRECTORY=/path/to/shared/dir

What is the LOADS variable and how is it used with acc-sm?

The LOADS variable specifies the list of utilities (utils) which are run on clients during recovery scale tests. The default list of utils is LOADS="dd tar dbench iozone IOR".

Each util is run by a corresponding script:

  • run_dd.sh
  • run_tar.sh
  • run_dbench.sh
  • run_IOR.sh
  • run_iozone.sh

A user can change this list to contain only those utils which he wants to run, i.e. if the user does not have mpirun and iozone installed on the clients, then he can remove IOR and IOzone from the list and use LOADS="dd tar dbench".

What is the FAILURE_MODE variable and how is it used with acc-sm?

The FAILURE_MODE variable is used to set the mode in which acc-sm emulates facet failure.

  • By default, FAILURE_MODE is set to "SOFT" (FAILURE_MODE=SOFT), which causes acc-sm to perform umount -f facet.
  • If FAILURE_MODE is set to "HARD" (FAILURE_MODE=HARD), then acc-sm runs the functions specified by the following variables:
POWER_DOWN=${POWER_DOWN:-"powerman --off"}
POWER_UP=${POWER_UP:-"powerman --on"}

The user can define his own "power down" and "power up" functions:

export POWER_DOWN="pdsh -S -w 10.8.0.118 /usr/bin/powerman -0";
export POWER_UP="pdsh -S -w 10.8.0.118 /usr/bin/powerman -1";

What is the CMD configuration for HEAD?

For the HEAD branch, specify the MDSCOUNT variable (number of MDTs). By default, MDSCOUNT is set to 1. If the Lustre configuration has several MDT nodes, they need to be specified in the configuration file as mds1_HOST, mds2_HOST, ...

By default, all of these variables are set to the mds_HOST value.

What do we do with the acc-sm test results?

If an acc-sm test fails, the failure is investigated. If the investigation reveals there is a Lustre defect, a bug is opened in Bugzilla to fix the problem and the acc-sm defect.

Personal tools
Navigation