WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Lustre DDN Tuning

From Obsolete Lustre Wiki
Jump to navigationJump to search

DISCLAIMER - EXTERNAL CONTRIBUTOR CONTENT

This content was submitted by an external contributor. We provide this information as a resource for the Lustre™ open-source community, but we make no representation as to the accuracy, completeness or reliability of this information.


This page provides guidelines to configure DDN storage arrays for use with Lustre. For more complete information on DDN tuning, refer to the performance management section of the DDN manual for your product.

This section covers the following DDN arrays:

  • S2A 8500
  • S2A 9500
  • S2A 9550

Setting Readahead and MF

For the S2A DDN 8500 storage array, we recommend that you disable readahead. In a 1000-client system, if each client has up to 8 read RPCs in flight, then this is 8 * 1000 * 1 MB = 8 GB of reads in flight. With a DDN cache in the range of 2 to 5 GB (depending on the model), it is unlikely that the LUN-based readahead would have ANY cache hits even if the file data were contiguous on disk (generally, file data is not contiguous). The Multiplication Factor (MF) also influences the readahead; you should disable it.

CLI commands for the DDN are:

cache prefetch=0
cache MF=off

For the S2A 9500 and S2A 9550 DDN storage arrays, we recommend that you use the above commands to disable readahead.

Setting Segment Size

The cache segment size noticeably affects I/O performance. Set the cache segment size differently on the MDT (which does small, random I/O) and on the OST (which does large, contiguous I/O). In customer testing, we have found the optimal values to be 64 KB for the MDT and 1 MB for the OST.

Note: The cache size parameter is common to all LUNs on a single DDN and cannot be changed on a per-LUN basis.

These are CLI commands for the DDN.

  • For the MDT LUN:
$ cache size=64
size is in KB, 64, 128, 256, 512, 1024, and 2048. Default 128
  • For the OST LUN:
$ cache size=1024

Setting Write-Back Cache

Performance is noticeably improved by running Lustre with write-back cache turned on. However, there is a risk that when the DDN controller crashes you need to run e2fsck. Still, it takes less time than the performance hit from running with the write-back cache turned off.

For increased data security and in failover configurations, you may prefer to run with write-back cache off. However, you might experience performance problems with the small writes during journal flush. In this mode, it is highly beneficial to increase the number of OST service threads option ost ost_num_threads=512 in /etc/modprobe.conf. The OST should have enough RAM (about 1.5 MB /thread is preallocated for I/O buffers). Having more I/O threads allows you to have more I/O requests in flight, waiting for the disk to complete the synchronous write.

You have to decide whether performance is more important than the slight risk of data loss and downtime in case of a hardware/software problem on the DDN.

Note: There is no risk from an OSS/MDS node crashing, only if the DDN itself fails.

Setting maxcmds

For the S2A DDN 8500 array, changing maxcmds to 4 (from the default 2) improved write performance by as much as 30 percent in a particular case. This only works with SATA-based disks and when only one controller of the pair is actually accessing the shared LUNs.

However, this setting comes with a warning. DDN support does not recommend changing this setting from the default. By increasing the value to 5, the same setup experienced some serious problems.

The CLI command for the DDN client is provided below (default value is 2).

$ disk maxcmds=3

For S2A DDN 9500/9550 hardware and above, you can safely change the default from 6 to 16. Although the maximum value is 32, values higher than 16 are not currently recommended by DDN support.

Note: For help determining an appropriate maxcmds value, refer to the PDF provided with the DDN firmware. This PDF lists recommended values for that specific firmware version.

maxcmds

S2A 8500:

One customer experienced a 30% improvement in write performance by changing this value from the default 2 to 4. This works only with SATA-based disks and _only_ if you can guarantee that only one controller of the pair will be actually accessing the shared LUNs

This information comes with a warning, as DDN support do not recommend changing this setting from the default. By increasing the value to 5, the same customer experienced some serious problems.

The DDN cli commands needed are:

  disk   maxcmds=3                       # default is 2       

S2A 9500/9550: For this hardware, a value of 16 is recommended. The default value is 6. The maximum value is 32 but values above 16 are not currently recommended by DDN support.

Illustration - one OST per Tier

                                   Capacity  Block
LUN  Label    Owner  Status        (Mbytes)  Size  Tiers Tier list
------------------------------------------------------------------
 0              1    Ready             512    1  1
 1              1    Ready             512    1  2
 2              1    Ready             512    1  3
 3              1    Ready             512    1  4
 4              2    Ready [GHS]        1  5
 5              2    Ready [GHS]        1  6
 6              2    Critical          512    1  7
 7              2    Critical           1  8
10              1  Cache Locked          64   512    1  1
11              1  Cache Locked          64   512    1  2
12              1  Cache Locked          64   512    1  3
13              1  Cache Locked          64   512    1  4
14              2    Ready [GHS]         64   512    1  5
15              2    Ready [GHS]         64   512    1  6
16              2    Critical            64   512    1  7
17              2    Critical            64   512    1  8
 System verify extent: 16 Mbytes
 System verify delay:  30