Using Red Hat Cluster Manager with Lustre: Difference between revisions

Latest revision as of 09:13, 20 December 2010

(Updated: Dec 2010)

DISCLAIMER - EXTERNAL CONTRIBUTOR CONTENT

This content was submitted by an external contributor. We provide this information as a resource for the Lustre™ open-source community, but we make no representation as to the accuracy, completeness or reliability of this information.

This page describes how to configure and use Red Hat Cluster Manager with Lustre failover. Sven Trautmann has contributed this content.

For more about Lustre failover, see Configuring Lustre for Failover.

Preliminary Notes

This document is based on the RedHat Cluster version 2.0, which is part of RedHat Enterprise Linux version 5.5. For other versions or RHEL-based distributions, the syntax or methods to set up and run RedHat Cluster may differ.

In comparison with other HA solutions, RedHat Cluster as in RHEL 5.5 is an old HA solution. We recommend using other HA solutions like Pacemaker, if possible.

It is assumed that two Lustre server nodes share a number of Lustre targets. Each Lustre node provides a number of Lustre targets and, in case of a failure, the active/non-failed node takes over the Lustre targets of the failed nodes and makes them available to the Lustre clients.

Furthermore, to make sure the Lustre targets are mounted only on one of Lustre server nodes at a time, STONITH fencing is implemented. This requires a way to make sure the failed node is shut down in case of a failure. In the examples below, it is assumed that the Lustre server nodes are equipped with a service processor allowing to shut down a failed node using IPMI. For other methods of fencing, refer to the RedHat Cluster documentation.

Setting Up RedHat Cluster

Setting up RedHat Cluster consists of three steps:

setup openais,
configure the cluster and,
start the RedHat cluster services

Setting Up the openais Communication Stack

The openais package is distributed with RHEL and can be installed using

rpm -i /path/to/RHEL-DVD/Server/openais0.80.6-16.el5.x86_64.rpm

or

yum install openais

if yum is configured to access the RHEL repository.

Once installed, the software looks for a configuration in the file /etc/ais/openais.conf .

Complete the following steps to set up the openais communication stack:

1. Edit the totem section of the openais.conf configuration file to designate the IP address and netmask of the interface(s) to be used. The totem section of the configuration file describes the way openais communicates between nodes.

totem {
	version: 2
	secauth: off
	threads: 0
	interface {
		ringnumber: 0
		bindnetaddr: 10.0.0.0
		mcastaddr: 226.94.1.1
		mcastport: 5405
	}
}

Openais uses the option bindnetaddr to determine which interface is to be used for cluster communication. In the example shown above, it is assumed that one of the node’s interfaces is configured on the network 10.0.0.0. The value of the option is calculated from the IP address AND the network mask for the interface (IP & MASK) so the final bits of the address are cleared. Thus the configuration file is independent of any node and can be copied to all nodes.

2. Create an AIS key

# /usr/sbin/ais-keygen
OpenAIS Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Writing openais key to /etc/ais/authkey.

Installing RedHat Cluster

The minimum installation of RedHat Cluster consists of the Cluster Manager package cman and the Resource Group Manager package rgmanager. The cman package can be found in the RHEL repository. The rgmanager package is part of the Cluster repository. It can be found on the RHEL DVD/ISO image in the Cluster sub-directory and may need to be added to the yum configuration manually. With yum configured correctly RedHat Cluster can be installed using:

yum install cman rgmanager

If yum is not set up correctly, the rpm packages and their dependencies need to be installed manually.

Installing the Lustrefs resource script

The rgmanager package includes a number of resource scripts (/usr/share/cluster) which are used to integrate resources like network interfaces or file systems with rgmanager. Unfortunately, there is no resource script for Lustre included.

Luckily Giacomo Montagner posted an resource script on the lustre-discuss mailing list:

http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090623/7799de37/attachment-0001.bin

After downloading this file it needs to be copied to /usr/share/cluster/lustrefs.sh. Make sure the script is executable.

Configure your Cluster

RedHat Cluster uses /etc/cluster/cluster.conf as central configuration file. This file is in XML format. The complete schema of the XML file can be found at http://sources.redhat.com/cluster/doc/cluster_schema_rhel5.html.

The Basic structure of a cluster.conf file may look like this:

<?xml version="1.0" ?>
<cluster config_version="1" name="Lustre">
...
</cluster>

In this example the name of the cluster is set to Lustre and the version is initialized as 1. If the cluster configuration is updated the config_version attribute must be increased on all nodes in this cluster. RedHat cluster is usually used with more than two nodes providing resources. To tell RedHat cluster to work with two nodes the following cman attributes need to be set:

  <cman expected_votes="1" two_node="1"/>

This tells cman, that there are only two nodes in a cluster and one vote is enough declare a node failed.

Nodes

Next the nodes which form the cluster need to be specified. Each cluster node need to be specified separately wrapped in an surrounding clusternodes tag.

  <clusternodes>
    <clusternode name="lustre1" nodeid="1">
      <fence>
        <method name="single">
          <device lanplus="1" name="lustre1-sp"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="lustre2" nodeid="2">
      <fence>
        <method name="single">
          <device lanplus="1" name="lustre2-sp"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>

Each cluster node is given a name which must be it's hostname or IP address. Additionally a unique node ID needs to be specified. The fence tag assigned to each node specifies a fence device to use to shut down this cluster node. The fence devices are defined elsewhere in cluster.conf (see below for details).

Fencing

Fencing is essential to keep data on the Lustre file system consistent. Even with Multi-Mount-Protection enabled, fencing can make sure that a node in an unclear state is brought down for more analysis by the administrator.

To configure fencing, first some fence daemon options need to be specified. the fence_daemon tag is a direct child of the cluster tag.

  <fence_daemon post_fail_delay="0" post_join_delay="3"/>
  <fence_daemon clean_start="0"/>

Depending on the hardware configuration, these values may differ for different installations. Please see the notes in the cluster_schema_rhel5 document (linked above) for details.

Each Lustre node in a cluster should be equipped with a fencing device. RedHat cluster supports a number of devices. More details on which devices are supported and how to configure them can be found in the cluster schema document. For this example IPMI based fencing devices are used. The fencedevices section may look like this:

  <fencedevices>
    <fencedevice name="lustre1-sp" agent="fence_ipmilan" auth="password" ipaddr="10.0.1.1" login="root" passwd="supersecretpassword" option="off"/>
    <fencedevice name="lustre2-sp" agent="fence_ipmilan" auth="password" ipaddr="10.0.1.2" login="root" passwd="supersecretpassword" option="off"/>
  </fencedevices>

Every fence device has a number of attributes: name is used to define a name for this fencing device. This name is referred to in the fence part of the clusternode definition (see above). The agent defines the kind of fencing device to use. In this example an IPMI-over-Lan device is used. The remaining attributes are specific for the ipmilan device and are self-explanatory.

Resource Manager

The resource manager block of the cluster.conf is wrapped in a rm tag:

  <rm>
    ..
  </rm>

It contains definitions of resources, failover domains, and services.

Resources

In the resources block of the cluster.conf file all Lustre targets of both clustered nodes are specified. In this example, four Lustre object storage targets are defined:

    <resources>
      <lustrefs name="target1" mountpoint="/mnt/ost1" device="/path/to/ost1/device" force_fsck="0" force_unmount="0" self_fence="1"/>
      <lustrefs name="target2" mountpoint="/mnt/ost2" device="/path/to/ost2/device" force_fsck="0" force_unmount="0" self_fence="1"/>
      <lustrefs name="target3" mountpoint="/mnt/ost3" device="/path/to/ost3/device" force_fsck="0" force_unmount="0" self_fence="1"/>
      <lustrefs name="target4" mountpoint="/mnt/ost4" device="/path/to/ost4/device" force_fsck="0" force_unmount="0" self_fence="1"/>
    </resources>

To use the lustrefs resource definition it is essential that the lustrefs.sh script is installed in /usr/share/cluster as described above. To verify the script is installed correctly and has correct permission run

# /usr/share/cluster/lustrefs.sh --help
usage: /usr/share/cluster/lustrefs.sh {start|stop|status|monitor|restart|meta-data|verify-all}

Each lustrefs resource has a number of attributes. name defines how the resource can be addressed.

Failover Domains

Usually RedHat cluster is used to provide a service on a number of nodes, where one node takes over the service of a failed node. In this example a number of Lustre targets is provided by each of the Lustre server nodes. To allow such a configuration, the definition of two Failover domains is necessary. The definition of failoverdomains may look like this:

  <failoverdomains>
     <failoverdomain name="first_first" ordered="1" restricted="1">
        <failoverdomainnode name="lustre1" priority="1"/>
        <failoverdomainnode name="lustre2" priority="2"/>
      </failoverdomain>
      <failoverdomain name="second_first" ordered="1" restricted="1">         
        <failoverdomainnode name="lustre1" priority="2"/>
        <failoverdomainnode name="lustre2" priority="1"/>
      </failoverdomain>
    </failoverdomains>

In this example, two fail-over-domains are created by adding the same nodes to each fail-over-domain, but the nodes are assigned different priorities.

Services

As a final configuration step the resources defined earlier are assigned to their fail-over-domain. This is done by defining a service for each of the Lustre nodes in the cluster and assign a domain. For the resources and fail-over-domains defined earlier this may look like this:

    <service autostart="1" exclusive="0" recovery="relocate" domain="first_first" name="lustre_2">
      <lustrefs ref="target1"/>
      <lustrefs ref="target2"/>
    </service>

    <service autostart="1" exclusive="0" recovery="relocate" domain="second_first" name="lustre_1">
      <lustrefs ref="target3"/>
      <lustrefs ref="target4"/>
    </service>

In this example target1 and target2 are assigned to the first node and target3 and target4 are assigned to the second node by default.

Start RedHat Cluster

Before bringing up RedHat cluster, make sure cluster.conf is update/edited on both Lustre server nodes. Usually cluster.conf should be the same on both nodes. The only exception is, if the device paths differ on both nodes.

cman service

With cluster.conf in place of both nodes it's time to start the cman service. this is done by running

service cman start

on both clustered nodes. To verify cman is running clustat can be used:

bash-3.2# clustat 
Cluster Status for Lustre @ Tue Dec 14 11:27:36 2010
Member Status: Quorate

 Member Name                                      ID   Status
 ------ ----                                      ---- ------
 lustre1                                             1 Online, Local
 lustre2                                             2 Online

To enable the cman service permanently run:

chkconfig cman on

rgmanager service

With cman up and running it's time to start the resource group manager rgmanager by running

service rgmanager start

rgmanager will than start to bring up the Lustre targets assigned to each of the Lustre nodes.

Verifying RedHat Cluster

To verify the state of the cluster run clustat again. With the above configuration the output should look like this:

bash-3.2# clustat
Cluster Status for Lustre @ Tue Dec 14 13:12:07 2010
Member Status: Quorate

 Member Name                                    ID   Status
 ------ ----                                    ---- ------
 lustre1                                           1 Online, Local, rgmanager
 lustre2                                           2 Online, rgmanager

 Service Name                       Owner (Last)                       State         
 ------- ----                       ----- ------                       -----         
 service:lustre_1                   lustre1                            started       
 service:lustre_2                   lustre2                            started

Relocate services

It may be necessary to relocate running lustre services manually. This can be done using clusvcadm as shown in the example below. First the service lustre_2 is assigned to node lustre2. After calling clusvcadm -r lustre_2 this service is relocated to node lustre1, as show in the last clustat output.

bash-3.2# clustat
Cluster Status for Lustre @ Tue Dec 14 15:00:00 2010
Member Status: Quorate

 Member Name                            ID   Status
 ------ ----                            ---- ------
 lustre1                                   1 Online, Local, rgmanager
 lustre2                                   2 Online, rgmanager

 Service Name                   Owner (Last)                  State         
 ------- ----                   ----- ------                  -----         
 service:lustre_1               lustre1                       started       
 service:lustre_2               lustre2                       started       
bash-3.2# clusvcadm -r lustre_2  
Trying to relocate service:lustre_2...Success
service:lustre_2 is now running on ldk-2-2-eth2
bash-3.2# clustat
Cluster Status for Lustre @ Tue Dec 14 15:01:00 2010
Member Status: Quorate

 Member Name                            ID   Status
 ------ ----                            ---- ------
 lustre1                                   1 Online, Local, rgmanager
 lustre2                                   2 Online, rgmanager

 Service Name                   Owner (Last)                  State         
 ------- ----                   ----- ------                  -----         
 service:lustre_1               lustre1                       started       
 service:lustre_2               lustre1                       started

Other tools to use with RedHat Cluster

RedHat cluster is a complex system of programs and services. There are a number of tools available to interact and/or make working with RedHat Cluster easier. In this section a number of these tools are presented. For more details read the man pages.

cman_tool: can be used to manage the cman subsystem. It can be used to add or remove nodes to a cluster configuration
ccs_tool: may be used to update the configuration of the running cluster
clustat: show the status of the cluster and if and where services are currently running
clusvcadm: can be used to enable, disable or relocate services in a cluster
system-config-cluster: a graphical user interface for cluster configuration

WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.