Using Pacemaker with Lustre: Difference between revisions

Revision as of 12:52, 15 January 2010

DISCLAIMER - EXTERNAL CONTRIBUTOR CONTENT

This content was submitted by an external contributor. We provide this information as a resource for the Lustre™ open-source community, but we make no representation as to the accuracy, completeness or reliability of this information.

This page describes how to configure and use Pacemaker with Lustre Failover.

Setting Up Cluster Communications

Communication between the nodes of the cluster allows all nodes to “see” each other. In modern clusters, OpenAIS, or more specifically, its communication stack corosync, is used for this task. All communication paths in the cluster should be redundant so that a failure of a single path is not fatal for the cluster.

An introduction to the setup, configuration and operation of a Pacemaker cluster can be found in:

Pacemaker 1.0 Configuration Explained at www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained.
M. Schwartzkopff, Clusterbau: Hochverfügbarkeit mit pacemaker, OpenAIS, heartbeat und LVS, O'Reilly Vlg. Gmbh & Co., Dec 2009. (German)

Setting Up the corosync Communication Stack

The corosync communication stack, developed as part of the OpenAIS project, supports all the communication needs of the cluster. The package is included in all recent Linux distributions. If it is not included in your distribution, you can find precompiled binaries at www.clusterlabs.org/rpm. It is also possible to compile OpenAIS from source and install it on all HA nodes by running /configure; make and make install.

Note: If corosync is not included in your distribution, your distribution may include the complete OpenAIS package. From the cluster point of view, the only difference is that all files and commands start with openais rather than corosync. The configuration file is located in /etc/ais/openais.conf.

Once installed, the software looks for a configuration in the file /etc/corosync/corosync.conf.

Complete the following steps to set up the corosync communication stack:

1. Are my edits to this step OK? Edit the totem section of the corosync.conf (or openais.conf) configuration file to designate the IP address and netmask of the interface(s) to be used. The totem section of the configuration file describes the way corosync communicates between nodes.

totem {

version: 2

secauth: off

threads: 0

interface {

ringnumber: 0

bindnetaddr: 10.0.0.0

mcastaddr: 226.94.1.1

mcastport: 5405

}

Corosync uses the option bindnetaddr to determine which interface is to be used for cluster communication. The example above assumes one of the node’s interfaces is configured on the network 10.0.0.0. Is the bold text OK? The value of the option is calculated from the IP address AND the network mask for the interface (IP & MASK) so the final bits of the address are cleared. Thus the configuration file is independent of any node and can be copied to all nodes.

2. Are my edits in this step OK? Edit the aisexec section of the configuration file to designate which user can start the service. The user must be root:

aisexec {

user: root

group: root

}

3. Are my edits to this step OK? In the service section of the configuration file, add the services that corosync is to administer. In this example, only pacemaker is included:

service {

name: pacemaker

version: 0

}

4. (Optional) To use the Pacemaker GUI, add the mgmt daemon to the service section:

service {

name: pacemaker

version: 0

use_mgmtd: yes

}

The corosync service starts as part of the normal init process. It can also be started manually by entering:

/etc/init.d/corosync start

After corosync has started, the following lines should be visible in the system log file:

(...) [MAIN ] Corosync (...) started and ready to provide service. 
(...) [TOTEM ] The network interface [...] is now up.

You can also check for correct functioning of the network stack by entering:

# corosync-cfgtool -s

The following should be displayed:

Printing ring status. 
Local node ID (...)
RING ID 0 
	id		= (...)
	status				= ring 0 active with no faults

Setting up Redundant Communication Using Bonding

It is recommended that you set up the cluster communication via two or more redundant paths. One way to achieve this is to use bonding interfaces. Please consult the documentation for your distribution for information about how to configure bonding interfaces.

Setting up Redundant Communication within corosync

The corosync package provides a means for redundant communication. If two or more interfaces for the communication exist, an administrator can configure multiple interface{} sections in the configuration file, each with a different ringnumber. The rrd_mode option tells the cluster how to use these interfaces. If the value is set to active, corosync uses all interfaces actively. If the value is set to passive, corosync uses the second interface only if the first ring fails.

@@ Line 93: / Line 93: @@
 	status				= ring 0 active with no faults
 </pre>
+==== Setting up Redundant Communication Using Bonding ====
+It is recommended that you set up the cluster communication via two or more redundant paths. One way to achieve this is to use bonding interfaces. Please consult the documentation for your distribution for information about how to configure bonding interfaces.
+==== Setting up Redundant Communication within corosync ====
+The ''corosync'' package provides a means for redundant communication. If two or more interfaces for the communication exist, an administrator can configure multiple ''interface{}'' sections in the configuration file, each with a different ringnumber. The ''rrd_mode'' option tells the cluster how to use these interfaces. If the value is set to ''active'', ''corosync'' uses all interfaces actively. If the value is set to ''passive'', ''corosync'' uses the second interface only if the first ring fails.
+[[Sven has suggested adding an example to this section (“e.g., for the two network interfaces”).]]

WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Using Pacemaker with Lustre: Difference between revisions

Revision as of 12:52, 15 January 2010

Contents

Setting Up Cluster Communications

Setting Up the corosync Communication Stack

Setting up Redundant Communication Using Bonding

Setting up Redundant Communication within corosync

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools