WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Difference between revisions of "Guidelines for Setting Up a Cluster"

From Obsolete Lustre Wiki
Jump to navigationJump to search
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Some tips we've collected while working on clusters that can lead to a more useful debugging experience.
+
<small>''(Updated: Dec 2009)''</small>
  
# '''Shared home directories''' <br/>Having a shared namespace comes in handy all the time.  Its useful for bringing up lustre builds, collecting logs, blatting configuration files, etc.  sharing /home is the least surprising.
+
Listed below are some guidelines for setting up a cluster to make it easier to manage and debug.
# '''PDSH '''<br/>pdsh is an absolute requirement. Bonus points for being able to pdsh to all nodes from any node.
 
# '''Regular naming'''<br/>A node naming scheme that involves a short prefix and regular incrementing decimal node numbers (e.g. n0001, n0002, etc) combines very well with automation like pdsh.  As machines tend to take on different roles as different people use the cluster, it doesn't make a lot of sense to give hostnames based on roles in the lustre universe (mds, ost, etc).  It is useful to have a map available of hostname to Lustre function though.
 
# '''Serial Consoles'''<br/>As in any data center, they're essential.  Log their output for later retrieval should the kernel go wrong.  Provide a useful front end like 'conman' or 'conserver'.  Make sure the front-end can send breaks to the kernel's sysrq facility over the serial console.  In 2.6 kernels there are also reliable network based consoles that allow sending (nearly) all of the kernel messages to a remote system, even for oops messages.  In 2.6.5 this is called "netconsole", and 2.6.9 and later this is "netdump" (which supercedes netconsole).  The "netdump" code also allows doing kernel crash dumps over the network to another host, which can be invaluable for debugging node-crashing problems.
 
# '''Collect syslogs in one place'''<br/>Its nice to be able to watch one log for errors that are reported to syslog across the cluster.
 
# '''Remote Power Management'''<br/>If a machine wedges one needs to be able to reboot it without physically flipping a switch.  Any number of vendors offer serial controlled power widgets, ones that work with 'powerman' are most useful.  This is a requirement for doing automated failover (STONITH).
 
# '''Automated Disaster Recovery'''<br/>Its nice to be able to reimage a node by via netbooting and network software installs.  Its a low frequency endevour, though.
 
# '''Boot Quickly'''
 
## Disable non-essential services to be started at boot-time
 
## Minimize hardware checks the BIOS may do
 
## Especially avoid things like RH's Kudzu which can ask for user input before proceeding
 
  
----
+
*  '''Set up shared home directories.''' A shared namespace is useful for bringing up Lustre™ builds and collecting logs.  The most commonly shared namespace is ''/home''.
* '''FrontPage'''
+
*  '''Use ''pdsh''.''' This parallel-distributed, multithreaded remote shell enables efficient execution of commands on multiple remote hosts in parallel.
 +
*  '''Use a regular node naming scheme.''' A node naming scheme consisting of a short prefix combined with regularly incremented decimal node numbers (e.g., n0001, n0002, etc.) works well with an automated tool like ''pdsh''.  Also, machines tend to be used for different roles in a cluster over time, so hostnames based on roles in the Lustre file system (mds, ost, etc) are not always practical. However, documenting how hostnames map to Lustre functions is useful.
 +
*  '''Use serial consoles.''' A serial console enables output to be logged for later retrieval in case a problem occurs. It can be provided with a useful front end like ''conman'' or ''conserver''. A front end that can send breaks to the kernel's ''sysrq'' facility over the serial console is preferable.
 +
* '''Send kernel crash dumps and kernel messages to a remote system.''' Linux provides various tools, such as netdump or netconsole, to capture crash dumps remotely. See [[Diagnostic and Debugging Tools]] for more information.
 +
* '''Collect syslogs in one place.''' In addition to collecting logs on a per node basis, collecting syslogs in one location lets an administrator monitor a single log for errors reported to ''syslog'' from across the cluster.
 +
* '''Set up remote power management.''' If a machine wedges, it must be possible to reboot it without physically flipping a switch.  Various vendors offer serial-controlled power widgets. Power widgets that work with ''powerman'' are the most useful.  Remote power management is a requirement for doing automated failover (STONITH).
 +
 
 +
* '''Automate node provisioning.''' Although infrequently used, it's convenient to be able to reimage a node via netbooting and network software installs.
 +
 
 +
* '''Boot quickly.''' To be able to boot quickly, do the following:
 +
** Disable non-essential services from starting at boot-time.
 +
** Minimize hardware checks made by the BIOS.
 +
** Avoid starting utilities at boot-time that ask for user input before proceeding.

Latest revision as of 06:10, 22 February 2010

(Updated: Dec 2009)

Listed below are some guidelines for setting up a cluster to make it easier to manage and debug.

  • Set up shared home directories. A shared namespace is useful for bringing up Lustre™ builds and collecting logs. The most commonly shared namespace is /home.
  • Use pdsh. This parallel-distributed, multithreaded remote shell enables efficient execution of commands on multiple remote hosts in parallel.
  • Use a regular node naming scheme. A node naming scheme consisting of a short prefix combined with regularly incremented decimal node numbers (e.g., n0001, n0002, etc.) works well with an automated tool like pdsh. Also, machines tend to be used for different roles in a cluster over time, so hostnames based on roles in the Lustre file system (mds, ost, etc) are not always practical. However, documenting how hostnames map to Lustre functions is useful.
  • Use serial consoles. A serial console enables output to be logged for later retrieval in case a problem occurs. It can be provided with a useful front end like conman or conserver. A front end that can send breaks to the kernel's sysrq facility over the serial console is preferable.
  • Send kernel crash dumps and kernel messages to a remote system. Linux provides various tools, such as netdump or netconsole, to capture crash dumps remotely. See Diagnostic and Debugging Tools for more information.
  • Collect syslogs in one place. In addition to collecting logs on a per node basis, collecting syslogs in one location lets an administrator monitor a single log for errors reported to syslog from across the cluster.
  • Set up remote power management. If a machine wedges, it must be possible to reboot it without physically flipping a switch. Various vendors offer serial-controlled power widgets. Power widgets that work with powerman are the most useful. Remote power management is a requirement for doing automated failover (STONITH).
  • Automate node provisioning. Although infrequently used, it's convenient to be able to reimage a node via netbooting and network software installs.
  • Boot quickly. To be able to boot quickly, do the following:
    • Disable non-essential services from starting at boot-time.
    • Minimize hardware checks made by the BIOS.
    • Avoid starting utilities at boot-time that ask for user input before proceeding.