WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Difference between revisions of "Diagnostic and Debugging Tools"

From Obsolete Lustre Wiki
Jump to navigationJump to search
 
(71 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
<small>''(Updated: Feb 2010)''</small>
 
__TOC__
 
__TOC__
 
A variety of diagnostic and analysis tools are available to debug issues with the Lustre™ software. Some of these are provided in Linux distributions, while others have been developed and are made available by the Lustre project.
 
A variety of diagnostic and analysis tools are available to debug issues with the Lustre™ software. Some of these are provided in Linux distributions, while others have been developed and are made available by the Lustre project.
Line 6: Line 7:
 
The following in-kernel debug mechanisms are incorporated into the Lustre software:
 
The following in-kernel debug mechanisms are incorporated into the Lustre software:
  
* '''Debug logs.''' A circular debug buffer holds a substantial amount of debugging information (MBs or more) during the first insertion of the kernel module. When this buffer fills up, it wraps and discards the oldest information. Lustre offers additional debug messages that can be written out to this kernel log.
+
* '''Debug logs.''' A circular debug buffer to which Lustre internal [[Lustre_Debugging_Procedures#Understanding_the_Lustre_debug_messaging_format|debug messages]] are written (in contrast to error messages, which are printed to the syslog or console). Entries to the Lustre debug log are controlled by the mask set by ''/proc/sys/lnet/debug''. The log size defaults to 5 MB per CPU but can be increased as a busy system will quickly overwrite 5 MB. When the buffer fills, the oldest information is discarded.  
  
:The debug log holds Lustre internal logging, separate from the error messages printed to syslog or console. Entries to the Lustre debug log are controlled by the mask set by ''/proc/sys/lnet/debug''. The log defaults to 5 MB per CPU, and is a ring buffer. Newer messages overwrite older ones. The default log size
 
 
* '''Debug daemon.''' The debug daemon controls logging of debug messages.
 
* '''Debug daemon.''' The debug daemon controls logging of debug messages.
* '''''/proc/sys/lnet/debug''.''' This log contains a mask that can be used to delimit the debugging information written out to the kernel debug logs.
 
  
These tools are also provided with the Lustre software:
+
* '''''/proc/sys/lnet/debug''.''' This file contains a mask that can be used to delimit the debugging information written out to the kernel debug logs.  
* '''lctl.''' This tool is used to manually dump the log and post-process logs that are dumped automatically. [['''lctl''' used with the debug_kernel option dumps the lustre debugging log]]
 
* '''Lustre subsystem asserts.''' In case of asserts, a log writes at ''/tmp/lustre_log.<timestamp>''.
 
* '''''lfs''.''' This Lustre utility helps get to the extended attributes of a Lustre file (among other things).
 
  
== External debugging tools for administrators and developers ==
+
For more information about using these tools to debug Lustre issues, see [[Lustre Debugging Procedures]].
The tools described in this section are provided in the Linux kernel or available at an external website.
 
  
* '''''strace'''.'' This tool allows a system call to be traced.
+
These tools are also provided with the Lustre software:
* '''''/var/log/messages''.''' ''syslogd'' prints fatal or serious messages at this log.
 
* '''Crash dumps.''' On crash-dump enabled kernels, sysrq c produces a crash dump. Lustre enhances this crash dump with a log dump (the last 64 KB of the log) to the console.
 
* '''''debugfs''.''' Interactive file system debugger.
 
 
 
==== Logging and data collection tools====
 
These logging and data collection tools can be used to collect information for debugging Lustre kernel issues.
 
  
'''kdump.''' A Linux kernel crash utility useful for debugging a system running Red Hat Enterprise Linux. For more information about ''kdump'', see the Red Hat knowledge base article [http://kbase.redhat.com/faq/docs/DOC-6039 ''How do I configure kexec/kdump on Red Hat Enterprise Linux 5?'']. To download ''kdump'', go to the [http://fedoraproject.org/wiki/SystemConfig/kdump#Download Fedora Project Download] site.
+
* '''lctl.''' This tool is used with the ''debug_kernel'' option to manually dump the Lustre debugging log or post-process debugging logs that are dumped automatically. For more information about the ''lctl'' tool, see [http://wiki.lustre.org/manual/LustreManual20_HTML/LustreDebugging.html#50438274_pgfId-1295889 Section 28.2.2: ''Using the lctl Tool to View Debug Messages''] and [http://wiki.lustre.org/manual/LustreManual20_HTML/SystemConfigurationUtilities_HTML.html#50438219_pgfId-1318224 Section 36.3: ''lctl''] in the [http://wiki.lustre.org/manual/LustreManual20_HTML/index.html ''Lustre Operations Manual''].
[[Links are good]]
 
  
'''netdump.''' A crash dump utility from Red Hat that allows memory images to be dumped over a network to a central server for analysis. [[It is now obsolete and has been replaced by kdump. Check this with brian or cfaber.]]
+
* '''Lustre subsystem asserts.''' A panic-style assertion (LBUG) in the kernel causes Lustre to dump the debug log to the file ''/tmp/lustre-log.<timestamp>'' where it can be retrieved after a reboot. For more information, see [http://wiki.lustre.org/manual/LustreManual20_HTML/LustreTroubleshooting.html#50438198_pgfId-1291324 Section 26.1.2: ''Viewing Error Messages''] in the [http://wiki.lustre.org/manual/LustreManual20_HTML/index.html ''Lustre Operations Manual'']
  
'''netconsole.''' Supports kernel-level network logging over UDP. A system requires (''SysRq'') allows users to collect relevant data through ''netconsole''. For more information, see [[Netconsole|Netconsole]]. [[Content is still relevant - check that it is accurate]]
+
* '''lfs.''' This utility provides access to the extended attributes (EAs) of a Lustre file (along with other information). For more information about ''lfs'', see [http://wiki.lustre.org/manual/LustreManual20_HTML/UserUtilities_HTML.html#50438206_pgfId-1305210 Section 32.1: lfs] in the [http://wiki.lustre.org/manual/LustreManual20_HTML/index.html ''Lustre Operations Manual''].
  
=Additional external debugging and analysis tools for developers=
+
== External debugging tools ==
 +
====Tools for administrators and developers ====
 +
The tools described in this section are provided in the Linux kernel or are available at an external website. For information about using some of these tools for Lustre debugging, see [[Lustre Debugging Procedures]] and [[Lustre Debugging for Developers]].
  
* leak_finder.pl: This is useful program which helps find memory leaks in the
+
Some general debugging tools provided as a part of the standard Linux distro are:
code.
 
The tools described below may be useful for debugging Lustre™ in a development environment.
 
  
=== Virtual Machines ===
+
* '''strace.''' This tool allows a system call to be traced.
A virtual machine is often used to create an isolated development and test environment.
 
  
'''VirtualBox Open Source Edition.''' Provides enterprise-class virtualization capability for all major platforms and is available free from Sun Microsystems at [http://www.sun.com/software/products/virtualbox/get.jsp?intcmp=2945 Get Sun Virtual Box].
+
* '''''/var/log/messages'''.'' ''syslogd'' prints fatal or serious messages at this log.
  
'''VMware Server.''' Virtualization platform available as free introductory software at [http://downloads.vmware.com/d/info/datacenter_downloads/vmware_server/2_0 Download VMware Server].
+
* '''Crash dumps.''' On crash-dump enabled kernels, ''sysrq c'' produces a crash dump. Lustre enhances this crash dump with a log dump (the last 64 KB of the log) to the console.
  
'''Xen.''' A para-virtualized environment with virtualization capabilities similar to VMware Server and Virtual Box. However, Xen allows the use of modified kernels to provide near-native performance and the ability to emulate shared storage. For more information, see [[Using Xen with Lustre]].  [[link to xen.org]]
+
* '''debugfs.''' Interactive file system debugger.
  
===Debuggers and Analysis Tools===
+
The following ''logging and data collection tools'' can be used to collect information for debugging Lustre kernel issues:
  
'''kgdb.''' A source-level kernel debugger that allows remote debugging using ''conman''.  
+
* '''kdump.''' A Linux kernel crash utility useful for debugging a system running Red Hat Enterprise Linux. For more information about ''kdump'', see the Red Hat knowledge base article [http://kbase.redhat.com/faq/docs/DOC-6039 ''How do I configure kexec/kdump on Red Hat Enterprise Linux 5?'']. To download ''kdump'', go to the [http://fedoraproject.org/wiki/SystemConfig/kdump#Download Fedora Project Download] site.
  
kgdb provides a special set of hooks for a Linux kernel to attach ''gdb'' from another machine over a serial console. We provide ''kgdb'' patches for some kernels like ''rhel4'' with the Lustre patches (these are not patched in by default).
+
* '''netconsole.''' Supports kernel-level network logging over UDP. A system requires (''SysRq'') allows users to collect relevant data through ''netconsole''. For more information, see [[Netconsole|Netconsole]].  
  
For more information, see [[KGDB]]
+
* '''netdump.''' A crash dump utility from Red Hat that allows memory images to be dumped over a network to a central server for analysis. The ''netdump'' utility was replaced by ''kdump'' in RHEL 5. For more information about ''netdump'', see [http://www.redhat.com/support/wpapers/redhat/netdump/ ''Red Hat, Inc.'s Network Console and Crash Dump Facility''].
and [[Using kgdb with UDP]].
 
  
Also see [http://www.linuxtopia.org/online_books/redhat_linux_debugging_with_gdb/running.html ''Chapter 6. Running Programs Under gdb''] in the ''Red Hat Linux 4 Debugging with GDB'' guide.
+
==== Tools for developers ====
 +
The tools described in this section may be useful for debugging Lustre™ in a development environment.
  
 +
Of general interest is:
  
[[NOTES]] - KGDB topic - ask Alex BZZZ or Robert Reid - instructions are old and not specific to Lustre - do we want to keep these around or find link to eternal site - sourceforge site has a ton of information.  
+
* '''leak_finder.pl.''' This program provided with Lustre is useful for finding memory leaks in the code.
- VmWare instructions on this page are speific to using cdb with VmWare - but are OLD!
 
  
- For other external tools - provide pointer rather than maintain documentation on wiki. See IX4
+
A ''virtual machine'' is often used to create an isolated development and test environment. Some commonly-used virtual machines are:
  
==== [[lcrash]] ====
+
* '''VirtualBox Open Source Edition.''' Provides enterprise-class virtualization capability for all major platforms and is available free from Sun Microsystems at [http://www.sun.com/software/products/virtualbox/get.jsp?intcmp=2945 Get Sun Virtual Box].
[[lcrash - Linux crash dump analyzer]] generic Linux tool - find link
 
  
==== crash ====
+
* '''VMware Server.''' Virtualization platform available as free introductory software at [http://downloads.vmware.com/d/info/datacenter_downloads/vmware_server/2_0 Download VMware Server].
''crash'' is used to analyze saved crash dump data.
 
  
Enter:
+
* '''Xen.''' A para-virtualized environment with virtualization capabilities similar to VMware Server and Virtual Box. However, Xen allows the use of modified kernels to provide near-native performance and the ability to emulate shared storage. For more information, see [[Using Xen with Lustre]] or go to [http://xen.org xen.org]
crash vmlinux crash_dump
 
  
For more information about using ''crash'' to analyze crash dump output, see:
+
A variety of ''debuggers and analysis tools'' are available including:
  
* Red Hat Magazine article [http://magazine.redhat.com/2007/08/15/a-quick-overview-of-linux-kernel-crash-dump-analysis/ ''A quick overview of Linux kernel crash dump analysis''].
+
* '''kgdb.''' The Linux Kernel Source Level Debugger ''kgdb'' is used in conjunction with the GNU Debugger ''gdb'' for debugging the Linux kernel. For more information about using ''kgdb'' with gdb, see [http://www.linuxtopia.org/online_books/redhat_linux_debugging_with_gdb/running.html ''Chapter 6. Running Programs Under gdb''] in the ''Red Hat Linux 4 Debugging with GDB'' guide.
* [http://people.redhat.com/anderson/crash_whitepaper/#EXAMPLES Crash Usage: A Case Study] from the white paper ''Red Hat Crash Utility'' by David Anderson.
 
*Kernel Trap forum entry [http://kerneltrap.org/node/5758 Linux: Kernel Crash Dumps].
 
* White paper [http://www.google.com/url?sa=t&source=web&ct=res&cd=8&ved=0CCUQFjAH&url=http%3A%2F%2Fwww.kernel.sg%2Fpapers%2Fcrash-dump-analysis.pdf&rct=j&q=redhat+crash+dump&ei=6aQBS-ifK4T8tAPcjdiHCw&usg=AFQjCNEk03E3GDtAsawG3gfpwc1gGNELAg ''A Quick Overview of Linux Kernel Crash Dump Analysis''].
 
  
 +
* '''crash.''' Used to analyze saved crash dump data when a system had panicked or locked up or appears unresponsive. For more information about using ''crash'' to analyze a crash dump, see:
  
[[NOTES]] See Tien's suggestion BZ 21334 www.hpc.ufl.edu/index.php/Lustre
+
: - Red Hat Magazine article [http://magazine.redhat.com/2007/08/15/a-quick-overview-of-linux-kernel-crash-dump-analysis/ ''A quick overview of Linux kernel crash dump analysis''].
 +
: - [http://people.redhat.com/anderson/crash_whitepaper/#EXAMPLES Crash Usage: A Case Study] from the white paper ''Red Hat Crash Utility'' by David Anderson.
 +
: - Kernel Trap forum entry [http://kerneltrap.org/node/5758 Linux: Kernel Crash Dumps].
 +
: - White paper [http://www.google.com/url?sa=t&source=web&ct=res&cd=8&ved=0CCUQFjAH&url=http%3A%2F%2Fwww.kernel.sg%2Fpapers%2Fcrash-dump-analysis.pdf&rct=j&q=redhat+crash+dump&ei=6aQBS-ifK4T8tAPcjdiHCw&usg=AFQjCNEk03E3GDtAsawG3gfpwc1gGNELAg ''A Quick Overview of Linux Kernel Crash Dump Analysis''].

Latest revision as of 07:53, 20 January 2011

(Updated: Feb 2010)

A variety of diagnostic and analysis tools are available to debug issues with the Lustre™ software. Some of these are provided in Linux distributions, while others have been developed and are made available by the Lustre project.

Lustre Debugging Tools

The following in-kernel debug mechanisms are incorporated into the Lustre software:

  • Debug logs. A circular debug buffer to which Lustre internal debug messages are written (in contrast to error messages, which are printed to the syslog or console). Entries to the Lustre debug log are controlled by the mask set by /proc/sys/lnet/debug. The log size defaults to 5 MB per CPU but can be increased as a busy system will quickly overwrite 5 MB. When the buffer fills, the oldest information is discarded.
  • Debug daemon. The debug daemon controls logging of debug messages.
  • /proc/sys/lnet/debug. This file contains a mask that can be used to delimit the debugging information written out to the kernel debug logs.

For more information about using these tools to debug Lustre issues, see Lustre Debugging Procedures.

These tools are also provided with the Lustre software:

External debugging tools

Tools for administrators and developers

The tools described in this section are provided in the Linux kernel or are available at an external website. For information about using some of these tools for Lustre debugging, see Lustre Debugging Procedures and Lustre Debugging for Developers.

Some general debugging tools provided as a part of the standard Linux distro are:

  • strace. This tool allows a system call to be traced.
  • /var/log/messages. syslogd prints fatal or serious messages at this log.
  • Crash dumps. On crash-dump enabled kernels, sysrq c produces a crash dump. Lustre enhances this crash dump with a log dump (the last 64 KB of the log) to the console.
  • debugfs. Interactive file system debugger.

The following logging and data collection tools can be used to collect information for debugging Lustre kernel issues:

  • netconsole. Supports kernel-level network logging over UDP. A system requires (SysRq) allows users to collect relevant data through netconsole. For more information, see Netconsole.
  • netdump. A crash dump utility from Red Hat that allows memory images to be dumped over a network to a central server for analysis. The netdump utility was replaced by kdump in RHEL 5. For more information about netdump, see Red Hat, Inc.'s Network Console and Crash Dump Facility.

Tools for developers

The tools described in this section may be useful for debugging Lustre™ in a development environment.

Of general interest is:

  • leak_finder.pl. This program provided with Lustre is useful for finding memory leaks in the code.

A virtual machine is often used to create an isolated development and test environment. Some commonly-used virtual machines are:

  • VirtualBox Open Source Edition. Provides enterprise-class virtualization capability for all major platforms and is available free from Sun Microsystems at Get Sun Virtual Box.
  • Xen. A para-virtualized environment with virtualization capabilities similar to VMware Server and Virtual Box. However, Xen allows the use of modified kernels to provide near-native performance and the ability to emulate shared storage. For more information, see Using Xen with Lustre or go to xen.org

A variety of debuggers and analysis tools are available including:

  • kgdb. The Linux Kernel Source Level Debugger kgdb is used in conjunction with the GNU Debugger gdb for debugging the Linux kernel. For more information about using kgdb with gdb, see Chapter 6. Running Programs Under gdb in the Red Hat Linux 4 Debugging with GDB guide.
  • crash. Used to analyze saved crash dump data when a system had panicked or locked up or appears unresponsive. For more information about using crash to analyze a crash dump, see:
- Red Hat Magazine article A quick overview of Linux kernel crash dump analysis.
- Crash Usage: A Case Study from the white paper Red Hat Crash Utility by David Anderson.
- Kernel Trap forum entry Linux: Kernel Crash Dumps.
- White paper A Quick Overview of Linux Kernel Crash Dump Analysis.