WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Difference between revisions of "Diagnostic and Debugging Tools"

From Obsolete Lustre Wiki
Jump to navigationJump to search
Line 76: Line 76:
 
- For other external tools - provide pointer rather than maintain documentation on wiki. See IX4
 
- For other external tools - provide pointer rather than maintain documentation on wiki. See IX4
  
'''lcrash.'''  [[Linux crash dump analyzer]] generic Linux tool - find link
+
'''lcrash.'''  A utility that generates detailed kernel information and provides the ability to generate reports about system crash dumps. For more information, see [http://man-wiki.net/index.php/1:lcrash| man page]. [[Is this a good link?]]
  
 
'''crash.''' is used to analyze saved crash dump data.
 
'''crash.''' is used to analyze saved crash dump data.

Revision as of 11:05, 26 January 2010

A variety of diagnostic and analysis tools are available to debug issues with the Lustre™ software. Some of these are provided in Linux distributions, while others have been developed and are made available by the Lustre project.

Lustre Debugging Tools

The following in-kernel debug mechanisms are incorporated into the Lustre software:

Debug logs. A circular debug buffer is provided that holds a substantial amount of debugging information (MBs or more) during the first insertion of the kernel module. When this buffer fills up, the oldest information is discarded. Lustre provides debug messages that can be written out to this kernel log.

The debug log holds Lustre internal logging content, which is different from the error messages printed to syslog or console. Entries to the Lustre debug log are controlled by the mask set by /proc/sys/lnet/debug. The log defaults to 5 MB per CPU and is a ring buffer. Newer messages overwrite older ones. The default log size can be increased, as a busy system will quickly overwrite the 5 MB default.

Debug daemon. The debug daemon controls logging of debug messages.

/proc/sys/lnet/debug. This log contains a mask that can be used to delimit the debugging information written out to the kernel debug logs.

These tools are also provided with the Lustre software:

lctl. This tool is used to manually dump the log and post-process logs that are dumped automatically. '''lctl''' used with the debug_kernel option dumps the lustre debugging log

Lustre subsystem asserts. In case of asserts, a log writes at /tmp/lustre_log.<timestamp>.

lfs. This Lustre utility provides access to the extended attributes of a Lustre file (among other things).

External debugging tools

Tools for administrators and developers

The tools described in this section are provided in the Linux kernel or are available at an external website.

strace. This tool allows a system call to be traced.

/var/log/messages. syslogd prints fatal or serious messages at this log.

Crash dumps. On crash-dump enabled kernels, sysrq c produces a crash dump. Lustre enhances this crash dump with a log dump (the last 64 KB of the log) to the console.

debugfs. Interactive file system debugger.


The following logging and data collection tools can be used to collect information for debugging Lustre kernel issues.

kdump. A Linux kernel crash utility useful for debugging a system running Red Hat Enterprise Linux. For more information about kdump, see the Red Hat knowledge base article How do I configure kexec/kdump on Red Hat Enterprise Linux 5?. To download kdump, go to the Fedora Project Download site.

netdump. A crash dump utility from Red Hat that allows memory images to be dumped over a network to a central server for analysis. It is now obsolete and has been replaced by kdump. Check this with brian murrell - send email 1/28.

netconsole. Supports kernel-level network logging over UDP. A system requires (SysRq) allows users to collect relevant data through netconsole. For more information, see Netconsole. Content is still relevant - check that it is accurate

Tools for developers

The tools described below may be useful for debugging Lustre™ in a development environment.

leak_finder.pl. This program is useful for finding memory leaks in the code.


A virtual machine is often used to create an isolated development and test environment.

VirtualBox Open Source Edition. Provides enterprise-class virtualization capability for all major platforms and is available free from Sun Microsystems at Get Sun Virtual Box.

VMware Server. Virtualization platform available as free introductory software at Download VMware Server.

Xen. A para-virtualized environment with virtualization capabilities similar to VMware Server and Virtual Box. However, Xen allows the use of modified kernels to provide near-native performance and the ability to emulate shared storage. For more information, see Using Xen with Lustre. link to xen.org


Debuggers and Analysis Tools...

kgdb. A source-level kernel debugger that allows remote debugging using conman. kgdb provides a special set of hooks for a Linux kernel to attach gdb from another machine over a serial console. We provide kgdb patches for some kernels like rhel4 with the Lustre patches (these are not patched in by default). Update needed?

For more information, see KGDB and Using kgdb with UDP.

Also see Chapter 6. Running Programs Under gdb in the Red Hat Linux 4 Debugging with GDB guide.


NOTES - KGDB topic - ask Alex BZZZ or Robert Reid - instructions are old and not specific to Lustre - do we want to keep these around or find link to eternal site - sourceforge site has a ton of information.

1. Get patches from ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/

- VmWare instructions on this page are specific to using cdb with VmWare - but are OLD!

- For other external tools - provide pointer rather than maintain documentation on wiki. See IX4

lcrash. A utility that generates detailed kernel information and provides the ability to generate reports about system crash dumps. For more information, see man page. Is this a good link?

crash. is used to analyze saved crash dump data.

Enter:

crash vmlinux crash_dump

For more information about using crash to analyze crash dump output, see:


NOTES See Tien's suggestion BZ 21334 www.hpc.ufl.edu/index.php/Lustre