Architecture - Profiling Tools for IO

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Definitions
Ganglia - distributed monitoring system(http://ganglia.info/)

Background
The profiling tool should be the part of LRE, and it will also be used in ORNL to profile the I/O status of their XT4/XT3 cluster. We decided to implemented the whole profiling system based on Ganglia.

Collect profile information
Stats collection

Trace logs collection job

Analyse profile information
items

graphs Note: It will need trace log analyse tool to retrieve the exectue call time frome the trace log.

Output profile information
Output the those graphes we got by Ganglia PHP Web Frontend.

Implementation constraint

 * 1) Use current utilities and architecture as much as possible, and be available to use as soon as possible.
 * 2) Implement the whole profiling system based on Ganglia
 * 3) It should work with lustre 1.4 also (ORNL may be stuck here for a long time)
 * 4) Easily extensible - realize that we may want to add or remove some stats in the future.