Architecture - Profiling Tools for IO

Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.

Definitions

Ganglia - distributed monitoring system(http://ganglia.info/)

Background

The profiling tool should be the part of LRE, and it will also be used in ORNL to profile the I/O status of their XT4/XT3 cluster. We decided to implemented the whole profiling system based on Ganglia.

Use cases

Collect profile information	Performance & Usuability	collect I/O and other stats from servers or all nodes.
Analyse profile information	Usuability	generate some nice graph according to the profiling information
Output profile information	Usuability	Output these graph to the end user.

Collect profile information

Stats collection

Scenario:		Collecting stats from the servers and clients.
goals		Overhead & Usuability
details	OST_Req_Handle_Info	req_qdepth, req_active, req_waittime /proc/fs/lustre/ost/OSS/ost_io/stats (server load)
	OST_Read/Write_Info	ost read/write count from each client /proc/fs/lustre/ost/OSS/ost_io/req_history
	Read/Write_Req_Info	req detail information. (percentage of !M rpc)/proc/fs/lustre/obdfilter/lustre-OST0001/brw_stats
	Client_Cache_Avaiblity	client cache stats information/proc/fs/lustre/obdfilter/lustre-OSTXXXX/exports/NID@nettype/UUID/cur_grant(dirty)_bytes
	Client_RPC_Frequency	Client RPC stats /proc/fs/lustre/osc(mdc)/stats
	MDS_OPS_Stats	MDS stats ops /proc/fs/lustre/mds/stats
	Ldlm_Stats	/proc/fs/lustre/ldlm/services/ldlm_cbd/stats
Implementation constrains		Collecting stats by garlia monitor daemon

Trace logs collection job

Scenario:		Generating the trace logs on each nodes.
goals		Usuability (Geting indivial call trace information)
details	VFS trace call logs	Get VFS trace logs with enable D_VFSTRACE on each clients.
	Server RPC trace log	Get OST RPC handler trace log with enable D_RPCTRACE on each OSTS.
Implementation constrains		Enable/Disable trace log by garlia monitor daemon

Analyse profile information

items

ID	Description
OST_Load	Represent the OST load over the time
Client_IO_Efficiency	Represent client I/O rpc effiency (1 MB RPC percentage)
Client_Cache_Stats	Represent whether client cache(grant) is efficiency
Client_RPC	Represent client RPC frequency
VFS_trace	Individial VFS trace call execute time (different call has different color)
Server RPC trace	Individial OST RPC handler time
Ldlm Stats	Represent lock (enqueue)conflicts status over the time

graphs

graphs	input	x_axis	y axis
OST_Load	OST_Req_Handle_Info	time	req_qdepth + req_active, req_waittime
Client_OST_IO_Efficiency	Read/Write_Req_Info	time	each size req percent
Client_Cache_Stats	Client_Cache_Avaiblity	time	Client cache(grant) avaiblity.
Client_RPC	Client_RPC_Frequency	time	read_req_read_count
ldlm stats	ldlm_stats	time	lock blocking_ast handler count on server.
VFS trace info	VFS trace logs	time	Individial VFS trace call execute time (different call has different color)
Server RPC trace	OST RPC trace logs	time	Individial OST RPC handler time

Note: It will need trace log analyse tool to retrieve the exectue call time frome the trace log.

Output profile information

Output the those graphes we got by Ganglia PHP Web Frontend.

Implementation constraint

Use current utilities and architecture as much as possible, and be available to use as soon as possible.
Implement the whole profiling system based on Ganglia
It should work with lustre 1.4 also (ORNL may be stuck here for a long time)
Easily extensible - realize that we may want to add or remove some stats in the future.

WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Architecture - Profiling Tools for IO

Contents

Definitions

Background

Use cases

Collect profile information

Analyse profile information

Output profile information

Implementation constraint

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools