<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://wiki.old.lustre.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Wangdi</id>
	<title>Obsolete Lustre Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="http://wiki.old.lustre.org/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Wangdi"/>
	<link rel="alternate" type="text/html" href="http://wiki.old.lustre.org/index.php?title=Special:Contributions/Wangdi"/>
	<updated>2026-04-15T02:10:23Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.39.7</generator>
	<entry>
		<id>http://wiki.old.lustre.org/index.php?title=Architecture_-_Profiling_Tools_for_IO&amp;diff=9747</id>
		<title>Architecture - Profiling Tools for IO</title>
		<link rel="alternate" type="text/html" href="http://wiki.old.lustre.org/index.php?title=Architecture_-_Profiling_Tools_for_IO&amp;diff=9747"/>
		<updated>2007-09-24T20:47:44Z</updated>

		<summary type="html">&lt;p&gt;Wangdi: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Definitions ==&lt;br /&gt;
Ganglia - distributed monitoring system(http://ganglia.info/)&lt;br /&gt;
&lt;br /&gt;
== Background ==&lt;br /&gt;
The profiling tool should be the part of LRE, and it will also be used in ORNL to profile the I/O status of their XT4/XT3 cluster. We decided to implemented the whole profiling system based on Ganglia.&lt;br /&gt;
&lt;br /&gt;
== Use cases ==&lt;br /&gt;
{|border=1  cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|Collect profile information|| Performance &amp;amp; Usuability || collect I/O and other stats from servers or all nodes.&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|Analyse profile information || Usuability || generate some nice graph according to the profiling information&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|Output profile information || Usuability || Output these graph to the end user.&lt;br /&gt;
|}&lt;br /&gt;
=== Collect profile information ===&lt;br /&gt;
Stats collection&lt;br /&gt;
{|border=1  cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|colspan=2|&#039;&#039;&#039;Scenario:&#039;&#039;&#039; || Collecting stats from the servers and clients. &lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|colspan=2|&#039;&#039;&#039;goals&#039;&#039;&#039; || Overhead &amp;amp; Usuability&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|rowspan=&amp;quot;7&amp;quot; writing-mode=&amp;quot;vertical&amp;quot;|&#039;&#039;&#039;details&#039;&#039;&#039;&lt;br /&gt;
|&#039;&#039;&#039;OST_Req_Handle_Info&#039;&#039;&#039; || req_qdepth, req_active, req_waittime /proc/fs/lustre/ost/OSS/ost_io/stats (server load)&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|&#039;&#039;&#039;OST_Read/Write_Info&#039;&#039;&#039; || ost read/write count from each client /proc/fs/lustre/ost/OSS/ost_io/req_history&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|&#039;&#039;&#039;Read/Write_Req_Info&#039;&#039;&#039; || req detail information. (percentage of !M rpc)/proc/fs/lustre/obdfilter/lustre-OST0001/brw_stats&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|&#039;&#039;&#039;Client_Cache_Avaiblity&#039;&#039;&#039; || client cache stats information/proc/fs/lustre/obdfilter/lustre-OSTXXXX/exports/NID@nettype/UUID/cur_grant(dirty)_bytes&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|&#039;&#039;&#039;Client_RPC_Frequency&#039;&#039;&#039; || Client RPC stats /proc/fs/lustre/osc(mdc)/stats&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|&#039;&#039;&#039;MDS_OPS_Stats&#039;&#039;&#039; || MDS stats ops /proc/fs/lustre/mds/stats&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|&#039;&#039;&#039;Ldlm_Stats&#039;&#039;&#039; || /proc/fs/lustre/ldlm/services/ldlm_cbd/stats&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|colspan=2|&#039;&#039;&#039;Implementation constrains &#039;&#039;&#039; || Collecting stats by garlia monitor daemon&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Trace logs collection job&lt;br /&gt;
{|border=1  cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|colspan=2|&#039;&#039;&#039;Scenario:&#039;&#039;&#039; || Generating the trace logs on each nodes.&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|colspan=2|&#039;&#039;&#039;goals&#039;&#039;&#039; || Usuability (Geting indivial call trace information)&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot; writing-mode=&amp;quot;vertical&amp;quot;|&#039;&#039;&#039;details&#039;&#039;&#039;&lt;br /&gt;
|&#039;&#039;&#039;VFS trace call logs&#039;&#039;&#039; || Get VFS trace logs with enable D_VFSTRACE on each clients.&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|&#039;&#039;&#039;Server RPC trace log&#039;&#039;&#039; || Get OST RPC handler trace log with enable D_RPCTRACE on each OSTS.&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|colspan=2|&#039;&#039;&#039;Implementation constrains &#039;&#039;&#039; || Enable/Disable trace log by garlia monitor daemon&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
===Analyse profile information ===&lt;br /&gt;
&#039;&#039;&#039;items&#039;&#039;&#039;&lt;br /&gt;
{|border=1  cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|&#039;&#039;&#039;ID&#039;&#039;&#039; || Description&lt;br /&gt;
|-&lt;br /&gt;
|OST_Load || Represent the OST load over the time&lt;br /&gt;
|-&lt;br /&gt;
|Client_IO_Efficiency || Represent client I/O rpc effiency (1 MB RPC percentage)&lt;br /&gt;
|-&lt;br /&gt;
|Client_Cache_Stats || Represent whether client cache(grant) is efficiency &lt;br /&gt;
|-&lt;br /&gt;
|Client_RPC || Represent client RPC frequency&lt;br /&gt;
|-&lt;br /&gt;
|VFS_trace || Individial VFS trace call execute time (different call has different color)&lt;br /&gt;
|-&lt;br /&gt;
|Server RPC trace ||Individial OST RPC handler time&lt;br /&gt;
|-&lt;br /&gt;
|-&lt;br /&gt;
|Ldlm Stats || Represent lock (enqueue)conflicts status over the time&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;graphs&#039;&#039;&#039;&lt;br /&gt;
{|border=1  cellspacing=&amp;quot;0&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|-align=&amp;quot;left&amp;quot;&lt;br /&gt;
|&#039;&#039;&#039;graphs&#039;&#039;&#039; || input || x_axis || y axis&lt;br /&gt;
|-&lt;br /&gt;
|OST_Load || OST_Req_Handle_Info || time || req_qdepth + req_active, req_waittime&lt;br /&gt;
|-&lt;br /&gt;
|Client_OST_IO_Efficiency || Read/Write_Req_Info || time || each size req percent&lt;br /&gt;
|-&lt;br /&gt;
|Client_Cache_Stats || Client_Cache_Avaiblity || time || Client cache(grant) avaiblity.&lt;br /&gt;
|-&lt;br /&gt;
|Client_RPC || Client_RPC_Frequency || time || read_req_read_count&lt;br /&gt;
|-&lt;br /&gt;
|ldlm stats || ldlm_stats || time || lock blocking_ast handler count on server.&lt;br /&gt;
|-&lt;br /&gt;
|VFS trace info || VFS trace logs || time || Individial VFS trace call execute time (different call has different color)&lt;br /&gt;
|-&lt;br /&gt;
|Server RPC trace || OST RPC trace logs || time || Individial OST RPC handler time&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: It will need trace log analyse tool to retrieve the exectue call time frome the trace log.&lt;br /&gt;
&lt;br /&gt;
=== Output profile information ===&lt;br /&gt;
&lt;br /&gt;
Output the those graphes we got by Ganglia PHP Web Frontend.&lt;br /&gt;
&lt;br /&gt;
== Implementation constraint ==&lt;br /&gt;
# Use current utilities and architecture as much as possible, and be available to use as soon as possible.&lt;br /&gt;
# Implement the whole profiling system based on Ganglia&lt;br /&gt;
# It should work with lustre 1.4 also (ORNL may be stuck here for a long time)&lt;br /&gt;
# Easily extensible - realize that we may want to add or remove some stats in the future.&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
[[Category:Architecture|Profiling Tools for IO]]&lt;/div&gt;</summary>
		<author><name>Wangdi</name></author>
	</entry>
</feed>