Documenting Code

(Updated: Dec 2009)

Lustre™ code documentation helps engineers working on the code to read and correctly modify the code. The reader is expected to have a good overall grasp of the Lustre architecture and internals. The code documentation provides reference information on the application programming interfaces (APIs) and describes significant internal features of each Lustre subsystem.

Lustre code documentation consists of stylized comments embedded in the source code, which helps to keep the documentation consistent as the code is developed. The embedded comments can be processed by doxygen into online, browse-able (HTML) documentation.

Requirements
The minimum requirement for documenting Lustre code is to describe subsystem APIs - the datatypes, procedures and globals subsystem exports to the rest of Lustre - and significant internal datatypes. These should be described as follows:


 * Datatypes (structs, typedefs, enums)
 * What it is for
 * Structure members
 * Usage constraints
 * Procedures
 * What it does
 * Parameters
 * Return values
 * Usage constraints
 * Subtle implementation details
 * Globals
 * What it is for
 * Usage constraints

The most important information to include are "Usage constraints" and "Subtle implementation details".

"Usage constraints" are restrictions on how and when you call a procedure or operate on a datastructure. These include concurrency control, reference counting, permitted caller context etc. etc.

"Subtle implementation details" are anything done in the code that might not be transparently obvious, such as code that ensures the last thread in a pool of workers is held in reserve for deadlock avoidance.

A well-chosen descriptive name can allow other information, such as what the procedure does or what a parameter means, to be quite brief or even omitted. But usage constraints and implementation subtleties must always be spelled out, e.g. by describing an object's entire lifecycle from creation through to destruction, so that the next engineer to maintain or use the code does it safely and correctly.

Each time you make a change to the Lustre code or inspect a patch, you must review the changes to ensure:


 * Sufficient documentation exists.
 * The documentation is accurate and up to date.

Examples
Doxygen comments start with /** (like in javadoc).

Doxygen commands are placed in doxygen comments to control how doxygen formats the output. Commands start with a backslash (\) or at-sign (@), but we typically use the backslash and reserve the at-sign for group blocks (see below). Don't use doxygen commands unnecessarily.

The main purpose of code documentation is to be available in the code for you to read when you're working on the code. So it's important that the comments read like real C comments and not formatting gibberish.

Procedures and Globals
Document procedures and globals in the .c files, rather than in headers.

/** * Owns a page by IO. * * Waits until \a pg is in cl_page_state::CPS_CACHED state, and then switch it * into cl_page_state::CPS_OWNED state. * * \param io IO context which wants to own the page * \param pg page to be owned * * \pre !cl_page_is_owned(pg, io) * \post result == 0 iff cl_page_is_owned(pg, io) * * \retval 0  success * * \retval -ve failure, e.g., page was destroyed (and landed in *            cl_page_state::CPS_FREEING instead of cl_page_state::CPS_CACHED). * * \see cl_page_disown * \see cl_page_operations::cpo_own */ int cl_page_own(const struct lu_env *env, struct cl_io *io, struct cl_page *pg)

Notes:
 * Start with a brief description, which continues to the first '.' (period or full stop).
 * Follow the brief description with a detailed description.
 * Descriptions are written in the third person singular, e.g. " does this and that", " represents such and such a concept".
 * To refer to a function argument, use the \a argname syntax.
 * To refer to another function, use the funcname syntax. This will produce a cross-reference.
 * To refer to a field or an enum value use the SCOPE::NAME syntax.
 * Describe possible return values with \retval.
 * Mention all concurrency control restrictions here (such as locks that the function expects to be held, or holds on exit).
 * If possible, specify a (weakest) pre-condition and (strongest) post-condition for the function. If conditions cannot be expressed as a C language expression, provide an informal description.
 * Enumerate related functions and datatypes in the \see section. Note, that doxygen will automatically cross-reference all places where a given function is called (but not through a function pointer) and all functions that it calls, so there is no need to enumerate all this.

Datatypes
Document datatypes where they are declared.

/** * "Compound" object, consisting of multiple layers. * * Compound object with given fid is unique with given lu_site. * * Note, that object does *not* necessary correspond to the real object in the * persistent storage: object is an anchor for locking and method calling, so * it is created for things like not-yet-existing child created by mkdir or * create calls. lu_object_operations::loo_exists can be used to check * whether object is backed by persistent storage entity. */ struct lu_object_header { /**        * Object flags from enum lu_object_header_flags. Set and checked * atomically. */       unsigned long     loh_flags; /**        * Object reference count. Protected by lu_site::ls_guard. */       atomic_t          loh_ref; /**        * Fid, uniquely identifying this object. */       struct lu_fid     loh_fid; /**        * Common object attributes, cached for efficiency. From enum * lu_object_header_attr. */       __u32             loh_attr; /**        * Linkage into per-site hash table. Protected by lu_site::ls_guard. */       struct hlist_node loh_hash; /**        * Linkage into per-site LRU list. Protected by lu_site::ls_guard. */       struct list_head  loh_lru; /**        * Linkage into list of layers. Never modified once set (except lately        * during object destruction). No locking is necessary. */       struct list_head  loh_layers; };

Describe datatype invariants (preferably formally).

/** * Fields are protected by the lock on cfs_page_t, except for atomics and * immutables. * * \invariant Datatype invariants are in cl_page_invariant. Basically: * cl_page::cp_parent and cl_page::cp_child are a well-formed double-linked * list, consistent with the parent/child pointers in the cl_page::cp_obj and * cl_page::cp_owner (when set). */ struct cl_page { /** Reference counter. */       atomic_t           cp_ref;

Describe concurrency control mechanisms for structure fields.

/** An object this page is a part of. Immutable after creation. */       struct cl_object  *cp_obj; /** Logical page index within the object. Immutable after creation. */       pgoff_t            cp_index; /** List of slices. Immutable after creation. */       struct list_head   cp_layers; ... };

Specify when fields are valid.

/**        * Owning IO in cl_page_state::CPS_OWNED state. Sub-page can be owned * by sub-io. */       struct cl_io      *cp_owner; /**        * Owning IO request in cl_page_state::CPS_PAGEOUT and * cl_page_state::CPS_PAGEIN states. This field is maintained only in        * the top-level pages. */       struct cl_req     *cp_req;

You can use @{...@} syntax to define a subset of fields or enum values, which should be grouped together.

struct cl_object_header { /** Standard lu_object_header. cl_object::co_lu::lo_header points * here. */       struct lu_object_header  coh_lu; /** \name locks * \todo XXX move locks below to the separate cache-lines, they are * mostly useless otherwise. */       /** @{ */        /** Lock protecting page tree. */       spinlock_t               coh_page_guard; /** Lock protecting lock list. */       spinlock_t               coh_lock_guard; /** @} locks */

By default, a documenting comment goes immediately before the entity being commented. If it is necessary to place this comment separately (e.g., to streamline comments in the header file), use the following syntax.

/** \struct cl_page * Layered client page. * * cl_page: represents a portion of a file, cached in the memory. All pages *   of the given file are of the same size, and are kept in the radix tree

Subsystem Overview
To document a subsystem, add the following comment to the header file that contains the definitions of its key datatypes. This will group all the documentation in the @{...@} block.

/** \defgroup component_name Component Name * * overall module documentation * ... * * @{ */ datatype definitions... exported functions... /** @} component_name */

The single-word name component_name identifies a group to doxygen. Component Name is the printable title of the group. It extends to the end of the line. See \defgroup for more details.

To separate a logical part of a larger component, add the following somewhere within the \defgroup of the component:

/** * \name Printable Title of sub-component * * Description of a sub-component */ /** @{ */ datatype definitions... exported functions... /** @} */

If an exported function prototype in a header is located within some group, the appropriate function definition in a .c file is automatically assigned to the same group.

A set of comments that is not lexically a part of a group can be included into it with the \addtogroup command. It works just like \defgroup, but the printable group title is optional. See \addtogroup for full details.

/** \addtogroup cl_object * @{ */ /** * "Data attributes" of cl_object. Data attributes can be updated * independently for a sub-object, and top-object's attributes are calculated * from sub-objects' ones. */ struct cl_attr { /** Object size, in bytes */ loff_t cat_size; ... }; ... /** @} cl_object */

Running Doxygen
You need to install the Graphviz package before you can run doxygen.

Doxygen uses a configuration file to control how it builds documentation. See Doxygen Configuration for details.

Lustre comes with two configuration files:
 * build/doxyfile.ref produces a short form of the documentation set, suitable as a reference. Output is placed into the doxygen.ref/ directory.
 * build/doxyfile.api produces a full documentation set, more suitable for learning code structure. In addition to the short form, this set includes call-graphs and source code excerpts. Output is placed into the doxygen.api/ directory.

If the version of doxygen you are running is newer than the one last used to generate the configuration files, run the following commands to upgrade: doxygen -s -u build/doxyfile.api doxygen -s -u build/doxyfile.ref

To build all the documentation, in the top-level lustre directory, run: doxygen build/doxyfile.api doxygen build/doxyfile.ref

There are also phony Makefile targets doxygen-api and doxygen-ref to run these commands and doxygen to run both.

Note that doxygen currently gives many warnings about undocumented entities. These should abate as we improve the code documentation.

Publishing Documention
The build/publish_doxygen script publishes a local version of the documentation at http://wiki.lustre.org/doxygen:

build/publish_doxygen [-b branchname] [-l additional-label] [-d] [-u user] [-p port]

The default branch is "master". The user and port are used to ssh into shell.lustre.sun.com. User defaults to your $USER environment variable and port defaults to 922. The -d option instructs the script to use the current date as a label.

Documentation is uploaded into...

user@shell.lustre.sun.com:/home/www/doxygen/$branch$label where $label is a concatenation of all labels given on the command line in order. The parent directory is rsync-ed to wiki.lustre.org regularly and the documentation can be browsed at...

http://wiki.lustre.org:/doxygen

When adding a new branch/label, you have to edit index.html in the doxygen directory on shell.lustre.sun.com.

Doxygen References
Doxygen Home

Doxygen Manual

Doxygen Special Commands