WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Coding Guidelines: Difference between revisions

From Obsolete Lustre Wiki
Jump to navigationJump to search
(→‎Lustre Guidelines: add comment block and console error messages)
 
(36 intermediate revisions by 4 users not shown)
Line 1: Line 1:
<small>''(Updated: Jan 2010)''</small>
== Beautiful Code == 


'''''Note:''''' ''All Lustre developers should follow the guidelines in this page very strictly to avoid problems during code merges later on. Please make the required changes to the default formatting rules in the editor you use to comply to the guidelines below.
''A note from Eric Barton, a Lustre pioneer:''
           
More important than the physical layout of code (which is covered in detail below) is the idea that the code should be ''beautiful'' to read.


== Beautiful Code ==
What makes code beautiful to me?  Fundamentally, it's readability and obviousness.  The code must not have secrets but should flow easily, pleasurably and ''accurately'' off the page and into the mind of the reader.


''A note from Eric Barton, our lead Lustre developer:''
How do I think beautiful code is written?  Like this...


More important than the physical layout of code (which is covered in detail below) is the thought that code is ''beautiful'' to read.  What makes code "beautiful" to me?  Fundamentally, I'm talking about readability and obviousness - the code must not have secrets.
* The author must be confident and knowledgeable and proud of her work.  She must understand what the code should do, the environment it must work in, all the combinations of inputs, all the valid outputs, all the possible races and all the reachable states.  She must [http://en.wikipedia.org/wiki/Grok grok] it.


* Separate "ideas" are clearly separated in the program layout -  e.g. blank lines group related statements and separate unrelated statements.
* Names must be well chosen.  The meaning a human reader attaches to a name can be orthogonal to what the compiler does with it, so it's just as easy to mislead as it is to inform.  ''[http://en.wikipedia.org/wiki/Does_what_it_says_on_the_tin "Does exactly what it says on the tin"]'' is a popular UK English expression describing something that does ''exactly'' what it tells you it's going to do, no more and no less.  For example, if I open a tin labeled "soap", I expect the contents to help me wash and maybe even smell nice. If it's no good at removing dirt, I'll be disappointed. If it removes the dirt but burns off a layer of skin with it, I'll be positively upset.  The name of a procedure, a variable or a structure member should tell you something informative about the entity without misleading - just "what it says on the tin".


* Declarations are easy to readThe declarations should always be separated from the functional codeDeclarations should be made as locally as possible to the code they are being usedI prefer one variable per line, with the names aligned vertically - it makes checking types as you read quicker and IMHO it's a pleasure to the eye.
* Names must be well chosenLocal, temporary variables can almost always remain relatively short and anonymous, while names in global scope must be unique.  In general, the wider the context you expect to use the name in, the more unique and informative the name should be.  Don't be scared of long names if they help to ''make_the_code_clearer'', but ''do_not_let_things_get_out_of_hand'' either - we don't write COBOLRelated names should be obvious, unambiguous and avoid naming conflicts with other unrelated names, e.g. by using a consistent prefix.  This applies to all API procedures (if not all procedures period) within a given subsystemSimilarly, unique member names for global structures, using a prefix to identify the parent structure type, helps readability.


* If it's a choice between breaking the 80-chars-per-line rule and code that doesn't look good (e.g. breaks indentation rules or doesn't visually contain procedure call parameters between the brackets or spreads over too many lines), then break the 80-chars-per-line rule.  We _don't_ want stupidly long lines, but we all have enough screen resolution these days to not have to wrap lines at 80 chars.
* Names must be well chosen.  Don't choose names that are easily confused - especially not if the compiler can't even tell the difference when you make a spelling mistake. ''i'' and ''j'' aren't the worst example - ''rq_reqmsg'' and ''rq_repmsg'' are much worse (and taken from our own code!!!).


* Names are well chosen.  ''"Does exactly what it says on the tin"'' is an English (UK) expression describing something that tells you what it's going to do and then does _exactly_ that.  For example, when I open a box labelled "soap", I expect it to help me wash and maybe even smell nice.  I'll get upset if it's no good at removing dirt, and really upset if it makes me break out in a rash.  The name of a procedure or variable or structure member should tell you something about the entity, without giving you misleading information - just "what it says on the tin".
* Names must be well chosen.  I can't emphasize this issue enough - I hope you get the point.


* Names are well chosenThis time I mean that you don't choose names that easily become a different but valid (to the compiler) name if you make a spelling mistake.  '''i''' and '''j''' aren't the worst example - '''req_portal''' and '''rep_portal''' are much worse (and taken from our own code!!!).
* Assertions must be used intelligentlyThey combine the roles of ''active comment'' and ''software fuse''.  As an ''active comment'' they tell you something about the program that you can trust more than a comment.  And as a ''software fuse'', they provide fault isolation between subsystems by letting you know when and where invariant assumptions are violated.  Overuse must be avoided - it hurts performance without helping readability - and any other use is just plain wrong.  For example, assertions must '''never''' be used to validate data read from disk or the network.  Network and disk hardware ''does'' fail and Lustre has to handle that - it can't just crash.  The same goes for user input. Checking data copied in from userspace with assertions just opens the door for a denial of service attack.


* Names are well chosenDon't be scared of long names, but_do_not_go_to_extremes_either - we don't write COBOL.
* Formatting and indentation rules should be followed intelligently.  The visual layout of the code on the page should lend itself to being read easily and accurately - it just looks clean and good.
** Separate "ideas" should be separated clearly in the code layout using blank lines that group related statements and separate unrelated statements.
** Procedures should not ramble on.  You must be able to take in the meaning of a procedure without scrolling past page after page of code or parsing deeply nested conditionals and loops.  The 80-column rule is there for a reason.
** Declarations are easier to refer to while scanning the code if placed in a block locally to, but visually separate from, the code that uses themReadability is further enhanced by limiting declarations to one per line and aligning types and names vertically.
** Parameters in multi-line procedure calls should be aligned so that they are visually contained by their brackets.
** Brackets should be used in complex expressions to make operator precedence clear.
** Conditional boolean (''if (expr)''), scalar (''if (val != 0)'') and pointer (''if (ptr != NULL)'') expressions should be written consistently.
** Formatting and indentation rules should not be followed slavishly.  If you're faced with either breaking the 80-chars-per-line rule or the parameter indentation rule or creating an obscure helper function, then the 80-chars-per-line rule might have to suffer. The overriding consideration is how the code reads.


* Names are well chosen.  I can't emphasise this issue enough - I hope you get the point.
I could go on, but I hope you get the idea. Just think about the poor reader when you're writing, and whether your code will convey its meaning naturally, quickly and accurately, without room for misinterpretation. 


* Intelligent use of assertions.  Assertions can be abused.  Obviously, when over-used they hurt performance.  And they can also make you think that the code author didn't know what she was doing and added them to help her ''learn'' the code.  But when assertions are used properly, they combine the roles of ''active comment'' and ''software fuse''.  As an ''active comment'' they tell you something about the program which you can trust more than a comment.  As a ''software fuse'', they provide fault isolation between subsystem by letting you know when and where invariant assumptions are violated.
I didn't mention ''clever'' as a feature of beautiful code because it's only one step from ''clever'' to ''tricky'' - consider...
 
I could go on, but I hope you get the idea.  Notice that I didn't mention ''clever'' as a desirable attribute.  It's only one step from
''clever'' to ''tricky'' - consider...


   t = a; a = b; b = t; /* dumb swap */
   t = a; a = b; b = t; /* dumb swap */
Line 31: Line 39:
   a ^= b; b ^= a; a ^= b;  /* clever swap */
   a ^= b; b ^= a; a ^= b;  /* clever swap */


...which is a very minor example.  It can almost be excused, because the "cleverness" is confined to a tiny part of the code.  But if the ''clever'' code had sub-structure, that sub-structure would be very hard to work on.  You can't be sure you're not screwing something up until you understand completely the environment in which you're working.
You could feel quite pleased that the clever swap avoids the need for a local temporary variable - but is that such a big deal compared with how quickly, easily and accurately the reader will read it?  This is a very minor example which can almost be excused because the "cleverness" is confined to a tiny part of the code.  But when ''clever'' code gets spread out, it becomes much harder to modify without adding defects.  You can only work on code without screwing up if you understand the code ''and'' the environment it works in completely.  Or to put it more succinctly...
 
:''Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.'' - [http://en.wikipedia.org/wiki/Brian_Kernighan  Brian W. Kernighan]
 
IMHO, beautiful code helps code quality because it improves communication between the code author and the code reader.  Since everyone maintaining and developing the code is a code reader as well as a code author, the quality of this communication can lead either to a virtuous circle of improving quality, or a vicious circle of degrading quality.  You, dear reader, will determine which.
 
----
 
== Style and Formatting Guidlelines ==
 
All of our rules for formatting, wrapping, parenthesis, brace placement, etc., are originally derived from the [http://www.kernel.org/doc/Documentation/CodingStyle Linux kernel rules], which are basically K&R style.
 
=== Whitespace ===
 
Whitespace gets its own section because unnecessary whitespace changes can cause spurious merge conflicts when code is landed and updated in a distributed development environment.  Please ensure that you comply with the guidelines in this section to avoid these issues. We've included default formatting rules for emacs and vim to help make it easier.
 
* Tabs should be used in all 2.3 and later lustre/, lnet/ and libcfs/ files. This matches the upstream Linux kernel coding style, and is the default method of code indentation.


''Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.'' - [http://en.wikipedia.org/wiki/Brian_Kernighan  Brian W. Kernighan]
* '''NOTE NOTE NOTE''' The use of tabs for indentation is a reversal from previous Lustre coding guidelines, since May 2012 and Lustre 2.3.  This is being done in order to facilitate code integration with the Linux kernel.  All new patches should be submitted using tabs for ALL modified lines in the patch.  If there are 6 or fewer lines using spaces for indentation between two lines changed by a patch, then all of the intervening lines should also have the indentation changed to use tabs. Similarly, if there are only a handful of lines at the start or end a modified function or test that are still using spaces for indentation, convert all of the lines in that function or test to use tabs. In this manner, we can migrate consistent chunks of code over to tabs without having a 250kLOC patch breaking the commit history of every line of code, and also avoid breaking code that is in existing branches/patches that still need to merge. In a year or so, lines that still use spaces (i.e. those that are not under active development) will be converted to using tabs for indentation as well.


IMHO, beautiful code helps code quality because it improves communication between the code author and the code reader.  Since code modifiers are also code readers, the quality of communication can lead either to a virtuous circle of improving quality, or a vicious circle of degrading quality.
* All lines should wrap at 80 characters. If it's getting too hard to wrap at 80 characters, you probably need to rearrange conditional order or break it up into more functions.
<pre>
right:


== Formatting Guidelines ==
void func_helper(...)
{
        do_sth2_1;


Here are some guidelines to formatting Lustre code.
        if (cond3)
        do_sth_which_needs_a_very_long_line_to_read_clearly;


* There should be no tabs in any Lustre or LNET files. The exceptions are libsysio (maintained by someone else), ldiskfs and kernel patches (also part of a non-Lustre Group project).
        do_sth2_2;
}


* Blocks should be indented 8 spaces.
void func (...)
{
        if (!cond1)
                return;


* Variables should be declared one per line, even if there are multiple variables of the same type.
        do_sth1_1;
 
        if (cond 2)
                func_helper(...)
 
        do_sth1_2;
}
 
wrong:
 
void func(...)
{
        if (cond1) {
                do_sth1_1;
                if (cond2) {
                        do_sth2_1;
                        if (cond3) {
                                do_sth_which_needs_a_very_long_line_to_read_clearly;
                        }
                        do_sth2_2;
                }
                do_sth1_2;
        }
}


* New files should contain the following along with the license boilerplate.  This will cause vim and emacs to use spaces instead of tabs for indenting.  If you use a different editor, it also needs to be set to use spaces for indening Lustre code.
<pre style="background:lightgrey;">
  /* -*- mode: c; c-basic-offset: 8; indent-tabs-mode: nil; -*-
  * vim:expandtab:shiftwidth=8:tabstop=8:
  */
</pre>
</pre>


* All lines should wrap at 80 characters. If it's getting too hard to wrap there, you probably need to break it up into more functions.  In some cases, it is acceptable to remove a few spaces between function arguments to avoid overflowing onto the next line.
* Do not include spaces or tabs on blank lines or at the end of lines.  Please ensure you remove all instances of these in any [[Submitting Patches|patches you submit to Bugzilla]].  You can find them with grep or in vim using the following regexps:
 
<pre>
* Do not have spaces or tabs on blank lines or at the end of lines.  Find these with some regexps in your patch (grep, or in vim) before attaching it to bugzilla:
<pre style="background:lightgrey;">
  /[ \t]$/
  /[ \t]$/
</pre>
</pre>


:Alternatively, if you use vim you can put this line in your vimrc file, which will highlight whitespace at the end of lines and spaces followed by tabs in indentation (only works for C/C++ files):
:Alternatively, if you use vim, you can put this line in your vimrc file, which will highlight whitespace at the end of lines and spaces followed by tabs in indentation (only works for C/C++ files):
<pre style="background:lightgrey;">
<pre>
let c_space_errors=1
let c_space_errors=1
</pre>
</pre>


:Or you can use this command, which will make tabs and whitespace at the end of lines visible for all files (but a bit more discretely):
:Or you can use this command, which will make tabs and whitespace at the end of lines visible for all files (but a bit more discretely):
<pre style="background:lightgrey;">
<pre>
set list listchars=tab:>\ ,trail:$
set list listchars=tab:>\ ,trail:$
</pre>
 
:In emacs, you can use (whitespace-mode) or (whitespace-visual-mode) depending on the version. You could also consider using (flyspell-prog-mode).
 
=== C Language Features ===
 
* Don't use ''inline'' unless you're doing something so performance critical that the function call overhead will make a difference -- in other words: almost never.  It makes debugging harder and overuse can actually hurt performance by causing instruction cache or stack overflow.
 
* Use ''typedef'' carefully...
** Do not create a new integer ''typedef'' without a good reason.
** Always postfix ''typedef'' names with ''_t'' so that they can be identified clearly in the code.
** ''Never'' ''typedef'' pointers.  The ''*'' makes C pointer declarations obvious. Hiding it inside a ''typedef'' just obfuscates the code.
 
* Do not embed assignments inside boolean expressions.  Although this can make the code more concise, it doesn't necessarily make it more elegant and you increase the risk of confusing "=" with "==" or getting operator precedence wrong if you skimp on brackets.  It's even easier to make mistakes when reading the code, so it's much safer simply to avoid it altogether.
<pre>
right:
        ptr = malloc(size);
        if (ptr != NULL) {
                ...
 
wrong:
        if ((ptr = malloc(size)) != NULL) {
                ...
</pre>
 
* Conditional expressions read more clearly if only boolean expressions are implicit (i.e., non-boolean and pointer expressions compare explicitly with ''0'' and ''NULL'' respectively.)
<pre>
right:
        if (!writing &&          /* not writing? */
            inode != NULL &&    /* valid inode? */
            ref_count == 0)      /* no more references? */
                do_this();
 
wrong:
        if (writing == 0 &&      /* not writing? */
            inode &&            /* valid inode? */
            !ref_count)          /* no more references? */
                do_this();
</pre>
 
* Use parentheses to help readability and reduce the chance of operator precedence errors, but not so heavily that it is difficult to determine which parentheses are a matched pair.
<pre>
right:
        if (a->a_field == 3 ||
            ((b->b_field & BITMASK1) && (c->c_field & BITMASK2)))
                do this();
 
wrong:
        if (a->a_field == 3 || b->b_field & BITMASK1 && c->c_field & BITMASK2)
                do this()
 
wrong:
        if (((a->a_field == 3) || ((b->b_field & (BITMASK1)) &&
            (c->c_field & (BITMASK2)))))
                do this()
</pre>
 
=== Lustre Guidelines ===
* The types and printf()/printk() formats used by Lustre code are:
 
<pre>
        __u64                LPU64/LPX64/LPD64 (unsigned, hex, signed)
        __u32/int            %u/%x/%d (unsigned, hex, signed)
        (unsigned)long long  %llu/%llx/%lld
        loff_t              %lld after a cast to long long (unfortunately)
</pre>
 
* Functions and files should be documented using the [http://en.wikipedia.org/wiki/Doxygen Doxygen] markup style:
 
<pre>
/**
* Implements cl_page_operations::cpo_make_ready() method for Linux.
*
* This is called to yank a page referred to by \a slice from the transfer
* cache and to send it out as a part of transfer. This function try-locks
* the page. If try-lock failed, page is owned by some concurrent IO, and
* should be skipped (this is bad, but hopefully rare situation, as it usually
* results in transfer being shorter than possible).
*
* \param  env    lu environment for large temporary stack variables
* \param  slice  per-layer page structure being prepared
* \retval 0      success, page can be placed into transfer
* \retval -EAGAIN page is either used by concurrent IO has been
*                truncated. Skip it.
*/
static int vvp_page_make_ready(const struct lu_env *env,
                              const struct cl_page_slice *slice)
{
</pre>
 
* Use ''list_for_each_entry()'' instead of ''list_for_each'' followed by ''list_entry''
 
* When using ''sizeof()'' it should be used on the variable itself, rather than specifying the type of the variable, so that if the variable changes type/size then ''sizeof()'' will be correct:
<pre>
right:
        int *array;
 
        OBD_ALLOC(array, 10 * sizeof(*array));
 
wrong:
        OBD_ALLOC(array, 10 * sizeof(int));    /* breaks if array becomes __u64 */
 
wrong:
        OBD_ALLOC(array, 10 * sizeof(array));  /* This is the pointer size */
   
</pre>
 
* When allocating/freeing a single struct, use OBD_ALLOC_PTR() for clarity:
<pre>
right:
        OBD_ALLOC_PTR(mds_body);
        OBD_FREE_PTR(mds_body);
 
wrong:
        OBD_ALLOC(mds_body, sizeof(*mds_body));
        OBD_FREE(mds_body, sizeof(*mds_body));
</pre>
 
* Do not embed operations inside assertions.  If assertions are disabled for performance reasons this code will not be executed.
<pre>
right:
        len = strcat(foo, bar);
        LASSERT(len > 0);
 
wrong:
        LASSERT(strcat(foo, bar) > 0);
</pre>
 
* Messages on the console (''CERROR'', ''CWARN'', ''LCONSOLE_*'') should print the OBD device name or filesystem name where the error is hit, since there are usually multiple targets running on a single server.  The error messages should also include enough information to make some useful diagnosis of the problem (e.g. FID and/or filename, client NID, etc).  Otherwise, there is little value in having printed the error, but then having to try and reproduce the problem to diagnose it:
<pre>
right:
        LCONSOLE_INFO("MDS %s: %s now active, resetting orphans\n",
                      obd->obd_name, obd_uuid2str(uuid));
        LCONSOLE_WARN("%s: new disk, initializing\n", obd->obd_name);
        CERROR("%s: unsupported incompat filesystem feature(s) %x\n", obd->obd_name,
 
wrong:
        CERROR("Cannot get thandle\n");
        CERROR("NULL bitmap!\n");
        CERROR("invalid event\n");
</pre>
 
* Error messages that print a numeric error value should print it at the end of the line in a consistent format:
<pre>
right:
        CERROR("%s: error invoking upcall %s %s %s: rc = %d",
        CERROR("%s: cannot open/create O: rc = %d\n", obd->obd_name,rc);
 
wrong:
        CERROR("err %d on param '%s'\n", rc, ptr);
        CERROR("Can't get index (%d)\n", rc);
</pre>
</pre>


:In emacs, you can use (whitespace-mode) or (whitespace-visual-mode) depending on version. You should also consider using (flyspell-prog-mode).
=== Layout ===
 
* Code can be much more readable if the simpler actions are taken first in a set of tests.  Re-ordering conditions like this also eliminates excessive nesting.
<pre>
right:
        list_for_each_entry(...) {


* Don't use "inline" unless you're doing something so performance critical that the function call overhead will make a difference -- in other words: almost never.  It makes debugging harder.
                if (!condition1) {
                        do_sth1;
                        continue;
                }


:'''''Note:''''' All of our wrapping, parenthesis, brace placement, etc. rules are basically Linux kernel rules, which are basically K&R. For those of you in need of a refresher, [[#More Details]] are provided below.
                do_sth2_1;
                do_sth2_2;
                ...
                do_sth2_N;


* For Autoconf macros, follow the [http://www.gnu.org/software/autoconf/manual/html_node/Coding-Style.html style suggested in the autoconf manual].
                if (!condition2)
                        break;


<pre style="background:lightgrey;">
                do_sth3_1;
    AC_CACHE_CHECK([for EMX OS/2 environment], [ac_cv_emxos2],
                do_sth3_2;
    [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [return __EMX__;])],
                ...
                         [ac_cv_emxos2=yes],
                do_sth3_N;
                         [ac_cv_emxos2=no])])
        }
wrong:
        list_for_each_entry(...) {
                if (condition1) {
                        do_sth2_1;
                        do_sth2_2;
                        ...
                        do_sth2_N;
                        if (condition2) {
                                do_sth3_1;
                                do_sth3_2;
                                ...
                                do_sth3_N;
                                continue;
                        }             
                         break;
                } else {
                         do_sth1;
                }
        }
</pre>
</pre>
:or even
 
<pre style="background:lightgrey;">
* Variable should be declared one per line, type and name, even if there are multiple variables of the same type.  For maximum readability, the names should be aligned on the same column, preferably with longer declarations at the top.
    AC_CACHE_CHECK([for EMX OS/2 environment],
<pre>
                    [ac_cv_emxos2],
right:
                    [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([],
        int          len;
                                                        [return __EMX__;])],
        int          count;
                                      [ac_cv_emxos2=yes],
        struct inode *inode;
                                      [ac_cv_emxos2=no])])
 
wrong:
        int len, count;
        struct inode *inode;
</pre>
</pre>


=== More Details ===
* Variable declarations should be kept to an internal scope, if practical and reasonable, to simplify understanding of where these variables are used:


* Each variable declaration should be on a separate line, with the names aligned to the same column:
<pre>
  <pre style="background:lightgrey;">
  right:
  int           len;
        int len;
  int          count;
       
  struct inode *inode;
        if (len > 0) {
</pre>
                int          count;
                struct inode *inode = iget(foo);
 
                count = inode->i_size;
                :
        }
</pre>


* Even for short conditionals, the operation should be on a separate line:
* Even for short conditionals, the operation should be on a separate line:
<pre style="background:lightgrey;">
<pre>
  if (foo)
right:
          bar();
        if (foo)
  </pre>
                bar();
  wrong:
        if (foo) bar();
</pre>
 
* When you wrap a line containing parenthesis, start the next line after the parenthesis so that the expression or argument is visually bracketed.
<pre>
right:
        variable = do_something_complicated(long_argument, longer_argument,
                                            longest_argument(sub_argument,
                                                            foo_argument),
                                            last_argument);


* When you wrap, the next line should start after the parenthesis:
        if (some_long_condition(arg1, arg2, arg3) < some_long_value &&
<pre style="background:lightgrey;">
            another_long_condition(very_long_argument_name,
  right:
                                  another_long_argument_name) >
            second_long_value) {
                ...


  variable = do_something_complicated(long_argument, longer_argument,
wrong:
                                      longest_argument(sub_argument,
 
                                                      foo_argument),
        variable = do_something_complicated(long_argument, longer_argument,
                                      last_argument);
                longest_argument(sub_argument, foo_argument),
                last_argument);
  if (some_long_condition(arg1, arg2, arg3) < some_long_value &&
 
      another_long_condition(very_long_argument_name,
        if (some_long_condition(arg1, arg2, arg3) < some_long_value &&
                            another_long_argument_name) >
                another_long_condition(very_long_argument_name,
      second_long_value) {
                        another_long_argument_name) >
  }
                second_long_value) {
</pre>                        
                ...
  <pre style="background:lightgrey;">
</pre>
  wrong:
 
* If you're wrapping an expression, put the operator at the end of the line. If there are no parentheses to which to align the start of the next line, just indent 8 more spaces.
<pre>
        off = le32_to_cpu(fsd->fsd_client_start) +
                cl_idx * le16_to_cpu(fsd->fsd_client_size);
</pre>


  variable = do_something_complicated(long_argument, longer_argument,
                    longest_argument(sub_argument, foo_argument),
                    last_argument);
  if (some_long_condition(arg1, arg2, arg3) < some_long_value &&
              another_long_condition(very_long_argument_name,
                    another_long_argument_name) >
                    second_long_value) {
  }
</pre>
* If you're wrapping put the operators at the end of the line, and if there are no parentheses indent 8 more:
<pre style="background:lightgrey;">
  off = le32_to_cpu(fsd->fsd_client_start) +
          cl_idx * le16_to_cpu(fsd->fsd_client_size);
</pre>
* Binary and ternary (but not unary) operators should be separated from their arguments by one space.
* Binary and ternary (but not unary) operators should be separated from their arguments by one space.
<pre style="background:lightgrey;">
<pre>
  a++;
right:
  b |= c;
        a++;
  d = f > g ? 0 : 1;
        b |= c;
</pre>
        d = (f > g) ? 0 : 1;
* Function calls should be nestled against the parentheses, the parentheses should crowd the arguments, and one space after commas:
<pre style="background:lightgrey;">
  right: do_foo(bar, baz);
  wrong: do_foo ( bar,baz );
</pre>
</pre>
* All ''if'', ''for'', ''while'', etc. expressions should be separated by a space from the parenthesis, one space after the semicolons:
 
<pre style="background:lightgrey;">
* Function calls should be nestled against the parentheses, the parentheses should crowd the arguments, and one space should appear after commas:
  for (a = 0; a < b; a++)
<pre>
  if (a < b || a == c)
  right:  
  while (1)
        do_foo(bar, baz);
</pre>
 
* Opening braces should be on the same line as the line that introduces the block, except for function calls.  Closing braces get their own line, except for "else".
wrong:
  <pre style="background:lightgrey;">
        do_foo ( bar,baz );
  int foo(void)
  {
          if (bar) {
                  this();
                  that();
          } else if (baz) {
                  ;
          } else {
                  ;
          }
          do {
                  cow();
          } while (0);
  }
</pre>
</pre>
* Put a space between ''if'', ''for'', ''while'' etc. and the following parenthesis.  Put a space after each semicolon in a ''for'' statement.
<pre>
right:
        for (a = 0; a < b; a++)
        if (a < b || a == c)
        while (1)
wrong:
        for( a=0; a<b; a++ )
        if( a<b || a==c )
        while( 1 )
</pre>
* Opening braces should be on the same line as the line that introduces the block, except for function calls.  Bare closing braces (i.e. not ''else'' or ''while'' in do/while) get their own line.
<pre>
int foo(void)
{
        if (bar) {
                this();
                that();
        } else if (baz) {
                stuff();
        } else {
                other_stuff();
        }
        do {
                cow();
        } while (condition);
}
</pre>
* If one part of a compound ''if'' block has braces, all should.
* If one part of a compound ''if'' block has braces, all should.
<pre style="background:lightgrey;">
<pre>
  right:
right:
        if (foo) {
                bar();
                baz();
        } else {
                salmon();
        }


  if (foo) {
wrong:
          bar();
        if (foo) {
          baz();
                bar();
  } else {
                baz();
          salmon();
        } else
  }
                moose();
</pre>
</pre>
<pre style="background:lightgrey;">  
 
  wrong:
* When you define a macro, protect callers by placing parentheses round every parameter reference in the body. Line up the backslashes of multi-line macros to help readability.  Use a do/while (0) block with ''no'' trailing semicolon to ensure multi-statement macros are syntactically equivalent to procedure calls.
<pre>
  if (foo) {
/* right */
          bar();
#define DO_STUFF(a)                             \
          baz();
do {                                             \
  } else
        int b = (a) + MAGIC;                     \
          moose();
        do_other_stuff(b);                       \
} while (0)
 
/* wrong */
#define DO_STUFF(a) \
do { \
        int b = a + MAGIC; \
        do_other_stuff(b); \
} while (0);
</pre>
</pre>
* When you make a macro, protect those who might call it by using do/while and parentheses; line up your backslashes:
 
<pre style="background:lightgrey;">
* If you write conditionally compiled code in a procedure body, make sure you do not create unbalanced braces, quotes, etc.  This really confuses editors that navigate expressions or use fonts to highlight language features.  It can often be much cleaner to put the conditionally compiled code in its own helper function which, by good choice of name, documents itself too.
  right:
<pre>
/* right */
  #define DO_STUFF(a)                             \
static inline int invalid_dentry(struct dentry *d)
  do {                                             \
{
          int b = (a) + MAGIC;                     \
#ifdef DCACHE_LUSTRE_INVALID
          do_other_stuff(b);                      \
        return d->d_flags & DCACHE_LUSTRE_INVALID;
  } while (0)
#else
</pre>
        return d_unhashed(d);
<pre style="background:lightgrey;">
#endif
  wrong:
}
 
  #define DO_STUFF(a) \
int do_stuff(struct dentry *parent)
  { \
{
          int b = a + MAGIC; \
        if (invalid_dentry(parent)) {
          do_other_stuff(b); \
                ...
  }
       
/* wrong */
int do_stuff(struct dentry *parent)
{
#ifdef DCACHE_LUSTRE_INVALID
        if (parent->d_flags & DCACHE_LUSTRE_INVALID) {
#else
        if (d_unhashed(parent)) {
#endif
                ...
</pre>
</pre>
* If you nest preprocessor commands, use spaces to visually delineate:
* If you nest preprocessor commands, use spaces to visually delineate:
<pre style="background:lightgrey;">
<pre>
  #ifdef __KERNEL__
#ifdef __KERNEL__
  # include <goose>
# include <goose>
  # define MOOSE steak
# define MOOSE steak
  #else
#else
  # include <mutton>
# include <mutton>
  # define MOOSE prancing
# define MOOSE prancing
  #endif
#endif
</pre>
</pre>
* For very long #ifdefs include the conditional with each #endif to make it readable:
 
<pre style="background:lightgrey;">
* For very long #ifdefs, include the conditional with each #endif to make it readable:
  #ifdef __KERNEL__
<pre>
  # if LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,0)
#ifdef __KERNEL__
  /* lots
# if LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,0)
    of
/* lots
    stuff */
  of
  # endif /* KERNEL_VERSION(2,5,0) */
  stuff */
  #else /* !__KERNEL__ */
# endif /* KERNEL_VERSION(2,5,0) */
  # if HAVE_FEATURE
#else /* !__KERNEL__ */
  /* more
# if HAVE_FEATURE
  * stuff */
/* more
  # endif
* stuff */
  #endif /* __KERNEL__ */  
# endif
#endif /* __KERNEL__ */
</pre>
</pre>
* Comments should have the leading '''/*''' on the same line as the comment, and the trailing '''*/''' at the end of the last comment line.  Intermediate lines should start with a '''*''' aligned with the first line's '''*''':
 
<pre style="background:lightgrey;">  
* Comments should have the leading '/*' on the same line as the comment and the trailing '*/' at the end of the last comment line.  Intermediate lines should start with a '*' aligned with the '*' on the first line:
  /* This is a short comment */
<pre>
        /* This is a short comment */
  /* This is a multi-line comment.  I wish the line would wrap already,
 
  * as I don't have much to write about. */
        /* This is a multi-line comment.  I wish the line would wrap already,
        * as I don't have much to write about. */
</pre>
</pre>
* Conditional expressions read more clearly if boolean expressions are implicit, but non-boolean and pointer expressions compare explicitly with 0 and NULL respectively - e.g...
<pre style="background:lightgrey;">
  right:


  if (inode != NULL &&    /* valid inode? */
* Function declarations absolutely should NOT go into .c files, unless they are forward declarations for static functions that can't otherwise be moved before the caller.  Instead, the declaration should go into the most "local" header available (preferably *_internal.h for a given piece of code).
      !writing &&          /* not writing? */
 
      ref_count == 0)      /* no more references? */
* Structure and constant declarations should not be declared in multiple places.  Put the struct into the most "local" header possible.  If it is something that is passed over the wire, it needs to go into lustre_idl.h and needs to be correctly swabbed when the RPC message is unpacked.
          do_this();
</pre>
<pre style="background:lightgrey;">
  wrong:


  if (inode &&            /* valid inode? */
* The types and printf/printk formats used by Lustre code are:
      writing == 0 &&      /* not writing? */
<pre>
      !ref_count)         /* no more references? */
        __u64                LPU64/LPX64/LPD64 (unsigned, hex, signed)
          do_this();
        size_t                LPSZ (or cast to int and use %u / %d)
        __u32/int            %u/%x/%d (unsigned, hex, signed)
        (unsigned) long long  %llu/%llx/%lld
        loff_t                %lld after a cast to long long (unfortunately)
</pre>
</pre>
* Parentheses and line breaks help make conditional expressions more readable.
<pre style="background:lightgrey;">
  right:


  if (a->a_field == 3 ||
* For Autoconf macros, follow the [http://www.gnu.org/software/autoconf/manual/html_node/Coding-Style.html style suggested in the autoconf manual].
      ((b->b_field & BITMASK1) && (c->c_field & BITMASK2)))
          do this();
</pre>
<pre style="background:lightgrey;">
  wrong:


  if (a->a_field == 3 || b->b_field & BITMASK1 && c->c_field & BITMASK2)
<pre>
            do this()
        AC_CACHE_CHECK([for EMX OS/2 environment], [ac_cv_emxos2],
        [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [return __EMX__;])],
                          [ac_cv_emxos2=yes],
                          [ac_cv_emxos2=no])])
</pre>
</pre>
* Function declarations absolutely should NOT go into .c files, unless they are forward declarations for static functions that can't otherwise be moved before the caller.  Instead, the declaration should go into the most "local" header available (preferrably *_internal.h for a given piece of code).
:or_even
* Structure and constant declarations should not be declared in multiple places.  Put the struct into the most "local" header possible.  If it is something that is passed over the wire it needs to go into lustre_idl.h, and needs to be correctly swabbed when the RPC message is unpacked.
<pre>
* The types and printf/printk formats used by Lustre code are:
      AC_CACHE_CHECK([for EMX OS/2 environment],
<pre style="background:lightgrey;">
                      [ac_cv_emxos2],
  __u64                LPU64/LPX64/LPD64 (unsigned, hex, signed)
                      [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([],
  size_t                LPSZ (or cast to int and use %u / %d)
                                                          [return __EMX__;])],
  __u32/int            %u/%x/%d (unsigned, hex, signed)
                                        [ac_cv_emxos2=yes],
  (unsigned) long long  %llu/%llx/%lld
                                        [ac_cv_emxos2=no])])
  loff_t                %lld after a cast to long long (unfortunately)
</pre>
</pre>
----
----
''

Latest revision as of 09:25, 4 September 2012

(Updated: Jan 2010)

Beautiful Code

A note from Eric Barton, a Lustre pioneer:

More important than the physical layout of code (which is covered in detail below) is the idea that the code should be beautiful to read.

What makes code beautiful to me? Fundamentally, it's readability and obviousness. The code must not have secrets but should flow easily, pleasurably and accurately off the page and into the mind of the reader.

How do I think beautiful code is written? Like this...

  • The author must be confident and knowledgeable and proud of her work. She must understand what the code should do, the environment it must work in, all the combinations of inputs, all the valid outputs, all the possible races and all the reachable states. She must grok it.
  • Names must be well chosen. The meaning a human reader attaches to a name can be orthogonal to what the compiler does with it, so it's just as easy to mislead as it is to inform. "Does exactly what it says on the tin" is a popular UK English expression describing something that does exactly what it tells you it's going to do, no more and no less. For example, if I open a tin labeled "soap", I expect the contents to help me wash and maybe even smell nice. If it's no good at removing dirt, I'll be disappointed. If it removes the dirt but burns off a layer of skin with it, I'll be positively upset. The name of a procedure, a variable or a structure member should tell you something informative about the entity without misleading - just "what it says on the tin".
  • Names must be well chosen. Local, temporary variables can almost always remain relatively short and anonymous, while names in global scope must be unique. In general, the wider the context you expect to use the name in, the more unique and informative the name should be. Don't be scared of long names if they help to make_the_code_clearer, but do_not_let_things_get_out_of_hand either - we don't write COBOL. Related names should be obvious, unambiguous and avoid naming conflicts with other unrelated names, e.g. by using a consistent prefix. This applies to all API procedures (if not all procedures period) within a given subsystem. Similarly, unique member names for global structures, using a prefix to identify the parent structure type, helps readability.
  • Names must be well chosen. Don't choose names that are easily confused - especially not if the compiler can't even tell the difference when you make a spelling mistake. i and j aren't the worst example - rq_reqmsg and rq_repmsg are much worse (and taken from our own code!!!).
  • Names must be well chosen. I can't emphasize this issue enough - I hope you get the point.
  • Assertions must be used intelligently. They combine the roles of active comment and software fuse. As an active comment they tell you something about the program that you can trust more than a comment. And as a software fuse, they provide fault isolation between subsystems by letting you know when and where invariant assumptions are violated. Overuse must be avoided - it hurts performance without helping readability - and any other use is just plain wrong. For example, assertions must never be used to validate data read from disk or the network. Network and disk hardware does fail and Lustre has to handle that - it can't just crash. The same goes for user input. Checking data copied in from userspace with assertions just opens the door for a denial of service attack.
  • Formatting and indentation rules should be followed intelligently. The visual layout of the code on the page should lend itself to being read easily and accurately - it just looks clean and good.
    • Separate "ideas" should be separated clearly in the code layout using blank lines that group related statements and separate unrelated statements.
    • Procedures should not ramble on. You must be able to take in the meaning of a procedure without scrolling past page after page of code or parsing deeply nested conditionals and loops. The 80-column rule is there for a reason.
    • Declarations are easier to refer to while scanning the code if placed in a block locally to, but visually separate from, the code that uses them. Readability is further enhanced by limiting declarations to one per line and aligning types and names vertically.
    • Parameters in multi-line procedure calls should be aligned so that they are visually contained by their brackets.
    • Brackets should be used in complex expressions to make operator precedence clear.
    • Conditional boolean (if (expr)), scalar (if (val != 0)) and pointer (if (ptr != NULL)) expressions should be written consistently.
    • Formatting and indentation rules should not be followed slavishly. If you're faced with either breaking the 80-chars-per-line rule or the parameter indentation rule or creating an obscure helper function, then the 80-chars-per-line rule might have to suffer. The overriding consideration is how the code reads.

I could go on, but I hope you get the idea. Just think about the poor reader when you're writing, and whether your code will convey its meaning naturally, quickly and accurately, without room for misinterpretation.

I didn't mention clever as a feature of beautiful code because it's only one step from clever to tricky - consider...

  t = a; a = b; b = t; /* dumb swap */
  a ^= b; b ^= a; a ^= b;  /* clever swap */

You could feel quite pleased that the clever swap avoids the need for a local temporary variable - but is that such a big deal compared with how quickly, easily and accurately the reader will read it? This is a very minor example which can almost be excused because the "cleverness" is confined to a tiny part of the code. But when clever code gets spread out, it becomes much harder to modify without adding defects. You can only work on code without screwing up if you understand the code and the environment it works in completely. Or to put it more succinctly...

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian W. Kernighan

IMHO, beautiful code helps code quality because it improves communication between the code author and the code reader. Since everyone maintaining and developing the code is a code reader as well as a code author, the quality of this communication can lead either to a virtuous circle of improving quality, or a vicious circle of degrading quality. You, dear reader, will determine which.


Style and Formatting Guidlelines

All of our rules for formatting, wrapping, parenthesis, brace placement, etc., are originally derived from the Linux kernel rules, which are basically K&R style.

Whitespace

Whitespace gets its own section because unnecessary whitespace changes can cause spurious merge conflicts when code is landed and updated in a distributed development environment. Please ensure that you comply with the guidelines in this section to avoid these issues. We've included default formatting rules for emacs and vim to help make it easier.

  • Tabs should be used in all 2.3 and later lustre/, lnet/ and libcfs/ files. This matches the upstream Linux kernel coding style, and is the default method of code indentation.
  • NOTE NOTE NOTE The use of tabs for indentation is a reversal from previous Lustre coding guidelines, since May 2012 and Lustre 2.3. This is being done in order to facilitate code integration with the Linux kernel. All new patches should be submitted using tabs for ALL modified lines in the patch. If there are 6 or fewer lines using spaces for indentation between two lines changed by a patch, then all of the intervening lines should also have the indentation changed to use tabs. Similarly, if there are only a handful of lines at the start or end a modified function or test that are still using spaces for indentation, convert all of the lines in that function or test to use tabs. In this manner, we can migrate consistent chunks of code over to tabs without having a 250kLOC patch breaking the commit history of every line of code, and also avoid breaking code that is in existing branches/patches that still need to merge. In a year or so, lines that still use spaces (i.e. those that are not under active development) will be converted to using tabs for indentation as well.
  • All lines should wrap at 80 characters. If it's getting too hard to wrap at 80 characters, you probably need to rearrange conditional order or break it up into more functions.
right:

void func_helper(...)
{
        do_sth2_1;

        if (cond3)
	        do_sth_which_needs_a_very_long_line_to_read_clearly;

        do_sth2_2;
}

void func (...)
{
        if (!cond1)
                return;

        do_sth1_1;

        if (cond 2)
                func_helper(...)

        do_sth1_2;
}

wrong:

void func(...)
{
        if (cond1) {
                do_sth1_1;
                if (cond2) {
                        do_sth2_1;
                        if (cond3) {
                                do_sth_which_needs_a_very_long_line_to_read_clearly;
                        }
                        do_sth2_2;
                }
                do_sth1_2;
        }
}

  • Do not include spaces or tabs on blank lines or at the end of lines. Please ensure you remove all instances of these in any patches you submit to Bugzilla. You can find them with grep or in vim using the following regexps:
 /[ \t]$/
Alternatively, if you use vim, you can put this line in your vimrc file, which will highlight whitespace at the end of lines and spaces followed by tabs in indentation (only works for C/C++ files):
let c_space_errors=1
Or you can use this command, which will make tabs and whitespace at the end of lines visible for all files (but a bit more discretely):
set list listchars=tab:>\ ,trail:$
In emacs, you can use (whitespace-mode) or (whitespace-visual-mode) depending on the version. You could also consider using (flyspell-prog-mode).

C Language Features

  • Don't use inline unless you're doing something so performance critical that the function call overhead will make a difference -- in other words: almost never. It makes debugging harder and overuse can actually hurt performance by causing instruction cache or stack overflow.
  • Use typedef carefully...
    • Do not create a new integer typedef without a good reason.
    • Always postfix typedef names with _t so that they can be identified clearly in the code.
    • Never typedef pointers. The * makes C pointer declarations obvious. Hiding it inside a typedef just obfuscates the code.
  • Do not embed assignments inside boolean expressions. Although this can make the code more concise, it doesn't necessarily make it more elegant and you increase the risk of confusing "=" with "==" or getting operator precedence wrong if you skimp on brackets. It's even easier to make mistakes when reading the code, so it's much safer simply to avoid it altogether.
 right:
        ptr = malloc(size);
        if (ptr != NULL) {
                ...

 wrong:
        if ((ptr = malloc(size)) != NULL) {
                ...
  • Conditional expressions read more clearly if only boolean expressions are implicit (i.e., non-boolean and pointer expressions compare explicitly with 0 and NULL respectively.)
 right:
        if (!writing &&          /* not writing? */
            inode != NULL &&     /* valid inode? */
            ref_count == 0)      /* no more references? */
                do_this();

 wrong:
        if (writing == 0 &&      /* not writing? */
            inode &&             /* valid inode? */
            !ref_count)          /* no more references? */
                do_this();
  • Use parentheses to help readability and reduce the chance of operator precedence errors, but not so heavily that it is difficult to determine which parentheses are a matched pair.
 right:
        if (a->a_field == 3 ||
            ((b->b_field & BITMASK1) && (c->c_field & BITMASK2)))
                do this();

 wrong:
        if (a->a_field == 3 || b->b_field & BITMASK1 && c->c_field & BITMASK2)
                 do this()

 wrong:
        if (((a->a_field == 3) || ((b->b_field & (BITMASK1)) &&
            (c->c_field & (BITMASK2)))))
                 do this()

Lustre Guidelines

  • The types and printf()/printk() formats used by Lustre code are:
        __u64                LPU64/LPX64/LPD64 (unsigned, hex, signed)
        __u32/int            %u/%x/%d (unsigned, hex, signed)
        (unsigned)long long  %llu/%llx/%lld
        loff_t               %lld after a cast to long long (unfortunately)
  • Functions and files should be documented using the Doxygen markup style:
/**
 * Implements cl_page_operations::cpo_make_ready() method for Linux.
 *
 * This is called to yank a page referred to by \a slice from the transfer
 * cache and to send it out as a part of transfer. This function try-locks
 * the page. If try-lock failed, page is owned by some concurrent IO, and
 * should be skipped (this is bad, but hopefully rare situation, as it usually
 * results in transfer being shorter than possible).
 *
 * \param  env     lu environment for large temporary stack variables
 * \param  slice   per-layer page structure being prepared
 * \retval 0       success, page can be placed into transfer
 * \retval -EAGAIN page is either used by concurrent IO has been
 *                 truncated. Skip it.
 */
static int vvp_page_make_ready(const struct lu_env *env,
                               const struct cl_page_slice *slice)
{
  • Use list_for_each_entry() instead of list_for_each followed by list_entry
  • When using sizeof() it should be used on the variable itself, rather than specifying the type of the variable, so that if the variable changes type/size then sizeof() will be correct:
 right:
         int *array;

         OBD_ALLOC(array, 10 * sizeof(*array));

 wrong:
         OBD_ALLOC(array, 10 * sizeof(int));     /* breaks if array becomes __u64 */

 wrong:
         OBD_ALLOC(array, 10 * sizeof(array));   /* This is the pointer size */
    
  • When allocating/freeing a single struct, use OBD_ALLOC_PTR() for clarity:
 right:
         OBD_ALLOC_PTR(mds_body);
         OBD_FREE_PTR(mds_body);

 wrong:
         OBD_ALLOC(mds_body, sizeof(*mds_body));
         OBD_FREE(mds_body, sizeof(*mds_body));
  • Do not embed operations inside assertions. If assertions are disabled for performance reasons this code will not be executed.
 right:
        len = strcat(foo, bar);
        LASSERT(len > 0);

 wrong:
        LASSERT(strcat(foo, bar) > 0);
  • Messages on the console (CERROR, CWARN, LCONSOLE_*) should print the OBD device name or filesystem name where the error is hit, since there are usually multiple targets running on a single server. The error messages should also include enough information to make some useful diagnosis of the problem (e.g. FID and/or filename, client NID, etc). Otherwise, there is little value in having printed the error, but then having to try and reproduce the problem to diagnose it:
right:
        LCONSOLE_INFO("MDS %s: %s now active, resetting orphans\n",
                      obd->obd_name, obd_uuid2str(uuid));
        LCONSOLE_WARN("%s: new disk, initializing\n", obd->obd_name);
        CERROR("%s: unsupported incompat filesystem feature(s) %x\n", obd->obd_name,

wrong:
        CERROR("Cannot get thandle\n");
        CERROR("NULL bitmap!\n");
        CERROR("invalid event\n");
  • Error messages that print a numeric error value should print it at the end of the line in a consistent format:
right:
        CERROR("%s: error invoking upcall %s %s %s: rc = %d",
        CERROR("%s: cannot open/create O: rc = %d\n", obd->obd_name,rc);

wrong:
        CERROR("err %d on param '%s'\n", rc, ptr);
        CERROR("Can't get index (%d)\n", rc);

Layout

  • Code can be much more readable if the simpler actions are taken first in a set of tests. Re-ordering conditions like this also eliminates excessive nesting.
right:
        list_for_each_entry(...) {

                if (!condition1) {
                        do_sth1;
                        continue;
                }

                do_sth2_1;
                do_sth2_2;
                ...
                do_sth2_N;

                if (!condition2)
                        break;

                do_sth3_1;
                do_sth3_2;
                ...
                do_sth3_N;
        }
wrong:
        list_for_each_entry(...) {
                if (condition1) {
                        do_sth2_1;
                        do_sth2_2;
                        ...
                        do_sth2_N;
                        if (condition2) {
                                do_sth3_1;
                                do_sth3_2;
                                ...
                                do_sth3_N;
                                continue;
                        }               
                        break;
                } else {
                        do_sth1;
                }
        }
  • Variable should be declared one per line, type and name, even if there are multiple variables of the same type. For maximum readability, the names should be aligned on the same column, preferably with longer declarations at the top.
 right:
        int           len;
        int           count;
        struct inode *inode;

 wrong:
        int len, count;
        struct inode *inode;
  • Variable declarations should be kept to an internal scope, if practical and reasonable, to simplify understanding of where these variables are used:
 right:
        int len;
        
        if (len > 0) {
                int           count;
                struct inode *inode = iget(foo);

                count = inode->i_size;
                :
        }
  • Even for short conditionals, the operation should be on a separate line:
 right:
        if (foo)
                bar();
 wrong:
        if (foo) bar();
  • When you wrap a line containing parenthesis, start the next line after the parenthesis so that the expression or argument is visually bracketed.
 right:
        variable = do_something_complicated(long_argument, longer_argument,
                                            longest_argument(sub_argument,
                                                             foo_argument),
                                            last_argument);

        if (some_long_condition(arg1, arg2, arg3) < some_long_value &&
            another_long_condition(very_long_argument_name,
                                   another_long_argument_name) >
            second_long_value) {
                ...

 wrong:

        variable = do_something_complicated(long_argument, longer_argument,
                longest_argument(sub_argument, foo_argument),
                last_argument);

        if (some_long_condition(arg1, arg2, arg3) < some_long_value &&
                another_long_condition(very_long_argument_name,
                        another_long_argument_name) >
                second_long_value) {
                ...
  • If you're wrapping an expression, put the operator at the end of the line. If there are no parentheses to which to align the start of the next line, just indent 8 more spaces.
        off = le32_to_cpu(fsd->fsd_client_start) +
                cl_idx * le16_to_cpu(fsd->fsd_client_size);
  • Binary and ternary (but not unary) operators should be separated from their arguments by one space.
 right:
        a++;
        b |= c;
        d = (f > g) ? 0 : 1;
  • Function calls should be nestled against the parentheses, the parentheses should crowd the arguments, and one space should appear after commas:
 right: 
        do_foo(bar, baz);

 wrong:
        do_foo ( bar,baz );
  • Put a space between if, for, while etc. and the following parenthesis. Put a space after each semicolon in a for statement.
 right:
        for (a = 0; a < b; a++)
        if (a < b || a == c)
        while (1)
 wrong:
        for( a=0; a<b; a++ )
        if( a<b || a==c )
        while( 1 )
  • Opening braces should be on the same line as the line that introduces the block, except for function calls. Bare closing braces (i.e. not else or while in do/while) get their own line.
int foo(void)
{
        if (bar) {
                this();
                that();
        } else if (baz) {
                stuff();
        } else {
                other_stuff();
        }

        do {
                cow();
        } while (condition);
}
  • If one part of a compound if block has braces, all should.
 right:
        if (foo) {
                bar();
                baz();
        } else {
                salmon();
        }

 wrong:
        if (foo) {
                bar();
                baz();
        } else
                moose();
  • When you define a macro, protect callers by placing parentheses round every parameter reference in the body. Line up the backslashes of multi-line macros to help readability. Use a do/while (0) block with no trailing semicolon to ensure multi-statement macros are syntactically equivalent to procedure calls.
/* right */
#define DO_STUFF(a)                              \
do {                                             \
        int b = (a) + MAGIC;                     \
        do_other_stuff(b);                       \
} while (0)

/* wrong */
#define DO_STUFF(a) \
do { \
        int b = a + MAGIC; \
        do_other_stuff(b); \
} while (0);
  • If you write conditionally compiled code in a procedure body, make sure you do not create unbalanced braces, quotes, etc. This really confuses editors that navigate expressions or use fonts to highlight language features. It can often be much cleaner to put the conditionally compiled code in its own helper function which, by good choice of name, documents itself too.
/* right */
static inline int invalid_dentry(struct dentry *d)
{
#ifdef DCACHE_LUSTRE_INVALID
        return d->d_flags & DCACHE_LUSTRE_INVALID;
#else
        return d_unhashed(d);
#endif
}

int do_stuff(struct dentry *parent)
{
        if (invalid_dentry(parent)) {
                ...
        
/* wrong */
int do_stuff(struct dentry *parent)
{
#ifdef DCACHE_LUSTRE_INVALID
        if (parent->d_flags & DCACHE_LUSTRE_INVALID) {
#else
        if (d_unhashed(parent)) {
#endif
                ...
  • If you nest preprocessor commands, use spaces to visually delineate:
#ifdef __KERNEL__
# include <goose>
# define MOOSE steak
#else
# include <mutton>
# define MOOSE prancing
#endif
  • For very long #ifdefs, include the conditional with each #endif to make it readable:
#ifdef __KERNEL__
# if LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,0)
/* lots
   of
   stuff */
# endif /* KERNEL_VERSION(2,5,0) */
#else /* !__KERNEL__ */
# if HAVE_FEATURE
/* more
 * stuff */
# endif
#endif /* __KERNEL__ */
  • Comments should have the leading '/*' on the same line as the comment and the trailing '*/' at the end of the last comment line. Intermediate lines should start with a '*' aligned with the '*' on the first line:
        /* This is a short comment */

        /* This is a multi-line comment.  I wish the line would wrap already,
         * as I don't have much to write about. */
  • Function declarations absolutely should NOT go into .c files, unless they are forward declarations for static functions that can't otherwise be moved before the caller. Instead, the declaration should go into the most "local" header available (preferably *_internal.h for a given piece of code).
  • Structure and constant declarations should not be declared in multiple places. Put the struct into the most "local" header possible. If it is something that is passed over the wire, it needs to go into lustre_idl.h and needs to be correctly swabbed when the RPC message is unpacked.
  • The types and printf/printk formats used by Lustre code are:
        __u64                 LPU64/LPX64/LPD64 (unsigned, hex, signed)
        size_t                LPSZ (or cast to int and use %u / %d)
        __u32/int             %u/%x/%d (unsigned, hex, signed)
        (unsigned) long long  %llu/%llx/%lld
        loff_t                %lld after a cast to long long (unfortunately)
        AC_CACHE_CHECK([for EMX OS/2 environment], [ac_cv_emxos2],
        [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [return __EMX__;])],
                           [ac_cv_emxos2=yes],
                           [ac_cv_emxos2=no])])
or_even
       AC_CACHE_CHECK([for EMX OS/2 environment],
                      [ac_cv_emxos2],
                      [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([],
                                                          [return __EMX__;])],
                                         [ac_cv_emxos2=yes],
                                         [ac_cv_emxos2=no])])