WARNING: This is the _old_ Lustre wiki, and it is in the process of being retired. The information found here is all likely to be out of date. Please search the new wiki for more up to date information.

Coding Guidelines

From Obsolete Lustre Wiki
Revision as of 18:56, 22 February 2008 by Eeb (talk | contribs)
Jump to navigationJump to search

All Lustre developers should follow the guidelines in this page very strictly to avoid problems during code merges later on. Please make the required changes to the default formatting rules in the editor you use to comply to the guidelines below.

Beautiful Code

More important than the physical layout of code (which is covered in detail below) is the thought that code is beautiful to read. What makes code "beautiful" to me? Fundamentally, I'm talking about readability and obviousness - the code must not have secrets.

  • Separate "ideas" are clearly separated in the program layout - e.g. blank lines group related statements and separate unrelated statements.
  • Declarations are easy to read. The declarations should always be separated from the functional code. Declarations should be made as locally as possible to the code they are being used.
  • If it's a choice between breaking the 80-chars-per-line rule and code that doesn't look good (e.g. breaks indentation rules or doesn't visually contain procedure call parameters between the brackets or spreads over too many lines), then break the 80-chars-per-line rule. We _don't_ want stupidly long lines, but we all have enough screen resolution these days to not have to wrap lines at 80 chars.
  • Names are well chosen. "Does exactly what it says on the tin" is an English (UK) expression describing something that tells you what it's going to do and then does _exactly_ that. For example, when I open a box labelled "soap", I expect it to help me wash and maybe even smell nice. I'll get upset if it's no good at removing dirt, and really upset if it makes me break out in a rash. The name of a procedure or variable or structure member should tell you something about the entity, without giving you misleading information - just "what it says on the tin".
  • Names are well chosen. This time I mean that you don't choose names that easily become a different but valid (to the compiler) name if you make a spelling mistake. i and j aren't the worst example - req_portal and rep_portal are much worse (and taken from our own code!!!).
  • Names are well chosen. Don't be scared of long names, but_do_not_go_to_extremes_either. If long
  • Names are well chosen. I can't emphasise this issue enough - I hope you get the point.
  • Intelligent use of assertions. Assertions can be abused. Obviously, when over-used they hurt performance. And they can also make you think that the code author didn't know what she was doing and added them to help her learn the code. But when assertions are used properly, they combine the roles of active comment and software fuse. As an active comment they tell you something about the program which you can trust more than a comment. As a software fuse, they provide fault isolation between subsystem by letting you know when and where invariant assumptions are violated.

I could go on, but I hope you get the idea. Notice that I didn't mention clever as a desirable attribute. It's only one step from clever to tricky - consider...

  t = a; a = b; b = t; /* dumb swap */
  a ^= b; b ^= a; a ^= b;  /* clever swap */

...which is a very minor example. It can almost be excused, because the "cleverness" is confined to a tiny part of the code. But if the clever code had sub-structure, that sub-structure would be very hard to work on. You can't be sure you're not screwing something up until you understand completely the environment in which you're working.

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian W. Kernighan

IMHO, beautiful code helps code quality because it improves communication between the code author and the code reader. Since code modifiers are also code readers, the quality of communication can lead either to a virtuous circle of improving quality, or a vicious circle of degrading quality.


Formatting Guidelines

1. There should be no tabs in any Lustre or LNET files. The exceptions are libsysio (maintained by someone else), ldiskfs and kernel patches (also part of a non-CFS project).

2. Blocks should be indented by 8 spaces.

3. New files should contain the following along with the license boilerplate. This will cause vim and emacs to use spaces instead of tabs for indenting. If you use a different editor, it also needs to be set to use spaces for indening Lustre code.

  /* -*- mode: c; c-basic-offset: 8; indent-tabs-mode: nil; -*-
   * vim:expandtab:shiftwidth=8:tabstop=8:
   */

4. All lines should wrap at 80 characters. If it's getting too hard to wrap there, you probably need to break it up into more functions. In some cases, it is acceptable to remove a few spaces between function arguments to avoid overflowing onto the next line.

5. Don't have spaces or tabs on blank lines or at the end of lines. Find these with some regexps in your patch (grep, or in vim) before attaching it to bugzilla:

/[ \t]$/

6. Don't use "inline" unless you're doing something so performance critical that the function call overhead will make a difference -- in other words: never. It makes debugging harder.

All of our wrapping, parenthesis, brace placement, etc. rules are basically Linux kernel rules, which are basically K&R. For those of you in need of a refresher, great detail is provided below.

7. For Autoconf macros, follow the style suggested in the autoconf manual.

     AC_CACHE_CHECK([for EMX OS/2 environment], [ac_cv_emxos2],
     [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [return __EMX__;])],
                        [ac_cv_emxos2=yes],
                        [ac_cv_emxos2=no])])

or even

     AC_CACHE_CHECK([for EMX OS/2 environment],
                    [ac_cv_emxos2],
                    [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([],
                                                        [return __EMX__;])],
                                       [ac_cv_emxos2=yes],
                                       [ac_cv_emxos2=no])])

Great detail

a. When you wrap, the next line should start after the parenthesis:

  right:
 
  variable = do_something_complicated(long_argument, longer_argument,
                                      longest_argument(sub_argument,
                                                       foo_argument),
                                      last_argument);
 
  if (some_long_condition(arg1, arg2, arg3) < some_long_value &&
      another_long_condition(very_long_argument_name,
                             another_long_argument_name) >
      second_long_value) {
  }
                             
 
  wrong:
 
  variable = do_something_complicated(long_argument, longer_argument,
                    longest_argument(sub_argument, foo_argument),
                    last_argument);
 
  if (some_long_condition(arg1, arg2, arg3) < some_long_value &&
              another_long_condition(very_long_argument_name,
                    another_long_argument_name) >
                    second_long_value) {
  }
 

b. If you're wrapping put the operators at the end of the line, and if there are no parentheses indent 8 more:

  off = le32_to_cpu(fsd->fsd_client_start) +
          cl_idx * le16_to_cpu(fsd->fsd_client_size);
 

c. Binary and ternary (but not unary) operators should be separated from their arguments by one space.

  a++;
  b |= c;
  d = f > g ? 0 : 1;
 

d. Function calls should be nestled against the parentheses, the parentheses should crowd the arguments, and one space after commas:

  right: do_foo(bar, baz);
  wrong: do_foo ( bar,baz );

e. All if, for, while, etc. expressions should be separated by a space from the parenthesis, one space after the semicolons:

  for (a = 0; a < b; a++)
  if (a < b || a == c)
  while (1)
 

f. Opening braces should be on the same line as the line that introduces the block, except for function calls. Closing braces get their own line, except for "else".

  int foo(void)
  {
          if (bar) {
                  this();
                  that();
          } else if (baz) {
                  ;
          } else {
                  ;
          }
          do {
                  cow();
          } while (0);
  }

g. If one part of a compound if block has braces, all should.

  right:
 
  if (foo) {
          bar();
          baz();
  } else {
          salmon();
  }
 
  wrong:
 
  if (foo) {
          bar();
          baz();
  } else
          moose();

h. When you make a macro, protect those who might call it by using do/while and parentheses; line up your backslashes:

  right:
 
  #define DO_STUFF(a)                              \
  do {                                             \
          int b = (a) + MAGIC;                     \
          do_other_stuff(b);                       \
  } while (0)
 
  wrong:
 
  #define DO_STUFF(a) \
  { \
          int b = a + MAGIC; \
          do_other_stuff(b); \
  }

i. If you nest preprocessor commands, use spaces to visually delineate:

  #ifdef __KERNEL__
  # include <goose>
  # define MOOSE steak
  #else
  # include <mutton>
  # define MOOSE prancing
  #endif

j. For very long #ifdefs include the conditional with each #endif to make it readable:

  #ifdef __KERNEL__
  # if LINUX_VERSION_CODE >= KERNEL_VERSION(2,5,0)
  /* lots
     of
     stuff */
  # endif /* KERNEL_VERSION(2,5,0) */
  #else /* !__KERNEL__ */
  # if HAVE_FEATURE
  /* more
   * stuff */
  # endif
  #endif /* __KERNEL__ */ 

k. Comments should have the leading /* on the same line as the comment, and the trailing */ at the end of the last comment line. Intermediate lines should start with a * aligned with the first line's *:

 
 /* This is a short comment */
 
  /* This is a multi-line comment.  I wish the line would wrap already,
   * as I don't have much to write about. */

l. Function declarations absolutely should NOT go into .c files, unless they are forward declarations for static functions that can't otherwise be moved before the caller. Instead, the declaration should go into the most "local" header available (preferrably *_internal.h for a given piece of code).

m. Structure and constant declarations should not be declared in multiple places. Put the struct into the most "local" header possible. If it is something that is passed over the wire it needs to go into lustre_idl.h, and needs to be correctly swabbed when the RPC message is unpacked.

n. The types and printf/printk formats used by Lustre code are:

   __u64                 LPU64/LPX64/LPD64 (unsigned, hex, signed)
   size_t                LPSZ (or cast to int and use %u / %d)
   __u32/int             %u/%x/%d (unsigned, hex, signed)
   (unsigned) long long  %llu/%llx/%lld
   loff_t                %lld after a cast to long long (unfortunately)