Architecture - Writing Architecture Documents

Lustre Architecture Process

Resources

 * SEI books - Documenting Software Architectures, Software Architecture in Practice
 * arch@ - the Architecture team mailing list where all architectural discussion should be taking place
 * Lustre architecture wiki site - arch.lustre.org
 * Lustre Architectural Features Google spreadsheet - http://spreadsheets.google.com/a/clusterfs.com/ccc?key=pMf3Udi56jKwFpCMJDq-Mbg&hl=en
 * Good example of a feature architecture - Simple Space Balance Migration
 * Task Planning Worksheet form - https://wikis.clusterfs.com/intra/index.php/Task_Planning_Work_Sheets

General Guidelines

 * Discuss everything on arch@; avoid private discussions until the choices are accepted
 * Arch pages should be short and sweet; if printed, most components would fit 1-2 pages; major components maybe 5-10 (say for the LRE)
 * Rake care with formatting, make them look nicely -- people can read these all over
 * Use consistent formatting:
 * Summary QAS features - Simple Space Balance Migration
 * Detailed QAS - Adaptive Timeouts
 * Definition lists - Simple Space Balance Migration

Process Outline

 * 1) Get your architecture assignments and priorities from the Lustre Architectural Feature Google spreadsheet
 * 2) * this page is maintained by the Chief Architect and VP engineering
 * 3) * the Responsible Architect(s) column
 * 4) ** identifies who needs to generate the architecture
 * 5) ** is populated initially based on the current engineering organization, but may change based on priorities and negotiation with arch@
 * 6) ** the first person is the list is "on the hook" to deliver; project managers will be coming after you to complete this work
 * 7) ** the others architects are responsible for contributing to the design
 * 8) * the "link" column identifies the page on the Lustre Architecture wiki were you work should be published
 * 9) Make a list of summary requirements, using line per row, and an indicator what kind of use case they are. There are templates and patterns to aid you in finding solutions.
 * 10) quality attribute scenarios, focus on the following six attributes.
 * 11) * performance
 * 12) * availability
 * 13) * testability
 * 14) * modifiability
 * 15) * usability
 * 16) * security
 * 17) features
 * 18) implementation constraints
 * 19) Examples:
 * 20) * features - e.g. offer posix interface for this or that, or replicate file systems, but not atime
 * 21) * qualities - e.g. provide performance of 90% of hardware, propagate changes within 1 sec, recovery without semantic changes to clients
 * 22) * implementation constraints - e.g. use POSIX DMU, or use existing llog code, or use replication code from cmd2
 * 23) document this on your assigned architecture wiki page.  If there are many, start a separate QAS page
 * 24) Review use cases with Chief Architect and architecture team
 * 25) * send an email to arch@, requesting an inspection for your high level use cases; specify your architecture wiki page for convenience
 * 26) * this is an important quality gating step; be sure to get approval and make required updates before proceeding
 * 27) Review use cases with key customers
 * 28) * your project manager will help broker this step for you
 * 29) * the key customers for each feature are identified in the Architectural Features Google spreadsheet
 * 30) * ask them to review and provide questions and feedback on the use cases; in particular, have we overlooked any key scenarios?
 * 31) * discuss relevant feedback with arch@ and revise use cases as required
 * 32) Decompose the problem into a more detailed architectural definition
 * 33) * almost all architecture starts by decomposing the problem
 * 34) * write down the subsystems and the functionality and interfaces offered by each
 * 35) * often a list of definitions is useful, like "agent = a daemon receiving commands to move data", "coordinator == a daemon running on ... " etc.
 * 36) * it is pretty important that the deomposition leaves no doubt about what happens where
 * 37) Write detailed behavioral architecture
 * 38)  Write detailed attribute scenarios
 * 39) * pick the hardest use cases from the table
 * 40) * describe in more detail what happens when they take place, for each of the components you have identified in the decomposition phase
 * 41) * Use existing template tables from the example pages
 * 42) Sequence and State Diagrams
 * 43) * Use UML2 notation - see the petascale white paper for some examples
 * 44) * for example, "what is the RPC pattern to handle a lock revocation"
 * 45) Write detailed command invocations
 * 46) * use cases can contain exact command invocations: type mkfs.lustre --mds ...... (all arguments ) to achieve ......
 * 47) Review architecture with architecture team
 * 48) * send an email to arch@, requesting an inspection for your draft architecture document; specify your architecture wiki page for convenience
 * 49) * We follow the benevolent dictatorship model, we are done when Braam says we are done.
 * 50) * Typically several weeks are needed for things to settle
 * 51) Review architecture with key customers
 * 52) * your project manager will help broker this step for you
 * 53) * the key customers for each feature are identified in the Architectural Features Google spreadsheet
 * 54) * discuss relevant feedback with arch@ and revise the architecture as required
 * 55) Cross reference
 * 56) * Add bugzilla link to architecture pages (in a Reference section at the bottom)
 * 57) * Add an architecture link to the planning worksheet
 * 58) Finish up by preparing the task planning worksheet (which has its own process)

Examples
Example Architectures
 * http://arch.lustre.org/index.php?title=Simple_Space_Balance_Migration
 * http://arch.lustre.org/index.php?title=QAS_Client_Cleanup - for bigger projects sometimes one separates out a whole page of QAS (requirements)
 * http://arch.lustre.org/index.php?title=VS_LRE - a larger introduction for the LRE project; this has some elements we did not discuss here, and which are generally not necessary
 * http://arch.lustre.org/index.php?title=QAS_LRE - has many use cases and several detailed ones, done with customers