How to write good API and non-API documentation in ROSE.

This chapter is mainly for developers working on the ROSE library as opposed to users developing software that uses the library. It specifies how we would like to have the ROSE library source code documented. The style enumerated here does not necessarily need to be used for projects, tests, the tutorial, user-code, etc. Each item is also presented along with our motivation for doing it this way.

ROSE uses Doxygen for two broad categories of documentation:

For documenting the ROSE API. Doxygen is able to generate the structure of the documentation, and authors fill in the descriptions.
For documenting non-API things that are nonetheless tied to a particular version of ROSE. An example is this page itself, which might change over time as ROSE evolves and which must go through ROSE's continuous integration testing and/or release testing.

Quick start

Here's an example that documents a couple of closely-related class member functions. Things to note:

Use C-style block comments for documentation.
First line (up to punctuation) is a summary – the autobrief string that shows up in tables of contents.
Use @ref when referring to another class and @p when mentioning a parameter.
Use @{ and @} to give the same documentation to both member functions.
You can easily insert HTTP links into documentation.

| 1|  /** Most basic use of the partitioner.
| 2|   *
| 3|   *  This method does everything from parsing the command-line to generating an abstract syntax tree. If all is
| 4|   *  successful, then an abstract syntax tree is returned. The return value is a @ref SgAsmBlock node that contains all
| 5|   *  the detected functions. If the specimen consisted of an ELF or PE container then the parent nodes of the returned
| 6|   *  AST will lead eventually to an @ref SgProject node.
| 7|   *
| 8|   *  The command-line can be provided as a typical @c argc and @c argv pair, or as a vector of arguments. In the
| 9|   *  latter case, the vector should not include <code>argv[0]</code> or <code>argv[argc]</code> (which is always a
|10|   *  null pointer).
|11|   *
|12|   *  The command-line supoprts a "--help" or ("-h") switch to describe all other switches and arguments, essentially
|13|   *  generating output much like a Unix man(1) page.
|14|   *
|15|   *  The @p purpose should be a single line string that will be shown in the title of the man page and should not start
|16|   *  with an upper-case letter, a hyphen, white space, or the name of the command. E.g., a disassembler tool might
|17|   *  specify the purpose as "disassembles a binary specimen".
|18|   *
|19|   *  The @p description is a full, multi-line description written in the [Sawyer](https://github.com/matzke1/sawyer)
|20|   *  markup language where "@" characters have special meaning..
|21|   *
|22|   *  @{ */
|23|  SgAsmBlock* frontend(int argc, char *argv[],
|24|                       const std::string &purpose, const std::string &description) /*final*/;
|25|  virtual SgAsmBlock* frontend(const std::vector<std::string> &args,
|26|                               const std::string &purpose, const std::string &description);
|27|  /** @} */

General Doxygen style

Both categories of documentation (API and non-API) are written as comments in C source code and follow the same style conventions.

Comment style: Whether to use block- or line-style comments is up to the author. However, authors are encouraged to use block style comments for Doxygen documentation and line-style comments for non-Doxygen documentation so that IDEs can easily highlight them differently. Furthermore, a vertical line of "*" down the left side of block comments has two useful benefits: it helps those developers that don't use syntax-highlighting IDEs to realize that lines are part of a comment, and it provides a hint that lines matched by grep style searching are comments rather than code. Both become more important as the size of the block comment grows, especially if it contains lines that might look like code.
Use Javadoc style: Javadoc style uses the at-sign ("@") rather than the backslash ("\") to introduce Doxygen key words. The backslash style comes from the much less common Qt documentation system. IDEs tend to have fewer problems recognizing the Javadoc style due to its popularity and the fact that "" is relatively uncommon in C++ code. Similarly, the presence of "!" to mark the start of a Doxygen comment is a Qt-ism, so avoid it for similar reasons.
Explicit references: Although Doxygen will automatically create cross references to any word that has strange capitalization or underscores, using an explicit @ref will cause Doxygen to emit an error if the referent's name changes and breaks the link. Our goal is to eventually fix all Doxygen warnings so that new warnings are easy to spot.
Capitalization: Use the Wikipedia style of capitalization for pages, sections, and subsections. Namely, the first word is capitalized and all other words except proper names and abbreviations are lower-case. Titles do not end with punctuation.
"ROSE": The name of this project is "ROSE", not "Rose" and not "rose". However, within the documentation itself it's seldom necessary to mention ROSE by name.

Doxygen documentation for non-API entities

As mentioned, one of ROSE's uses of Doxygen is for documentation not related to any specific API element (such as this page itself). This section intends to show how to document such things.

Pages or modules? Non-API documentation is generally organized into Doxygen "Related pages" and/or "Modules", with the main differences between them being that pages are relatively large non-hierarchical chapter-like things, while modules are are smaller (usually) and hierarchical. The distinction is blurry though because both support sections and subsections. Use this table to help decide:

Use "Related pages"	Use "Modules"
Subject is important enough to be a chapter in a book?	Subject would be an appendix in a book?
Subject should be listed in the top-level table of contents?	Subject should be listed in some broader subject's page?
User would read the entire subject linearly?	User would jump around in the subject area?
Subject has two levels of nesting?	Subject has arbitrarily deep hierarchy?
Subject's sections should appear together in a single HTML page?	Subject's sections should each be on their own HTML page?

Pages are created with Doxygen's @page directive, which takes a unique global identifier and a title. The first sentence is the auto-brief content (regardless of whether @brief is present) that will show up in the "Related pages" list. The auto-brief sentence should fit on one line, end with a period, and should not be identical to the title; it should restate the title in different words or else the table of contents looks awkward:

* @page binary_tutorial Getting started with binary analysis

* @brief Overview showing how to write binary analysis tools.

Modules, on the other hand, are created with Doxygen's @defgroup directives and the hierarchy is formed by declaring one module to be in another with @ingroup. The group is defined with a unique global identifier followed by a title. The @ingroup takes the global identifier of some other @defgroup. The first sentence is the auto-brief content regardless of whether the @brief is used:

* @defgroup installation_dependencies_boost How to install Boost
* @ingroup installation_dependencies
* @brief Instructions for installing Boost, a ROSE software dependency.

Location of documentation source? Regardless of whether one chooses to write a page or a module, the documentation needs to be placed in a C++ source file. These files should have the extension ".dox" (".docs" is acceptable too, but avoid ".doc" and ".docx") and the documentation should be written as a block comment. IDEs can be told that these files are actually C++ code, so you'll get whatever fancy comment-handling features your IDE normally provides. For example, Emacs excels at formatting C++ block comments and can reflow paragraphs, add the vertical line of stars, spell check, highlight Doxygen tags, etc.

These ".dox" files can live anywhere in the ROSE source tree, but we prefer that they're somewhere under the top-level "docs" directory along with all the non-Doxygen documentation. Once you've added the new file, you should edit "docs/Rose/rose.cfg.in", find the INPUT variable, and add your new file to the list. For Doxygen "pages", the position in the list determines the order of that page relative to other pages. Doxygen might still find your file if you fail to list it in the INPUT variable, but it will be sorted more or less alphabetically.

Doxygen documentation for API entities

The original purpose of Doxygen is to document the files, name spaces, classes, functions, and other types that compose an API. Doxygen automatically generates the document structure from C++ declarations and the API author fills in those things that cannot be done automatically, which is the majority of the text. The bullets below reference this declaration:

public: std::vector<std::string> splitString(const std::string &inputString, const std::string &separator);

Co-location: The Doxygen comment should be adjacent to the thing it documents. Some people claim this unnecessarily clutters the header file and that the comment should be in a separate file, but the counter argument is that by having documentation near the declaration it is more likely to be updated if the declaration changes. Also, the cluttering-up claim is made moot by any reasonably capable IDE, especially if we separate API and implementation documentation by using C-style block comments for one and C++ line comments for the other.
Auto brief: The Doxygen configuration is set up so that the first sentence of documentation gets used as the @brief value without having to specify @brief. The brief content should be concise! In particular, it should not start with "This function..." (since Doxygen context provides that), it should easily fit on one line, it should not repeat information obvious from the declaration, and it should end with a period. Example: "Splits a string into substrings according to separator strings."
Public versus private: Every public and protected part of the API must be documented. Documentation is as important as implementation; even so-called "self documenting" practices need additional human-written descriptions to make them useful to users that might not be familiar with a certain technique or algorithm. If some entity is not documented then it is not worthy of being a member of the API! Eventually we will disable the switches that allow Doxygen to generate stub documentation for non-documented parts of the API. The private things should not be documented with Doxygen because if someone's using these they need to be reading the source anyway.
Description: All API entities must have a clear description except if the auto-brief statement together with the declaration entirely captures all details of interest to a user, which is seldom the case. The description should describe any pre and post conditions, what happens if an error is detected, and provide an example directly or indirectly if appropriate. Type information need not be repeated since it's already documented in the declaration. Example: "The @@p inputString is scanned to find each non-overlapping occurrence of the @@p separator string from left to right. The substrings between the identified separators are returned in the order they occur, including empty substrings. If no separator is found in the input string then only the input string is returned, even if it is empty."
Function parameters: Function parameters need to be documented when their type and/or name is not sufficient. They can be documented in list format or as part of the function's description. Use the @p formatting tag when referring to a parameter. It may work better to document related parameters in a descriptive paragraph than listing each one separately. If using a list, there's no need to include a parameter if the parameter is sufficiently documented in the main description or by the declaration; combine closely related parameters into a single item; attempt to minimize forward references by rewording or reordering.
To-do lists: If you need to mark documentation that should be fixed, use the @todo tag and include a description of what needs to be fixed. Also include your name (i.e., the person who thinks there's a problem) and the date.
Author name: Do not insert your own name as the author. There are a number of reasons: first, we all work on all parts of ROSE to some degree and most of us would not be willing to remove another author's name if we make edits to the documentation even if that author is no longer with the ROSE team, which leads to the name eventually becoming inaccurate and/or misleading. Second, if some names are inaccurate then none of the names can be trusted. Finally, the question of who wrote what is answered better by the revision control system than by annotations in the source code.
Proofread: Proofread your documentation in a web browser after Doxygen runs. One common error is for Doxygen to make a link to a capitalized word (like Function, at least as this is being written) that happens to also be an entity in the API. Prefix such words with a percent sign when the link is unintended. Likewise, authors should try to avoid class and namespace names that are also common words in order to prevent Doxygen from suddenly making those words links throughout all documentation. The documentation is generated by running make -C $ROSE_BLD/docs/Rose doxygen_docs or by invoking the "doxygen" command on the "$ROSE_BLD/docs/Rose/rose.cfg" file. In a web browser, open the resulting "$ROSE_BLD/docs/Rose/ROSE_WebPages/ROSE_HTML_Reference/index.html" file.

Doxygen documentation for AST Nodes

AST Nodes (e.g. SgNode) are a special case because they are generated by ROSETTA, a code generator. So we can't document them inside the code. Instead we have a seperate file for each documented AST node. (A lot of files are missing, because a lot of AST nodes are missing documentation.)

These files are in $ROSE_SRC/docs/testDoxygen, and they are of the format NodeType.docs. e.g. SgConstructorInitializer.docs

The format of the files follows normal doxygen formatting, except that each comment item must specify the item being commented. e.g. start the comment of a class with "\class SgConstructorInitializer"

doxygen searches for these file automatically, so new files can be added without concern, although they should match the name of the AST Node class being commented.

Doxygen directives

Doxygen understands a subset of HTML, its own Javadoc-like directives, and Markdown. The most useful are:

Sections and subsections are introduced by @section or @subsection followed by a unique identifier followed by the name of the section using Title capitalization described above.
Function parameter names are indicated with @p followed by the parameter name. This typesets them in a consistent style.
References to other symbols, pages, and modules are indicated with @ref followed by the name or ID. The look-up for symbols is similar to how the C++ compiler would look up the name, so you might need to qualify it with some namespace or even place the comment inside a namespace. If you don't want the qualified name cluttering the final HTML, follow the qualified name with a shorter name in double quotes (e.g., @ref InstructionSemantics::BaseSemantics::RiscOperators "RiscOperators").
Code snippets are indicated by preceding the word with @c. If more than one word or if it contains special characters, use HTML <code> and </code> tags instead.
To include an entire example source file into the documentation, use the @includelineno directive, which takes the name of the file to include. No path is necessary, but the file must live in a directory known to Doxygen, such as the "tutorial" directory. (see ROSE's Doxygen configuration file).
To include a few lines from an example source file, use the @snippet mechanism. This mechanism is easier to use than the older method of trying to match certain patterns of code. The @snippet directive takes two arguments: the name of an example source file, and the name of a snippet. The beginning and end of the snippet is marked in the source file using Doxygen comments of the form //! [snippet name goes in here]. Snippet names can include space characters.
To document a namespace, class, function, variable, etc., place the documentation immediately prior to the declaration and use a block style comment that starts with two asterisks: /**. Alternatively, for variables and enum constants it's sometimes more convenient to put the documenting comment after the declaration, in which case the comment should start with /**<.

Build Doxygen Docs

doxygen rose.cfg

To generate ROSE's Doxygen documentation locally, run:

cd $ROSE_BUILD/docs/Rose
make doxygen_docs or doxygen rose.cfg

Please note that you can ignore warning messages about things not documented. When it is done, you can open it by typing "firefox $ROSE_BUILD/docs/Rose/ROSE_WebPages/ROSE_HTML_Reference/index.html"

Next steps

Doxygen, as one would expect from a documentation generator, is well documented at its website. There are also a number of quick references available.

Collaboration diagram for Writing documentation: