# Doxygen Filter

## Overview

The Doxygen Filter is an Okapi component for extracting Doxygen-style comments from source code. An example:

``` /*! A test class */
class Test
{
public:
/** An enum type.
* The documentation block cannot be put after the enum!
*/
enum EnumType
{
int EVal1, /**< enum value 1 */
int EVal2 /**< enum value 2 */
};
void member(); //!< a member function.
protected:
int value; /*!< an integer value */
};
```

C++-style (`///`), Javadoc-style (`/**`), Qt-style (`/*!`), and Python-style (`'''` or `"""`) comment blocks are supported.

## Processing Details

### Input Encoding

The filter decides which encoding to use for the input document using the following logic:

• If the file has a Unicode Byte-Order-Mark:
• Then, the corresponding encoding (e.g. UTF-8, UTF-16, etc.) is used.
• Otherwise, the input encoding used is the default encoding that was specified when opening the document.

### Inline Codes

The full set of Doxygen special commands, HTML commands, and XML commands are recognized and interpreted. For instance,

``` /*! \class Test class.h "inc/class.h"
*  \brief This is a test class.
*
* Some details about the Test class
*/
```

will be extracted to the following Text Units:

1. `<1/><2/> This is a test class.`
2. `Some details about the Test class`

### Line Numbers

The filter preserves line numbers so that a one-to-one correspondence between source line number and translated line number is maintained.

## Parameters

Supported Doxygen commands are listed in one of three categories:

• `custom_commands`
• `doxygen_commands`
• `html_commands`

You can customize the behavior of the filter by editing existing entries or adding new ones. An example `doxygen_commands` entry:

``` doxygen_commands:
COMMAND_NAME:
type: TYPE
inline: INLINE
pair: PAIR_CMD_NAME
translatable: CMD_TRANSLATABLE
parameters:
- name: PARAM_NAME
length: LENGTH
required: REQUIRED
translatable: PARAM_TRANSLATABLE
- ...
```

Replace bold items above with custom data conforming to the following.

Item Description Example value
`COMMAND_NAME` The name of the command as it will appear in the Doxygen comment, without any prefix or suffix bits. E.g. `\code{.py}` should be `code`. Case-sensitive. `code`
`TYPE` The "type" of the command, specifically one of `PLACEHOLDER`, `OPENING`, or `CLOSING`. `PLACEHOLDER`
`INLINE` Whether the command should be considered an inline item (`true`) or a block-level element (`false`). Default: `false`. `true`
`PAIR_CMD_NAME` For `OPENING`- and `CLOSING`-type commands, this identifies the paired command. E.g. `\code` is paired with `\endcode`, so for `code` we have `pair: endcode`. Not required for `PLACEHOLDER` commands. `endcode`
`CMD_TRANSLATABLE` Indicates whether the entire content of the command is translatable or not. This is intended for block-level `OPENING` commands that delimit entire blocks such as `\code`. Default: `true`. `true`
`PARAM_NAME` The name of a parameter. This is for organizational purposes only, and is not used by the filter. `name`
`LENGTH` The length of the parameter, specifically one of `WORD`, `LINE`, `PHRASE` or `PARAGRAPH`. These map to the designations described at the top of the special commands page, except for `PHRASE` which indicates a string bounded by double quotes like `"image caption"`. `WORD`
`REQUIRED` Whether the parameter is required (`true`) or optional (`false`). This affects how aggressively the filter tries to interpret proceeding text as a parameter. Default: `true`. `true`
`PARAM_TRANSLATABLE` Indicates whether the parameter is translatable (`true`) or not (`false`). Each parameter may be set independently, though untranslatable parameters following translatable ones will be recorded as separate inline codes. Default: `true`. `true`

Note:

• The `parameters` listing is optional.
• When present, parameters should be listed in the order in which they are written following the command.
• Parameters with non-whitespace delimiters (e.g. `.py` in `\code{.py}`) are not currently supported.

You may also define custom commands as follows (all of the above options except `COMMAND_NAME` are supported; the following is a minimal case):

``` custom_commands:
- pattern: "REGEX_PATTERN"
type: TYPE
...
```
Item Description Example value
`REGEX_PATTERN` Any valid regex that matches non-zero-width runs of text within the comment. Matches will be turned into codes according to the parameters as described above. `###ACCESS_CHECKS###.*?;`

### Whitespace

Prevent the filter from collapsing whitespace by setting `preserve_whitespace: true`.

## Limitations

• Single linebreaks in a text run that are not part of a Doxygen command are collapsed. No effort is made to enforce a maximum line width upon output, so essentially each translatable paragraph will be collapsed to a single (potentially very long) line.
• Command parameters with non-whitespace delimiters (e.g. `.py` in `\code{.py}`) are not currently supported.
• Non-translatable command parameters are not exposed for any special processing.