Corpus Extensions

Use an extension to rewrite metadata across many symbols at once: backfill briefs from a naming convention, tag symbols by group, mark generated code as "see below" in the output. Extensions run between extraction and rendering, so every generator sees the change.

Languages and addon locations

Extensions are user-supplied scripts dropped under <addons>/extensions/ with the .js or .lua extension. The addon roots come from the addons and addons-supplemental options, the same roots that hold the Handlebars templates. Mr.Docs aggregates scripts across every root and runs them in alphabetical order by full path, with the two languages interleaved.

Both scripting languages reach the same mrdocs API. The choice is a trade-off, not a ranking.

  • JavaScript is more familiar to most developers. The runtime Mr.Docs embeds is a small ES engine, so scripts have the language proper (closures, destructuring, regex, classes) but not a Node-style standard library. There is no fs, no path, no process. Scripts that only manipulate the corpus do not miss any of that.

  • Lua is the language designed to be embedded. Mr.Docs links it whole, so scripts have access to the entire Lua standard library (string, table, math, io, os) and can do filesystem work or text munging without leaving the script. The cost is that fewer people read Lua at a glance than read JavaScript. If you’re already familiar with Lua, it is the more powerful choice.

Accessing the corpus

A script extends Mr.Docs by defining a function named transform_corpus(corpus). Mr.Docs calls it once per loaded script with a flat read-only view of the corpus. A script that doesn’t define transform_corpus is silently ignored at this step.

  • JavaScript

  • Lua

addons/extensions/noop.js
function transform_corpus(corpus) {
    // walk corpus.symbols, assign to the fields you want to change
}
addons/extensions/noop.lua
function transform_corpus(corpus)
    -- walk corpus.symbols, assign to the fields you want to change
end

The corpus object provides functions that expose the symbol graph. The corpus.symbols field is a flat array containing every extracted symbol. Scripts that need queries like "all members of `X`" simply walk the array and filter.

For instance, the following scripts count the symbols of each kind and report the totals at the end of the run:

  • JavaScript

  • Lua

addons/extensions/count_by_kind.js
function transform_corpus(corpus) {
    var counts = {};
    for (var i = 0; i < corpus.symbols.length; ++i) {
        var k = corpus.symbols[i].kind;
        counts[k] = (counts[k] || 0) + 1;
    }
    for (var k in counts) {
        console.log(k + ": " + counts[k]);
    }
}
addons/extensions/count_by_kind.lua
function transform_corpus(corpus)
    local counts = {}
    for _, sym in ipairs(corpus.symbols) do
        counts[sym.kind] = (counts[sym.kind] or 0) + 1
    end
    for k, v in pairs(counts) do
        print(k .. ": " .. v)
    end
end

Each entry in corpus.symbols is a proxy for a live Mr.Docs symbol. The fields of each object are at the DOM reference.

When a script knows a symbol’s id and needs to act on that one symbol:

  • corpus.get(id) returns the proxy for it or null if the id is unknown

  • corpus.lookup(name) does a global-namespace name lookup and returns the proxy (or null)

subclass-tree.cpp
/// The root of the shape hierarchy.
struct Shape {};

/// A shape with straight sides.
struct Polygon : Shape {};

/// A three-sided polygon.
struct Triangle : Polygon {};

/// A four-sided polygon.
struct Quadrilateral : Polygon {};

/// A quadrilateral with equal sides.
struct Square : Quadrilateral {};

/// A round shape.
struct Circle : Shape {};
  • JavaScript

  • Lua

addons/extensions/subclass_tree.js
// Print the inheritance subtree rooted at a named class.
//
// `corpus.lookup(name)` resolves the entry point once. From there the
// only way down the tree is by id: each record carries a `derived`
// list of base16 ids, and `corpus.get(id)` turns each id back into a
// live symbol proxy. The recursion walks the graph that single-pass
// iteration over `corpus.symbols` cannot reconstruct.

function listSubclasses(corpus, sym, indent) {
    for (var i = 0; i < sym.derived.length; ++i) {
        var child = corpus.get(sym.derived[i]);
        if (!child) { continue; }
        console.log(indent + child.name);
        listSubclasses(corpus, child, indent + "  ");
    }
}

function transform_corpus(corpus) {
    var base = corpus.lookup("Shape");
    if (!base) { return; }
    console.log(base.name);
    listSubclasses(corpus, base, "  ");
}
addons/extensions/subclass_tree.lua
-- Print the inheritance subtree rooted at a named class.
--
-- `corpus.lookup(name)` resolves the entry point once. From there the
-- only way down the tree is by id: each record carries a `derived`
-- list of base16 ids, and `corpus.get(id)` turns each id back into a
-- live symbol proxy. The recursion walks the graph that single-pass
-- iteration over `corpus.symbols` cannot reconstruct.

local function listSubclasses(corpus, sym, indent)
    for _, id in ipairs(sym.derived) do
        local child = corpus.get(id)
        if child then
            print(indent .. child.name)
            listSubclasses(corpus, child, indent .. "  ")
        end
    end
end

function transform_corpus(corpus)
    local base = corpus.lookup("Shape")
    if not base then return end
    print(base.name)
    listSubclasses(corpus, base, "  ")
end

Running either script against the fixture above prints:

Shape
  Circle
  Polygon
    Quadrilateral
      Square
    Triangle

Modifying the corpus

Scripts modify the corpus by assigning to fields on a symbol proxy. Each assignment lands directly in the underlying Mr.Docs symbol. The runtime validates each assignment and raises an exception on an invalid value. An uncaught error in an extension aborts the build and includes the script’s path and the error message.

Most extensions read some attribute of a symbol and write back a string, an enumerator, or a small structured value. For instance, a codebase that follows "any function whose name starts with is_<name> is a predicate returning true if its <name> holds" encodes information that the brief could repeat verbatim:

brief-from-name.cpp
bool is_prime(int n);
bool is_palindrome(char const* s);
bool is_empty(char const* s);
  • JavaScript

  • Lua

addons/extensions/brief_from_name.js
// Synthesize documentation for every `is_*` predicate from its name.
//
// The convention: "any function whose name starts with `is_` is a
// predicate that returns `true` if its name (in plain English) holds".
// Both the brief and the lone parameter follow that template, so the
// script writes them once and frees authors from typing the same
// sentence on every declaration. Anything an author already wrote is
// preserved: only missing fields are filled in.

function transform_corpus(corpus) {
    for (var i = 0; i < corpus.symbols.length; ++i) {
        var sym = corpus.symbols[i];
        if (sym.kind !== "function") { continue; }
        if (sym.name.indexOf("is_") !== 0) { continue; }

        if (!sym.doc) { sym.doc = {}; }

        var subject = sym.name.slice(3).replace(/_/g, " ");

        if (!sym.doc.brief) {
            sym.doc.brief = "Returns true if " + subject + ".";
        }

        if (sym.params.length === 1
            && (!sym.doc.params || sym.doc.params.length === 0)) {
            sym.doc.params = [{
                name: sym.params[0].name,
                children: "The input examined for the " + subject + " property."
            }];
        }
    }
}
addons/extensions/brief_from_name.lua
-- Synthesize documentation for every `is_*` predicate from its name.
--
-- The convention: "any function whose name starts with `is_` is a
-- predicate that returns `true` if its name (in plain English) holds".
-- Both the brief and the lone parameter follow that template, so the
-- script writes them once and frees authors from typing the same
-- sentence on every declaration. Anything an author already wrote is
-- preserved: only missing fields are filled in.

function transform_corpus(corpus)
    for _, sym in ipairs(corpus.symbols) do
        if sym.kind == "function"
           and sym.name:sub(1, 3) == "is_" then
            if not sym.doc then sym.doc = {} end

            local subject = sym.name:sub(4):gsub("_", " ")

            if not sym.doc.brief then
                sym.doc.brief = "Returns true if " .. subject .. "."
            end

            if #sym.params == 1
               and (not sym.doc.params or #sym.doc.params == 0) then
                sym.doc.params = {
                    {
                        name = sym.params[1].name,
                        children = "The input examined for the "
                            .. subject .. " property."
                    }
                }
            end
        end
    end
end

Every is_foo_bar function then ships with "Returns true if foo bar." Authors only have to write a brief when the synthesized one is not the right one.

Preview
is_empty

Returns true if empty.

Synopsis

Declared in <brief‐from‐name.cpp>

bool
is_empty(char const* s);
Parameters

Name

Description

s

The input examined for the empty property.

is_palindrome

Returns true if palindrome.

Synopsis

Declared in <brief‐from‐name.cpp>

bool
is_palindrome(char const* s);
Parameters

Name

Description

s

The input examined for the palindrome property.

is_prime

Returns true if prime.

Synopsis

Declared in <brief‐from‐name.cpp>

bool
is_prime(int n);
Parameters

Name

Description

n

The input examined for the prime property.

Cross-linking Symbols

When the value being written needs to reference another symbol, the second symbol’s id is what makes the link clickable in the rendered output rather than a plain string.

For instance, consider a project where the parse_X and format_X free functions are symmetric. A reader landing on one almost always wants to see the other. The extension builds a name → id index in one pass, then walks it again to look up each partner:

parse-format-relates.cpp
/// An HTTP request as a structured value.
struct request;

/// Parse `text` into a request. Returns a valid request on success.
request parse_request(char const* text);

/// Format `r` as the wire-format text of an HTTP request.
char const* format_request(request const& r);

/// A user record.
struct user;

/// Parse `text` into a user.
user parse_user(char const* text);

/// Format `u` as the canonical wire-format text of a user.
char const* format_user(user const& u);
  • JavaScript

  • Lua

addons/extensions/parse_format_relates.js
// Cross-link symmetric IO helpers from a JavaScript extension.
//
// Every `parse_X` and `format_X` free function gets a `@see` entry
// pointing at its partner. The result is that each function's
// rendered page carries a "See Also" link to the other one, without
// anyone writing `@see` by hand.

function partnerName(name) {
    if (name.indexOf("parse_") === 0) {
        return "format_" + name.slice(6);
    }
    if (name.indexOf("format_") === 0) {
        return "parse_" + name.slice(7);
    }
    return null;
}

function transform_corpus(corpus) {
    for (var i = 0; i < corpus.symbols.length; ++i) {
        var s = corpus.symbols[i];
        if (s.kind !== "function") { continue; }
        var pname = partnerName(s.name);
        if (!pname) { continue; }
        var partner = corpus.lookup(pname);
        if (!partner) { continue; }
        s.doc = {
            sees: [{
                children: [{
                    kind: "reference",
                    literal: pname,
                    id: partner.id
                }]
            }]
        };
    }
}
addons/extensions/parse_format_relates.lua
-- Cross-link symmetric IO helpers from a Lua extension.
--
-- Every `parse_X` and `format_X` free function gets a `@see` entry
-- pointing at its partner. The result is that each function's
-- rendered page carries a "See Also" link to the other one, without
-- anyone writing `@see` by hand.

local function partnerName(name)
    if name:sub(1, 6) == "parse_" then
        return "format_" .. name:sub(7)
    end
    if name:sub(1, 7) == "format_" then
        return "parse_" .. name:sub(8)
    end
    return nil
end

function transform_corpus(corpus)
    for _, s in ipairs(corpus.symbols) do
        if s.kind == "function" then
            local pname = partnerName(s.name)
            if pname then
                local partner = corpus.lookup(pname)
                if partner then
                    s.doc = {
                        sees = {
                            {
                                children = {
                                    { kind = "reference",
                                      literal = pname,
                                      id = partner.id }
                                }
                            }
                        }
                    }
                end
            end
        end
    end
end
Preview
request

An HTTP request as a structured value.

Synopsis

Declared in <parse‐format‐relates.cpp>

struct request;
Non-Member Functions

Name

Description

format_request

Format r as the wire‐format text of an HTTP request.

parse_request

Parse text into a request. Returns a valid request on success.

user

A user record.

Synopsis

Declared in <parse‐format‐relates.cpp>

struct user;
Non-Member Functions

Name

Description

format_user

Format u as the canonical wire‐format text of a user.

parse_user

Parse text into a user.

format_request

Format r as the wire‐format text of an HTTP request.

Synopsis

Declared in <parse‐format‐relates.cpp>

char const*
format_request(request const& r);
Parameters

Name

Description

r

An HTTP request as a structured value.

See Also
format_user

Format u as the canonical wire‐format text of a user.

Synopsis

Declared in <parse‐format‐relates.cpp>

char const*
format_user(user const& u);
Parameters

Name

Description

u

A user record.

See Also
parse_request

Parse text into a request. Returns a valid request on success.

Synopsis

Declared in <parse‐format‐relates.cpp>

request
parse_request(char const* text);
Return Value

An HTTP request as a structured value.

See Also
parse_user

Parse text into a user.

Synopsis

Declared in <parse‐format‐relates.cpp>

user
parse_user(char const* text);
Return Value

A user record.

See Also

The two-pass shape (index, then look up) is the idiom whenever a write needs to refer to a symbol the script hasn’t yet seen during the walk.

Notice in this example that s.doc.sees receives a list of polymorphic types that represent a paragraph in s.doc.sees.children. These polymorphic objects accept an object with a kind: selector that names the concrete derived class to construct.