This page contains the details of a technical writing project accepted for Google Season of Docs.
Project summary
- Open source organization:
- Matplotlib
- Technical writer:
- brunobeltran
- Project name:
- Improving feature discoverability by standardizing documentation of “implicit” types
- Project length:
- Long running (5 months)
Project description
Motivation
Historically, matplotlib's API has relied heavily on string-as-enum
""implicit types"". Besides mimicking matlab's API, these parameter-strings allow the
user to pass semantically-rich values as arguments to matplotlib functions
without having to explicitly import or verbosely prefix an actual enum value
just to pass basic plot options (i.e. plt.plot(x, y, linestyle='solid')
is
easier to type and less redundant than something like plt.plot(x, y,
linestyle=mpl.LineStyle.solid)
).
Many of these string-as-enum implicit types have since evolved more
sophisticated features. For example, a linestyle
can now be either a string
or a 2-tuple of sequences, and a MarkerStyle can now be either a string or a
matplotlib.path.Path
. While this is true of many implicit types, MarkerStyle
is the only one (to my knowledge) that has the status of having been upgraded to
a proper Python class.
Because these implicit types are not classes in their own right, Matplotlib has
historically had to roll its own solutions for centralizing documentation and
validation of these implicit types (e.g. the docstring.interpd.update
docstring
interpolation pattern and the cbook._check_in_list
validator pattern,
respectively) instead of using the standard toolchains provided by Python
classes (e.g. docstrings and the validate-at-__init__
pattern,
respectively).
While these solutions have worked well for us, the lack of an explicit location
to document each implicit type means that the documentation is often difficult
to find, large tables of allowed values are repeated throughout the
documentation, and often an explicit statement of the scope of an implicit
type is completely missing from the docs. Take the plt.plot
docs, for
example: in the ""Notes"", a description of the matlab-like format-string styling
method mentions linestyle
, color
, and markers
options. There are
many more ways to pass these three values than are hinted at, but for many
users, this is their only source of understanding about what values are possible
for those options until they stumble on one of the relevant tutorials. A the
table of Line2D
attributes is included in an attempt to show the reader what
options they have for controlling their plot. However, while the linestyle
entry does a good job of linking to Line2D.set_linestyle
(two clicks
required) where the possible inputs are described, the color
and markers
entries do not. color
simply links to Line2D.set_color
, which fails to
offer any intuition for what kinds of inputs are even allowed.
It could be argued that this is something that can be fixed by simply tidying up the individual docstrings that are causing problems, but the issue is unfortunately much more systemic than that. Without a centralized place to find the documentation, this will simply lead to us having more and more copies of increasingly verbose documentation repeated everywhere each of these implicit types is used, making it especially more difficult for beginner users to simply find the parameter that they need. However, the current system, which forces users to slowly piece together their mental model of each implicit type through wiki-diving style traversal throughout our documentation, or piecemeal from StackOverflow examples, is also not sustainable.
End Goal
Ideally, any mention of an implicit type should link to a single page that describes all the possible values that type can take, ordered from most simple and common to most advanced or esoteric. Instead of using valuable visual space in the top-level API documentation to piecemeal enumerate all the possible input types to a particular parameter, we can then use that same space to give a plain-word description of what plotting abstraction the parameter is meant to control.
To use the example of linestyle
again, what we would want in the
LineCollection
docs is just:
- A link to complete docs for allowable inputs (a combination of those found in
Line2D.set_linestyle
and the linestyle tutorial). - A plain words description of what the parameter is meant to accomplish. To matplotlib power users, this is evident from the parameter's name, but for new users this need not be the case.
The way this would look in the actual LineCollection
docs is just
python
""""""
linestyles: `LineStyle` or list thereof, default: :rc:`lines.linestyle` ('-')
A description of whether the stroke used to draw each line in the collection
is dashed, dotted or solid, or some combination thereof.
""""""
where the LineStyle
type reference would be resolved by Sphinx to point
towards the a single, authoritative, and complete set of documentation for how
Matplotlib treats linestyles.
Benefits
Some powerful features of this approach include
- Making the complete extent of what each function is capable of obvious in plain text (with zero clicks required).
- Making the default option visible (with zero clicks). Seeing default option is often enough to jog the memory of returning users.
- Make a complete description of the ""most common"" and ""easiest"" options for a parameter easily available when browsing (with a single click).
- Make the process of discovering more powerful features and input methods as easy as ""scroll down"" to see more advanced options (with still only one click).
- Provide a centralized strategy for linking top-level ""API"" docs to the relevant ""tutorials"".
- Avoid API-doc-explosion, where scanning through the many possible options to each parameters makes individual docstrings unwieldy.
Other benefits of this approach over the current docs are:
- Docs are less likely to become stale, due to centralization.
- Canonicalization of many of matplotlib's ""implicit standards"" (like what is a ""bounds"" versus an ""extents"") that currently have to be learned by reading the code.
- The process would highlight issues with API consistency in a way that can be more easily tracked via the GitHub issues tracker, helping with the process of improving our API.
- Faster doc build times, due to significant decreases in the amount of text needing to be parsed.
Implementation
The improvements described above will require two major efforts for which a dedicated technical writer will be invaluable. The first is to create one centralized ""tutorial"" page per implicit type. This will require working with the core developer team to identify a concrete list of implicit types whose documentation would be valuable to users (typically, because they contain powerful, hidden features of our library whose documentation is currently only found in difficult-to-stumble-across tutorials). For each implicit type, I will then synthesize the various relevant tutorials, API docs, and example pages into a single authoritative source of documentation that can be linked to anywhere that particular type is referenced.
Once the centralized documentation for a given implicit type is complete, the
second major effort begins: replacing existing API documentation with links to
the new documentation, with an eye towards making the experience of actually
using this new documentation as easy as possible, both for those using Python's
built-in help()
utility and for those browsing our documentation online.
While the exact format of the documentation proposed here is subject to change
as this project evolves, I have worked with the Matplotlib core team during
their weekly ""dev calls"" to establish a consensus that the strategy proposed
here is the most expedient, useful, and technically tractable approach to begin
documenting these ""implicit types"" (notes on these
calls are available on hackmd).
I will use the existing ""tutorials"" infrastructure for the initial stages of
creating the centralized documentation for each implicit type, allowing me to
easily reference these pages as follows, without having to create any new public
classes (again, using the LineCollection
docs as an example):
""""""
linestyles: LineStyle or list thereof, default: :rc:`lines.linestyle` ('-')
A description of whether the stroke used to draw each line in the collection
is dashed, dotted or solid, or some combination thereof. For a full
description of possible LineStyle's, see :doc:`tutorials/types/linestyle`.
""""""
Moving forward, we could then easily change how these references are spelled once the core developer team agrees on the best long-term strategy for incorporating our new ""types"" documentation into bona fide Python classes, for example as proposed by me in Matplotlib Enhancement Proposal 30.
Finally, the preliminary list of implicit types that I propose documenting during this Google Season of Docs are:
capstyle
joinstyle
bounds
extents
linestyle
colors
/lists of colors
colornorm/colormap
tick formatters
A living version of this document can be found on our Discourse.