Kernel documentation with Sphinx, part 1: how we got here

Benefits for LWN subscribers
The primary benefit fromsubscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

July 6, 2016

This article was contributed by Jani Nikula

The last time LWN looked atformatted kernel documentationin January, it seemed like the merging ofAsciiDoc support for thekernel's structured source-code documentation ("kernel-doc") comments, wasimminent. As Jonathan Corbet, in the capacity of the kernel documentationmaintainer, wrote: "A good-enough solution that exists nowshould not be held up overly long in the hopes that vague ideas forsomething else might turn into real, working code." Sometimes,however, the threat that something not quite perfect might be mergedis enough to motivate people to turn those vague ideas into somethingreal.

In the end,Sphinx andreStructuredText areemerging as the future of Linux kernel documentation, with far moreambitious goals than the original AsciiDoc support patches ever had. Withthe bulk of the infrastructure work now merged to thedocs-nextbranch headed for v4.8, it's a good time to reflect on how this came to happen and give an overviewof the promising future of kernel documentation.

Background

The patches to support lightweight markup (initially usingMarkdown, laterAsciiDoc) in kernel-doc comments were borne out of a desire to write betterdocumentation for the graphics subsystem. One of the goals was to enhancethe in-source graphics subsystem internals documentationfor two main reasons. First, if the documentation is nextto the code it describes, the documentation has a better chance of beingupdated along with the code. Second, if the documentation can be written inplain text rather than DocBook XML, it's more likely to be written in the firstplace.

However, plain text proves to be just a little too plain when youventure beyond documenting functions and types, or if you want togenerate pretty HTML or PDF documents out of it. Adding support forlightweight markup in the kernel-doc comments was the natural thing todo. However, bolting this to the existing DocBook toolchain turned outto be problematic.

As part of the documentation build process, thescripts/kernel-doc script extracts the structured comments andemits them in DocBook format. Thekernel-doc script supports somestructure but fairly little formatting. To fit into this scheme, thelightweight markup support patches causedkernel-doc to invoke an external conversion tool(initiallypandoc, laterasciidoc) on each documentationcomment block to convert them from lightweight markup to DocBook. This waspainfully slow.

Doing the conversion inkernel-doc kept the DocBook pipelineside of things mostly intact and oblivious to any markup, but it addedanother point of failure in the already long and fragile path from commentsto HTML or PDF. Problems with markup and mismatches at each point of conversionmade debugging challenging. The tools involved were not designed towork together and often disagreed about when and how markup should beapplied.

It was clear that this was not the best solution, but at the time itworked and there was nothing else around.

AsciiDoc all-in, muddying the waters

Inspired by Jonathan's article and frustrated by the long documentationbuild times (we were testing the patches in the Intel graphics integrationtree), I had the idea to makekernel-doc output AsciiDocdirectly instead of DocBook. Converting the few structural features in thecomments to AsciiDoc and just passing through the rest was trivial;kernel-docalready supported several output formats with reasonable abstractions. Likemany ideas, this was the obvious thing to do—in retrospect. Suddenly, thisopened the door to writing all of the high-level documents underDocumentation/DocBook in AsciiDoc, embedding the documentationcomments at that level, and getting rid of the DocBook template filesaltogether. This has massive benefits, and Jonathan soon followed up with aproof-of-conceptthat did just that.

There was a little bit of excited buzz around this, with folksexploring, experimenting, and actually trying things out with documentconversion. A number of conversations between interested developers atlinux.conf.au seemed to further confirm that this was the path forward.But, just when it felt like people were settling on switching todoing everything in AsciiDoc, Jonathanmuddied thewaters by taking a hard look at Sphinx as an alternative toAsciiDoc.

Sphinx vs. AsciiDoc

Sphinx is a documentation generator that uses reStructuredText as itsmarkup language, extending and usingDocutils for parsing. BothSphinx and Docutils werecreated in Python to document Python, but documenting C and C++ is alsosupported. Sphinx supports several output formats directly, such as HTML,LaTeX, and ePub, and supports PDF output via either LaTeX or the externalrst2pdf tool.

TheAsciiDoc format, on the other hand, is semantically equivalent to DocBookXML, with the DocBook constructs expressed in terms of lightweightmarkup. AsciiDoc is easier for humans to read and write than XML, but sinceit is designed to translate to DocBook, it fits nicely in front of anexisting DocBook toolchain. The original Python AsciiDoc tool has beenaround for a long time, but has been superseded by a Ruby reimplementationcalledAsciidoctor in recentyears. As far as the AsciiDoc markup goes, Asciidoctor was designed tobe a drop-in replacement, but any extensions are implementation-specificdue to the change in implementation language. Both tools support HTML andDocBook output natively; other output formats are generated from DocBook.

When comparing the markup formats for the purposes of kerneldocumentation, only the table support, which is much needed for the mediasubsystem documentation in particular, was clearly identified as beingsuperior in AsciiDoc. Otherwise, the markup comparison was ratherdispassionate; it really boiled down to the tools themselves and, to someextent, which languages the tools were written in. Indeed, the markups andtools were not independent choices. All the lightweight markups have theirpros and cons.

Superficially, the implementation language of the tools shouldn't playany role in the decision. But it seemed that neither tool would workas-is, or at least we wouldn't be able to get their full potentialwithout extending the tools ourselves. In the kernel tree, there are notools written in Ruby, but there are plenty of tools written inPython. It was fairly easy to lean towards Sphinx in this regard.

If you are looking for flexibility, one great advantage of AsciiDoc isthat it's so closely tied to DocBook. By switching to AsciiDoc, thekernel documentation could reuse the existing DocBook toolchain. Thedownside is that AsciiDoc would add another step in front of the alreadyfragile DocBook toolchain.Dan Allen of Asciidoctor said: "One of thekey goals of the Asciidoctor project is to be able to directly produce awide variety of outputs from the same source (without DocBook)."However, this support isn't quite there yet.

The Asciidoctor project has a promising future. But Sphinx is stable,available now, and fits the needs of the kernel.GrantLikely summed it up this way: "Honestly, in the end I thinkwe could make either tool do what is needed of it. However, my impressionafter trying to do a document that needs to have nice publishable outputwith both tools is that Sphinx is easier to work with, simpler to extend,better supported."In the end, Jonathan's verdict was to go with Sphinx. The patches havebeen merged, and the first Sphinx-based documentation will appear in the4.8 kernel.

Thesecond and final part of this serieswill look into how the kernel's new Sphinx-based toolchain works and how to write documentation using it.

Index entries for this article
Kernel	Documentation
GuestArticles	Nikula, Jani

Kernel documentation with Sphinx, part 1: how we got here

Posted Jul 6, 2016 9:07 UTC (Wed) bygwhaley (guest, #99526) [Link]

Having been sat on the periphery of this long running process, and having some understanding of the tangle that had to be unwound and the intricate and many faceted issues that had to be solved - well done all involved! I think we can look forward to a new era of kernel documentation.

Kernel documentation with Sphinx, part 1: how we got here

Posted Jul 7, 2016 13:07 UTC (Thu) bydomo (guest, #14031) [Link]

Thanks Jani, that was good read while pondering between asciidoc & rst (markdown is
usually no-go due to lack of features or standard (i.e. choose either))

Kernel documentation with Sphinx, part 1: how we got here

Posted Jul 18, 2016 5:16 UTC (Mon) bysachingarg (guest, #38869) [Link] (5 responses)

So, this is the acceptance of Knuth's concept of literate programming, or are we still not there yet?

Kernel documentation with Sphinx, part 1: how we got here

Posted Jul 18, 2016 6:58 UTC (Mon) byneilbrown (subscriber, #359) [Link] (4 responses)

long long way from Knuth's literate programming.

LP wasn't just about writing better comments. It also involved changing the order in which code was written so that ideas could be developed in an order that made sense to the human reader, often quite different to the order that the compiler wants.
This isn't just re-arranging function declarations. It might also mean writing a rough outline of a function with various "blanks", then filling in the blanks one by one after explaining them.

I think literate programing can work very well when the programmer fully understands the problem they are trying to solve and can then present it coherently as a lesson to the reader. A lesson which can be compiled and run to show that it works.
I don't think it works well at all for code which is being built by engineers who are coming to understand the problem as they go (most of us) and for whom the requirements change between the start and end of the project (though of course, that would never happen!).

I think that for code that is under development, having significant documentation in with the code is a mistake as it is very likely to become out of date quickly. Having documentation in with the code only makes sense (to me) once the code has stabilized. Then there is at least some chance that the documentation will be vaguely accurate for more than one day.

Certainly some people can make the effort to update documentation whenever they change the code. Both of the people who do that are worth their weight in gold and I respect them. But I doubt I could ever emulate them.

Kernel documentation with Sphinx, part 1: how we got here

Posted Jul 18, 2016 9:07 UTC (Mon) byjezuch (subscriber, #52988) [Link] (3 responses)

Obviously, it also helps if the person writing it all is a good writer. Most of us suck at this :)

Kernel documentation with Sphinx, part 1: how we got here

Posted Jul 18, 2016 18:26 UTC (Mon) byliw (subscriber, #6379) [Link]

It has, in fact, been my experience that software developers will avoid writing prose longer than a line on IRC. They will got great lengths to avoid it, up to and including standing between decorative bushes of vegetation while wearing camouflage clothing.

It's sad, and not just because it makes those of us who like writing to stand out.

Kernel documentation with Sphinx, part 1: how we got here

Posted Jul 18, 2016 21:30 UTC (Mon) byneilbrown (subscriber, #359) [Link] (1 responses)

I feel compelled to quote some wise words from a favorite novel by Jane Austen:

"My fingers," said Elizabeth, "do not move over this instrument
in the masterly manner which I see so many women's do. They
have not the same force or rapidity, and do not produce the
same expression. But then I have always supposed it to be my
own fault--because I will not take the trouble of practising.

Kernel documentation with Sphinx, part 1: how we got here

Posted Jul 19, 2016 16:10 UTC (Tue) byortalo (guest, #4654) [Link]

Just a quick ref., cause... I knew the woman who knew the man who... and also cause it seems to me it's worth reading or listening (again).

Sorry for not finding a good link to the full (published) paper but the slides are here:http://fose.ethz.ch/slides/parnas.pdf

and you will find a video around here:https://youtu.be/dn8bVhfAv0c

Movatterモバイル変換

Kernel documentation with Sphinx, part 1: how we got here

Background

AsciiDoc all-in, muddying the waters

Sphinx vs. AsciiDoc

Kernel documentation with Sphinx, part 1: how we got here

Kernel documentation with Sphinx, part 1: how we got here

Kernel documentation with Sphinx, part 1: how we got here

Kernel documentation with Sphinx, part 1: how we got here

Kernel documentation with Sphinx, part 1: how we got here

Kernel documentation with Sphinx, part 1: how we got here

Kernel documentation with Sphinx, part 1: how we got here

Kernel documentation with Sphinx, part 1: how we got here