NEP 44 — Restructuring the NumPy documentation#
- Author:
Ralf Gommers
- Author:
Melissa Mendonça
- Author:
Mars Lee
- Status:
Accepted
- Type:
Process
- Created:
2020-02-11
- Resolution:
https://mail.python.org/pipermail/numpy-discussion/2020-March/080467.html
Abstract#
This document proposes a restructuring of the NumPy Documentation, both in formand content, with the goal of making it more organized and discoverable forbeginners and experienced users.
Motivation and scope#
Seehere for the front page of the latest docs.The organization is quite confusing and illogical (e.g. user and developer docsare mixed). We propose the following:
Reorganizing the docs into the four categories mentioned in[1], namelyTutorials,How Tos,Reference Guide andExplanations (more about this below).
Creating dedicated sections for Tutorials and How-Tos, including orientationon how to create new content;
Adding an Explanations section for key concepts and techniques that requiredeeper descriptions, some of which will be rearranged from the Reference Guide.
Usage and impact#
The documentation is a fundamental part of any software project, especiallyopen source projects. In the case of NumPy, many beginners might feel demotivatedby the current structure of the documentation, since it is difficult to discoverwhat to learn (unless the user has a clear view of what to look for in theReference docs, which is not always the case).
Looking at the results of a “NumPy Tutorial” search on any search engine alsogives an idea of the demand for this kind of content. Having official high-leveldocumentation written using up-to-date content and techniques will certainlymean more users (and developers/contributors) are involved in the NumPycommunity.
Backward compatibility#
The restructuring will effectively demand a complete rewrite of links and someof the current content. Input from the community will be useful for identifyingkey links and pages that should not be broken.
Detailed description#
As discussed in the article[1], there are four categories of doc content:
Tutorials
How-to guides
Explanations
Reference guide
We propose to use those categories as the ones we use (for writing andreviewing) whenever we add a new documentation section.
The reasoning for this is that it is clearer both fordevelopers/documentation writers and to users where each piece ofinformation should go, and the scope and tone of each document. Forexample, if explanations are mixed with basic tutorials, beginnersmight be overwhelmed and alienated. On the other hand, if the referenceguide contains basic how-tos, it might be difficult for experiencedusers to find the information they need, quickly.
Currently, there are many blogs and tutorials on the internet about NumPy orusing NumPy. One of the issues with this is that if users search for thisinformation they may end up in an outdated (unofficial) tutorial beforethey find the current official documentation. This can be especiallyconfusing, especially for beginners. Having a better infrastructure for thedocumentation also aims to solve this problem by giving users high-level,up-to-date official documentation that can be easily updated.
Status and ideas of each type of doc content#
Reference guide#
NumPy has a quite complete reference guide. All functions are documented, mosthave examples, and most are cross-linked well withSee Also sections. Furtherimproving the reference guide is incremental work that can be done (and is beingdone) by many people. There are, however, many explanations in the referenceguide. These can be moved to a more dedicated Explanations section on the docs.
How-to guides#
NumPy does not have many how-to’s. The subclassing and array ducktyping sectionmay be an example of a how-to. Others that could be added are:
Parallelization (controlling BLAS multithreading with
threadpoolctl, usingmultiprocessing, random number generation, etc.)Storing and loading data (
.npy/.npzformat, text formats, Zarr, HDF5,Bloscpack, etc.)Performance (memory layout, profiling, use with Numba, Cython, or Pythran)
Writing generic code that works with NumPy, Dask, CuPy, pydata/sparse, etc.
Explanations#
There is a reasonable amount of content on fundamental NumPy concepts such asindexing, vectorization, broadcasting, (g)ufuncs, and dtypes. This could beorganized better and clarified to ensure it’s really about explaining the conceptsand not mixed with tutorial or how-to like content.
There are few explanations about anything other than those fundamental NumPyconcepts.
Some examples of concepts that could be expanded:
Copies vs. Views;
BLAS and other linear algebra libraries;
Fancy indexing.
In addition, there are many explanations in the Reference Guide, which should bemoved to this new dedicated Explanations section.
Tutorials#
There’s a lot of scope for writing better tutorials. We have a newNumPy forabsolute beginners tutorial[3] (GSoD project of Anne Bonner). In addition weneed a number of tutorials addressing different levels of experience with Pythonand NumPy. This could be done using engaging data sets, ideas or stories. Forexample, curve fitting with polynomials and functions innumpy.linalg couldbe done with the Keeling curve (decades worth of CO2 concentration in airmeasurements) rather than with synthetic random data.
Ideas for tutorials (these capture the types of things that make sense, they’renot necessarily the exact topics we propose to implement):
Conway’s game of life with only NumPy (note: already inNicolas Rougier’s book)
Using masked arrays to deal with missing data in time series measurements
Using Fourier transforms to analyze the Keeling curve data, and extrapolate it.
Geospatial data (e.g. lat/lon/time to create maps for every year via a stackedarray, likegridMet data)
Using text data and dtypes (e.g. use speeches from different people, shape
(n_speech,n_sentences,n_words))
ThePreparing to Teach document[2] from the Software Carpentry InstructorTraining materials is a nice summary of how to write effective lesson plans (andtutorials would be very similar). In addition to adding new tutorials, we alsopropose aHow to write a tutorial document, which would help users contributenew high-quality content to the documentation.
Data sets#
Using interesting data in the NumPy docs requires giving all users access tothat data, either inside NumPy or in a separate package. The former is not thebest idea, since it’s hard to do without increasing the size of NumPysignificantly.
Whenever possible, documentation pages should use examples from thescipy.datasets package.
Related work#
Some examples of documentation organization in other projects:
These projects make the intended audience for each part of the documentationmore explicit, as well as previewing some of the content in each section.
Implementation#
Currently, thedocumentation for NumPy can beconfusing, especially for beginners. Our proposal is to reorganize the docs inthe following structure:
- For users:
Absolute Beginners Tutorial
main Tutorials section
How Tos for common tasks with NumPy
Reference Guide (API Reference)
Explanations
F2Py Guide
Glossary
- For developers/contributors:
Contributor’s Guide
Under-the-hood docs
Building and extending the documentation
Benchmarking
NumPy Enhancement Proposals
- Meta information
Reporting bugs
Release Notes
About NumPy
License
Ideas for follow-up#
Besides rewriting the current documentation to some extent, it would be idealto have a technical infrastructure that would allow more contributions from thecommunity. For example, if Jupyter Notebooks could be submitted as-is astutorials or How-Tos, this might create more contributors and broaden the NumPycommunity.
Similarly, if people could download some of the documentation in Notebookformat, this would certainly mean people would use less outdated material forlearning NumPy.
It would also be interesting if the new structure for the documentation makestranslations easier.
Discussion#
Discussion around this NEP can be found on the NumPy mailing list:
References and footnotes#
[1](1,2)Diátaxis - A systematic framework for technical documentation authoring
[2]Preparing to Teach (from theSoftware Carpentry Instructor Training materials)
[3]NumPy for absolute beginners Tutorial by Anne Bonner
Copyright#
This document has been placed in the public domain.