Making Readthedocs for a Python package
A readthedocs page is handy when you wish to auto-generate online documentation for a package, such as in package releases or submitting code for peer review. Setting up a readthedocs page is well-documented online, with many step-by-step walkthroughs and guides. However, sometimes it is not straightforward and troubleshooting problems can be time-consuming.
I'm going to outline the steps and focus on the less intuitive aspects of setting up readthedocs pages, not only for others to benefit from, but also for myself when I have to do this again in the future. I would suggest reading this guide alongside the official walkthrough on the readthedocs webpages.
Formatting script documentation
So the first thing is to make sure that your scripts, or package, are held in a repository on Github and all of the documentation in your scripts are compatible for importing into a readthedocs page.
The standard style for the documentation is reStructuredText. I had previously used reStructuredText to write the documentation for a package I released called PyTrx. Whilst it was reasonbly straightforward, it looks pretty ugly in local scripts.
# Example from PyTrx, https://github.com/PennyHow/PyTrx
def getOGRArea(pts):
"""Get real world OGR polygons (.shp) from xyz poly pts
with real world points which are compatible with mapping
software (e.g. ArcGIS).
:param pts: UV/XYZ coordinates of a given area shape
:type pts: arr
:returns: List of OGR geometry polygons
:rtype: list
"""
ring = ogr.Geometry(ogr.wkbLinearRing)
for p in pts:
if np.isnan(p[0]) == False:
if len(p)==2:
ring.AddPoint(int(p[0]),int(p[1]))
else:
ring.AddPoint(p[0],p[1],p[2])
poly = ogr.Geometry(ogr.wkbPolygon)
poly.AddGeometry(ring)
return poly
I have used this before for auto-generated documentation and I would not recommend it. There are alternatives that look much nicer, such as Numpy docstrings, which is what I used on my most recent project.
# Example from pyBiblyser
def fetchAltmetrics(doi):
"""Fetch altmetrics from DOI
Parameters
----------
doi : str
DOI string to search with
Returns
-------
result : dict
Altmetrics result
"""
api = 'https://api.altmetric.com/v1/doi/'
response = requests.get(api + doi)
if response.status_code == 200:
result = response.json()
return result
It looks a lot nicer in local scripts, and the set-up for auto-generating documentation from the Numpy docstrings style is relatively simple (which we will look at later). There is also the option to use Google docstrings, which is also another visually pleasing alternative (see an example here), although it does not have all of the functionality of Numpy docstrings, such as generating example scripts within the documentation.
Initialising and populating the documentation pages
There are different packages that can be used to initialise the local build of the readthedocs files, namely Sphinx and MkDocs. I have generally gone with Sphinx, but MkDocs looks relatively similar and the workflow feels familiar.
In the case of using Sphinx, it firstly needs to be installed using pip, following which you can initialise the build. It is best to make a new folder called 'docs' in the active directory where your scripts can be found.
pip install sphinx
mkdir docs
sphinx-quickstart
There is a prompt for several set-up parameters when initialising the build. One that I couldn't find much information about was the option of having source and build as separate directories. I had not come across this a couple of years ago when using Sphinx, so I think this is a new feature - by opting out of separate directories, the source files are automatically placed in the top directory of the 'docs' folder.
Now, I am sure this is fine, but I wasn't sure if this was a problem for me later on when auto-generating documentation. So I kept the source and build as separate directories on this occassion. Once initialised, you can populate the source directory with pages (as .rst reStructuredText files). An Index.rst file should be created, which will be the radthedocs front page of the package and is where links to all your pages can be outlined.
.. Index.rst contents
pyBiblyser
==========
.. toctree::
:maxdepth: 2
:caption: Contents:
installation
guide
diversityindex
modules
acknowledgements
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
So in this case, the package is called PyBiblyser, and a contents page will link to a page on installation, a package guide, information about the diversity index, an outline of the package modules, and acknowledgements. Links to the auto-generating documentation index will appear below this, along with a search tool
Configuring the build
The configuration file is the main hub for your pages build. This is located at docs/source/conf.py (if source and build are in separate directories), or docs/conf.py (if source and build are together). Parameters can be added here to configure the readthedocs pages
I would say the three main components I altered here were the paths for reading the documentation from, the html_theme parameter and the extensions parameter. Paths should be added to the scripts on which auto-documentation should be implemented -- so, in my case, I wanted the paths in the top directory of the git repository to be included, so added this.
# paths from conf.py
sys.path.insert(0, os.path.abspath('../../'))
sys.path.insert(0, os.path.abspath('.'))
The html_theme parameter is linked to the style of the readthedocs pages - there are a number of themes to choose from, with the sphinx_rtd_theme being the classic readthedocs style. Other Sphinx packages can be added with the extensions parameter in order to configure components such as how auto-documentation is generated.
# Extensions from conf.py
extensions = [
'sphinx.ext.autodoc', # To generate autodocs
'sphinx.ext.mathjax', # autodoc with maths
'sphinx.ext.napoleon' # For auto-doc configuration
]
napoleon_google_docstring = False # Turn off googledoc strings
napoleon_numpy_docstring = True # Turn on numpydoc strings
napoleon_use_ivar = True # For maths symbology
There are many extensions to Sphinx (see here for the list), but the autodoc extension is essential for generating auto-documentation - do not forget it! In my case, I opted for auto-documentation to be generated from Numpy docstrings in my scripts.
Importing and building in readthedocs
To import from the Github respository to readthedocs, you need to connect the repository to your readthedocs and allow readthedocs to pull from the repository. This is largely straightforward (see guide here) apart from if you are pulling from a private repository in a Github organisation (which is what I was doing). After reading around, it doesn't appear like this is possible at the moment, so I had to make the repository public. From here, I could generate a webhook for the repository, which allows the contents to be pulled to readthedocs.
Successfully building the readthedocs from the Github repository was probably the mosy fiddly and time-consuming part for me. The auto-documentation is generated by running the package on a virtual machine; therefore, you have to specify how to run the package with two files - a readthedocs setup file (.yaml. or .yml) in the top directory; and a file containing requirements (.yaml, .yml or .txt)
The readthedocs.yaml file specifies the virtual machine build. I kept this quite simple, but I understand this can cause problems in the future - it is best to specify the build (for instance, the operating system) to best copy the environment that the package has been run in, and there are plenty of options, as documented here.
# File .readthedocs.yaml
build:
image: latest
python:
version: 3.7
requirements_file: docs/requirements.txt
So in this instance, I have specified to build a Python 3.7 environment (as this is what I had coded the package in) with a set of requirements that I have provided the path to. The requirements file can either be a .txt list, or an environment .yml/.yaml file. These requirements should be the same as your package dependencies. I didn't list the package versions as I was having problems with the virtual environment build, but generally you should specify the version, or the minimum version, of each package.
# File docs/requirements.txt
bs4
numpy
pandas
pip
gender-guesser
habanero
pybliometrics
scholarly
Troublshooting
You can check the status and troubleshoot problems effectively by checking the build log, as described here. A common problem is errors in the imported dependencies - I had this problem and found that I needed to also specify pip as a requirement otherwise pip install could not be used.
Another common problem is that the auto-documentation does not generate because it "cannot see" your documented scripts, in which case you need to check the specified paths in the conf.py file.
Continue to build the readthedocs pages until you see the auto-documentation populated in the index and module index. This can be time-consuming to wait for each build, troubleshoot, edit, and try again - at least, I found it time-consuming. However, it is worth it.
From here on, whatever you change in your package documentation will be reflected in your readthedocs. Enjoy!
Further reading
The readthedocs homepage, including a simple walkthrough
This post by Brendan Hasz nicely details the set-up
This Stack Overflow forum on troubleshooting when autodoc is not rendering on readthedocs