A readthedocs page is handy when you wish to auto-generate online documentation for a package, such as in package releases or submitting code for peer review. Setting up a readthedocs page is well-documented online, with many step-by-step walkthroughs and guides. However, sometimes it is not straightforward and troubleshooting problems can be time-consuming.

I'm going to outline the steps and focus on the less intuitive aspects of setting up readthedocs pages, not only for others to benefit from, but also for myself when I have to do this again in the future. I would suggest reading this guide alongside the official walkthrough on the readthedocs webpages.

Formatting script documentation

So the first thing is to make sure that your scripts, or package, are held in a repository on Github and all of the documentation in your scripts are compatible for importing into a readthedocs page.

The standard style for the documentation is reStructuredText. I had previously used reStructuredText to write the documentation for a package I released called PyTrx. Whilst it was reasonbly straightforward, it looks pretty ugly in local scripts.

# Example from PyTrx, https://github.com/PennyHow/PyTrx

def getOGRArea(pts):
    """Get real world OGR polygons (.shp) from xyz poly pts 
    with real world points which are compatible with mapping 
    software (e.g. ArcGIS).

    :param pts: UV/XYZ coordinates of a given area shape
    :type pts: arr 
    :returns: List of OGR geometry polygons
    :rtype: list                           
    """                                           
    ring = ogr.Geometry(ogr.wkbLinearRing)
    for p in pts:
        if np.isnan(p[0]) == False:
            if len(p)==2:
                ring.AddPoint(int(p[0]),int(p[1]))
            else:                  
                ring.AddPoint(p[0],p[1],p[2])
    poly = ogr.Geometry(ogr.wkbPolygon)
    poly.AddGeometry(ring)
    return poly

I have used this before for auto-generated documentation and I would not recommend it. There are alternatives that look much nicer, such as Numpy docstrings, which is what I used on my most recent project.

# Example from pyBiblyser

def fetchAltmetrics(doi):
    """Fetch altmetrics from DOI
    
    Parameters
    ----------
    doi : str                           
      DOI string to search with
    
    Returns
    -------
    result : dict
      Altmetrics result
    """
    api = 'https://api.altmetric.com/v1/doi/'
    response = requests.get(api + doi)
    if response.status_code == 200:
        result = response.json()
    return result 

It looks a lot nicer in local scripts, and the set-up for auto-generating documentation from the Numpy docstrings style is relatively simple (which we will look at later). There is also the option to use Google docstrings, which is also another visually pleasing alternative (see an example here), although it does not have all of the functionality of Numpy docstrings, such as generating example scripts within the documentation.

Initialising and populating the documentation pages

There are different packages that can be used to initialise the local build of the readthedocs files, namely Sphinx and MkDocs. I have generally gone with Sphinx, but MkDocs looks relatively similar and the workflow feels familiar.

In the case of using Sphinx, it firstly needs to be installed using pip, following which you can initialise the build. It is best to make a new folder called 'docs' in the active directory where your scripts can be found.

pip install sphinx

mkdir docs

sphinx-quickstart

There is a prompt for several set-up parameters when initialising the build. One that I couldn't find much information about was the option of having source and build as separate directories. I had not come across this a couple of years ago when using Sphinx, so I think this is a new feature - by opting out of separate directories, the source files are automatically placed in the top directory of the 'docs' folder.

Now, I am sure this is fine, but I wasn't sure if this was a problem for me later on when auto-generating documentation. So I kept the source and build as separate directories on this occassion. Once initialised, you can populate the source directory with pages (as .rst reStructuredText files). An Index.rst file should be created, which will be the radthedocs front page of the package and is where links to all your pages can be outlined.

.. Index.rst contents

pyBiblyser
==========

.. toctree::
   :maxdepth: 2
   :caption: Contents:
   
   installation
   guide
   diversityindex
   modules
   acknowledgements

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

So in this case, the package is called PyBiblyser, and a contents page will link to a page on installation, a package guide, information about the diversity index, an outline of the package modules, and acknowledgements. Links to the auto-generating documentation index will appear below this, along with a search tool

Configuring the build

The configuration file is the main hub for your pages build. This is located at docs/source/conf.py (if source and build are in separate directories), or docs/conf.py (if source and build are together). Parameters can be added here to configure the readthedocs pages

I would say the three main components I altered here were the paths for reading the documentation from, the html_theme parameter and the extensions parameter. Paths should be added to the scripts on which auto-documentation should be implemented -- so, in my case, I wanted the paths in the top directory of the git repository to be included, so added this.

# paths from conf.py
sys.path.insert(0, os.path.abspath('../../'))
sys.path.insert(0, os.path.abspath('.'))

The html_theme parameter is linked to the style of the readthedocs pages - there are a number of themes to choose from, with the sphinx_rtd_theme being the classic readthedocs style. Other Sphinx packages can be added with the extensions parameter in order to configure components such as how auto-documentation is generated.

 # Extensions from conf.py
extensions = [
    'sphinx.ext.autodoc',	     # To generate autodocs
    'sphinx.ext.mathjax',           # autodoc with maths
    'sphinx.ext.napoleon'           # For auto-doc configuration
]

napoleon_google_docstring = False   # Turn off googledoc strings
napoleon_numpy_docstring = True     # Turn on numpydoc strings
napoleon_use_ivar = True 	     # For maths symbology

There are many extensions to Sphinx (see here for the list), but the autodoc extension is essential for generating auto-documentation - do not forget it! In my case, I opted for auto-documentation to be generated from Numpy docstrings in my scripts.

Importing and building in readthedocs

To import from the Github respository to readthedocs, you need to connect the repository to your readthedocs and allow readthedocs to pull from the repository. This is largely straightforward (see guide here) apart from if you are pulling from a private repository in a Github organisation (which is what I was doing). After reading around, it doesn't appear like this is possible at the moment, so I had to make the repository public. From here, I could generate a webhook for the repository, which allows the contents to be pulled to readthedocs.

Successfully building the readthedocs from the Github repository was probably the mosy fiddly and time-consuming part for me. The auto-documentation is generated by running the package on a virtual machine; therefore, you have to specify how to run the package with two files - a readthedocs setup file (.yaml. or .yml) in the top directory; and a file containing requirements (.yaml, .yml or .txt)

The readthedocs.yaml file specifies the virtual machine build. I kept this quite simple, but I understand this can cause problems in the future - it is best to specify the build (for instance, the operating system) to best copy the environment that the package has been run in, and there are plenty of options, as documented here.

# File .readthedocs.yaml

build:
  image: latest

python:
  version: 3.7

requirements_file: docs/requirements.txt

So in this instance, I have specified to build a Python 3.7 environment (as this is what I had coded the package in) with a set of requirements that I have provided the path to. The requirements file can either be a .txt list, or an environment .yml/.yaml file. These requirements should be the same as your package dependencies. I didn't list the package versions as I was having problems with the virtual environment build, but generally you should specify the version, or the minimum version, of each package.

 
# File docs/requirements.txt

bs4
numpy
pandas
pip
gender-guesser
habanero
pybliometrics
scholarly

Troublshooting

You can check the status and troubleshoot problems effectively by checking the build log, as described here. A common problem is errors in the imported dependencies - I had this problem and found that I needed to also specify pip as a requirement otherwise pip install could not be used.

Another common problem is that the auto-documentation does not generate because it "cannot see" your documented scripts, in which case you need to check the specified paths in the conf.py file.

Continue to build the readthedocs pages until you see the auto-documentation populated in the index and module index. This can be time-consuming to wait for each build, troubleshoot, edit, and try again - at least, I found it time-consuming. However, it is worth it.

From here on, whatever you change in your package documentation will be reflected in your readthedocs. Enjoy!


Further reading

The readthedocs homepage, including a simple walkthrough

This post by Brendan Hasz nicely details the set-up

Another detailed walkthrough

Numpydoc style guide

This Stack Overflow forum on troubleshooting when autodoc is not rendering on readthedocs

Specifications for the configuration file