Recently, I have had a paper accepted which presents PyTrx, a new Python toolset for use in glacial photogrammetry. Over the course of getting this published, it has been suggested by co-authors and reviewers alike to use a package manager for easy download and implementation of PyTrx. I therefore wanted to package the toolset up for distribution via PyPI ('pip'), thus making is easily accessible to other Python users with the simple command pip install pytrx. Whilst I found the tutorials online informative, there were some pitfalls which I found hard to solve with the given information. So here is an account of how I got my package on PyPI. The associated files for the PyTrx package are available on a branch of PyTrx's GitHub repository,</strong> if you want to see this walkthrough in action.

Defining the package files

First and foremost, the file structure of the toolset is crucial to it being packaged up correctly. The top directory should contain a folder containing your package, and several other files containing the necessary setup information:

 master_folder
   - PyTrx
   - LICENSE.txt
   - README.md
   - setup.py 

This is one of the first slip-ups I made, putting all my toolset scripts in the tol directory rather than a folder of its own. If the Python scripts that make your package are not placed in their own folder then they will not be found when it comes to compiling the package.

So let's go through each of these elements, beginning with the folder that contains the Python scripts we wish to turn into a PyPI package. An initialisation file needs to be created in this folder in order to import the directory as a package. This is simply an empty Python script called __init__.py, so our folder structure will look a bit like this now:

 master_folder
   - PyTrx
       - __init__.py
   - LICENSE.txt
   - README.md
   - setup.py 

Moving on to the LICENSE.txt file, it is important to define a license with any Python package that is publicly distributed in order to inform the user how your package can be used. This can simply be a text file containing a copied license. A straightforward and popular license for distributing code is the MIT license which allows code to be used and adapted with appropriate credit, but there are greate guides for choosing a license appropriate for you online (e.g. choosealicense.com). This file has to be called 'license' or 'licence' (uppercase or lowercase) so that it is recognised when it comes to compiling the package.

Similarly with the README.md file, this has to be called 'readme' specifically so that it is recognised when it comes to compiling the package. This file contains a long description of the Python package. It might be the case that you already have a README.md file if you have hosted your scripts on GitHub, in which case you can merely adopt this as your readme. Just remember that this should be hardcoded in HTML code, and the readme file will form the main description of your package that people will read when they navigate to the package's PyPI webpage.

And finally the setup.py. The setup file is probably the trickiest file to define, but the most important as here we outline all of the metadata associated with our Python package; including the package's recognised pip name (i.e. the one used in the command pip install NAME), its version, author and contact details, keywords, short package description, and dependencies. Here is PyTrx's setup.py file to serve as an example:

import setuptools

with open("README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="pytrx", 
    version="1.1.0",
    author="Penelope How",
    author_email="pennyruthhow@gmail.com",
    description="An object-oriented toolset for calculating velocities, surface areas and distances from oblique imagery of glacial environments",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/PennyHow/PyTrx",
    keywords="glaciology photogrammetry time-lapse",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Development Status :: 5 - Production/Stable",
        "Intended Audience :: Science/Research",
        "Natural Language :: English",
        "Operating System :: OS Independent",
    ],
    install_requires=['glob2', 'matplotlib', 'numpy', 'opencv-python>=3', 'pillow', 'scipy'],
    python_requires='>=3',
)

Most of the variables are straightforward to adapt for your own package setup.py file. The ones to watch out for are the classifiers variable where metadata flags are defined, and the install_requires variable where the package's dependencies are. PyPI offers a good resource that lists all of the possible classifiers you can add to the classifiers variable.

Finding out how to define dependencies was a little trickier though, as the main PyPI tutorial does not address this. This page gave a brief outline of how to define them with the install_requires variable, but I found that I still had problems in the subsequent steps with package incompatibilities. My main problem was that I had largely worked with conda rather than pip for managing my Python packages, so there were a number of discrepancies between the two in configuring dependencies with PyTrx. My main challenge was finding a balance with OpenCV and GDAL, two notoriously difficult packages to find compatible versions for - I had managed this with conda, finding two specific versions of these packages to configure a working environment. In pip, I found this proved much harder. The package versions used in conda were not the same for pip, and there wasn't an official repository for OpenCV, only an unofficial repository called opencv-python. We'll learn more about testing dependency set-ups a bit later on, but for now, just be aware to check that each PyPI package dependency is available and use the >= or <= to define if the package needs to be above or below a certain version. It is generally advised not to pin a dependency to a specific version (i.e. ==), I guess because it reduces the flexibility of the package installation for users.

Generating the distribution files

Once we have all of our files, we can now compile our package and generate the distribution files that will be eventually uploading to TestPyPI and PyPI. It is advised to use TestPyPI to test your package distribution before doing the real deal on PyPI, and I found it incredibly useful as an apprehensive first-time uploader.

If you do decide to test your package on TestPyPI, it is good etiquette to change the name of your package (defined in setup.py) to something very unique - there are many test packages on TestPyPI, and although they delete test packages on a regular basis, there are plenty of package names that yours could clash with. In the case of PyTrx, I defined the package name as pytrxhow (the package name with my surname), that way there was no chance of using a name that had already been taken. Additionally, you should take your dependencies out of the setup.py file as often the same packages do not exist on TestPyPI and therefore are not an accurate reflection of how your package dependencies will look on PyPI.

To generate the distribution files, two packages need to be installed into your Python environment, setup-tools and wheel. I already had versions of these packages in my conda environment, but I updated them using the same command (in Anaconda Prompt) as if I wanted to install them:

conda install setup-tools wheel

After these are installed, navigate to the directory where all of your files are (i.e. in master_folder) using the cd command, and run the following command to build your distribution files for TestPyPI:

python3 setup.py sdist bdist_wheel

This should generate two folders containing files that look something like this:

master_folder
   - PyTrx
       - __init__.py
   - LICENSE.txt
   - README.md
   - dist
       - pytrx-1.1.0-py3-none-any.whl
       - pytrx-1.1.0.tar.gz
   - pytrx.egg-info
       - PKG-INFO
       - SOURCES.txt	
       - dependency_links.txt	
       - requires.txt	
       - top_level.txt
   - setup.py 

The dist and egg-info folder should contain all of the information inputted into the setup.py file, so it's a good idea to check through these to see if the files are populated correctly. The SOURCES.txt file should contain a list of the paths to all of the relevant files for making your packages. If you have taken out your dependencies, then the requires.txt file should be empty.

Testing the distribution

There are two ways to test that the distribution files work: 1. using TestPyPI to trial the distribution and the 'look' of the PyPI entry, and 2. using the setup.py file to test the package installation in your local environment (including dependency solving). Beginning with the test on TestPyPI, start by creating an account on TestPyPI and creating an API token, so you can securely upload the package (there is a great set of instructions for doing this here). Make sure to write down all of the information associated with the token as you will not be able to see it again.

Next, make sure that you have an up-to-date version of the twine package in your environment. Twine is a Python package primarily for uploading packages, which can easily installed/upgraded in a conda environment with the following command:

conda install twine

Now, Twine can be used to facilitate the upload of your package to TestPyPI with this command (making sure that you are still in your master_folder directory:

python3 -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*

Once the command has run, there will be a link to your TestPyPI repository at the bottom which you can click on to take you to it. You can use this to test install your package with no dependencies. In the case of PyTrx (pytrxhow, my test version), this could be done with the following command (just change 'pytrxhow' to specify a different package):

pip install -i https://test.pypi.org/simple/ pytrxhow 

This is all well and good for testing how a package looks on PyPI and testing it can install, however, I was more anxious about the package dependencies knowing the issues I had with OpenCV and GDAL previously in my conda environment. After checking your TestPyPI installation (and this may take a few tries, updating the version number every time), put your dependencies back into your setup.py file, run the distribution file generation again, and test the dependency configuration with the following command that will attempt to install your package locally:

python setup.py develop

This may take some time to run, but should give you an idea as to whether the dependencies can be resolved. I cloned my base conda environment in order to do this, giving a (relatively) blank environment to run off, and tested the installation by attempting to import the newly installed package in Spyder.

I found that I could not solve the environment, no matter what I specified in setup.py, and therefore had to play around with which package was causing the majority of the problems. I found that GDAL was the main cause of PyTrx unsuccessfully installing, so took it out of my dependencies, instead opting to install it after with conda. This seems to work much better, and although may not be a perfect solution, it will create fewer problems for users.

Uploading the distribution to PyPI

So at this point you should feel confident in the look and feel of your package, and its installation in your environment. Before proceeding with the final steps, just run through the following checklist to make sure you have everything:

  • Check that all the information is correct in the setup.py, changing the name (e.g. 'pytrxhow' to 'pytrx') and dependencies if you have been uploading to PyPI previously
  • If you change anything in the setup.py file, then run the distribution file generation again
  • Check your TestPyPI page to make sure all the information uploaded is correct and nothing is missing
  • Check on PyPI that there is no other package with the same name as yours

A thorough check is needed at this stage because an upload to PyPI cannot be changed. Further package versions can be uploaded if there is a major problem, but versions that you have uploaded cannot be edited or altered. Therefore it is best to try and get it right the first time. No pressure!

For uploading to PyPI, you need to create an account on PyPI. This account creation is separate to TestPyPI, so another username and password unfortunately. Again, create an API token in the same manner as done previously with TestPyPI, making sure to write down all of the details associated with it. To upload your package to PyPI, we are using Twine again and the following command:

twine upload dist/*

Once run, there will be a link to click through to your PyPI page and voila, your package is online and easy for anyone to download with the old classic command (in the case of PyTrx):

pip install pytrx

In the case of PyTrx, our PyPI page is available to view here, and our GitHub repository contains all of PyTrx's scripts and the distribution files used for the PyPI upload, which might be useful for some. Hope this helps someone who has suffered any of the pitfalls of PyPI packages! 

Icebergs in Nuuk


Useful resources:

A broad overview and use of Test PyPI

Uploading to PyPI

More information about specifying dependencies and testing package installations

More information about PyPI classifiers

PyTrx’s PyPI page, GitHub repository, and publication