Description: https://images.manning.com/360/480/resize/book/b/e249997-13e6-47d8-acce-3b6c25cc8a53/Hillard-MEAP-HI.png

An excerpt from Publishing Python Packages by Dane Hillard

This article discusses using GitHub Actions inside of CI workflows, using the example of a Python package software product.

Read it if you want to learn more about CI and Python packages.


Take 35% off Publishing Python Packages by entering fcchillard2 into the discount code box at checkout at manning.com.


The continuous integration workflow

Imagine you’ve onboarded several new developers to your project to continue taking on new clients. Your team has spent the last several weeks getting ready for the next release of your package, and you finally shipped the new version earlier in the day. As your team celebrates, the incessant vibration of your phone gives you the sinking feeling that something is wrong. It turns out that the developer who worked on the final changes before release forgot to run the unit tests, and the last change broke a core piece of functionality.

You need a system in place that can run the valuable checks you’ve developed on each change automatically, in an environment where everyone working on the project can confirm their status. These continuous integration systems are another major stride in productivity and confidence in your project as it evolves.

DEFINITION: Continuous integration (CI) is the practice of incorporating changes as often as possible into the main stream of development for a project to minimize the possibility of behavior that diverges from the desired or expected behavior. CI is diametrically opposed to the early practices of large software projects where development might go on for months or years before being merged and released. CI encourages small, incremental changes with the aim to deliver value earlier and more frequently. For in-depth coverage of continuous integration, check out Grokking Continuous Delivery by Christie Wilson and Pipeline as Code by Mohamed Labouardy. 

Most continuous integration workflows consist of the same basic steps, as shown in figure 1. The automatic build and test steps are the gaps in your current process.


Figure 1. A basic continuous integration workflow gives developers an automated feedback loop about their changes.


Because the automated building and testing steps are performed in a shared location, you and your team can verify that a given change works as expected, regardless of any testing steps the author of the change performed locally. This is a key shift: local testing can now focus on writing new tests or updating existing tests in quick iteration, and running the full test suite becomes an optional convenience. Developers have options during their implementation based on their capacity at the moment, instead of being forced to do things one very specific way.

Now that you’re familiar with the basic flow of continuous integration, you can start working toward building one using freely available tools.

Continuous integration with GitHub Actions

Before merging any new code, you decide each change to the project should be verified, recorded, and published using a CI pipeline in a shared environment. This removes any variability due to someone’s local configuration, and prevents the scenario where someone publishes a package version from their computer that never gets incorporated into the code base. Because your team has been using GitHub to host the code base and collaborate on changes, you decide to give GitHub Actions a try.

Other continuous integration solutions

Although I’ve chosen to cover GitHub Actions in this article, it’s just one of a wide variety of options out there. Most continuous integration solutions have strong overlap in their concepts, so learning a different platform is often a matter of understanding their particular lingo.

Some widely-used, cloud-first CI solutions are:

It can be useful to choose one of these if it aligns with your existing choice of cloud provider for personal or organizational work. Jenkins is an open-source solution that typically requires a bit more effort on your part, but might be nice if you want full end-to-end control.

I strongly recommend staying away from Travis CI and I won’t link to it here. Although it used to be one of the most beloved platforms for open source projects, it has suffered from slow feature development, poor communication, security concerns, and a push toward paid plans since its acquisition in 2019.

A high-level GitHub Actions workflow

To work effectively with GitHub Actions, you need to understand the high-level workflow, the GitHub Actions-specific terminology, and the configuration format in the following sections.

In your new pipeline, any time you open a pull request or push new commits to GitHub, the CI pipeline checks out the code from your branch and performs the following in parallel:

  • Check code formatting using black and the format tox environment
  • Lint the code using flake8 and the lint tox environment
  • Type check the code using mypy and the typecheck tox environment
  • Unit test the code using pytest and the default tox environment
  • Build a source distribution using build
  • Build binary wheel distributions using build and cibuildwheel (more on this later in the chapter)

Whenever you tag a commit, the pipeline additionally publishes the distributions to PyPI. Figure 2 depicts this flow at a high level.


Figure 2. A continuous integration pipeline flow for Python packaging using GitHub Actions


You’re locking all the testing and code quality work you did into an automated pipeline. In the future if you change how one of your tox environments works or add a new kind of check, you can add them to your pipeline as well. This investment will pay dividends with each new process you create.

Understanding GitHub Actions terminology

You need to make use of the following GitHub Actions concepts to build your CI pipeline:

  • Workflow: The highest level of granularity for a CI pipeline. You can create multiple workflows that happen in response to different events.
  • Job: A high-level phase you define for a workflow, such as building or testing something.
  • Step: A specific task you define in a job, usually consisting of a single shell command. Steps can also reference other pre-defined GitHub actions, which is useful to build off of common tasks like checking out your code.
  • Trigger: An event or activity that causes a workflow to happen. Even when a workflow is triggered, you can skip jobs in that workflow conditionally with expressions.
  • Expression: One of a set of GitHub-specific conditions and value that you can check to control your CI pipeline.

For now, you need just one workflow consisting of several jobs, some of which run conditionally based on the triggering event. Each job has several similar steps to install dependencies and tools and finally run a task. The workflow is triggered by pull requests and tags you create. Figure 3 shows the same CI pipeline you saw earlier, this time pointing out how these different moving parts map to GitHub Actions concepts.


Figure 3. How different parts of a continuous integration pipeline map to GitHub Actions concepts


Understanding GitHub Actions in depth

Teaching all of what GitHub Actions has to offer is outside the scope of this excerpt, but if you’d like to keep exploring more features you can follow GitHub’s learning materials.

With the terminology in hand, you’re ready to start building a GitHub Actions workflow for your package.

If you haven’t done so yet, now is a good time to bring your project under version control in a Git repository and push it to GitHub. If you aren’t familiar with Git or GitHub, pause here and take some time to familiarize yourself. Their documentation (https://docs.github.com/en/get-started/quickstart/create-a-repo) and Git in Practice by Mike McQuaid are good resources.  

Starting a GitHub Actions workflow configuration

You configure GitHub Actions workflows using YAML. For your workflow, you can use a single YAML file to specify the jobs and steps. Start by creating a new branch in your repository. Create a .github/ directory in the root directory of your project if it doesn’t already exist. Inside the .github/ directory, create a new directory called workflows/. GitHub automatically discovers files with a .yml extension in the .github/workflows/ directory and expects them to be valid workflow definitions.

You can give your workflow configuration file most any name you like, but using the name main.yml is a common practice when a project has only one workflow configured. You can also use a name that indicates the purpose of the workflow, such as packaging.yml. Create an empty configuration file in the .github/workflows/ directory now.

Each GitHub Actions workflow must have at least a few fields:

  1. name: A human-friendly string to display in a few parts of the GitHub interface
  2. on: A list of one or more events that trigger the workflow
  3. jobs: A map of one or more jobs to perform
  4. In turn, a job must have at least a few fields:
  5. Key: A machine-readable string by which to reference the job elsewhere in the pipeline. Often this is a version of the job name that uses only letters and hyphens.
  6. name: A human-friendly string to display in a few parts of the GitHub interface.
  7. runs-on: The type of GitHub Actions runner to use for the job. For your purposes, ubuntu-latest works well. You can see all the available runners in the runs-on documentation.
  8. steps: A list of one or more steps to perform.

Finally, a step may be one of two formats:

  • A reference to a pre-defined action, such as the official checkout action provided by GitHub or by a third party. This format specifies a uses key whose value references the action’s GitHub repository and an optional version string separated by an @ character.
  • A human-friendly name string to display in a few parts of the GitHub interface, and a run field that specifies the command to run.

Listing 1 shows how these pieces fit together into a sample workflow configuration.

Listing 1. A sample YAML configuration for a GitHub Actions workflow that prints a short message

 
 name: My first workflow ❶
  
 on: ❷
   - push
  
 jobs: ❸
   say-hello: ❹
     name: Say Hello ❺
     runs-on: ubuntu-latest ❻
     steps: ❼
       - uses: actions/checkout@v2 ❽
  
       - name: Say Hello ❾
         run: echo "Hello"
  

❶ A human-friendly name for the workflow

❷ The workflow is triggered by pushed code and tags

❸ The jobs for the workflow

❹ A machine-readable key for the job

❺ A human-friendly name for the job

❻ The job uses the latest Ubuntu-based runner

❼ The steps for the job

❽ Use the official checkout action to check out the code

❾ Run a step with a custom name and command

 

When run, the workflow checks out the code of the branch or tag that triggered the workflow, then runs an echo command to say hello. If the triggering push event is a pull request, GitHub Actions reports the pending status near the bottom of that pull request’s page (figure 4).


Figure 4. A pending GitHub Actions workflow displayed at the bottom of a pull request


After the workflow completes, GitHub Actions shows the completed status on the pull request page (figure 5).


Figure 5. A successfully completed GitHub Actions workflow on a pull request


You can click the Details link on a workflow job to see the output of individual steps (figure 6). You can also find all previous job runs on the Actions tab of your repository. GitHub Actions performs some steps of its own before and after the steps you define.


Figure 6. The detailed steps and output for a GitHub Actions workflow job. Some steps are user-defined, and some are built into GitHub Actions.


You can click on a step’s name to expand and view its output, which can be useful in better understanding actions provided by GitHub or a third-party (figure 7).


Figure 7. The output from the official checkout action shows all the steps involved in checking out the code from the triggering branch or tag.


You can also use the output to confirm or debug steps you create yourself, such as ensuring that a logged value is what you expect (figure 8).


Figure 8. Commands specified in a workflow job step are displayed along with their output.


Especially when a workflow fails, browsing these different levels of the GitHub Actions interface becomes important in discovering how to fix the failure. These areas are where you’ll see failing unit tests and messages about improperly formatted code or other code quality issues found by your tools.

Exercise

On a new branch in your repository:

  1. Create a .github/ directory in the root directory of your project if it doesn’t already exist.
  2. Inside the .github/ directory, create a new directory called workflows/.
  3. Create a YAML file for your workflow configuration in the .github/worfklows/ directory. You can name your workflow configuration file what you like, but using the name main.yml is a common practice when a project has only one workflow configured. You can also use a name that indicates the purpose of the workflow, such as packaging.yml.
  4. In your workflow file, add the sample YAML from listing 1.
  5. Commit and push your changes to GitHub.
  6. Open a pull request.

After you complete these steps, you should see GitHub Actions trigger the workflow on your pull requeest. Confirm that the workflow succeeds and performs the steps you defined. Change the echo command to a new string and push a new commit. The workflow should trigger again and the output should reflect your updated string.

Now that you’ve created a working GitHub Actions workflow, you’re ready to add your real tasks to it.

If you want to learn more about the book, check it out on Manning’s liveBook platform here.