Setting Up GitHub Actions for Python: What the Docs Don't Tell You

#githubactions #python #cicd #devops

Three months ago, our team's CI pipeline was a mess. We were running pytest on a five-person Python project using a self-hosted Jenkins server that one of the founding engineers had set up in 2019, and nobody really understood anymore. Build times were hitting 12 minutes, the server would randomly fail to clone repos, and we had a Slack channel called #ci-on-fire that was getting more traffic than #general.

So I spent a weekend migrating everything to GitHub Actions. Two weeks of day-to-day use followed — plus a few Friday afternoon incidents I'd rather forget — and now we're sitting at under three minutes per build, costing us exactly $0 on our open-source repos.

This is what I learned, including the parts that would have saved me several hours if I'd known them upfront.

Your First Real Workflow (Not the Hello World Version)

The official docs will show you a 10-line YAML file that runs pytest. That's fine for a toy project. For a real Python application, you need to think about a few more things upfront: which Python versions you support, how you manage dependencies, whether you're linting before running tests or letting broken code waste CI minutes.

Here's the workflow I actually use for the FastAPI project we've been building:

name: CI

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-22.04
    strategy:
      matrix:
        python-version: ["3.10", "3.11", "3.12"]
      fail-fast: false

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}

      - name: Cache pip dependencies
        uses: actions/cache@v4
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ matrix.python-version }}-${{ hashFiles('**/requirements*.txt') }}
          restore-keys: |
            ${{ runner.os }}-pip-${{ matrix.python-version }}-
            ${{ runner.os }}-pip-

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
          pip install -r requirements-dev.txt

      - name: Lint with ruff
        run: ruff check .

      - name: Run tests with coverage
        run: pytest --cov=. --cov-report=xml --cov-fail-under=80

      - name: Upload coverage to Codecov
        if: matrix.python-version == '3.11'
        uses: codecov/codecov-action@v4
        with:
          token: ${{ secrets.CODECOV_TOKEN }}

Two things I want to explain explicitly. The fail-fast: false on the matrix — without that, if Python 3.10 fails, GitHub immediately cancels the 3.11 and 3.12 jobs. Sometimes that's what you want. More often, you want to see if 3.12 passes cleanly while 3.10 has an unrelated issue. I leave it false and see all results.

The ubuntu-22.04 pin is intentional. Leaving it as ubuntu-latest means your workflow can silently break when GitHub bumps the default to a newer Ubuntu version. This happened to us — a transitive dependency wasn't compatible with ubuntu-24.04 yet, and we spent half a morning convinced there was a code issue before someone noticed the runner change in the logs. Now I pin and update deliberately, on my schedule.

Linting before tests is also intentional. ruff runs in under two seconds on our codebase. There's no reason to waste three minutes running pytest on code with obvious import errors.

Caching: The Part That Actually Matters for Build Speed

Here is the thing: most tutorials either skip caching entirely or get the cache key wrong, leaving you with either constant cache misses or stale dependencies causing confusing failures.

The pattern I use — runner.os + python-version + hash of requirements files — hits the right balance. If your requirements don't change, you get a cache hit. The restore-keys fallback means if you add a new package, you start from the previous cache instead of from scratch, shaving a minute or two off even a miss.

One thing I noticed when I first set this up: I was caching ~/.cache/pip but not seeing the improvement I expected. Turned out my requirements.txt wasn't pinned — it had things like fastapi>=0.100.0, so the hash was stable but pip was still reaching out to PyPI to check for newer versions on every run. After switching to pinned requirements generated with pip-compile from pip-tools, cache hits went from saving about 20 seconds to saving nearly three minutes per matrix leg. That compounds fast when you're running three Python versions on every PR.

If you're on Poetry, the caching setup is different. You want to cache the Poetry virtualenv, and you need virtualenvs.in-project = true so the path is predictable:

- name: Cache Poetry virtualenv
  uses: actions/cache@v4
  with:
    path: .venv
    key: ${{ runner.os }}-venv-${{ matrix.python-version }}-${{ hashFiles('poetry.lock') }}

Using poetry.lock as your hash source is correct — it's the exact dependency snapshot, and it changes whenever anything in the tree changes, including transitive dependencies.

Secrets, Databases, and the Configuration That Actually Works

Managing secrets in GitHub Actions is well designed. You add them in Settings → Secrets and variables → Actions, reference them as ${{ secrets.MY_SECRET }}, and they're masked in logs. That part is fine.

What gets messier is service dependencies. If your test suite hits a real database — and it probably should for integration tests — you need to spin one up. GitHub Actions has first-class support for service containers:

services:
  postgres:
    image: postgres:16
    env:
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: testdb
    ports:
      - 5432:5432
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5

The health check options are critical. Without them, your test step starts before Postgres is actually ready to accept connections, and you get flaky tests you'll spend hours debugging — convinced it's a race condition in your own code. I learned this the slow way. Three days of "sometimes fails, can't reproduce locally" before I noticed the missing health check in the workflow file.

One gotcha with secrets: they're not passed to workflows triggered by pull requests from forks. This is a deliberate security decision — if fork PRs could access your secrets, a malicious contributor could exfiltrate them through a workflow change. For open-source projects where you want CI on fork PRs, you either have a maintainer re-run the workflow from the base repository, or you structure your test suite so the basic tests don't need real credentials at all. We went with the latter — mock external services in unit tests, real services only in integration tests that run post-merge.

The Mistake That Cost Me a Friday Afternoon

Right, so — I'm going to be honest about something embarrassing.

We had a deployment workflow set up. Push to main, tests pass, deploy to staging. I pushed at 4:30pm on a Friday because I was impatient to see the feature live. The tests passed, the deployment step started, and then it failed trying to verify the host key for our server. I'd set up strict host key checking (correct behavior), but two weeks earlier we'd migrated to a new server and forgotten to update the known_hosts secret.

Not catastrophic. Just: re-add the correct known_hosts value, commit, push, watch the workflow, wait another four minutes. By the time everything deployed it was past 6pm.

The lesson isn't "don't push on Fridays." It's that deployment workflows should be separate from test workflows, and you should have a workflow_dispatch trigger for manual runs:

on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      environment:
        description: 'Target environment'
        required: true
        default: 'staging'
        type: choice
        options: [staging, production]

workflow_dispatch is genuinely underused. It turns your workflow into something you can trigger manually from the GitHub UI with optional inputs — useful for deployments, one-off data migrations, scheduled reports, anything where you want the CI machinery without tying it to a specific commit.

Reusable Workflows and Avoiding YAML Sprawl

Once you have more than two or three workflows, duplication becomes a real problem. The Python setup steps are the same, the caching logic is the same, the linting is the same. GitHub Actions has two mechanisms for this: composite actions and reusable workflows.

Composite actions live in your repo under .github/actions/. A setup-python-env composite action that handles setup, caching, and dependency installation means your workflow files shrink from 50 lines to 15, and when you update the caching key strategy, you update it once. I'm not sure this scales cleanly beyond a team of ten working across multiple repos — at some point you probably want a dedicated shared-actions repository. For our five-person monorepo, keeping composite actions in the same repo works fine.

Reusable workflows are different — they let you call an entire .yml file as a job from another workflow. They require an explicit workflow_call trigger and are particularly useful when you have integration tests that three different triggering workflows all need to run. The syntax is uses: ./.github/workflows/integration-tests.yml for same-repo calls, or uses: org/repo/.github/workflows/file.yml@main for cross-repo.

Honestly, the specific pattern matters less than being consistent. Pick one approach before you have five slightly different linting configurations across five workflow files.

What I'd Actually Recommend

Starting from scratch? Use the matrix workflow above, pin your Ubuntu version, set up dependency caching with pinned requirements, and add service containers for whatever databases your tests need. That covers the majority of what you'll need day-to-day.

On dependency management: pip-compile with pinned requirements or Poetry with a lockfile. Don't mix strategies mid-project. The reproducibility and caching benefits compound.

Deployments should be in a separate workflow from tests. Use workflow_run to trigger deployment after the test workflow completes, and workflow_dispatch for any case where you want manual control over when something runs.

Self-hosted runners vs GitHub-hosted: GitHub-hosted is good enough for almost all Python projects. The 2-core, 7GB RAM machines handle pytest suites in the thousands without breaking a sweat. If you're doing GPU training or heavy native compilation, then yes, self-hosted makes sense. Otherwise the operational overhead isn't worth it — you'd spend more time maintaining runner infrastructure than you'd save in CI minutes. That's the same trap we were in with Jenkins, just with shinier branding.

The best thing about GitHub Actions, honestly, is that it's just YAML with predictable behavior. When something breaks, I can usually figure out why within five minutes of reading the logs. That's more than I could say for the Jenkins setup we came from, where debugging often meant SSHing into a server and praying.