Haji Rufai

Posted on May 26

Building an Intelligent CI/CD Pipeline Generator in Python

#github #cicd #devops #python

Every developer has been there: starting a new project and spending an hour configuring CI/CD. Copy-pasting YAML from Stack Overflow, tweaking caching strategies, setting up matrix testing, adding security scanning... it's tedious and error-prone.

What if a tool could analyze your codebase and generate production-ready pipeline configs automatically?

That's exactly what I built with PipeForge — an intelligent CI/CD pipeline generator that supports GitHub Actions, GitLab CI, and Docker.

🔗 GitHub Repository

The Problem

Setting up proper CI/CD involves dozens of decisions:

Which Python versions to test against?
How to cache dependencies efficiently?
Should you add security scanning?
What about multi-stage Docker builds?
How to configure database service containers?

Most developers either copy a basic config and miss best practices, or spend hours crafting the perfect pipeline. PipeForge automates this.

Architecture Overview

┌─────────────────┐     ┌──────────────────┐     ┌────────────────┐
│  Project Dir    │────▶│    Analyzer       │────▶│  Generators    │
│  (your code)    │     │  (detection)      │     │  (output)      │
└─────────────────┘     └──────────────────┘     └────────────────┘
                                                        │
                              ┌──────────────────────────┼───────────┐
                              │                          │           │
                        ┌─────▼─────┐  ┌─────────▼──┐  ┌▼─────────┐
                        │  GitHub   │  │  GitLab    │  │  Docker  │
                        │  Actions  │  │  CI        │  │          │
                        └───────────┘  └────────────┘  └──────────┘

The design follows an Analyzer-Generator pattern: analysis and generation are completely decoupled. You can add new generators (CircleCI, Jenkins, etc.) without touching the analyzer.

Smart Project Analysis

The analyzer walks your project directory and detects:

Category	What's Detected
Languages	Python, JavaScript/TypeScript, Go, Rust, Java
Frameworks	FastAPI, Django, Flask, Express, Next.js, Gin, Spring, Actix
Package Managers	pip, Poetry, npm, Yarn, pnpm, Cargo, Go modules, Maven, Gradle
Test Runners	pytest, Jest, Vitest, Mocha, go test, cargo test, JUnit
Linters	Ruff, Black, ESLint, Prettier, golangci-lint, Clippy
Databases	PostgreSQL, MySQL, SQLite, MongoDB, Redis

Here's the core detection logic for languages:

EXTENSION_MAP = {
    ".py": Language.PYTHON,
    ".js": Language.JAVASCRIPT,
    ".ts": Language.TYPESCRIPT,
    ".go": Language.GO,
    ".rs": Language.RUST,
    ".java": Language.JAVA,
}

def analyze_project(project_path: str) -> ProjectAnalysis:
    root = Path(project_path).resolve()
    analysis = ProjectAnalysis(project_name=root.name, project_path=str(root))

    # Walk directory, skip noise (.git, node_modules, __pycache__)
    for dirpath, dirnames, filenames in os.walk(root):
        dirnames[:] = [d for d in dirnames if d not in SKIP_DIRS]
        for fname in filenames:
            ext = Path(fname).suffix.lower()
            if ext in EXTENSION_MAP:
                lang_counts[EXTENSION_MAP[ext]] += 1

    # Primary language = most files
    sorted_langs = sorted(lang_counts.items(), key=lambda x: x[1], reverse=True)
    for i, (lang, count) in enumerate(sorted_langs):
        analysis.languages.append(LanguageInfo(
            language=lang, file_count=count, is_primary=(i == 0)
        ))

    return analysis

Framework detection goes deeper — it reads file contents:

# Python: check actual imports
py_content = _read_sample_files(root, "*.py", max_files=20)
if any("from fastapi" in c for c in py_content):
    frameworks.append(Framework.FASTAPI)

# Node: check package.json dependencies
pkg = json.loads((root / "package.json").read_text())
deps = {**pkg.get("dependencies", {}), **pkg.get("devDependencies", {})}
if "next" in deps:
    frameworks.append(Framework.NEXTJS)

GitHub Actions Generator

The generator builds optimized workflows with best practices baked in. Here's what a Python project gets:

Dependency Caching:

- name: Cache pip
  uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ matrix.python-version }}-${{ hashFiles('**/requirements*.txt') }}
    restore-keys: ${{ runner.os }}-pip-${{ matrix.python-version }}-

Matrix Testing across Python 3.11 and 3.12 by default.

Database Services — if PostgreSQL is detected, it automatically adds:

services:
  postgres:
    image: postgres:16
    env:
      POSTGRES_USER: test
      POSTGRES_PASSWORD: test
    ports:
      - 5432:5432
    options: --health-cmd pg_isready --health-interval 10s

CodeQL Security Scanning is included by default — a free, powerful static analysis tool from GitHub that catches security vulnerabilities before they reach production.

Docker Generation with Best Practices

PipeForge generates Dockerfiles following production best practices:

Multi-stage builds — separate build and runtime stages to minimize image size
Non-root user — security best practice, runs as appuser
Health checks — built-in container health monitoring
Layer caching — copies dependency files first for better cache utilization

# Stage 1: Build
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Stage 2: Production
FROM python:3.12-slim AS production
WORKDIR /app
RUN groupadd -r appuser && useradd -r -g appuser appuser
COPY --from=builder /install /usr/local
COPY . .
ENV PYTHONDONTWRITEBYTECODE=1
USER appuser
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

For Go projects, PipeForge uses Google's distroless images — the ultimate minimal runtime:

FROM gcr.io/distroless/static-debian12 AS production
COPY --from=builder /app/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]

Config Validation

PipeForge can also validate existing configs — great for catching issues before pushing:

$ pipeforge validate .github/workflows/ci.yml
GitHub Actions validation: ✅ VALID

$ pipeforge validate Dockerfile
Dockerfile validation: ✅ VALID
┌──────────┬──────┬───────────────────────────────────────────┐
│ Severity │ Line │ Message                                   │
├──────────┼──────┼───────────────────────────────────────────┤
│ INFO     │ -    │ No HEALTHCHECK — consider adding one      │
└──────────┴──────┴───────────────────────────────────────────┘

The validator catches:

Missing required fields (on, jobs, steps, runs-on)
Invalid stage references in GitLab CI
Unpinned action versions (using @main instead of @v4)
Missing FROM instructions, :latest tags, missing USER directives

The CLI

Built with Click and Rich, the CLI is intuitive:

# Analyze a project
pipeforge analyze /path/to/project

# Generate configs for all platforms
pipeforge generate . -p github_actions -p gitlab_ci -p docker

# Dry run (preview without writing)
pipeforge generate . --dry-run

# Include deployment
pipeforge generate . --deploy --deploy-provider vercel

# Get JSON output for scripting
pipeforge inspect .

Testing Strategy

116 tests across 6 test modules cover every detection and generation path:

tests/
├── test_analyzer.py        # 45 tests — language, framework, PM, linter, DB detection
├── test_github_actions.py  # 14 tests — workflow generation for all languages
├── test_gitlab_ci.py       #  9 tests — GitLab CI pipeline generation
├── test_docker.py          # 10 tests — Dockerfile, .dockerignore, compose
├── test_validator.py       # 22 tests — YAML, GitHub Actions, GitLab, Dockerfile
└── test_cli.py             # 16 tests — CLI commands and integration

The key insight: use tmp_path fixtures that create realistic project structures:

@pytest.fixture
def python_project(tmp_path):
    (tmp_path / "requirements.txt").write_text("fastapi>=0.100\npytest>=7.0")
    (tmp_path / "main.py").write_text("from fastapi import FastAPI\napp = FastAPI()")
    (tmp_path / "tests" / "test_app.py").write_text("def test_health(): assert True")
    return tmp_path

What I Learned

PyYAML parses on: as boolean True — The YAML spec says bare on is a boolean. GitHub Actions uses it as a key. You need to handle both "on" and True as keys.
Template pattern beats string concatenation — I started with f-strings but moved to a structured approach. For complex YAML generation, building dictionaries and serializing is cleaner.
Detection is harder than generation — Reliably detecting frameworks requires reading actual file contents, not just checking file names. A requirements.txt with flask doesn't mean Flask is used — but from flask import Flask in code does.
Defaults matter more than features — The tool is most useful when its defaults are excellent. Every generated config should work out-of-the-box without tweaking.

Tech Stack

Component	Technology
Language	Python 3.12
CLI	Click + Rich
Templates	Jinja2
Config	PyYAML
Testing	pytest (116 tests)
CI	GitHub Actions

Next Steps

Add CircleCI and Jenkins generators
Template customization via .pipeforge.yml config
GitHub Action that runs PipeForge as a PR check
Plugin system for custom generators

PipeForge is open source — check it out at github.com/hajirufai/pipeforge. Give it a ⭐ if it helps you skip the CI/CD setup tax!

python #devops #cicd #github

DEV Community