DEV Community

Haji Rufai
Haji Rufai

Posted on

Building an Intelligent CI/CD Pipeline Generator in Python

Every developer has been there: starting a new project and spending an hour configuring CI/CD. Copy-pasting YAML from Stack Overflow, tweaking caching strategies, setting up matrix testing, adding security scanning... it's tedious and error-prone.

What if a tool could analyze your codebase and generate production-ready pipeline configs automatically?

That's exactly what I built with PipeForge β€” an intelligent CI/CD pipeline generator that supports GitHub Actions, GitLab CI, and Docker.

πŸ”— GitHub Repository

The Problem

Setting up proper CI/CD involves dozens of decisions:

  • Which Python versions to test against?
  • How to cache dependencies efficiently?
  • Should you add security scanning?
  • What about multi-stage Docker builds?
  • How to configure database service containers?

Most developers either copy a basic config and miss best practices, or spend hours crafting the perfect pipeline. PipeForge automates this.

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Project Dir    │────▢│    Analyzer       │────▢│  Generators    β”‚
β”‚  (your code)    β”‚     β”‚  (detection)      β”‚     β”‚  (output)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                        β”‚
                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                              β”‚                          β”‚           β”‚
                        β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”  β”Œβ–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚  GitHub   β”‚  β”‚  GitLab    β”‚  β”‚  Docker  β”‚
                        β”‚  Actions  β”‚  β”‚  CI        β”‚  β”‚          β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

The design follows an Analyzer-Generator pattern: analysis and generation are completely decoupled. You can add new generators (CircleCI, Jenkins, etc.) without touching the analyzer.

Smart Project Analysis

The analyzer walks your project directory and detects:

Category What's Detected
Languages Python, JavaScript/TypeScript, Go, Rust, Java
Frameworks FastAPI, Django, Flask, Express, Next.js, Gin, Spring, Actix
Package Managers pip, Poetry, npm, Yarn, pnpm, Cargo, Go modules, Maven, Gradle
Test Runners pytest, Jest, Vitest, Mocha, go test, cargo test, JUnit
Linters Ruff, Black, ESLint, Prettier, golangci-lint, Clippy
Databases PostgreSQL, MySQL, SQLite, MongoDB, Redis

Here's the core detection logic for languages:

EXTENSION_MAP = {
    ".py": Language.PYTHON,
    ".js": Language.JAVASCRIPT,
    ".ts": Language.TYPESCRIPT,
    ".go": Language.GO,
    ".rs": Language.RUST,
    ".java": Language.JAVA,
}

def analyze_project(project_path: str) -> ProjectAnalysis:
    root = Path(project_path).resolve()
    analysis = ProjectAnalysis(project_name=root.name, project_path=str(root))

    # Walk directory, skip noise (.git, node_modules, __pycache__)
    for dirpath, dirnames, filenames in os.walk(root):
        dirnames[:] = [d for d in dirnames if d not in SKIP_DIRS]
        for fname in filenames:
            ext = Path(fname).suffix.lower()
            if ext in EXTENSION_MAP:
                lang_counts[EXTENSION_MAP[ext]] += 1

    # Primary language = most files
    sorted_langs = sorted(lang_counts.items(), key=lambda x: x[1], reverse=True)
    for i, (lang, count) in enumerate(sorted_langs):
        analysis.languages.append(LanguageInfo(
            language=lang, file_count=count, is_primary=(i == 0)
        ))

    return analysis
Enter fullscreen mode Exit fullscreen mode

Framework detection goes deeper β€” it reads file contents:

# Python: check actual imports
py_content = _read_sample_files(root, "*.py", max_files=20)
if any("from fastapi" in c for c in py_content):
    frameworks.append(Framework.FASTAPI)

# Node: check package.json dependencies
pkg = json.loads((root / "package.json").read_text())
deps = {**pkg.get("dependencies", {}), **pkg.get("devDependencies", {})}
if "next" in deps:
    frameworks.append(Framework.NEXTJS)
Enter fullscreen mode Exit fullscreen mode

GitHub Actions Generator

The generator builds optimized workflows with best practices baked in. Here's what a Python project gets:

Dependency Caching:

- name: Cache pip
  uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ matrix.python-version }}-${{ hashFiles('**/requirements*.txt') }}
    restore-keys: ${{ runner.os }}-pip-${{ matrix.python-version }}-
Enter fullscreen mode Exit fullscreen mode

Matrix Testing across Python 3.11 and 3.12 by default.

Database Services β€” if PostgreSQL is detected, it automatically adds:

services:
  postgres:
    image: postgres:16
    env:
      POSTGRES_USER: test
      POSTGRES_PASSWORD: test
    ports:
      - 5432:5432
    options: --health-cmd pg_isready --health-interval 10s
Enter fullscreen mode Exit fullscreen mode

CodeQL Security Scanning is included by default β€” a free, powerful static analysis tool from GitHub that catches security vulnerabilities before they reach production.

Docker Generation with Best Practices

PipeForge generates Dockerfiles following production best practices:

  1. Multi-stage builds β€” separate build and runtime stages to minimize image size
  2. Non-root user β€” security best practice, runs as appuser
  3. Health checks β€” built-in container health monitoring
  4. Layer caching β€” copies dependency files first for better cache utilization
# Stage 1: Build
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Stage 2: Production
FROM python:3.12-slim AS production
WORKDIR /app
RUN groupadd -r appuser && useradd -r -g appuser appuser
COPY --from=builder /install /usr/local
COPY . .
ENV PYTHONDONTWRITEBYTECODE=1
USER appuser
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
Enter fullscreen mode Exit fullscreen mode

For Go projects, PipeForge uses Google's distroless images β€” the ultimate minimal runtime:

FROM gcr.io/distroless/static-debian12 AS production
COPY --from=builder /app/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]
Enter fullscreen mode Exit fullscreen mode

Config Validation

PipeForge can also validate existing configs β€” great for catching issues before pushing:

$ pipeforge validate .github/workflows/ci.yml
GitHub Actions validation: βœ… VALID

$ pipeforge validate Dockerfile
Dockerfile validation: βœ… VALID
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Severity β”‚ Line β”‚ Message                                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ INFO     β”‚ -    β”‚ No HEALTHCHECK β€” consider adding one      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

The validator catches:

  • Missing required fields (on, jobs, steps, runs-on)
  • Invalid stage references in GitLab CI
  • Unpinned action versions (using @main instead of @v4)
  • Missing FROM instructions, :latest tags, missing USER directives

The CLI

Built with Click and Rich, the CLI is intuitive:

# Analyze a project
pipeforge analyze /path/to/project

# Generate configs for all platforms
pipeforge generate . -p github_actions -p gitlab_ci -p docker

# Dry run (preview without writing)
pipeforge generate . --dry-run

# Include deployment
pipeforge generate . --deploy --deploy-provider vercel

# Get JSON output for scripting
pipeforge inspect .
Enter fullscreen mode Exit fullscreen mode

Testing Strategy

116 tests across 6 test modules cover every detection and generation path:

tests/
β”œβ”€β”€ test_analyzer.py        # 45 tests β€” language, framework, PM, linter, DB detection
β”œβ”€β”€ test_github_actions.py  # 14 tests β€” workflow generation for all languages
β”œβ”€β”€ test_gitlab_ci.py       #  9 tests β€” GitLab CI pipeline generation
β”œβ”€β”€ test_docker.py          # 10 tests β€” Dockerfile, .dockerignore, compose
β”œβ”€β”€ test_validator.py       # 22 tests β€” YAML, GitHub Actions, GitLab, Dockerfile
└── test_cli.py             # 16 tests β€” CLI commands and integration
Enter fullscreen mode Exit fullscreen mode

The key insight: use tmp_path fixtures that create realistic project structures:

@pytest.fixture
def python_project(tmp_path):
    (tmp_path / "requirements.txt").write_text("fastapi>=0.100\npytest>=7.0")
    (tmp_path / "main.py").write_text("from fastapi import FastAPI\napp = FastAPI()")
    (tmp_path / "tests" / "test_app.py").write_text("def test_health(): assert True")
    return tmp_path
Enter fullscreen mode Exit fullscreen mode

What I Learned

  1. PyYAML parses on: as boolean True β€” The YAML spec says bare on is a boolean. GitHub Actions uses it as a key. You need to handle both "on" and True as keys.

  2. Template pattern beats string concatenation β€” I started with f-strings but moved to a structured approach. For complex YAML generation, building dictionaries and serializing is cleaner.

  3. Detection is harder than generation β€” Reliably detecting frameworks requires reading actual file contents, not just checking file names. A requirements.txt with flask doesn't mean Flask is used β€” but from flask import Flask in code does.

  4. Defaults matter more than features β€” The tool is most useful when its defaults are excellent. Every generated config should work out-of-the-box without tweaking.

Tech Stack

Component Technology
Language Python 3.12
CLI Click + Rich
Templates Jinja2
Config PyYAML
Testing pytest (116 tests)
CI GitHub Actions

Next Steps

  • Add CircleCI and Jenkins generators
  • Template customization via .pipeforge.yml config
  • GitHub Action that runs PipeForge as a PR check
  • Plugin system for custom generators

PipeForge is open source β€” check it out at github.com/hajirufai/pipeforge. Give it a ⭐ if it helps you skip the CI/CD setup tax!

python #devops #cicd #github

Top comments (0)