Sami Chibani

Posted on Mar 27

CI/CD in the Era of AI and Platform Engineering: A Deep Dive into Dagger CI (Part 4)

#ai #devops #cicd #python

Part 4: The AI-Native CI/CD Stack: Agents, Modules, and Spec-Driven Development

Fixed pipelines for speed and reliability. AI agents to write them and fix them when they break.

In Part 1 we built pipelines as real code. In Part 2 we decoupled them from infrastructure. In Part 3 we built AcmeCorp's private module library (acme-backend, acme-frontend, and acme-deploy) that wraps public daggerverse modules with organization-specific compliance, naming, and security.

Now let's talk about where AI actually belongs in CI/CD, and where it doesn't.

The thesis is simple: AI doesn't replace the pipeline. It writes the pipeline and fixes it when it breaks. The pipeline itself stays fixed, deterministic, and fast.

What Is a Dagger Agent?

Just as container primitives allow us to build CI pipelines, Dagger introduces an LLM() primitive that lets you create agents the same way you'd call any other pipeline function. Under the hood, dag.llm() connects to any supported model (Claude, GPT, Gemini) and gives you a composable builder to layer on system prompts, environment bindings, and tool access.

What makes this powerful is the tool story. Any Dagger module, including the same private modules we built in Part 3, can be exposed as MCP tools that the agent calls at runtime. Your acme-deploy module becomes a cloud_run tool. Your acme-backend module becomes build and test tools. You can also attach any local MCP server (a language linter, a CLI wrapper, a documentation server) alongside those module tools, giving the agent both your custom CI abstractions and third-party capabilities in a single environment.

The result is a tight synergy between modules and agents: modules are the typed, testable building blocks; agents are the orchestration layer that composes them through natural language. You don't choose between writing pipelines and using AI. You write modules once, and agents compose them for you.

LLM provider required. The dag.llm() primitive needs access to a language model. Dagger detects your provider from environment variables — set one of:

Provider API Key Variable Model Variable

Anthropic (Claude) ANTHROPIC_API_KEY ANTHROPIC_MODEL

OpenAI (GPT) OPENAI_API_KEY OPENAI_MODEL

Google (Gemini) GEMINI_API_KEY GEMINI_MODEL

Local (Ollama) OLLAMA_HOST (defaults to http://localhost:11434) OLLAMA_MODEL

In CI, add the key as a repository secret and pass it via env:. Locally, export the variable in your shell before running dagger call. If no provider is detected, agent functions will fail at runtime with a clear error message.

Provider	API Key Variable	Model Variable
Anthropic (Claude)	`ANTHROPIC_API_KEY`	`ANTHROPIC_MODEL`
OpenAI (GPT)	`OPENAI_API_KEY`	`OPENAI_MODEL`
Google (Gemini)	`GEMINI_API_KEY`	`GEMINI_MODEL`
Local (Ollama)	`OLLAMA_HOST` (defaults to `http://localhost:11434`)	`OLLAMA_MODEL`

The Problem With Setting Up CI

We solved the YAML problem in Part 1. Pipelines are real code now. And in Part 3, we went further: toolchains let you install AcmeCorp's private modules as zero-code CI, with dagger check running all your checks from a single dagger.json. No SDK, no .dagger/ directory, no pipeline code.

But there's still a bottleneck: configuring that setup requires knowing the module library. AcmeCorp's platform team maintains a growing set of private modules: acme-backend, acme-frontend, acme-deploy. Each module has its own @check functions, its own parameters, its own DefaultPath conventions. Knowing which modules to install as toolchains, which customizations to add for a monorepo layout, and how to wire the deployment step in GitHub Actions still requires familiarity with the internal module library.

What if you could point an AI agent at your private modules and your source code, and have it generate the complete toolchain setup and CI workflow for you?

Generating the Setup With Daggie

Daggie is a Dagger CI specialist agent. It reads module source code, understands their APIs, and generates the right toolchain configuration for your project. You give it your source directory and the Git URL of your module repository. Daggie discovers all available modules inside it and picks the ones relevant to the assignment.

Let's pick up from where we left off in Part 3. We're in the dagger-ci-demo monorepo (FastAPI backend + Angular frontend), and AcmeCorp's private modules live at github.com/telchak/acme-dagger-modules. AcmeCorp's coding agents (Monty, Angie, Daggie) live at github.com/telchak/daggerverse.

If you still have local changes from Part 3, you can stash them with git stash -u or simply delete the repo and clone it fresh — we want a clean starting point with no existing Dagger configuration.

First, initialize Dagger in the project and write the assignment file:

dagger init

cat > daggie-assignment.md << 'EOF'
Set up this monorepo with toolchains from the module library.
The backend (backend/) is FastAPI/Python, the frontend (frontend/) is Angular.
Install acme-backend, acme-frontend, and acme-deploy as toolchains
with the right source path customizations for this monorepo layout.
Also generate a .github/workflows/ci.yml with dagger check for PRs,
and a deploy step for Cloud Run (backend) and Firebase (frontend) on main.
On check failure, call Monty on the backend and Angie on the frontend
to post inline fix suggestions on the PR.
EOF

Then point Daggie at both repositories — the module library and the daggerverse (so it can discover Monty and Angie's real URLs and versions):

dagger call -m github.com/telchak/daggerverse/daggie@v0.3.0 \
  --module-urls="https://github.com/telchak/acme-dagger-modules.git" \
  --module-urls="https://github.com/telchak/daggerverse.git" \
  assist \
    --assignment-file=./daggie-assignment.md \
    --source=. \
  export --path=.

Daggie clones both repositories and auto-discovers all Dagger modules within them by finding dagger.json files. It reads each module's source code and @check-decorated functions (acme-backend (test, lint), acme-frontend (test, lint, audit), acme-deploy (scan)), detects the monorepo layout, and finds the coding agents (Monty, Angie) with their version tags. It also fetches the latest dagger/dagger-for-github action version automatically. The export --path=. writes the generated dagger.json and .github/workflows/ci.yml to your project root, ready to review, test with dagger check, and commit.

Meet the Agents

Before we look at what Daggie generates, let's introduce the three agents that work together in this setup. They're all Dagger modules, and you call them the same way you call any other module:

Daggie: the CI specialist. It reads your source code and available modules, then generates the toolchain configuration and CI workflow. You've just seen it in action. Daggie writes the setup; it doesn't run in the pipeline.
Monty: the Python coding agent. When a check fails on Python code (a test failure, a lint error, a broken import), Monty reads the error output and the source code, analyzes the root cause, and posts an inline code fix suggestion directly on the pull request.
Angie: the Angular/TypeScript coding agent. Same role as Monty, but for the frontend stack. When an Angular build or test fails, Angie diagnoses the issue and suggests the fix.

The key design: Daggie generates the toolchain setup once. Monty and Angie are called from the CI workflow only when something fails. The happy path (dagger check: lint, test, audit, scan) is pure deterministic module execution with no LLM involved. AI only enters the picture when a human needs help.

The Generated Setup

Here's what Daggie generates. No .dagger/ directory, no SDK, no Python pipeline code. Just a dagger.json with toolchains and a CI workflow. The code blocks below are what Daggie consistently produced as output after 10+ runs with gemini-2.5-pro:

{
  "name": "acme-monorepo",
  "engineVersion": "v0.20.3",
  "toolchains": [
    {
      "name": "backend",
      "source": "github.com/telchak/acme-dagger-modules/acme-backend@v1.0.0",
      "customizations": [
        {
          "function": ["test"],
          "argument": "source",
          "defaultPath": "/backend"
        },
        {
          "function": ["lint"],
          "argument": "source",
          "defaultPath": "/backend"
        }
      ]
    },
    {
      "name": "frontend",
      "source": "github.com/telchak/acme-dagger-modules/acme-frontend@v1.0.0",
      "customizations": [
        {
          "function": ["test"],
          "argument": "source",
          "defaultPath": "/frontend"
        },
        {
          "function": ["lint"],
          "argument": "source",
          "defaultPath": "/frontend"
        },
        {
          "function": ["audit"],
          "argument": "source",
          "defaultPath": "/frontend"
        }
      ]
    },
    {
      "name": "deploy",
      "source": "github.com/telchak/acme-dagger-modules/acme-deploy@v1.0.0",
      "customizations": [
        {
          "function": ["scan"],
          "argument": "source",
          "defaultPath": "/backend"
        }
      ]
    }
  ]
}

Notice what Daggie understood from the project structure and the module library:

Monorepo layout detected. Daggie saw backend/ and frontend/ and added customizations to route each check's source argument to the right subdirectory.
All @check functions discovered. It read the module source, found every function decorated with @check (test, lint on backend; test, lint, audit on frontend; scan on deploy), and installed the modules as toolchains so dagger check picks them all up.
No pipeline code. No dag.container(), no base image selection, no pip install. The private modules encapsulate all of that. The project gets AcmeCorp-compliant CI from a single JSON file.
Deployment stays explicit. cloud_run and firebase are regular functions, not checks. They don't run via dagger check — they're called explicitly from the CI workflow's deploy step, because deployments should be intentional.

Running the Checks

No GCP credentials needed for the checks — they run entirely in containers:

dagger check

Output:

✔ acme-backend:lint    (12.9s)  OK
✔ acme-backend:test    (15.2s)  OK
✔ acme-deploy:scan     (18.4s)  OK
✔ acme-frontend:lint   (58.0s)  OK
✔ acme-frontend:test   (62.0s)  OK
✔ acme-frontend:audit  (25.6s)  OK

Six checks, three toolchains, zero lines of code. All six run in parallel. No tokens consumed. The private modules handled base images, cache volumes, coverage thresholds, and vulnerability scanning — all invisible to the project.

When a Check Fails

Let's say a developer pushes a PR and the backend tests fail:

$ dagger check
✔ acme-backend:lint    (3.1s)   OK
✘ acme-backend:test    (6.4s)   ERROR
┇ .test(
  ┆ source: context ./backend
  ) ›
✘ withExec pytest -v --tb=short --cov=src ...  (4.2s)  ERROR
FAILED tests/test_auth.py::test_validate_token
AssertionError: Expected 401, got 200
auth.py:47 — missing token expiry check

✔ acme-deploy:scan     (18.4s)  OK
✔ acme-frontend:lint   (2.9s)   OK
✔ acme-frontend:test   (9.1s)   OK
✔ acme-frontend:audit  (25.6s)  OK

The CI workflow's failure step kicks in. Monty reads the error output and the source code, analyzes the root cause, and posts an inline code suggestion directly on the PR:

🐍 Monty suggested a fix for backend/auth.py:
def validate_token(token: str) -> bool:
    payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
    if payload["exp"] < time.time():
        raise HTTPException(status_code=401, detail="Token expired")
    return True
The test expects a 401 when the token is expired, but validate_token doesn't check the exp claim. This adds the expiry check before returning.

The developer gets actionable fix suggestions, with code they can accept in one click, instead of a wall of logs to interpret.

Integrating Into CI/CD

Daggie also generates the GitHub Actions workflow. Here's what it produces: dagger check for PRs, deployment on main, and a failure handler that calls Monty or Angie directly:

# .github/workflows/ci.yml (generated by Daggie)
name: CI

on:
  push:
    branches:
      - main
  pull_request:

# Grant permissions for OIDC and for agents to post comments/suggestions
permissions:
  contents: read
  pull-requests: write
  id-token: write

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6

      - name: Run all Dagger checks
        id: checks
        uses: dagger/dagger-for-github@v8.4.1
        with:
          version: "0.20.3" # Must match engineVersion in dagger.json
          verb: check
        # Allow the job to continue on failure so the fix suggestion steps can run
        continue-on-error: true

      - name: Call Monty to suggest fixes for the backend
        if: failure() && steps.checks.outcome == 'failure' && github.event_name == 'pull_request'
        uses: dagger/dagger-for-github@v8.4.1
        with:
          version: "0.20.3"
          verb: call
          args: >-
            -m github.com/telchak/daggerverse/monty@v0.2.0
            suggest-github-fix
            --source=./backend
            --github-token=env:GITHUB_TOKEN
            --pr-number=${{ github.event.pull_request.number }}
            --repo=${{ github.repository }}
            --commit-sha=${{ github.event.pull_request.head.sha }}
            --error-output="${{ steps.checks.outputs.stderr }}"
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          # ANTHROPIC_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY
          # depending on the agent's LLM provider
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

      - name: Call Angie to suggest fixes for the frontend
        if: failure() && steps.checks.outcome == 'failure' && github.event_name == 'pull_request'
        uses: dagger/dagger-for-github@v8.4.1
        with:
          version: "0.20.3"
          verb: call
          args: >-
            -m github.com/telchak/daggerverse/angie@v0.2.0
            suggest-github-fix
            --source=./frontend
            --github-token=env:GITHUB_TOKEN
            --pr-number=${{ github.event.pull_request.number }}
            --repo=${{ github.repository }}
            --commit-sha=${{ github.event.pull_request.head.sha }}
            --error-output="${{ steps.checks.outputs.stderr }}"
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

      - name: Fail the job if checks failed
        if: steps.checks.outcome == 'failure'
        run: exit 1

  deploy:
    runs-on: ubuntu-latest
    needs: check
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    steps:
      - uses: actions/checkout@v6

      - name: Deploy Backend to Cloud Run
        uses: dagger/dagger-for-github@v8.4.1
        with:
          version: "0.20.3"
          verb: call
          args: >-
            deploy cloud-run
            --source=./backend
            --service-name=acme-backend-api
            --team=api-team
            --project-id=${{ vars.GCP_PROJECT_ID }}
            --region=${{ vars.GCP_REGION }}
            --environment=production
            --oidc-request-token=env:ACTIONS_ID_TOKEN_REQUEST_TOKEN
            --oidc-request-url=env:ACTIONS_ID_TOKEN_REQUEST_URL
        env:
          ACTIONS_ID_TOKEN_REQUEST_TOKEN: ${{ secrets.ACTIONS_ID_TOKEN_REQUEST_TOKEN }}
          ACTIONS_ID_TOKEN_REQUEST_URL: ${{ secrets.ACTIONS_ID_TOKEN_REQUEST_URL }}

      - name: Deploy Frontend to Firebase
        uses: dagger/dagger-for-github@v8.4.1
        with:
          version: "0.20.3"
          verb: call
          args: >-
            deploy firebase
            --source=./frontend
            --project-id=${{ vars.GCP_PROJECT_ID }}
            --oidc-request-token=env:ACTIONS_ID_TOKEN_REQUEST_TOKEN
            --oidc-request-url=env:ACTIONS_ID_TOKEN_REQUEST_URL
        env:
          ACTIONS_ID_TOKEN_REQUEST_TOKEN: ${{ secrets.ACTIONS_ID_TOKEN_REQUEST_TOKEN }}
          ACTIONS_ID_TOKEN_REQUEST_URL: ${{ secrets.ACTIONS_ID_TOKEN_REQUEST_URL }}

The check job uses zero LLM tokens. It's pure dagger check — six deterministic checks from three toolchains. The suggest-fix steps only run on failure, calling Monty and Angie directly as Dagger modules (not pipeline functions). The deploy job calls acme-deploy's functions via dagger call on the installed toolchain. You get deterministic, fast CI with intelligent failure handling.

The Developer Experience Shift

Before: Pipeline Specialists

Developer → writes app code
DevOps    → writes pipeline code per stack
DevOps    → maintains deployment scripts per framework
DevOps    → debugs pipeline failures
Everyone  → waits for DevOps

After: Agents Configure and Fix CI

Platform team → builds and maintains private modules (acme-backend, acme-frontend, acme-deploy)
Platform team → builds and maintains agents (Daggie, Monty, Angie)
Developer     → asks Daggie to set up toolchains from the private module library
dagger check  → runs deterministically — no LLM, no tokens
On failure    → coding agents analyze errors and suggest fixes on the PR

The platform team builds the modules and agents. Daggie configures toolchains and generates the CI workflow. dagger check runs fast and deterministic. When things break, coding agents step in with targeted fixes.

From Developer Platform to Agent Factory

So far we've seen how Dagger improves CI performance, maintainability, and developer experience. But there's a larger shift happening.

As coding agents become more capable, the developer's core role is evolving, from pure coder to agent orchestrator. You still need to understand the code, review the output, and make architectural decisions. But more and more of the mechanical work (implementing a well-specified feature, writing tests for existing code, fixing a lint error) can be delegated to agents that understand your codebase.

Follow this evolution to its conclusion, and an Internal Developer Platform starts looking like an Internal Agent Factory: a system that manages not just infrastructure and deployments, but how coding agents are built, composed, and deployed: which agents run on which tasks, with what models, under what constraints, producing what artifacts.

The building blocks are already here. We have:

Typed, testable modules that encapsulate domain expertise (Part 3)
Coding agents that read source code and produce changes (Monty, Angie)
A CI specialist that generates pipelines from module libraries (Daggie)

What's missing is the orchestration layer, something that takes a feature request, breaks it into agent-assignable tasks, and dispatches them through CI. That's Speck.

Spec-Driven Development With Speck

Speck is a Dagger agent that implements spec-driven development, inspired by GitHub's spec-kit methodology. The idea is simple: specifications first, code second.

Given a feature request (either a prompt or a GitHub issue), Speck runs a three-step pipeline:

Specify: generate a structured specification with user stories, acceptance criteria, and requirements
Plan: produce a technical implementation plan grounded in the actual codebase
Decompose: break the plan into ordered, dependency-aware tasks with agent and model assignments

The output is a structured JSON object designed for GitHub Actions fromJson() + matrix strategy consumption. Each task includes a suggested_agent (which Dagger agent should execute it), a suggested_model (which LLM complexity tier it needs), and an order field that defines the execution sequence.

The Pipeline Pattern

When --include-tests and --include-review are enabled, Speck organizes tasks into phases that follow an implement → test → review pipeline:

Phase 1: T001 (implement, haiku) → T002 (test, sonnet) → T003 (review, sonnet)
Phase 2: T004 (implement, sonnet) → T005 (implement, sonnet) → T006 (test, opus) → T007 (review, sonnet)

Phases run in parallel (each on its own CI runner). Tasks within a phase use the prompt chaining pattern, a workflow where the output of one agent becomes the input of the next, forming a sequential pipeline. Concretely, each agent receives a source Directory, modifies it, and exports the result back to the workspace. The next agent in the chain picks up that modified workspace as its input. This is different from running agents independently: the test agent sees the code the implementation agent wrote, and the review agent sees both the implementation and the tests. One PR is created per phase from the accumulated changes.

The model assignment is automatic: Speck maps task complexity to concrete model IDs based on the chosen provider family. Simple config changes get Haiku. Standard feature implementations get Sonnet. Cross-cutting architectural changes get Opus. Test tasks get one tier above their implementation task's complexity, since understanding the implementation requires more context.

Setting Up the Workflow

Let's see this in action. We'll fork a real-world application, the FastAPI RealWorld Example App (a production-like REST API with authentication, articles, comments, and favorites), and turn GitHub Actions into a spec-driven development platform.

Step 1: Fork the repository

gh repo fork nsidnev/fastapi-realworld-example-app --clone
cd fastapi-realworld-example-app

Step 2: Add the Speck workflow

Create .github/workflows/speck.yml:

# Spec-Driven Development with Speck
# Triggers on "speck" label → decompose → execute phases → create PRs

name: Spec-Driven Development

on:
  issues:
    types: [labeled]

permissions:
  contents: write
  pull-requests: write
  issues: write

jobs:
  decompose:
    if: github.event.label.name == 'speck'
    runs-on: ubuntu-latest
    outputs:
      result: ${{ steps.run.outputs.result }}
    steps:
      - uses: actions/checkout@v6
      - uses: dagger/dagger-for-github@v8.4.1
        with:
          verb: version

      - name: Decompose issue into phases
        id: run
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          ANTHROPIC_MODEL: claude-opus-4-6
        run: |
          dagger call -m github.com/telchak/daggerverse/speck@feat/speck \
            --allow-llm=all \
            --source=. \
            decompose \
            --issue-id=${{ github.event.issue.number }} \
            --repository="https://github.com/${{ github.repository }}" \
            --github-token=env:GITHUB_TOKEN \
            --create-pr \
            --include-tests \
            --include-review \
            --agents='[{"name":"monty","source":"github.com/telchak/daggerverse/monty@feat/speck","specialization":"Python backend development","capabilities":["assist","review","write_tests","build","upgrade"]}]' \
            --tech-stack="Python, FastAPI, PostgreSQL" \
            > /tmp/speck.json

          echo "result=$(jq -c '.' /tmp/speck.json)" >> $GITHUB_OUTPUT

      - name: Post summary to issue
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          RESULT: ${{ steps.run.outputs.result }}
        run: |
          TASKS=$(echo "$RESULT" | jq '.total_tasks')
          PHASES=$(echo "$RESULT" | jq '.execution_plan.total_phases')
          PRETTY=$(echo "$RESULT" | jq '.')

          gh issue comment "${{ github.event.issue.number }}" \
            --body "## Speck Decomposition Complete
          **Tasks**: ${TASKS} | **Phases**: ${PHASES}
          <details><summary>Full JSON</summary>

          \`\`\`json
          ${PRETTY}
          \`\`\`
          </details>"

  execute-phase:
    needs: decompose
    runs-on: ubuntu-latest
    strategy:
      matrix:
        phase: ${{ fromJson(needs.decompose.outputs.result).execution_plan.phases }}
      max-parallel: 3
      fail-fast: false
    steps:
      - uses: actions/checkout@v6
      - uses: dagger/dagger-for-github@v8.4.1
        with:
          verb: version

      - name: Execute phase ${{ matrix.phase.phase }} and create PR
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          SPECK_JSON: ${{ needs.decompose.outputs.result }}
          PHASE_NUM: ${{ matrix.phase.phase }}
        run: |
          TASKS=$(echo "$SPECK_JSON" | jq -c \
            "[.tasks[] | select(.phase == ($PHASE_NUM | tonumber))] | sort_by(.order)")

          for i in $(seq 0 $(( $(echo "$TASKS" | jq 'length') - 1 ))); do
            TASK=$(echo "$TASKS" | jq -c ".[$i]")
            AGENT=$(echo "$TASK" | jq -r '.suggested_agent.source // empty')
            ENTRY=$(echo "$TASK" | jq -r '.suggested_agent.entrypoint // "assist"' | tr '_' '-')
            DESC=$(echo "$TASK" | jq -r '.description')
            MODEL=$(echo "$TASK" | jq -r '.suggested_model')

            echo "--- [$(echo "$TASK" | jq -r '.id')] $(echo "$TASK" | jq -r '.title') (model=$MODEL) ---"
            [ -z "$AGENT" ] && echo "Skipping: no agent" && continue

            if [ "$ENTRY" = "review" ]; then
              ANTHROPIC_MODEL="$MODEL" dagger call -m "$AGENT" --allow-llm=all \
                --source=. "$ENTRY" --assignment="$DESC"
            else
              ANTHROPIC_MODEL="$MODEL" dagger call -m "$AGENT" --allow-llm=all \
                --source=. "$ENTRY" --assignment="$DESC" export --path=.
            fi
          done

          # Create PR from accumulated changes
          dagger call -m github.com/kpenfound/dag/github-issue \
            --token=env:GITHUB_TOKEN \
            create-pull-request \
            --repo="https://github.com/${{ github.repository }}" \
            --source=. \
            --branch="${{ matrix.phase.pr_branch }}" \
            --base=master \
            --title="Phase ${{ matrix.phase.phase }}: ${{ matrix.phase.name }}" \
            --body="Implements **${{ matrix.phase.name }}**. Closes #${{ github.event.issue.number }}. Generated by Speck." \
            url

A few things to note in this workflow:

Two jobs, one workflow. The decompose job uses Opus for planning, since it needs the most reasoning power to analyze the codebase and produce a good decomposition. The execute-phase job uses the suggested_model from each task: Haiku for simple changes, Sonnet for standard work, Opus for complex logic.
--allow-llm=all is required in CI. In interactive mode, Dagger prompts for LLM API access approval. In GitHub Actions there's no TTY, so we bypass the prompt.
PR creation uses a Dagger module. Instead of raw git commands, we use the github-issue module's create-pull-request function, which takes a --source Directory and handles branch creation, commit, push, and PR creation internally.
Review tasks skip export. The review entrypoint returns a string (the review text), not a Directory. Other entrypoints return a Directory that gets exported back to the workspace for the next task in the chain.

Step 3: Configure secrets

The workflow needs an LLM API key. Add it as a repository secret:

gh secret set ANTHROPIC_API_KEY --repo your-org/fastapi-realworld-example-app

The GITHUB_TOKEN is provided automatically by GitHub Actions with the permissions declared in the workflow.

Step 4: Commit, push, and create a test issue

git add .github/workflows/speck.yml
git commit -m "Add Speck spec-driven development workflow"
git push origin master

Now create a GitHub issue with a feature request (see issue #1):

Title: Add article bookmarking/favorites list endpoint

Body:

Add the ability for authenticated users to retrieve their list of favorited articles with pagination and optional filtering.

Requirements:

New GET /api/articles/feed endpoint that returns articles the current user has favorited

Support pagination via limit and offset query parameters (defaults: limit=20, offset=0)

Support optional tag filter to narrow favorites by tag

Response format must match the existing GET /api/articles response shape (articles array + articlesCount)

Only accessible to authenticated users (return 401 if unauthenticated)

Running the Workflow

Add the speck label to the issue. This triggers the workflow.

Step 1, Decomposition (Opus):

Speck reads the issue, explores the FastAPI codebase (models, routes, repositories, existing test patterns), and produces a structured decomposition. It posts the result as a comment on the issue:

In this case, Speck decomposed the feature into 3 phases with 9 tasks:

Phase 1, Repository and Query Layer (T001-T004): add repository method to fetch favorited articles (assist, sonnet), add dependency function and filter schema (assist, haiku), write tests for repository and schema (write-tests, sonnet), review (review, sonnet)
Phase 2, API Endpoint (T005-T007): implement GET /api/articles/favorites endpoint (assist, sonnet), write integration tests (write-tests, opus), review endpoint and tests (review, sonnet)
Phase 3, Error Handling and Strings (T008-T009): add error handling strings and edge case coverage (assist, haiku), final integration review (review, sonnet)

Each task has a suggested_model based on complexity: simple schema additions get claude-haiku-4-5, standard implementations get claude-sonnet-4-6, and comprehensive test writing gets claude-opus-4-6 (since tests need to understand the full implementation context).

Step 2, Execution (parallel phases, sequential tasks):

GitHub Actions matrices by phase. Each phase runs on its own runner. Within each phase, tasks are chained sequentially. Monty implements the feature, then writes tests on top of the implementation, then reviews the accumulated changes:

Phase 1 runner (Repository and Query Layer):
  T001 (assist, sonnet)        → Monty adds repository method           → exports to .
  T002 (assist, haiku)         → Monty adds filter schema + dependency  → exports to .
  T003 (write-tests, sonnet)   → Monty writes tests                     → exports to .
  T004 (review, sonnet)        → Monty reviews all changes              → logs review
  → github-issue creates PR #3 from accumulated changes

Phase 2 runner (API Endpoint):
  T005 (assist, sonnet)        → Monty implements the endpoint           → exports to .
  T006 (write-tests, opus)     → Monty writes integration tests          → exports to .
  T007 (review, sonnet)        → Monty reviews endpoint + tests          → logs review
  → github-issue creates PR #4 from accumulated changes

Phase 3 runner (Error Handling and Strings):
  T008 (assist, haiku)         → Monty adds error handling + edge cases  → exports to .
  T009 (review, sonnet)        → Monty does final integration review     → logs review
  → github-issue creates PR #5 from accumulated changes

Step 3, Pull Requests:

Each phase produced one PR with all accumulated changes, linked to the original issue:

PR #3: Phase 1: Repository and Query Layer — 361 additions across 5 files
PR #4: Phase 2: API Endpoint — 317 additions across 5 files
PR #5: Phase 3: Error Handling and Strings — 74 additions across 3 files

Each PR includes implementation, tests, and a review pass, all generated by Monty working sequentially on the same codebase within the phase.

What Just Happened

A developer wrote a feature request with acceptance criteria. The system:

Opus analyzed the codebase and decomposed the request into 9 tasks across 3 phases
Haiku, Sonnet, and Opus implemented the feature, wrote tests, and reviewed the code, each task using the right model for its complexity
Three PRs were created automatically, linked to the issue, ready for human review — 752 lines of code across 13 files

No pipeline code was written. No agent was invoked manually. The developer's job is now to review the PRs: read the code, check the tests, verify the approach. The mechanical work of translating a spec into code, tests, and PRs happened automatically.

This is the shift from Internal Developer Platform to Internal Agent Factory: the platform doesn't just run your CI. It runs your agents, manages their model costs, chains their outputs, and produces reviewable artifacts from natural language specifications.

Image generated with Google's Gemini "Nano Banana Pro"

Agents That Learn: Self-Improvement Across Runs

There's one more capability worth covering. Every agent (Monty, Angie, Daggie, and Goose, a GCP deployment orchestrator) reads per-repo context files to understand project conventions. But until now, the context was static. The developer wrote it once and maintained it by hand.

With --self-improve, the agents can update those files themselves:

dagger call -m github.com/telchak/daggerverse/monty@v0.2.0 \
  --self-improve=write \
  assist \
    --source=. \
    --assignment="Add input validation to all API endpoints"

As Monty works through the codebase (reading models, tracing routes, checking existing patterns), it discovers things: "This project uses Pydantic v2 field validators, not v1-style @validator." "Tests use httpx.AsyncClient, not the sync test client." "Custom exceptions live in app/errors.py."

Instead of those discoveries dying with the session, Monty records them in two files:

MONTY.md, Python-specific knowledge:

## Learned Context

- Pydantic v2 with field validators (`field_validator`), not v1 `@validator`
- All route handlers are async; tests use `httpx.AsyncClient` with `pytest-asyncio`
- Input validation pattern: Pydantic model as request body, raises `ValidationError` → 422

AGENTS.md, general project knowledge shared across all agents:

## Learned Context

- Custom exception hierarchy in `app/errors.py`, handlers in `app/middleware.py`
- Project uses src layout with `app/` as the main package
- CI runs pytest with coverage; minimum threshold is 80%

The next time any agent runs on this repo, whether it's Monty, Angie, or a different developer, it reads both the agent-specific file and the shared AGENTS.md, starting with better knowledge. Python patterns stay in MONTY.md where only Monty reads them; project-wide conventions go in AGENTS.md where every agent benefits. No one had to write documentation. The agents documented the project by working on it.

Three modes

Mode	What happens
`off` (default)	Current behavior: context files are read-only
`write`	Agent updates context files in the returned workspace directory
`commit`	Agent updates context files and creates a git commit

The commit mode is useful for automation. When combined with develop-github-issue, the context file updates get included in the PR:

dagger call -m github.com/telchak/daggerverse/monty@v0.2.0 \
  --self-improve=commit \
  develop-github-issue \
    --github-token=env:GITHUB_TOKEN \
    --issue-id=42 \
    --repository="https://github.com/owner/my-python-api" \
    --source=.

The PR includes both the code changes and a commit like:

chore(monty): update context files with learned discoveries

Over time, the context files become living documents, a compressed summary of the project's architecture, conventions, and gotchas, maintained by the agents that work on it.

Would I Recommend Dagger CI in Production Right Now?

I've been following the Dagger project for several years now. And I can say with confidence: it has never been closer to production-ready than it is today.

The core primitives (typed functions, composable modules, containerized execution, deterministic caching) are solid. The dagger call experience is genuinely portable across local development and CI. The module ecosystem is growing. And as we've seen throughout this series, the LLM integration through the dag.llm() primitive opens up a category of workflows that simply didn't exist before.

That said, there are areas where the platform still needs to mature. Here's what I'd like to see, and what's already on the roadmap.

What the Agent Layer Needs

The current LLM primitive is functional but minimal. To build truly capable agents in Dagger, a few key features would make a significant difference:

External memory: Right now, agents forget everything between runs. Connecting the LLM to persistent memory stores (a vector database, a knowledge graph, or even a simple key-value store) would let agents accumulate project knowledge beyond what --self-improve and context files can provide.
RAG integration: Being able to connect the LLM to external retrieval-augmented generation engines would allow agents to reason over large documentation sets, internal wikis, or historical CI logs without stuffing everything into the context window.
Remote MCP servers: Dagger currently supports stdio-based MCP servers. Adding support for remote HTTP MCP servers would unlock integration with hosted tool services (SonarQube, Jira, Slack, observability platforms) without needing to bundle everything as a local process.
Broader BYOK compatibility: Dagger natively recognizes only four provider configurations: ANTHROPIC_API_KEY, GEMINI_API_KEY, OPENAI_API_KEY, and Ollama. To use any other provider (Mistral, Qwen, DeepSeek), you have to route through the OPENAI_BASE_URL compatibility layer, which works but feels like a workaround. Native support for more provider environment variables (MISTRAL_API_KEY, DEEPSEEK_API_KEY, etc.) would make BYOK a first-class experience rather than an OpenAI-compat hack.
Native multi-agent patterns: Right now, orchestrating multiple agents (like Speck decomposing work across Monty and Angie) requires external coordination (a GitHub Actions matrix, a shell loop, or a custom orchestrator). A native Graph type in Dagger's core, similar to what LangGraph provides, would let you define agent workflows as typed, cacheable DAGs. Imagine declaring "run these three agents in parallel, merge their outputs, then run a review agent" as a first-class Dagger construct.

What's Coming, and Why It Matters

Some of the most exciting changes are already in active development:

Cloud Engines: Fully managed Dagger execution environments with auto-scaling and distributed caching built in. Run dagger --cloud and your pipeline executes on managed infrastructure, with secrets and local context securely streamed to the cloud. No more managing Kubernetes daemonsets or custom cache layers.

Cloud Checks: This is the big one. Cloud Checks connects directly to your Git provider and triggers dagger check on every change, running on Cloud Engines. No YAML. No vendor syntax. No orchestration layer. Just your Dagger modules.

Those two previous features are welcome because the more complex our Dagger workflows get, the more trying to fit them into GitHub Actions or GitLab CI feels like forcing circles into squares. Our Speck-driven development workflow is a perfect example: a decompose job that outputs dynamic JSON, a matrix strategy that fans out phases, shell scripts converting snake_case to kebab-case, environment variables carrying JSON between steps, conditional export commands based on return types... All of this ceremony exists because GitHub Actions was designed for static, declarative workflows, not for the kind of dynamic, graph-shaped execution that Dagger naturally produces. Cloud Checks would eliminate that entire translation layer. Your Dagger module is the CI platform. Add to that a native Graph core type, and you could have a full native multi-agent workflow completely independent from GitHub Actions or any other CI engine. Dagger CI would go from a "CI development toolkit" to a fully operational CI/CD platform.

Modules V2: A fundamental redesign of how modules interact with projects. Today, modules can't see your project structure unless you thread it through manually with --source flags, custom boilerplate, and static path patterns. Modules V2 introduces a typed Workspace API that lets modules parse configuration files, traverse directory trees, and adapt to any project layout, all through executable code rather than rigid pragmas. A new .dagger/config.toml file declares which modules a project uses in a human-editable format, and a lockfile ensures reproducible resolution across teams. This shifts complexity from users to module authors, which is exactly where it belongs.

These three features together (managed compute, native CI triggering, and smarter module integration) would close the gap between "Dagger as a portable pipeline SDK" and "Dagger as a complete CI platform." And from everything I've seen in the project's trajectory, that gap is closing fast.

Conclusion

This is where the whole series comes together.

Part 1: Pipelines as real code — typed, testable, portable. No more YAML guesswork.
Part 2: Decoupled from infrastructure. Same pipeline on any runner, any cloud.
Part 3: A module library. Public daggerverse modules for generic operations, private modules for organizational compliance. Domain expertise encoded as deterministic, versioned functions.
Part 4: AI agents at the edges. Daggie generates the toolchain setup from the module library. dagger check runs fast and deterministic — no LLM, no tokens. When checks fail, Monty and Angie post fix suggestions on the PR. Speck decomposes feature requests into phased, agent-executable task plans. And with --self-improve, every agent interaction leaves the project better documented.

The key insight: CI checks need to be fast, reliable, and deterministic. AI belongs at the edges — generating the configuration, diagnosing failures, decomposing specs into tasks, and learning from every run. Never in the hot path.

The example apps and Dagger module are at github.com/telchak/dagger-ci-demo. The AcmeCorp private modules from Part 3 are at github.com/telchak/acme-dagger-modules.

Useful links:

github.com/dagger/dagger: the main Dagger repository
dagger.io/changelog: follow what features are being actively developed (Cloud Engines, Cloud Checks, Modules V2, and more)
daggerverse.dev: the public module registry, where you can discover and reuse community modules
github.com/telchak/daggerverse: all the modules I've been personally creating throughout this series (Daggie, Monty, Angie, Goose, Speck, and the GCP infrastructure modules), and that I'll continue to support to bring my little piece to this open-source project

Key Takeaways

Deterministic checks, not LLM-routed ones: CI needs speed and reliability; keep AI out of the hot path
Daggie configures toolchains: point it at your module library and your project, and it generates the right dagger.json and CI workflow
Agents fix failures: Monty and Angie analyze errors and post code suggestions on PRs
Agents learn: --self-improve lets agents update context files (agent-specific + shared AGENTS.md) with project discoveries, getting smarter across runs
Spec-driven development: Speck decomposes high-level specs into structured task lists, dispatching work across specialized agents with model-appropriate assignments
Modules are the foundation: public modules for generic operations, private modules for org compliance, both composable by humans and agents
AI at the edges: configure the setup (before), fix the failures (after), learn from the work. Never in the hot path

This concludes the 4-part series. Thanks for reading.

Full Series:

Tags: #cicd #dagger #ai-agents #platform-engineering #mcp #cloudrun #firebase #spec-driven-development

DEV Community