DEV Community

Juan Antonio Osorio for Stacklok

Posted on

Bringing AI Agents to CI/CD: Using ToolHive and Buildkite to Bring Intelligence to Vulnerability Scanning

Continuous Integration and Continuous Deployment (CI/CD) pipelines have traditionally relied on deterministic scripts and predefined workflows, offering predictable results. What if your CI/CD pipeline could think, analyze, and make intelligent decisions? What if it could adapt to complex scenarios, understand context, and provide insights beyond simple pass/fail results?

This is where agentic workflows come in. With the new ToolHive Buildkite Plugin, you can now seamlessly integrate AI agents into your CI/CD pipelines using the Model Context Protocol (MCP).

Use Case: Agentic Vulnerability Scanning

Traditional CI/CD treats all issues equally. A medium-severity CVE gets the same treatment whether it's in a critical path or an unused dependency. Agents change this by understanding:

  • Context: Where and how a vulnerability can be exploited in your specific architecture.
  • Impact: The actual risk to your application, not just a generic score.
  • Remediation: Specific steps that work for your codebase, not generic advice.

Instead of a binary pass/fail, you get nuanced analysis with actionable recommendations. The agent doesn't just tell you there's a problem – it explains why it matters and how to fix it.

In the context of CI/CD, this means your pipeline can become more than just a series of tests – it becomes an intelligent system that understands your codebase, security posture, and deployment requirements. The technology stack for this approach: MCP, ToolHive, and Buildkite.

ToolHive: The MCP Engine

ToolHive is your starting point for running MCP in production. It handles:

  • Server Lifecycle: Starting, stopping, and managing MCP server instances.
  • Transport Methods: Supporting multiple communication protocols (stdio, SSE, streamable-http).
  • Security: Managing secrets, permissions, and isolation.
  • Discovery: Providing a registry of available MCP servers.

Buildkite: The CI/CD Platform

Buildkite provides a flexible, scalable CI/CD platform that's perfect for running agentic workflows because of its:

  • Plugin Architecture: Extensible system for adding functionality.
  • Container Support: Native Docker/Podman integration.
  • Parallel Execution: Ability to run multiple agents simultaneously.
  • Artifact Management: Built-in support for storing and sharing results.

How the ToolHive Buildkite Plugin Works

To bring this use case — agentic vulnerability scanning — to life, we created the ToolHive Buildkite Plugin. It bridges these technologies and enables you to spawn MCP servers directly in your CI/CD pipeline. Here's how it works:

  1. Plugin Configuration

In your Buildkite pipeline, you simply add the plugin to any step:

steps:
  - label: "🔍 Security Analysis"
    command: "run-security-scan"
    plugins:
      - StacklokLabs/toolhive#v0.0.2:
          server: "osv"  # OSV vulnerability database server
          transport: "sse"
          proxy-port: 8080
Enter fullscreen mode Exit fullscreen mode
  1. Automatic MCP Server Provisioning

When the pipeline runs, the plugin:

  • Downloads ToolHive if not already available.
  • Spawns the MCP server in a containerized environment.
  • Configures networking to make the server accessible to your agent.
  • Manages lifecycle ensuring proper cleanup after execution.
  1. Agent Connection

Your AI agent can then connect to the MCP server and use its tools. Here's a snippet from a simple Python agent using the PydanticAI framework:

from pydantic_ai import Agent
from pydantic_ai.mcp import MCPServerSSE

# Connect to the MCP server spawned by the plugin
osv_server = MCPServerSSE("http://localhost:8080/sse")

# Create an agent with access to OSV tools
agent = Agent(
    model=model,
    mcp_servers=[osv_server],
    system_prompt="You are a security analyst..."
)

# The agent can now use OSV tools to analyze vulnerabilities
result = await agent.run("Analyze security vulnerabilities...")
Enter fullscreen mode Exit fullscreen mode

Real-World Example: Intelligent Vulnerability Scanning

Let's look at a concrete example from our demo repository that showcases agentic vulnerability scanning, comparing a traditional approach to an agentic approach

Traditional CI/CD Security Scanning:

# Run a vulnerability scanner
osv-scanner --json > results.json

# Check if any high severity vulnerabilities exist
if grep -q '"severity": "HIGH"' results.json; then
  echo "High severity vulnerabilities found!"
  exit 1
fi
Enter fullscreen mode Exit fullscreen mode

This approach is limited in several ways. There’s a binary pass/fail decision, no context or explanation, no intelligent categorization and no actionable recommendations. In short, it’s functional, but not very useful.

Agentic Security Scanning:

steps:
  - label: "🔍 Intelligent Vulnerability Analysis"
    command: |
      uv run buildkite-demo-agent \
        --packages-file examples/packages.json \
        --fail-on-vulnerabilities \
        --severity-threshold high
    plugins:
      - StacklokLabs/toolhive#v0.0.2:
          server: "osv"
          transport: "sse"
Enter fullscreen mode Exit fullscreen mode

This agent provides a lot more value. It will deliver intelligent analysis by using Claude Code to understand vulnerability context; it will classify the severity of CVEs based on actual impact and not just CVSS scores; and it will offer detailed explanations and specific remediation steps.

The Agent's Intelligence in Action

Here's what happens when the agent analyzes a vulnerability:

  1. Query OSV Database: The agent uses MCP tools to query vulnerability data.
  2. Contextual Analysis: Claude analyzes the vulnerability considering:
    • The specific package and version
    • The type of vulnerability (RCE, DoS, data exposure)
    • The ecosystem and common usage patterns
  3. Intelligent Categorization: Instead of relying solely on CVSS scores, the agent considers:
    • Exploitability in your environment
    • Actual impact on your application
    • Availability of patches or workarounds
  4. Structured Output: Returns actionable information:

Conclusion

There’s a lot of energy around MCP, and we need to channel that into putting the protocol to work in production environments. Agentic vulnerability scanning is a compelling use case and it hints at the potential of agentic workflows in CI/CD pipelines. If you want to give this a try, we created the ToolHive Buildkite Plugin to be simple and secure. Of course, we also encourage you to check out ToolHive for other MCP use cases; you can explore our GitHub repo or reach out directly via Discord.

Top comments (0)