DEV Community

Cover image for An Incredible Operations Platform - Rundeck
Sergio D. Rodríguez Inclán
Sergio D. Rodríguez Inclán

Posted on • Originally published at blog.walsen.website

An Incredible Operations Platform - Rundeck

Introduction

There's a moment every operations engineer knows well: it's 2 AM, something's broken, and you're frantically SSH-ing into servers trying to remember the exact sequence of commands to fix it. You've done this before, but was it systemctl restart first or the config update? And which servers exactly?

This is the problem Rundeck solves. It's an open-source runbook automation platform that lets you define, schedule, and execute operational procedures across your entire infrastructure—with proper access control, audit trails, and the peace of mind that comes from knowing the procedure will run exactly the same way every time.

I've been using Rundeck for years, and recently I decided to contribute back to the ecosystem by creating three tools that solve specific pain points I encountered: a production-ready Docker image, a plugin for copying files between nodes, and a GitHub Action for seamless CI/CD integration. Let me walk you through all of them.

What Makes Rundeck Special

Rundeck occupies a unique space in the DevOps toolchain. It's not a configuration management tool like Ansible or Puppet—though it integrates beautifully with them. It's not a CI/CD platform like Jenkins—though it can trigger and be triggered by pipelines. Rundeck is specifically designed for operational workflows.

Core Capabilities

  • Job Definitions: Create multi-step workflows with conditionals, error handling, and node targeting
  • Node Management: Maintain a centralized inventory of your infrastructure with custom attributes
  • Access Control: Fine-grained RBAC so developers can restart their services without full SSH access
  • Key Storage: Secure credential management for SSH keys, passwords, and API tokens
  • Execution History: Complete audit trail of who ran what, when, and with what results
  • Scheduling: Cron-like scheduling for recurring maintenance tasks
  • REST API: Full API for integration with CI/CD pipelines, monitoring systems, and AI agents

Where It Shines

In my experience, Rundeck excels at:

  1. Incident Response: Pre-built runbooks that on-call engineers can execute confidently
  2. Self-Service Operations: Let developers restart services or clear caches without ops tickets
  3. Coordinated Procedures: Multi-node operations that need to happen in a specific order
  4. Compliance Tasks: Scheduled security scans, backup verifications, audit reports
  5. Deployment Orchestration: Coordinate deployments across environments with approval gates

REST API: Automation Beyond the UI

One of Rundeck's most powerful features is its comprehensive REST API. Every action you can perform in the web interface is available programmatically, making Rundeck a perfect backend for automated operations.

API Capabilities

The API supports:

  • Job execution: Trigger jobs with custom parameters
  • Execution monitoring: Check status, stream logs, abort running jobs
  • Job management: Create, update, delete, import/export job definitions
  • Node inventory: Query and manage node sources
  • Key storage: Programmatic credential management
  • System info: Health checks, metrics, cluster status

Authentication

Rundeck supports multiple authentication methods for API access:

# Using API Token (recommended)
curl -H "X-Rundeck-Auth-Token: YOUR_TOKEN" \
  https://rundeck.example.com/api/41/projects

# Using session cookie
curl -c cookies.txt -b cookies.txt \
  -d "j_username=admin&j_password=admin" \
  https://rundeck.example.com/j_security_check
Enter fullscreen mode Exit fullscreen mode

Triggering Jobs Programmatically

# Run a job by ID
curl -X POST \
  -H "X-Rundeck-Auth-Token: YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"options": {"environment": "production", "version": "1.2.3"}}' \
  https://rundeck.example.com/api/41/job/JOB_ID/run

# Response includes execution ID for monitoring
{
  "id": 12345,
  "href": "https://rundeck.example.com/api/41/execution/12345",
  "status": "running"
}
Enter fullscreen mode Exit fullscreen mode

Integration Scenarios

CI/CD Pipelines

Trigger deployment jobs from Jenkins, GitHub Actions, or GitLab CI. I created a dedicated GitHub Action to make this integration even easier: rundeck-github-action.

# Using the Rundeck GitHub Action
- name: Deploy via Rundeck
  uses: Walsen/rundeck-github-action@v1
  with:
    rundeck_url: https://rundeck.example.com
    rundeck_token: ${{ secrets.RUNDECK_TOKEN }}
    action: run_job
    job_id: ${{ vars.DEPLOY_JOB_ID }}
    job_options: '{"version": "${{ github.sha }}", "environment": "production"}'
    wait_for_completion: true
    timeout: 600
Enter fullscreen mode Exit fullscreen mode

The action supports multiple operations:

Action Description
run_job Execute a Rundeck job with options
get_job_info Get job details
list_jobs List jobs in a project
get_execution Get execution details
list_executions List executions for a job or project
abort_execution Abort a running execution

You can also wait for job completion and get the execution status:

- name: Run deployment and wait
  id: deploy
  uses: Walsen/rundeck-github-action@v1
  with:
    rundeck_url: https://rundeck.example.com
    rundeck_token: ${{ secrets.RUNDECK_TOKEN }}
    action: run_job
    job_id: ${{ vars.DEPLOY_JOB_ID }}
    wait_for_completion: true

- name: Check result
  run: |
    echo "Status: ${{ steps.deploy.outputs.execution_status }}"
    echo "URL: ${{ steps.deploy.outputs.execution_url }}"
Enter fullscreen mode Exit fullscreen mode

Monitoring Integration

Have your monitoring system trigger remediation jobs automatically:

# PagerDuty webhook handler example
def handle_alert(alert):
    if alert['type'] == 'high_memory':
        requests.post(
            f"{RUNDECK_URL}/api/41/job/{CLEAR_CACHE_JOB}/run",
            headers={"X-Rundeck-Auth-Token": RUNDECK_TOKEN},
            json={"options": {"server": alert['hostname']}}
        )
Enter fullscreen mode Exit fullscreen mode

AI Agents and LLM Integration

This is where things get interesting. Rundeck's API makes it an ideal execution backend for AI-powered operations. Using the Strands Agents SDK from AWS, you can build intelligent agents that leverage Rundeck as their operational backbone.

Mermaid Diagram

First, install the dependencies:

pip install strands-agents strands-agents-tools requests
Enter fullscreen mode Exit fullscreen mode

Create a custom Rundeck tool for your agent:

from strands import Agent, tool
import requests

RUNDECK_URL = "https://rundeck.example.com"
RUNDECK_TOKEN = "your-api-token"

@tool
def list_rundeck_jobs(project: str) -> dict:
    """List available runbooks in a Rundeck project.

    Args:
        project: The Rundeck project name

    Returns:
        List of available jobs with their IDs and descriptions
    """
    response = requests.get(
        f"{RUNDECK_URL}/api/41/project/{project}/jobs",
        headers={"X-Rundeck-Auth-Token": RUNDECK_TOKEN}
    )
    return response.json()

@tool
def run_rundeck_job(job_id: str, options: dict = None) -> dict:
    """Execute a Rundeck job/runbook.

    Args:
        job_id: The UUID of the job to execute
        options: Optional parameters for the job

    Returns:
        Execution details including status and execution ID
    """
    response = requests.post(
        f"{RUNDECK_URL}/api/41/job/{job_id}/run",
        headers={
            "X-Rundeck-Auth-Token": RUNDECK_TOKEN,
            "Content-Type": "application/json"
        },
        json={"options": options or {}}
    )
    return response.json()

@tool
def get_execution_status(execution_id: int) -> dict:
    """Check the status of a Rundeck job execution.

    Args:
        execution_id: The execution ID to check

    Returns:
        Execution status and details
    """
    response = requests.get(
        f"{RUNDECK_URL}/api/41/execution/{execution_id}",
        headers={"X-Rundeck-Auth-Token": RUNDECK_TOKEN}
    )
    return response.json()

# Create the operations agent
ops_agent = Agent(
    system_prompt="""You are an operations assistant with access to Rundeck runbooks.
    When asked to perform operational tasks:
    1. List available jobs to find the appropriate runbook
    2. Execute the job with the correct parameters
    3. Monitor the execution and report the results
    Always confirm before executing destructive operations.""",
    tools=[list_rundeck_jobs, run_rundeck_job, get_execution_status]
)

# Example interaction
response = ops_agent("The app servers are running low on disk space. \
    Can you clear the log files on the production cluster?")
print(response)
Enter fullscreen mode Exit fullscreen mode

This pattern keeps humans in control—the AI can only execute pre-approved runbooks with proper access controls and audit trails. It's autonomous operations with guardrails. The agent can reason about which runbook to use, execute it, and report back the results, all while respecting Rundeck's RBAC policies.

Low-Code Integration with n8n

If you prefer a visual approach to automation, n8n offers native Rundeck integration. You can build workflows that connect GitHub events to Rundeck job executions without writing code.

Mermaid Diagram

The n8n Rundeck node supports:

  • Execute: Trigger any Rundeck job with parameters
  • Get Metadata: Retrieve job definitions and status

Combined with n8n's 400+ integrations, you can create powerful automation chains—for example, triggering a deployment job when a GitHub release is published, then notifying your team on Slack when it completes.

Licensing: Open Source vs Enterprise

Rundeck follows an open-core model. The community edition is fully open source under the Apache 2.0 license, while PagerDuty (who acquired Rundeck) offers commercial versions with additional features.

Rundeck Community (Open Source)

Free forever, includes:

  • Workflow execution and job definitions
  • Node management and key storage
  • Access control (ACL-based)
  • Scheduling and job activity logs
  • Community plugins
  • REST API

This is what my Docker image uses—perfect for small to medium teams and learning environments.

PagerDuty Runbook Automation (Commercial)

The enterprise offerings add:

Feature Description
High Availability Clustered deployments with auto-takeover
SSO Integration SAML, LDAP, OAuth support
Enterprise Plugins ServiceNow, PagerDuty, Datadog, VMware integrations
Advanced Scheduling Blackout calendars, schedule forecasting
Failed Job Resume Resume from the failed step instead of restarting
Enterprise Support SLA-backed support and account management

PagerDuty offers two commercial options:

  1. Runbook Automation Self-Hosted: You manage the infrastructure, they provide the enterprise features
  2. Runbook Automation (Cloud): Fully managed SaaS with 99.9% SLA

Which Should You Choose?

For most use cases, start with the open source version. It's production-ready and covers the core functionality. Consider upgrading when you need:

  • High availability for mission-critical operations
  • SSO integration with your identity provider
  • Enterprise integrations (ServiceNow tickets, PagerDuty incidents)
  • Professional support with SLAs

The open source version isn't a "lite" version—it's a complete operations platform that many organizations run successfully in production.

The Gap: Node-to-Node File Transfers

While building a configuration distribution workflow, I hit a limitation. Rundeck's built-in file copier moves files from the Rundeck server to target nodes. But I needed to copy files from one node to multiple other nodes—specifically, distributing generated configs from a central server to application nodes.

The workaround was clunky: download to Rundeck, then upload to each destination. For large files or many destinations, this becomes a bottleneck.

So I built a plugin.

Rundeck Node-to-Node Plugin

The rundeck-node-to-node plugin adds a workflow step that copies files and directories between nodes using SSH/SFTP, fully integrated with Rundeck's node definitions and key storage.

Features

  • Two Transfer Modes: Route through Rundeck (reliable) or direct node-to-node (fast)
  • Parallel Transfers: Copy to multiple destinations simultaneously
  • Directory Support: Recursive copy with preserved permissions and timestamps
  • Key Storage Integration: Uses Rundeck's secure credential management
  • Error Handling: Option to continue on partial failures

Transfer Modes Explained

Via-Rundeck (Default)

Files download to Rundeck once, then upload to all destinations. Works in any network topology.

Mermaid Diagram

Direct Mode

Source pushes directly to destinations via SCP. Faster, but requires source-to-destination SSH access.

Mermaid Diagram

Node Configuration

The plugin uses Rundeck's standard node attributes:

config-server:
  hostname: 10.0.1.10
  username: deploy
  ssh-key-storage-path: keys/project/deploy-key
  tags: config

app-server-01:
  hostname: 10.0.1.20
  username: deploy
  ssh-key-storage-path: keys/project/deploy-key
  tags: app,production

app-server-02:
  hostname: 10.0.1.21
  username: deploy
  ssh-key-storage-path: keys/project/deploy-key
  tags: app,production
Enter fullscreen mode Exit fullscreen mode

Plugin Options

Option Required Default Description
Source Node Yes - Node name where files originate
Source Path Yes - File or directory path on source
Destination Nodes Yes - Comma-separated destination node names
Destination Path Yes - Target path on destinations
Recursive Copy No true Copy directories recursively
Preserve Attributes No true Keep timestamps and permissions
Transfer Mode No via-rundeck via-rundeck or direct
Parallel Transfers No true Transfer to multiple nodes in parallel
Continue on Error No false Don't fail if some destinations fail

Installation

  1. Download the JAR from Releases
  2. Copy to Rundeck's libext directory
  3. Restart Rundeck (or wait for auto-reload)

The plugin appears as a new workflow step type: "Node to Node File Copy".

Production-Ready Docker Image

Installing Rundeck traditionally means dealing with Java, databases, reverse proxies, and process management. To streamline this, I created a Docker image that bundles everything for production use.

The rundeck-image provides:

  • Rundeck 5.18.0 (configurable version)
  • Nginx reverse proxy for proper HTTP handling
  • Supervisor for process management
  • PostgreSQL support for production databases
  • Node-to-Node plugin pre-installed

Architecture

Mermaid Diagram

Quick Start

docker run -d \
  -p 8080:8080 \
  -e RUNDECK_GRAILS_URL=http://localhost:8080 \
  ghcr.io/walsen/rundeck-image:latest
Enter fullscreen mode Exit fullscreen mode

Access Rundeck at http://localhost:8080.

Production Setup with Docker Compose

version: '3.8'

services:
  rundeck:
    image: ghcr.io/walsen/rundeck-image:latest
    ports:
      - "8080:8080"
    environment:
      RUNDECK_GRAILS_URL: https://rundeck.example.com
      RUNDECK_DATABASE_URL: jdbc:postgresql://db:5432/rundeck
      RUNDECK_DATABASE_USERNAME: rundeck
      RUNDECK_DATABASE_PASSWORD: ${DB_PASSWORD}
    volumes:
      - ./config/realm.properties:/etc/rundeck/realm.properties
      - rundeck-data:/var/rundeck
    depends_on:
      - db

  db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: rundeck
      POSTGRES_USER: rundeck
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - postgres-data:/var/lib/postgresql/data

volumes:
  rundeck-data:
  postgres-data:
Enter fullscreen mode Exit fullscreen mode

Environment Variables

Variable Default Description
RUNDECK_GRAILS_URL http://localhost:8080 External URL (must match your setup)
RUNDECK_DATABASE_URL - PostgreSQL JDBC connection string
RUNDECK_DATABASE_USERNAME - Database user
RUNDECK_DATABASE_PASSWORD - Database password

User Authentication

Mount a realm.properties file for basic authentication:

# username:password,role1,role2
admin:admin,user,admin
operator:operator123,user
readonly:viewer456,user
Enter fullscreen mode Exit fullscreen mode

For production, integrate with LDAP or SSO—Rundeck supports both.

Custom Port Mapping

The image handles any external port. Just match RUNDECK_GRAILS_URL:

# Running on port 4440
docker run -d \
  -p 4440:8080 \
  -e RUNDECK_GRAILS_URL=http://localhost:4440 \
  ghcr.io/walsen/rundeck-image:latest
Enter fullscreen mode Exit fullscreen mode

Practical Example: Config Distribution Workflow

Let me show how these tools work together in a real scenario.

The Setup:

  • 1 config server generates environment-specific configuration files
  • 5 application servers need these configs
  • Updates should happen without manual intervention

The Workflow:

Mermaid Diagram

Job Configuration:

Step Node(s) Action
1 config-server /opt/scripts/generate-config.sh
2 config-server → app-01..05 Node-to-Node copy /etc/myapp/config.yml
3 app-01..05 (sequential) systemctl reload myapp

The Result:

  • One-click config updates
  • Parallel distribution to all servers
  • Rolling reload to avoid downtime
  • Complete audit trail
  • Anyone with permissions can run it

Testing the Plugin

The plugin repository includes a Docker-based test environment:

cd test/
docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

This spins up:

  • A Rundeck instance with the plugin
  • Multiple test nodes for source/destination testing
  • Pre-configured SSH keys

See test/README.md for detailed testing instructions.

CI/CD Integration

Both repositories include GitHub Actions workflows:

rundeck-image:

  • Builds on every push to main
  • Publishes to GitHub Container Registry
  • Tags releases with version numbers

rundeck-node-to-node:

  • Builds and tests the plugin
  • Creates release artifacts
  • Publishes JAR files to GitHub Releases

Deploying Rundeck on AWS

If you're running on AWS, you have several options for deploying Rundeck. Here's my recommendation based on cost and operational effort:

Best Option: Amazon ECS on Fargate

For most teams, ECS Fargate hits the sweet spot between cost and operational simplicity:

Mermaid Diagram

Factor ECS Fargate
Operational Effort Low - no EC2 instances to manage
Cost ~$30-50/month for small workloads
Scaling Easy horizontal scaling
Integration Native ALB, Secrets Manager, CloudWatch
Persistence EFS for Rundeck data, RDS for database

Budget Option: ECS with PostgreSQL Sidecar

For cost-conscious deployments, you can run PostgreSQL as a sidecar container instead of using RDS. This cuts costs significantly while maintaining a containerized approach:

Mermaid Diagram

Factor With RDS With PostgreSQL Sidecar
Monthly Cost $50-80 $20-35
Database Backups Automatic Manual (EFS snapshots)
High Availability RDS Multi-AZ option Single container
Complexity Two services Single task

For most Rundeck deployments, the sidecar approach works great—Rundeck isn't a high-transaction workload, and EFS provides durability for your data.

Comparison of AWS Options

Option Monthly Cost Ops Effort Best For
EC2 (t3.small) $15-20 Medium Budget-conscious, 24/7 workloads
EC2 Spot $5-10 Medium Dev/test, can tolerate interruptions
Lightsail $5-20 Low Learning, simple setups
ECS Fargate $30-50 Low Production, low maintenance
EKS $100+ High Already running Kubernetes

Why Not the Others?

  • EC2: More operational overhead (patching, monitoring), but cheaper for 24/7 workloads with Reserved Instances
  • EKS: Overkill unless you already have a Kubernetes cluster—adds complexity and ~$70/month just for the control plane
  • App Runner: Simpler but limited networking control, harder to reach private infrastructure
  • Lightsail: Cheapest option but limited VPC integration for node access

My Recommendations

  • Small team/learning: EC2 t3.small or Lightsail (~$10-20/month)
  • Production with low ops effort: ECS Fargate + RDS (~$50-80/month)
  • Enterprise/HA: ECS Fargate multi-task + RDS Multi-AZ (~$150+/month)

ECS Task Definition Example

{
  "family": "rundeck",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [
    {
      "name": "rundeck",
      "image": "ghcr.io/walsen/rundeck-image:latest",
      "portMappings": [
        {"containerPort": 8080, "protocol": "tcp"}
      ],
      "environment": [
        {"name": "RUNDECK_GRAILS_URL", "value": "https://rundeck.example.com"}
      ],
      "secrets": [
        {"name": "RUNDECK_DATABASE_URL", "valueFrom": "arn:aws:secretsmanager:..."},
        {"name": "RUNDECK_DATABASE_PASSWORD", "valueFrom": "arn:aws:secretsmanager:..."}
      ],
      "mountPoints": [
        {"sourceVolume": "rundeck-data", "containerPath": "/var/rundeck"}
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/rundeck",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ],
  "volumes": [
    {
      "name": "rundeck-data",
      "efsVolumeConfiguration": {
        "fileSystemId": "fs-xxxxxxxx",
        "transitEncryption": "ENABLED"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

This setup gives you a production-ready Rundeck deployment with persistent storage, secrets management, and centralized logging—all with minimal operational overhead.

Conclusions

Rundeck transforms operational chaos into repeatable, auditable procedures. It's the tool I reach for when I need to bridge the gap between "we should automate this" and "we have time to build proper automation."

The Node-to-Node plugin fills a specific gap—efficient file distribution between nodes without routing everything through Rundeck. And the Docker image removes the friction of getting a production-ready Rundeck instance running.

If you're drowning in operational toil, give Rundeck a try. And if you need these specific capabilities, the tools are ready for you.

Links

Top comments (0)