Sergio D. Rodríguez Inclán

Posted on Feb 1 • Originally published at blog.walsen.website

An Incredible Operations Platform - Rundeck

#automation #devops #opensource #tooling

Introduction

There's a moment every operations engineer knows well: it's 2 AM, something's broken, and you're frantically SSH-ing into servers trying to remember the exact sequence of commands to fix it. You've done this before, but was it systemctl restart first or the config update? And which servers exactly?

This is the problem Rundeck solves. It's an open-source runbook automation platform that lets you define, schedule, and execute operational procedures across your entire infrastructure—with proper access control, audit trails, and the peace of mind that comes from knowing the procedure will run exactly the same way every time.

I've been using Rundeck for years, and recently I decided to contribute back to the ecosystem by creating three tools that solve specific pain points I encountered: a production-ready Docker image, a plugin for copying files between nodes, and a GitHub Action for seamless CI/CD integration. Let me walk you through all of them.

What Makes Rundeck Special

Rundeck occupies a unique space in the DevOps toolchain. It's not a configuration management tool like Ansible or Puppet—though it integrates beautifully with them. It's not a CI/CD platform like Jenkins—though it can trigger and be triggered by pipelines. Rundeck is specifically designed for operational workflows.

Core Capabilities

Job Definitions: Create multi-step workflows with conditionals, error handling, and node targeting
Node Management: Maintain a centralized inventory of your infrastructure with custom attributes
Access Control: Fine-grained RBAC so developers can restart their services without full SSH access
Key Storage: Secure credential management for SSH keys, passwords, and API tokens
Execution History: Complete audit trail of who ran what, when, and with what results
Scheduling: Cron-like scheduling for recurring maintenance tasks
REST API: Full API for integration with CI/CD pipelines, monitoring systems, and AI agents

Where It Shines

In my experience, Rundeck excels at:

Incident Response: Pre-built runbooks that on-call engineers can execute confidently
Self-Service Operations: Let developers restart services or clear caches without ops tickets
Coordinated Procedures: Multi-node operations that need to happen in a specific order
Compliance Tasks: Scheduled security scans, backup verifications, audit reports
Deployment Orchestration: Coordinate deployments across environments with approval gates

REST API: Automation Beyond the UI

One of Rundeck's most powerful features is its comprehensive REST API. Every action you can perform in the web interface is available programmatically, making Rundeck a perfect backend for automated operations.

API Capabilities

The API supports:

Job execution: Trigger jobs with custom parameters
Execution monitoring: Check status, stream logs, abort running jobs
Job management: Create, update, delete, import/export job definitions
Node inventory: Query and manage node sources
Key storage: Programmatic credential management
System info: Health checks, metrics, cluster status

Authentication

Rundeck supports multiple authentication methods for API access:

# Using API Token (recommended)
curl -H "X-Rundeck-Auth-Token: YOUR_TOKEN" \
  https://rundeck.example.com/api/41/projects

# Using session cookie
curl -c cookies.txt -b cookies.txt \
  -d "j_username=admin&j_password=admin" \
  https://rundeck.example.com/j_security_check

Triggering Jobs Programmatically

# Run a job by ID
curl -X POST \
  -H "X-Rundeck-Auth-Token: YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"options": {"environment": "production", "version": "1.2.3"}}' \
  https://rundeck.example.com/api/41/job/JOB_ID/run

# Response includes execution ID for monitoring
{
  "id": 12345,
  "href": "https://rundeck.example.com/api/41/execution/12345",
  "status": "running"
}

Integration Scenarios

CI/CD Pipelines

Trigger deployment jobs from Jenkins, GitHub Actions, or GitLab CI. I created a dedicated GitHub Action to make this integration even easier: rundeck-github-action.

# Using the Rundeck GitHub Action
- name: Deploy via Rundeck
  uses: Walsen/rundeck-github-action@v1
  with:
    rundeck_url: https://rundeck.example.com
    rundeck_token: ${{ secrets.RUNDECK_TOKEN }}
    action: run_job
    job_id: ${{ vars.DEPLOY_JOB_ID }}
    job_options: '{"version": "${{ github.sha }}", "environment": "production"}'
    wait_for_completion: true
    timeout: 600

The action supports multiple operations:

Action	Description
`run_job`	Execute a Rundeck job with options
`get_job_info`	Get job details
`list_jobs`	List jobs in a project
`get_execution`	Get execution details
`list_executions`	List executions for a job or project
`abort_execution`	Abort a running execution

You can also wait for job completion and get the execution status:

- name: Run deployment and wait
  id: deploy
  uses: Walsen/rundeck-github-action@v1
  with:
    rundeck_url: https://rundeck.example.com
    rundeck_token: ${{ secrets.RUNDECK_TOKEN }}
    action: run_job
    job_id: ${{ vars.DEPLOY_JOB_ID }}
    wait_for_completion: true

- name: Check result
  run: |
    echo "Status: ${{ steps.deploy.outputs.execution_status }}"
    echo "URL: ${{ steps.deploy.outputs.execution_url }}"

Monitoring Integration

Have your monitoring system trigger remediation jobs automatically:

# PagerDuty webhook handler example
def handle_alert(alert):
    if alert['type'] == 'high_memory':
        requests.post(
            f"{RUNDECK_URL}/api/41/job/{CLEAR_CACHE_JOB}/run",
            headers={"X-Rundeck-Auth-Token": RUNDECK_TOKEN},
            json={"options": {"server": alert['hostname']}}
        )

AI Agents and LLM Integration

This is where things get interesting. Rundeck's API makes it an ideal execution backend for AI-powered operations. Using the Strands Agents SDK from AWS, you can build intelligent agents that leverage Rundeck as their operational backbone.

First, install the dependencies:

pip install strands-agents strands-agents-tools requests

Create a custom Rundeck tool for your agent:

from strands import Agent, tool
import requests

RUNDECK_URL = "https://rundeck.example.com"
RUNDECK_TOKEN = "your-api-token"

@tool
def list_rundeck_jobs(project: str) -> dict:
    """List available runbooks in a Rundeck project.

    Args:
        project: The Rundeck project name

    Returns:
        List of available jobs with their IDs and descriptions
    """
    response = requests.get(
        f"{RUNDECK_URL}/api/41/project/{project}/jobs",
        headers={"X-Rundeck-Auth-Token": RUNDECK_TOKEN}
    )
    return response.json()

@tool
def run_rundeck_job(job_id: str, options: dict = None) -> dict:
    """Execute a Rundeck job/runbook.

    Args:
        job_id: The UUID of the job to execute
        options: Optional parameters for the job

    Returns:
        Execution details including status and execution ID
    """
    response = requests.post(
        f"{RUNDECK_URL}/api/41/job/{job_id}/run",
        headers={
            "X-Rundeck-Auth-Token": RUNDECK_TOKEN,
            "Content-Type": "application/json"
        },
        json={"options": options or {}}
    )
    return response.json()

@tool
def get_execution_status(execution_id: int) -> dict:
    """Check the status of a Rundeck job execution.

    Args:
        execution_id: The execution ID to check

    Returns:
        Execution status and details
    """
    response = requests.get(
        f"{RUNDECK_URL}/api/41/execution/{execution_id}",
        headers={"X-Rundeck-Auth-Token": RUNDECK_TOKEN}
    )
    return response.json()

# Create the operations agent
ops_agent = Agent(
    system_prompt="""You are an operations assistant with access to Rundeck runbooks.
    When asked to perform operational tasks:
    1. List available jobs to find the appropriate runbook
    2. Execute the job with the correct parameters
    3. Monitor the execution and report the results
    Always confirm before executing destructive operations.""",
    tools=[list_rundeck_jobs, run_rundeck_job, get_execution_status]
)

# Example interaction
response = ops_agent("The app servers are running low on disk space. \
    Can you clear the log files on the production cluster?")
print(response)

This pattern keeps humans in control—the AI can only execute pre-approved runbooks with proper access controls and audit trails. It's autonomous operations with guardrails. The agent can reason about which runbook to use, execute it, and report back the results, all while respecting Rundeck's RBAC policies.

Low-Code Integration with n8n

If you prefer a visual approach to automation, n8n offers native Rundeck integration. You can build workflows that connect GitHub events to Rundeck job executions without writing code.

The n8n Rundeck node supports:

Execute: Trigger any Rundeck job with parameters
Get Metadata: Retrieve job definitions and status

Combined with n8n's 400+ integrations, you can create powerful automation chains—for example, triggering a deployment job when a GitHub release is published, then notifying your team on Slack when it completes.

Licensing: Open Source vs Enterprise

Rundeck follows an open-core model. The community edition is fully open source under the Apache 2.0 license, while PagerDuty (who acquired Rundeck) offers commercial versions with additional features.

Rundeck Community (Open Source)

Free forever, includes:

Workflow execution and job definitions
Node management and key storage
Access control (ACL-based)
Scheduling and job activity logs
Community plugins
REST API

This is what my Docker image uses—perfect for small to medium teams and learning environments.

PagerDuty Runbook Automation (Commercial)

The enterprise offerings add:

Feature	Description
High Availability	Clustered deployments with auto-takeover
SSO Integration	SAML, LDAP, OAuth support
Enterprise Plugins	ServiceNow, PagerDuty, Datadog, VMware integrations
Advanced Scheduling	Blackout calendars, schedule forecasting
Failed Job Resume	Resume from the failed step instead of restarting
Enterprise Support	SLA-backed support and account management

PagerDuty offers two commercial options:

Runbook Automation Self-Hosted: You manage the infrastructure, they provide the enterprise features
Runbook Automation (Cloud): Fully managed SaaS with 99.9% SLA

Which Should You Choose?

For most use cases, start with the open source version. It's production-ready and covers the core functionality. Consider upgrading when you need:

High availability for mission-critical operations
SSO integration with your identity provider
Enterprise integrations (ServiceNow tickets, PagerDuty incidents)
Professional support with SLAs

The open source version isn't a "lite" version—it's a complete operations platform that many organizations run successfully in production.

The Gap: Node-to-Node File Transfers

While building a configuration distribution workflow, I hit a limitation. Rundeck's built-in file copier moves files from the Rundeck server to target nodes. But I needed to copy files from one node to multiple other nodes—specifically, distributing generated configs from a central server to application nodes.

The workaround was clunky: download to Rundeck, then upload to each destination. For large files or many destinations, this becomes a bottleneck.

So I built a plugin.

Rundeck Node-to-Node Plugin

The rundeck-node-to-node plugin adds a workflow step that copies files and directories between nodes using SSH/SFTP, fully integrated with Rundeck's node definitions and key storage.

Features

Two Transfer Modes: Route through Rundeck (reliable) or direct node-to-node (fast)
Parallel Transfers: Copy to multiple destinations simultaneously
Directory Support: Recursive copy with preserved permissions and timestamps
Key Storage Integration: Uses Rundeck's secure credential management
Error Handling: Option to continue on partial failures

Transfer Modes Explained

Via-Rundeck (Default)

Files download to Rundeck once, then upload to all destinations. Works in any network topology.

Direct Mode

Source pushes directly to destinations via SCP. Faster, but requires source-to-destination SSH access.

Node Configuration

The plugin uses Rundeck's standard node attributes:

config-server:
  hostname: 10.0.1.10
  username: deploy
  ssh-key-storage-path: keys/project/deploy-key
  tags: config

app-server-01:
  hostname: 10.0.1.20
  username: deploy
  ssh-key-storage-path: keys/project/deploy-key
  tags: app,production

app-server-02:
  hostname: 10.0.1.21
  username: deploy
  ssh-key-storage-path: keys/project/deploy-key
  tags: app,production

Plugin Options

Option	Required	Default	Description
Source Node	Yes	-	Node name where files originate
Source Path	Yes	-	File or directory path on source
Destination Nodes	Yes	-	Comma-separated destination node names
Destination Path	Yes	-	Target path on destinations
Recursive Copy	No	true	Copy directories recursively
Preserve Attributes	No	true	Keep timestamps and permissions
Transfer Mode	No	via-rundeck	`via-rundeck` or `direct`
Parallel Transfers	No	true	Transfer to multiple nodes in parallel
Continue on Error	No	false	Don't fail if some destinations fail

Installation

Download the JAR from Releases
Copy to Rundeck's libext directory
Restart Rundeck (or wait for auto-reload)

The plugin appears as a new workflow step type: "Node to Node File Copy".

Production-Ready Docker Image

Installing Rundeck traditionally means dealing with Java, databases, reverse proxies, and process management. To streamline this, I created a Docker image that bundles everything for production use.

The rundeck-image provides:

Rundeck 5.18.0 (configurable version)
Nginx reverse proxy for proper HTTP handling
Supervisor for process management
PostgreSQL support for production databases
Node-to-Node plugin pre-installed

Architecture

Quick Start

docker run -d \
  -p 8080:8080 \
  -e RUNDECK_GRAILS_URL=http://localhost:8080 \
  ghcr.io/walsen/rundeck-image:latest

Access Rundeck at http://localhost:8080.

Production Setup with Docker Compose

version: '3.8'

services:
  rundeck:
    image: ghcr.io/walsen/rundeck-image:latest
    ports:
      - "8080:8080"
    environment:
      RUNDECK_GRAILS_URL: https://rundeck.example.com
      RUNDECK_DATABASE_URL: jdbc:postgresql://db:5432/rundeck
      RUNDECK_DATABASE_USERNAME: rundeck
      RUNDECK_DATABASE_PASSWORD: ${DB_PASSWORD}
    volumes:
      - ./config/realm.properties:/etc/rundeck/realm.properties
      - rundeck-data:/var/rundeck
    depends_on:
      - db

  db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: rundeck
      POSTGRES_USER: rundeck
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - postgres-data:/var/lib/postgresql/data

volumes:
  rundeck-data:
  postgres-data:

Environment Variables

Variable	Default	Description
`RUNDECK_GRAILS_URL`	`http://localhost:8080`	External URL (must match your setup)
`RUNDECK_DATABASE_URL`	-	PostgreSQL JDBC connection string
`RUNDECK_DATABASE_USERNAME`	-	Database user
`RUNDECK_DATABASE_PASSWORD`	-	Database password

User Authentication

Mount a realm.properties file for basic authentication:

# username:password,role1,role2
admin:admin,user,admin
operator:operator123,user
readonly:viewer456,user

For production, integrate with LDAP or SSO—Rundeck supports both.

Custom Port Mapping

The image handles any external port. Just match RUNDECK_GRAILS_URL:

# Running on port 4440
docker run -d \
  -p 4440:8080 \
  -e RUNDECK_GRAILS_URL=http://localhost:4440 \
  ghcr.io/walsen/rundeck-image:latest

Practical Example: Config Distribution Workflow

Let me show how these tools work together in a real scenario.

The Setup:

1 config server generates environment-specific configuration files
5 application servers need these configs
Updates should happen without manual intervention

The Workflow:

Job Configuration:

Step	Node(s)	Action
1	config-server	`/opt/scripts/generate-config.sh`
2	config-server → app-01..05	Node-to-Node copy `/etc/myapp/config.yml`
3	app-01..05 (sequential)	`systemctl reload myapp`

The Result:

One-click config updates
Parallel distribution to all servers
Rolling reload to avoid downtime
Complete audit trail
Anyone with permissions can run it

Testing the Plugin

The plugin repository includes a Docker-based test environment:

cd test/
docker-compose up -d

This spins up:

A Rundeck instance with the plugin
Multiple test nodes for source/destination testing
Pre-configured SSH keys

See test/README.md for detailed testing instructions.

CI/CD Integration

Both repositories include GitHub Actions workflows:

rundeck-image:

Builds on every push to main
Publishes to GitHub Container Registry
Tags releases with version numbers

rundeck-node-to-node:

Builds and tests the plugin
Creates release artifacts
Publishes JAR files to GitHub Releases

Deploying Rundeck on AWS

If you're running on AWS, you have several options for deploying Rundeck. Here's my recommendation based on cost and operational effort:

Best Option: Amazon ECS on Fargate

For most teams, ECS Fargate hits the sweet spot between cost and operational simplicity:

Factor	ECS Fargate
Operational Effort	Low - no EC2 instances to manage
Cost	~$30-50/month for small workloads
Scaling	Easy horizontal scaling
Integration	Native ALB, Secrets Manager, CloudWatch
Persistence	EFS for Rundeck data, RDS for database

Budget Option: ECS with PostgreSQL Sidecar

For cost-conscious deployments, you can run PostgreSQL as a sidecar container instead of using RDS. This cuts costs significantly while maintaining a containerized approach:

Factor	With RDS	With PostgreSQL Sidecar
Monthly Cost	$50-80	$20-35
Database Backups	Automatic	Manual (EFS snapshots)
High Availability	RDS Multi-AZ option	Single container
Complexity	Two services	Single task

For most Rundeck deployments, the sidecar approach works great—Rundeck isn't a high-transaction workload, and EFS provides durability for your data.

Comparison of AWS Options

Option	Monthly Cost	Ops Effort	Best For
EC2 (t3.small)	$15-20	Medium	Budget-conscious, 24/7 workloads
EC2 Spot	$5-10	Medium	Dev/test, can tolerate interruptions
Lightsail	$5-20	Low	Learning, simple setups
ECS Fargate	$30-50	Low	Production, low maintenance
EKS	$100+	High	Already running Kubernetes

Why Not the Others?

EC2: More operational overhead (patching, monitoring), but cheaper for 24/7 workloads with Reserved Instances
EKS: Overkill unless you already have a Kubernetes cluster—adds complexity and ~$70/month just for the control plane
App Runner: Simpler but limited networking control, harder to reach private infrastructure
Lightsail: Cheapest option but limited VPC integration for node access

My Recommendations

Small team/learning: EC2 t3.small or Lightsail (~$10-20/month)
Production with low ops effort: ECS Fargate + RDS (~$50-80/month)
Enterprise/HA: ECS Fargate multi-task + RDS Multi-AZ (~$150+/month)

ECS Task Definition Example

{
  "family": "rundeck",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [
    {
      "name": "rundeck",
      "image": "ghcr.io/walsen/rundeck-image:latest",
      "portMappings": [
        {"containerPort": 8080, "protocol": "tcp"}
      ],
      "environment": [
        {"name": "RUNDECK_GRAILS_URL", "value": "https://rundeck.example.com"}
      ],
      "secrets": [
        {"name": "RUNDECK_DATABASE_URL", "valueFrom": "arn:aws:secretsmanager:..."},
        {"name": "RUNDECK_DATABASE_PASSWORD", "valueFrom": "arn:aws:secretsmanager:..."}
      ],
      "mountPoints": [
        {"sourceVolume": "rundeck-data", "containerPath": "/var/rundeck"}
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/rundeck",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ],
  "volumes": [
    {
      "name": "rundeck-data",
      "efsVolumeConfiguration": {
        "fileSystemId": "fs-xxxxxxxx",
        "transitEncryption": "ENABLED"
      }
    }
  ]
}

This setup gives you a production-ready Rundeck deployment with persistent storage, secrets management, and centralized logging—all with minimal operational overhead.

Conclusions

Rundeck transforms operational chaos into repeatable, auditable procedures. It's the tool I reach for when I need to bridge the gap between "we should automate this" and "we have time to build proper automation."

The Node-to-Node plugin fills a specific gap—efficient file distribution between nodes without routing everything through Rundeck. And the Docker image removes the friction of getting a production-ready Rundeck instance running.

If you're drowning in operational toil, give Rundeck a try. And if you need these specific capabilities, the tools are ready for you.