DEV Community

Cover image for The Missing Piece for AI-Assisted Infrastructure Management
Shashi Kanth
Shashi Kanth

Posted on

The Missing Piece for AI-Assisted Infrastructure Management

I have been managing my homelab for years now, and it handles a lot. Two Kubernetes clusters, a mix of physical machines and VMs, a few components running in the cloud, reverse proxies managing traffic across all of it, databases, caches the usual sprawl that happens when you actually use your infrastructure for real workloads.

Deploying something new is never just one step. It's a choreographed sequence: update the config on the reverse proxy, deploy to the right Kubernetes cluster, make sure the database migration ran, verify the cache invalidated properly, check that the monitoring picked it up. Miss a step, and something breaks in a way that takes an hour to debug.

When Claude and ChatGPT started getting genuinely good at understanding infrastructure, I had a thought: what if I could just describe what I want deployed, and have an AI coordinate across all these systems?

The problem? I'm not handing my SSH keys or kubeconfig files to anyone. Or any AI.

So I built something to solve this. It's called SSH MCP Bridge, and I'm open-sourcing it today.

The Real Problem: Coordination Across Heterogeneous Infrastructure

If you have a single server, infrastructure management is straightforward. SSH in, run commands, done.

But real infrastructure even a homelab is rarely a single server. It's a collection of machines with different purposes, different access patterns, and different failure modes. Deploying a new service might touch five different systems. Troubleshooting a problem means correlating logs and metrics across multiple hosts.

This is where AI assistance could genuinely help. Not by being smarter than me at any individual task, but by handling the coordination overhead. The AI can SSH into the reverse proxy, check the config, hop over to the app server, verify the deployment, query the database to confirm the migration ran, and report back all while I describe what I'm trying to accomplish in plain English.

But current AI integrations with infrastructure are either too locked down to be useful, or they require you to paste credentials into places that make security folks nervous. I wanted something different. I wanted to tell Claude "deploy the new version to production" and have it actually coordinate across my systems without ever seeing a single IP address, password, or private key.

That's not paranoia. That's just good architecture.

The Solution: SSH MCP Bridge

MCP (Model Context Protocol) is how modern AI assistants like Claude, ChatGPT, and VS Code Copilot connect to external tools. Instead of copy-pasting command outputs back and forth, you expose tools that the AI can call directly. The ecosystem is still young, but it's maturing fast.

SSH MCP Bridge is an MCP server that sits between your AI assistant and your infrastructure:

AI Assistant (Claude/ChatGPT/VS Code)
           |
           v
    SSH MCP Bridge
           |
           v
Your Servers (web, db, cache, etc.)
Enter fullscreen mode Exit fullscreen mode

The AI talks to the bridge using MCP. The bridge holds your SSH credentials and maintains connections to your servers. When the AI wants to run a command, it asks the bridge. The bridge executes it and returns the results.

What the AI sees:

  • A list of friendly server names ("web-server", "database", "redis-cache")
  • Descriptions of what each server does
  • Tools to execute commands and manage sessions

What the AI never sees:

  • IP addresses
  • SSH private keys
  • Passwords
  • Network topology

This isn't just about security (though that's the main point). It also makes the AI's job easier. Instead of reasoning about "192.168.1.47", it thinks about "the production database server." That's closer to how we think about infrastructure anyway.

Two Ways to Deploy

I designed this for two different use cases, because my needs are different depending on context.

STDIO Mode is for local deployments. If you're running Claude Desktop on your laptop and your laptop can already SSH into your servers, this is the simplest path. The bridge runs as a subprocess that Claude talks to directly. No network exposure, no authentication complexity.

HTTP Mode is for remote deployments. Deploy the bridge on a server in your network (or in a container), and connect to it over HTTP/SSE. This is what you need for ChatGPT integration, or if you want a centralized MCP server that multiple clients can connect to. It supports API key auth for simple setups, and full OAuth 2.0/OIDC for enterprise environments.

What You Can Actually Do With This

Let me give you some real examples from my own usage.

Troubleshooting: "Check disk usage and memory on all servers, and tell me if anything looks concerning." The AI queries each host, aggregates the results, and gives you a summary. No more opening four terminal tabs.

Deployments: "Pull the latest code on the app server, run migrations on the database, restart the application, and verify it's responding." That's one sentence that coordinates multiple servers in the right order.

Configuration changes: "Add a new upstream server to the nginx config and reload." The AI can read the current config, make the edit, validate it, and apply it.

Investigation: "Show me the last 50 lines of the application log, and check if there are any related errors in the nginx access log." Cross-referencing logs across servers becomes conversational.

The key insight here is that the AI can maintain context across multiple commands and multiple servers. It remembers what it just checked, notices patterns, and can reason about the overall state of your system.

Session Management

SSH connections are relatively expensive to establish. You don't want to open a new connection for every command.

The bridge maintains a session pool. Once a connection to a host is established, it stays open and gets reused. Sessions automatically close after a configurable idle timeout (default is 30 minutes). There's also a cap on how many concurrent sessions per host, to prevent resource exhaustion.

For shell mode sessions (where you want working directory and environment to persist between commands), the bridge keeps a persistent shell channel open. For exec mode sessions (stateless, isolated commands), each command runs independently.

Security Considerations

I'm going to be direct about security, because infrastructure access is serious.

Credential isolation is the core principle. The bridge holds credentials; clients don't. Period.

Command-level control gives you multiple layers of restriction. First, the SSH username you configure determines what's possible at the OS level if you use a non-root user, root commands will fail even if the AI generates them. The operating system enforces this, not the bridge. On top of that, you can configure allowed or disallowed command patterns in the bridge itself. Want to block any command containing rm -rf or sudo? Add it to the deny list. Want to restrict execution to only a specific set of commands? Use an allow list. The AI never gets to run something you haven't permitted.

Authentication for HTTP mode uses either API keys (for simpler setups) or OAuth 2.0/OIDC (for enterprise). The OAuth integration works with Auth0, Azure AD, Okta, Keycloak anything that speaks standard OIDC.

Audit logging captures every command executed, with timestamp, user identity (from JWT tokens in OAuth mode), target host, and result. If you need to answer "who did what, when" for compliance or incident investigation, it's all there.

Container security: the Docker image runs as a non-root user. Mount your config and SSH keys as read-only volumes. Set resource limits. Standard practices, but important to mention.

Network isolation: in HTTP mode, put the bridge behind a reverse proxy with TLS. Restrict access at the firewall level. Consider deploying it on an internal network accessible only via VPN.

What you should NOT do: expose this to the public internet with only API key auth. That's asking for trouble.

Getting Started

If you want to try it out:

git clone https://github.com/shashikanth-gs/mcp-ssh-bridge.git
cd ssh-mcp-bridge
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Create a config file:

server:
  enable_stdio: true
  log_level: "INFO"

hosts:
  - name: my-server
    description: "Development server"
    host: "your-server.com"
    username: "your-user"
    private_key_path: "~/.ssh/id_rsa"
    execution_mode: "shell"

session:
  idle_timeout: 30
Enter fullscreen mode Exit fullscreen mode

For Claude Desktop, add to your config:

{
  "mcpServers": {
    "ssh-bridge": {
      "command": "/path/to/venv/bin/python",
      "args": ["-m", "ssh_mcp_bridge", "/path/to/config.yaml"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Restart Claude Desktop, and ask it to list your SSH hosts. If everything's configured correctly, you should see your server listed.

Docker deployment is also available if you prefer containers. Check the repo for docker-compose examples.

Why Open Source This?

I've been using this for my own infrastructure for a while. It started as a weekend project to scratch an itch, then grew as I added OAuth support, then got more polished as I realized other people might find it useful.

The MCP ecosystem needs more tools. Right now, most examples are simple file readers, web scrapers, basic API wrappers. Infrastructure management is a harder problem, but it's also where AI assistance can provide real leverage.

I'm also hoping to get feedback and contributions. There are features I want but haven't built yet: SCP/SFTP file transfers, bastion host (jump host) support, MCP resources for exposing server state. If any of those interest you, PRs are welcome.

Wrapping Up

AI-assisted infrastructure management is coming whether we like it or not. The question is whether we do it in a way that's secure and auditable, or in a way that we'll regret later.

SSH MCP Bridge is my attempt at the former. It's not the only approach, and it might not be right for everyone. But if you've been looking for a way to let AI help with server management without compromising your security posture, give it a try.

The repo is at github.com/shashikanth-gs/mcp-ssh-bridge. Docker images are on Docker Hub. Documentation covers everything from quick start to OAuth setup to security hardening.

Questions, feedback, or war stories about AI and infrastructure? I'm interested in hearing them.


Top comments (0)