DEV Community

Cover image for 8 Python Automation Techniques Every DevOps Engineer Needs to Master
Nithin Bharadwaj
Nithin Bharadwaj

Posted on

8 Python Automation Techniques Every DevOps Engineer Needs to Master

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

I remember the first time I had to update a configuration file on thirty servers by hand. I logged into each one, opened the file in vim, changed a single IP address, saved, and restarted the service. It took three hours. Then I did it again the next week. That’s when I decided there had to be a better way. Python gave me that way. Over the years I collected eight techniques that turned my manual drudgery into automated scripts. If you are new to DevOps, these methods will save you time and prevent mistakes. I’ll explain each one slowly, with simple words and plenty of code you can copy and modify.

Running commands on remote servers

The first technique is about executing commands on many servers at once without opening SSH sessions yourself. I use the fabric library for this. It handles connections, runs commands, and collects output. Think of it as a remote control for your servers.

Here is a simple example. Suppose you have a list of web servers and you want to check disk space on all of them simultaneously.

from fabric import Connection
from concurrent.futures import ThreadPoolExecutor

servers = ['web1.example.com', 'web2.example.com', 'web3.example.com']
connections = [Connection(host=s, user='ubuntu') for s in servers]

def run_df(conn):
    result = conn.run('df -h /', hide=True)
    return f"{conn.host}: {result.stdout.strip()}"

with ThreadPoolExecutor(max_workers=len(connections)) as executor:
    futures = [executor.submit(run_df, conn) for conn in connections]
    for future in futures:
        print(future.result())
Enter fullscreen mode Exit fullscreen mode

I wrote this script after a late‑night incident where I forgot to check disk usage and one server ran out of space. Now I run this before any deployment. It prints the disk usage of every server in parallel. You can replace the command with anything – restarting a service, checking memory, or pulling new code. The library also supports uploading files, so you can deploy your application by copying a tarball and extracting it. That is what my deployment script does: it puts the new build onto each server, unpacks it, and restarts the service. This is faster than doing it manually and less error‑prone.

Creating cloud resources with code

The second technique is provisioning cloud infrastructure directly from Python. Instead of clicking around the AWS console, I use boto3. It lets me create EC2 instances, security groups, and load balancers with a few lines of code.

For example, to launch a new web server:

import boto3

ec2 = boto3.client('ec2', region_name='us-west-2')
response = ec2.run_instances(
    ImageId='ami-0c55b159cbfafe1f0',
    InstanceType='t3.micro',
    KeyName='my-key',
    MinCount=1,
    MaxCount=1,
    TagSpecifications=[{
        'ResourceType': 'instance',
        'Tags': [{'Key': 'Name', 'Value': 'automated-server'}]
    }]
)
instance_id = response['Instances'][0]['InstanceId']
print(f"Created instance {instance_id}")
Enter fullscreen mode Exit fullscreen mode

I often need to wait until the instance is fully running before I can SSH into it. A waiter helps:

waiter = ec2.get_waiter('instance_running')
waiter.wait(InstanceIds=[instance_id])
print("Instance is now running.")
Enter fullscreen mode Exit fullscreen mode

When the instance is no longer needed, I delete it automatically:

ec2.terminate_instances(InstanceIds=[instance_id])
Enter fullscreen mode Exit fullscreen mode

I use this technique in my CI/CD pipeline. After a successful test, I provision a staging environment from scratch, run integration tests, and then tear it down. It saves money because we only pay for the time we use. The same approach works for other cloud providers; the libcloud library gives a unified interface.

Generating configuration files dynamically

The third technique deals with configuration files. Nginx, Prometheus, and PostgreSQL each need different files. Instead of maintaining copies for each environment, I use Jinja2 templates.

A template for an Nginx site might look like this (stored in nginx.conf.j2):

server {
    listen 80;
    server_name {{ domain }};
    location / {
        proxy_pass {{ upstream }};
    }
}
Enter fullscreen mode Exit fullscreen mode

Then with Python I render it for each environment:

from jinja2 import Environment, FileSystemLoader

env = Environment(loader=FileSystemLoader('./templates'))
template = env.get_template('nginx.conf.j2')
output = template.render(domain='api.example.com', upstream='http://backend:8080')
with open('/etc/nginx/sites-available/api', 'w') as f:
    f.write(output)
Enter fullscreen mode Exit fullscreen mode

I now have a script that can generate configurations for production, staging, and development just by passing different variables. No more “oops, I forgot to change the domain name”. This is idempotent – running it again produces the same file unless I change the variables.

Managing containers directly

Docker is everywhere. The fourth technique uses the official Docker SDK for Python to build, run, and monitor containers without shelling out to the command line.

Building an image and pushing it to a registry:

import docker
client = docker.from_env()
image, logs = client.images.build(path='./app', tag='myapp:v1.0')
for line in logs:
    if 'stream' in line:
        print(line['stream'].strip())
client.images.push('myapp:v1.0')
Enter fullscreen mode Exit fullscreen mode

Running a container with port mapping, environment variables, and volume mounts:

container = client.containers.run(
    'myapp:v1.0',
    name='web-instance',
    ports={'8000/tcp': 8000},
    environment={'DATABASE_URL': 'postgresql://db:5432/app'},
    volumes={'/data': {'bind': '/app/data', 'mode': 'rw'}},
    detach=True
)
print(f"Started container {container.id[:12]}")
Enter fullscreen mode Exit fullscreen mode

I use this to spin up ephemeral testing containers. When the test finishes, I call container.remove(force=True). No leftover containers cluttering the machine.

Orchestrating CI/CD pipelines

The fifth technique is triggering and monitoring builds from Python. Many teams use Jenkins, GitLab CI, or GitHub Actions. I prefer to script the pipeline decisions in Python because I can add complex logic.

Here’s how I trigger a parameterized Jenkins job and wait for the result:

import requests

jenkins_url = 'https://ci.example.com'
job_name = 'deploy-app'
auth = ('admin', 'api_token')
params = {'BRANCH': 'main', 'ENV': 'staging'}
response = requests.post(f"{jenkins_url}/job/{job_name}/buildWithParameters",
                         auth=auth, data=params)
build_number = response.headers['Location'].rstrip('/').split('/')[-1]
print(f"Triggered build #{build_number}")

# Poll until finished
import time
while True:
    resp = requests.get(f"{jenkins_url}/job/{job_name}/{build_number}/api/json", auth=auth)
    data = resp.json()
    if not data['building']:
        print(f"Build {build_number} result: {data['result']}")
        break
    time.sleep(10)
Enter fullscreen mode Exit fullscreen mode

I wrap this in a function that returns success or failure, then use it in my deployment script. If the build fails, I abort and send a notification. No more running to the Jenkins dashboard.

Centralizing logs with a custom handler

The sixth technique is about collecting logs from multiple machines into a single place – Elasticsearch, for example. Python’s built‑in logging module can send logs wherever you want. I created a handler that pushes each log line to an Elasticsearch index.

import logging
from elasticsearch import Elasticsearch

class ElasticsearchHandler(logging.Handler):
    def __init__(self, hosts=['localhost:9200'], index='app-logs'):
        super().__init__()
        self.es = Elasticsearch(hosts)
        self.index = index

    def emit(self, record):
        doc = {
            'timestamp': self.format(record),
            'logger': record.name,
            'level': record.levelname,
            'message': record.getMessage(),
        }
        self.es.index(index=self.index, document=doc)

logger = logging.getLogger('myapp')
logger.setLevel(logging.INFO)
handler = ElasticsearchHandler()
logger.addHandler(handler)

logger.info("Deployment started")
Enter fullscreen mode Exit fullscreen mode

Now all my application logs, including my deployment scripts, end up in Elasticsearch. I can search them with Kibana. I also add a field for the hostname so I know which server sent the log. This simple addition saved me hours of grepping through individual log files.

Getting secrets without hardcoding them

The seventh technique is retrieving credentials securely. I never put passwords, API keys, or database URLs in my code. Instead, I read them from a vault. I use HashiCorp Vault with the hvac library.

First, I authenticate with an approle:

import hvac
client = hvac.Client(url='https://vault.internal')
client.auth.approle.login(role_id='my-role', secret_id='my-secret')
Enter fullscreen mode Exit fullscreen mode

Then I request a dynamic database password:

creds = client.secrets.database.read_static_role(
    mount_point='database', name='postgres-app'
)
username = creds['data']['username']
password = creds['data']['password']
db_url = f"postgresql://{username}:{password}@db-host:5432/app"
Enter fullscreen mode Exit fullscreen mode

This password rotates automatically, and my code never stores it on disk. I also refresh the Vault token before it expires. This technique works with AWS Secrets Manager too – just use boto3’s get_secret_value. Now I can commit my code without fear of leaking secrets.

Idempotent server setup with Ansible

The eighth technique is using Ansible from Python in a way that is safe to run over and over. Instead of writing shell scripts that may fail if something already exists, Ansible modules are idempotent. The ansible-runner library lets me call Ansible from inside my Python scripts.

import ansible_runner

runner = ansible_runner.run(
    playbook='setup-webserver.yml',
    inventory='production',
    extravars={'app_version': '2.1'}
)
if runner.status == 'successful':
    print("Server setup completed")
else:
    print(f"Playbook failed: {runner.stats}")
Enter fullscreen mode Exit fullscreen mode

I use this to bootstrap new servers. The playbook installs packages, configures services, and deploys the application. If I run it again on the same server, it does nothing because the state is already correct. That is idempotence – a core principle of infrastructure automation.

Self‑healing health checks

As a bonus, I added a health monitor that restarts services automatically when they fail. I wrote a simple daemon that checks HTTP endpoints or process existence. If a service fails three times in a row, it runs systemctl restart.

import time
import requests
import subprocess

services = {
    'webserver': {'type': 'http', 'url': 'http://localhost:80/health'},
    'database': {'type': 'process', 'pattern': 'postgres'}
}

def check(service):
    cfg = services[service]
    if cfg['type'] == 'http':
        try:
            return requests.get(cfg['url'], timeout=5).status_code == 200
        except:
            return False
    else:  # process check
        result = subprocess.run(['pgrep', '-f', cfg['pattern']], capture_output=True)
        return result.returncode == 0

while True:
    for name in services:
        if not check(name):
            print(f"Warning: {name} is down")
            # Add logic to count consecutive failures and restart
    time.sleep(30)
Enter fullscreen mode Exit fullscreen mode

I put this inside a systemd service so it runs forever. When the web server crashes due to a memory leak, the monitor restarts it within a minute. I get a log entry, but the users experience only a brief hiccup.

These eight techniques cover the most common automation tasks in DevOps. Each one replaces a manual, error‑prone process with a script you can run, test, and share. Start with remote execution and configuration templates – they give you the most immediate relief. Then add provisioning and container management as your needs grow. The code examples above are ready to use; you only need to adjust hostnames, paths, and credentials. When I look back at the hours I spent logging into servers one by one, I wish I had learned these techniques sooner. Now, with Python, I can manage a hundred servers as easily as one. The real magic is that once you automate something, you never have to do it manually again – and that frees you to work on the interesting problems.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!


101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

Our Creations

Be sure to check out our creations:

Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | Java Elite Dev | Golang Elite Dev | Python Elite Dev | JS Elite Dev | JS Schools


We are on Medium

Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva

Top comments (0)