The MASTERCLAW Architecture: Running 12 Autonomous Python Bots on One VPS
My Hetzner VPS costs €6.18 a month. It has 2 vCPUs, 2GB of RAM, and 40GB of storage. Right now, as I write this, it's running 12 independent, autonomous Python bots 24/7. They're scraping data, managing social media accounts, publishing content, and monitoring financial markets.
This isn't a toy project; it's a production system that has been running with 99%+ uptime for months. It's the engine behind my Content Publishing Bot and the core infrastructure for my Multi-Lane Autonomous Income System.
The common way to do this is a mess. You ssh in, run nohup python bot1.py &, then nohup python bot2.py &, and so on. Your process list becomes a nightmare, logs are scattered, and when a bot inevitably crashes in the middle of the night, it stays dead until you manually intervene. There's a better way.
I call it the MASTERCLAW architecture. It's a four-layer system designed for maximum resilience and manageability on a single, low-cost server. Let's break it down.
The Problem: Python Processes are Fragile
Long-running Python scripts are inherently fragile. They can crash for a thousand reasons: a third-party API returns a 503 error, a web scrape target changes its HTML structure, a database connection flakes out, or you just have a plain old unhandled exception.
When you're running one script, it's manageable. When you're running a dozen, the probability of one of them being dead at any given time approaches 1. The challenge isn't just running the scripts; it's ensuring they keep running, no matter what.
This leads to a few key requirements for any serious multi-bot system:
- Process Supervision: Something needs to be watching the bots.
- Automatic Restarts: If a bot dies, it must be brought back to life immediately.
- Isolation: A crash in one bot should not affect any of the others.
- Manageability: I need a simple way to view logs, see what's running, and manually start/stop/restart individual bots without bringing down the whole system.
- Boot Persistence: The whole system must automatically start up if the server reboots.
The MASTERCLAW architecture solves all five.
The Four Layers of Resilience
The architecture is a layered approach, starting from a manual, interactive layer and building up to a fully automated, OS-integrated service.
Layer 1: tmux - The Interactive Cockpit
First, forget nohup and backgrounding with &. For development and manual debugging, tmux is your best friend. It's a terminal multiplexer that lets you create persistent sessions with multiple windows and panes.
My entire bot system runs inside a single tmux session named bots.
# To create the session for the first time
tmux new -s bots
# To detach from the session (it keeps running in the background)
Ctrl+b, then d
# To re-attach to the session later from anywhere
tmux attach -s bots
Why is this better? When I attach, I have a direct, interactive view of my entire system. I can see the live stdout of my main controller script, kill it with Ctrl+c, edit the code with vim, and restart it, all within one persistent SSH connection. It provides an "air traffic control" view that is indispensable for debugging.
But tmux is just for manual control. It doesn't solve automatic restarts or boot persistence. It's the cockpit, not the autopilot.
Layer 2: The Nanobot Gateway - One Script to Rule Them All
The core of the architecture is a single Python script I call masterclaw.py. This script has one job: to launch and manage all the other bots. I call the individual bots "nanobots" because they follow the single-responsibility principle. Each one is a small, simple script that does one thing well (e.g., twitter_bot.py, devto_publisher.py, price_scraper.py).
The masterclaw.py script uses Python's subprocess module to spawn each nanobot as an independent child process. This provides crucial isolation. If twitter_bot.py has a memory leak and crashes, it doesn't touch the masterclaw.py process or any of its siblings.
Here's a simplified look at the configuration and spawning logic:
import subprocess
import time
# Configuration of all bots to be managed
# The key is a friendly name, the value is the command to run
BOTS = {
"content_publisher": ["python", "bots/content_publisher.py"],
"social_media_manager": ["python", "bots/social_media_manager.py"],
"data_scraper_A": ["python", "bots/data_scraper_a.py"],
# ... add all 12+ bots here
}
# A dictionary to hold the running subprocess objects
running_bots = {}
def start_all_bots():
print("--- MASTERCLAW: Starting all nanobots ---")
for bot_name, bot_command in BOTS.items():
print(f"Starting bot: {bot_name}...")
# We redirect stdout and stderr to a log file for each bot
log_file = open(f"logs/{bot_name}.log", "a")
process = subprocess.Popen(
bot_command,
stdout=log_file,
stderr=log_file
)
running_bots[bot_name] = (process, bot_command, log_file)
print("--- MASTERCLAW: All nanobots started ---")
# --- Main execution ---
if __name__ == "__main__":
start_all_bots()
# In the next step, we'll add the watchdog loop here
while True:
time.sleep(60) # Keep the main script alive
This gateway pattern centralizes the management of all bots into one place. Adding a new bot is as simple as adding a line to the BOTS dictionary.
Layer 3: The Watchdog - Automated Self-Healing
The gateway script is running, and it has spawned all the nanobots. But what happens when one crashes? This is where the self-healing watchdog loop comes in.
The main masterclaw.py script enters an infinite loop after starting the bots. In this loop, it iterates through all the subprocesses it's managing and calls the .poll() method on each.
- If
.poll()returnsNone, the process is still running happily. - If
.poll()returns an integer (the exit code), the process has terminated.
When the watchdog finds a terminated process, it logs the event and immediately restarts that specific bot using the same command it used initially. This is the automatic self-healing mechanism.
This simple loop turns a collection of fragile scripts into a resilient, self-healing system.
Layer 4: systemd - The Unkillable Supervisor
We have a self-healing system, but it's all running inside a tmux session. If I accidentally kill that session, or more likely, if the server reboots for maintenance, the entire system goes down.
The final layer of resilience is to hand over the management of the masterclaw.py script itself to the operating system's own process supervisor: systemd.
systemd is the standard init system for most modern Linux distributions. We can write a simple service file that tells systemd how to start, stop, and manage our gateway script.
Here's my actual masterclaw.service file, located at /etc/systemd/system/masterclaw.service:
[Unit]
Description=Masterclaw Bot Management Service
After=network.target
[Service]
User=your_user # IMPORTANT: run as a non-root user
Group=your_group
WorkingDirectory=/home/your_user/masterclaw_project
ExecStart=/usr/bin/python3 /home/your_user/masterclaw_project/masterclaw.py
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Let's break down the critical lines:
-
WorkingDirectory: Sets the CWD so all relative paths in your script (likelogs/orbots/) work correctly. -
ExecStart: The full path to the Python interpreter and your gateway script. -
Restart=always: The magic directive. If themasterclaw.pyscript itself ever crashes for any reason,systemdwill automatically restart it. -
RestartSec=10: Wait 10 seconds before attempting a restart.
Once this file is in place, you enable and start it with a few commands:
# Reload systemd to recognize the new service file
sudo systemctl daemon-reload
# Enable the service to start on boot
sudo systemctl enable masterclaw.service
# Start the service immediately
sudo systemctl start masterclaw.service
# Check its status
sudo systemctl status masterclaw.service
Now, the masterclaw.py gateway is a true system service. It will start on boot and will be restarted if it ever dies. And since it is responsible for running the nanobots, the entire system is now fully persistent and self-healing.
The Full masterclaw.py Code
Here is a more complete, runnable version of the gateway and watchdog script. It combines the spawner and the self-healing loop.
import subprocess
import time
import logging
from datetime import datetime
# --- Configuration ---
LOG_FILE = "logs/masterclaw.log"
BOT_LOG_DIR = "logs/bots"
# Define all bots to be managed
BOTS = {
"publisher": ["python", "bots/publisher.py"],
"scraper_A": ["python", "bots/scraper_a.py"],
"scraper_B": ["python", "bots/scraper_b.py"],
"social_poster": ["python", "bots/social.py"],
# ... add as many as you need
}
# --- Logging Setup ---
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - [%(levelname)s] - %(message)s",
handlers=[
logging.FileHandler(LOG_FILE),
logging.StreamHandler() # Also print to console
]
)
# --- Core Logic ---
class MasterClaw:
def __init__(self):
# { bot_name: (process, command, log_file_handle) }
self.running_bots = {}
def start_bot(self, name):
"""Starts a single, specified bot."""
if name in self.running_bots:
logging.warning(f"Bot '{name}' is already running. Cannot start.")
return
if name not in BOTS:
logging.error(f"Bot '{name}' not found in configuration.")
return
command = BOTS[name]
try:
log_path = f"{BOT_LOG_DIR}/{name}.log"
log_file = open(log_path, "a")
process = subprocess.Popen(
command,
stdout=log_file,
stderr=subprocess.STDOUT,
text=True
)
self.running_bots[name] = (process, command, log_file)
logging.info(f"Successfully started bot '{name}' with PID {process.pid}.")
except Exception as e:
logging.error(f"Failed to start bot '{name}': {e}")
def start_all(self):
"""Initial start of all configured bots."""
logging.info("--- MASTERCLAW Initializing ---")
for bot_name in BOTS:
self.start_bot(bot_name)
logging.info("--- All bots have been launched ---")
def watchdog_loop(self):
"""The main self-healing loop."""
logging.info("--- Watchdog is now active ---")
while True:
time.sleep(15) # Check every 15 seconds
for name in list(self.running_bots.keys()):
process, command, log_file = self.running_bots[name]
return_code = process.poll()
if return_code is not None: # Process has terminated
logging.warning(f"WATCHDOG: Bot '{name}' has terminated with code {return_code}.")
# Clean up old resources
log_file.close()
del self.running_bots[name]
# Restart the bot
logging.info(f"WATCHDOG: Attempting to restart bot '{name}'...")
self.start_bot(name)
def shutdown(self):
"""Gracefully shut down all bot processes."""
logging.info("--- MASTERCLAW Shutting Down ---")
for name, (process, _, log_file) in self.running_bots.items():
logging.info(f"Terminating bot '{name}' (PID: {process.pid})")
process.terminate() # Send SIGTERM
try:
process.wait(timeout=10) # Wait up to 10 seconds
except subprocess.TimeoutExpired:
logging.warning(f"Bot '{name}' did not terminate gracefully. Sending SIGKILL.")
process.kill()
log_file.close()
logging.info("--- All bots have been shut down ---")
if __name__ == "__main__":
claw = MasterClaw()
try:
claw.start_all()
claw.watchdog_loop()
except KeyboardInterrupt:
print("\nKeyboard interrupt received.")
finally:
claw.shutdown()
This script is robust. It logs everything, handles bot crashes, and can be shut down gracefully with Ctrl+c, which is important for closing file handles and database connections properly.
Performance and Memory Optimization
"12 Python scripts? That must use a ton of RAM!"
Not really. Here's a current snapshot from htop on my €6 VPS:
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ COMMAND
1234 myuser 20 0 180.2M 85.1M 15.2M S 1.3 4.3 1h25:12 python3 masterclaw.py
5678 myuser 20 0 110.5M 60.3M 12.1M S 0.7 3.0 0:45.33 python3 bots/publisher.py
5680 myuser 20 0 95.7M 45.8M 10.9M S 0.0 2.3 0:22.11 python3 bots/scraper_a.py
5682 myuser 20 0 98.2M 42.1M 11.5M S 0.0 2.1 0:18.45 python3 bots/social.py
5684 myuser 20 0 89.9M 38.5M 9.8M S 0.0 1.9 0:15.78 python3 bots/scraper_b.py
... (8 more similar processes) ...
The key to low resource usage is the "nanobot" philosophy. Each bot is a small, focused script that imports only the libraries it needs. A bot that just scrapes a website with requests and BeautifulSoup might only use 30-40MB of RAM. A more complex one using selenium might use 100MB.
Because they are separate processes, the Python interpreter is loaded into memory for each one, which has some overhead. But this is a small price to pay for the incredible resilience and isolation it provides. On a 2GB RAM server, I could comfortably run 20-25 of these typical bots before memory becomes a concern.
This architecture works because it embraces the Unix philosophy: build small, single-purpose tools and compose them into a larger system. The MASTERCLAW gateway is the composer, the nanobots are the instruments, and systemd is the concert hall that ensures the show always goes on.
Want This Built for Your Business?
I build custom Python automation systems, trading bots, and AI-powered tools that run 24/7 in production.
Currently available for consulting and contract work:
- Hire me on Upwork — Python automation, API integrations, trading systems
- Check my Fiverr gigs — Bot development, web scraping, data pipelines
DM me on dev.to or reach out on either platform. I respond within 24 hours.
Need automation built? I build Python bots, Telegram systems, and trading automation.
View my Fiverr gigs → — Starting at $75. Delivered in 24 hours.
Top comments (0)