From Zero to Python Automation Engineer: My 30-Day Build Log
The system crashed at 3:17 AM. I woke up to 47 failed transaction alerts and a dashboard hemorrhaging red. My "autonomous income" bot had autonomously lost $127.43 in six hours. This wasn't the elegant, self-healing AI orchestra I'd envisioned on day one. This was a blunt lesson in production reality. That crash, and the thirty days of relentless building that led to it, transformed me from a Python hobbyist into an automation engineer. Here's the unvarnished build log.
The Problem: Manual Labor Doesn't Scale
I was trading prediction markets manually, staring at charts, placing maybe 20-30 trades a day. My ceiling was obvious: my own attention span and sleep schedule. The markets moved 24/7; I did not. I needed a system that could operate while I was coding, eating, or sleeping—a system that could execute on opportunities I'd miss, manage its own risk, and ideally, generate consistent profit. The goal wasn't to get rich overnight but to build a technical asset that could produce a baseline income, a "digital employee."
My stack had to be Python. Its ecosystem for data (pandas, numpy), automation (selenium, requests), and orchestration (celery, airflow) was unmatched. I gave myself 30 days to go from zero to a live, multi-lane, self-managing system.
Phase 1: Days 1-5 – The First Bot and Immediate Humiliation
I started with a single lane: scraping a specific data source and placing trades on Kalshi, a prediction market platform. The plan was simple: fetch data, apply a basic logic filter, execute via API.
My first version was a 150-line script, bot_v1.py. It failed spectacularly within minutes.
# bot_v1.py - THE FAILED PROTOTYPE
import requests
import time
class KalshiBot:
def __init__(self, email, password):
self.session = requests.Session()
self.base_url = "https://api.kalshi.com/trade-api/v2"
# Naive login - no error handling, no 2FA flow
login_payload = {"email": email, "password": password}
resp = self.session.post(f"{self.base_url}/login", json=login_payload)
resp.raise_for_status() # Crashed here if credentials were off
def get_markets(self):
# No pagination, assumed 100 markets was all
resp = self.session.get(f"{self.base_url}/markets?limit=100")
return resp.json()['markets']
def place_order(self, market_id, yes_no, count, price):
order_payload = {
"market_id": market_id,
"yes_no": yes_no,
"count": count,
"price": price,
"side": "buy"
}
# No validation, no check for balance, no unique client_order_id
resp = self.session.post(f"{self.base_url}/orders", json=order_payload)
print(resp.json())
return resp.json()
# Instant crash course in production:
# 1. Credentials hardcoded (later moved to environment variables).
# 2. No exponential backoff on API calls.
# 3. No logging, only print statements.
# 4. A single network timeout would kill the entire process.
The error log was brutal: requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.kalshi.com', port=443): Max retries exceeded. The script would run for a random period, then die. It had no memory, no state, no resilience.
Key Lesson 1: A script is not a system. Reliability requires state management, logging, and robust error handling from line one.
I rebuilt it with a SQLite database to track orders, added structured logging with the logging module, and implemented retry logic with tenacity. By day 5, I had a single bot that could run for 24 hours without intervention. It made 3 trades. Net profit: $1.26. The ROI was negative if I valued my time, but the proof of concept was alive.
Phase 2: Days 6-15 – Multi-Lane System and the Concurrency Nightmare
One lane was fragile. If its data source dried up, income went to zero. I needed parallel, independent "lanes" of automation—like the multi-lane system I later detailed in Building a Multi-Lane Autonomous Income System with Python and Claude AI.
I designed three lanes:
- Lane A: Data-scraping -> Kalshi trades (the original bot).
- Lane B: Twitter sentiment monitor -> Derivative market moves.
- Lane C: A simple arbitrage scanner between two platforms.
I used multiprocessing to run them concurrently. It was a disaster. Shared memory conflicts, zombie processes, and one lane crashing would often take the others down with it. The logs became indecipherable, a tangled mess from three processes writing to the same file.
I switched to a message queue model using Redis and Celery. Each lane became a set of independent Celery tasks, orchestrated by a central scheduler. This was the architectural shift that changed everything.
# lane_a_tasks.py - A PRODUCTION-READY CELERY TASK
import os
from celery import Celery
from tenacity import retry, stop_after_attempt, wait_exponential
import logging
from models import DatabaseManager # Custom SQLite wrapper
app = Celery('lanes', broker=os.environ.get('REDIS_URL', 'redis://localhost:6379/0'))
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@app.task(bind=True, max_retries=3)
@retry(stop=stop_after_attempt(4), wait=wait_exponential(multiplier=1, min=4, max=60))
def fetch_and_analyze_markets(self):
"""Celery task for Lane A: Market analysis."""
db = DatabaseManager()
try:
logger.info("Lane A: Starting market scan cycle.")
# This function now purely focuses on its business logic.
# State (found opportunities) is written to the database.
opportunities = scan_kalshi_markets() # Refactored function
for opp in opportunities:
# Deduplication check
if not db.opportunity_exists(opp['market_id'], opp['type']):
db.log_opportunity(opp)
# Trigger a separate order placement task for each opp
place_order_task.delay(opp)
logger.info(f"Lane A: Cycle complete. Found {len(opportunities)} opportunities.")
return {"status": "success", "opportunities_found": len(opportunities)}
except Exception as e:
logger.error(f"Lane A: Critical failure in fetch_and_analyze_markets: {e}")
# Celery will automatically retry based on the decorator and bind
raise self.retry(exc=e)
@app.task(bind=True, max_retries=2)
def place_order_task(self, opportunity):
"""Dedicated task for placing an order. Idempotent."""
# ... (order placement logic with idempotency checks)
This decoupling was transformative. Lanes could fail and restart independently. The system's state was in the database, not in process memory. I could monitor queue depths. By day 15, the three-lane system was running, generating ~$5-10/day with noticeable consistency. It was also scanning far more data, akin to the scale I achieved in my Kalshi market scanner project.
Phase 3: Days 16-25 – AI Orchestration and Over-Engineering
With the core system stable, I got ambitious. I wanted the system to adapt—to adjust trading parameters based on performance, to write its own simple scraping scripts for new sites, to summarize its daily performance in plain English.
I integrated the Claude API to act as a "captain." Every six hours, a Celery beat task would:
- Query the database for the last period's performance (P&L, win rate, number of trades).
- Feed this plus the current market context to Claude with a prompt: "Analyze this performance and suggest one parameter adjustment. Only respond with a JSON object like
{"param": "yes_price_threshold", "action": "increase", "value": 65}." - Parse the JSON response and apply the change to the configuration.
It worked. For two days. Then on day 18, Claude's response was: {"param": "max_trade_size", "action": "increase", "value": 500}. My system faithfully tried to place a $500 trade on a low-liquidity market, which failed (thankfully), but it also maxed out my daily trade count limit, freezing the bot for 24 hours.
Key Lesson 2: AI is a powerful, unpredictable tool. It must be sandboxed with absolute parameter boundaries and a human-in-the-loop approval step for any non-trivial change. I added a validation layer: any AI-suggested change outside a 10% deviation from the norm required manual email approval.
Phase 4: Days 26-30 – Self-Healing, Scaling, and The 3:17 AM Crash
The final push was about robustness. I implemented a "watchdog" process—a separate Celery worker that did nothing but monitor the health of the other tasks, check API key balances, and verify database connectivity.
I also built an automated rollback system. Every configuration change was versioned and logged. If the watchdog detected a streak of 5 consecutive trade failures, it would automatically revert the last parameter change and alert me.
This is what saved me after the 3:17 AM crash. The crash itself was caused by an external API deprecating an endpoint without notice. Lane A failed completely. The watchdog saw the 100% failure rate, triggered the rollback (which didn't help—the issue was external), and then performed its escalation procedure: it shut down Lane A entirely, redistributed compute resources to Lanes B and C, and sent me a high-priority alert with the full error traceback.
I woke up to the alert: [CRITICAL] Lane A terminated. External API failure. Error: 404 Client Error: Not Found for url:.... The system had performed a graceful degradation. It lost money in one lane but preserved the others and, most importantly, preserved its own operational integrity. I spent day 30 building a more sophisticated API failure detection that could switch to backup data sources.
The Results and The Code Pattern That Matters
After 30 days:
- System Uptime: 99.2% (after day 20).
- Lanes: 3 parallel automation lanes.
- Trades Executed: 1,847.
- Net Profit: $342.18.
- Codebase: 4,200 lines of Python across 47 files.
- Major Failures: 4, all recovered automatically or within 30 minutes of manual intervention.
The single most important technical pattern I learned wasn't about AI or concurrency. It was about idempotent, state-driven task design.
Every critical function in the system had to be idempotent (runnable multiple times without side effects) and its output had to be driven by, and recorded to, a persistent state (the database). This is what made retries, monitoring, and self-healing possible.
# THE CORE PATTERN: Idempotent, State-Driven Function
def place_order_if_needed(market_id, action, max_price):
"""
Idempotent order placement. Checks state before acting.
Returns order details if placed, None if already exists.
"""
db = DatabaseManager()
# 1. CHECK STATE: Have we already placed this order?
existing = db.get_order(market_id=market_id, action=action, status='open')
if existing:
logging.info(f"Order already exists: {existing['order_id']}. Skipping.")
return None
# 2. VALIDATE: Is the opportunity still valid?
current_price = get_market_price(market_id)
if current_price > max_price:
logging.warning(f"Price {current_price} > max {max_price}. Aborting.")
return None
# 3. EXECUTE: Place the order.
try:
order_resp = kalshi_api.place_order(...) # Your API call
order_id = order_resp['order_id']
except APIError as e:
logging.error(f"API failed for {market_id}: {e}")
raise # Let the retry mechanism handle it
# 4. RECORD STATE: Log the order to the database.
db.create_order({
'order_id': order_id,
'market_id': market_id,
'action': action,
'price': current_price,
'status': 'open'
})
logging.info(f"Order {order_id} placed successfully.")
return order_resp
This pattern—check state, validate, execute, record state—is the bedrock of reliable automation. It turns fragile scripts into resilient systems.
The Journey Never Ends
This 30-day sprint didn't create a finished product. It created a robust, extensible platform. New lanes can be added as Celery modules. The AI captain can be given new directives. The watchdog's health checks can expand.
The transition from writing scripts to engineering systems is a shift in mindset. You stop asking "Does this code work?" and start asking "How will this code fail, and how will the system recover?" That is the essence of the automation engineer's role.
The system now runs, earning its keep and evolving. It's a digital employee I built from scratch, one brutal lesson at a time.
Want This Built for Your Business?
I build custom Python automation systems, trading bots, and AI-powered tools that run 24/7 in production.
Currently available for consulting and contract work:
- Hire me on Upwork — Python automation, API integrations, trading systems
- Check my Fiverr gigs — Bot development, web scraping, data pipelines
- Get the MASTERCLAW bot pack — the same autonomous stack that powers this system
DM me on dev.to or reach out on either platform. I respond within 24 hours.
Need automation built? I build Python bots, Telegram systems, and trading automation.
View my Fiverr gigs → — Starting at $75. Delivered in 24 hours.
Want the full stack? Get the MASTERCLAW bot pack that powers this system: mikegamer32.gumroad.com/l/ipatug
Top comments (0)