Siraj Raval

Posted on Mar 13

Building a 24/7 AI Agent with OpenClaw - Architecture Deep Dive

#ai #opensource #automation #machinelearning

Building a 24/7 AI Agent with OpenClaw - Architecture Deep Dive

By Siraj Raval

Most AI agent tutorials show you toy examples: chatbots that answer questions, scripts that summarize text, demos that work perfectly on cherry-picked data.

This isn't one of those.

I'm going to show you the architecture behind OpenClaw - a production AI agent that's been running 24/7 for 6 months, handling my email, calendar, finances, and daily operations with minimal downtime.

This is what actually shipping AI agents looks like.

System Overview

Core Requirements

Before jumping into code, let's define what a production AI agent actually needs:

Persistent state - Memory that survives crashes and restarts
Multi-modal tools - Ability to interact with APIs, databases, browsers, and file systems
Proactive execution - Run tasks without waiting for user commands
Safe boundaries - Guardrails to prevent catastrophic mistakes
Observable behavior - Comprehensive logging for debugging and trust

Most AI agent frameworks focus on #2 (tools) while ignoring the others. That's why they fail in production.

High-Level Architecture

┌─────────────────────────────────────────────────┐
│                   User Inputs                    │
│  (WhatsApp, Terminal, Heartbeat Scheduler)       │
└────────────────┬────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────┐
│              OpenClaw Core Engine                │
│  ┌──────────────────────────────────────────┐   │
│  │  LLM Reasoning Layer (Claude Opus 4)     │   │
│  └──────────────────────────────────────────┘   │
│  ┌──────────────────────────────────────────┐   │
│  │  Memory System                           │   │
│  │  - Daily logs (memory/YYYY-MM-DD.md)     │   │
│  │  - Long-term memory (MEMORY.md)          │   │
│  │  - Semantic search (ChromaDB)            │   │
│  └──────────────────────────────────────────┘   │
│  ┌──────────────────────────────────────────┐   │
│  │  Tool Execution Layer                    │   │
│  │  - Gmail/Calendar API                    │   │
│  │  - Browser automation                    │   │
│  │  - Voice calls (VAPI)                    │   │
│  │  - File system operations                │   │
│  │  - Shell commands                        │   │
│  └──────────────────────────────────────────┘   │
└─────────────────┬───────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────┐
│             External Integrations                │
│  Gmail • Calendar • WhatsApp • Airbnb • Amazon   │
│  Bank APIs • Travel APIs • Voice Services        │
└─────────────────────────────────────────────────┘

Tech Stack

Runtime:

Node.js 22 (persistent process)
TypeScript for type safety (optional but recommended)
PM2 for process management and auto-restart

LLM:

Primary: Anthropic Claude Opus 4 (200K context, strong reasoning)
Fallback: OpenAI GPT-4o (faster, cheaper for simple tasks)

Storage:

Memory: Markdown files + ChromaDB (vector embeddings)
Structured data: PostgreSQL (optional, for analytics)
Credentials: Encrypted vault (JSON with restricted permissions)

Infrastructure:

AWS EC2 t3.medium (4GB RAM, 2 vCPU)
Ubuntu 22.04 LTS
Nginx reverse proxy (for web interfaces)

Cost: ~$150/month (compute + API calls)

Core Components

1. Memory System

This is the most critical piece. Without reliable memory, your agent is just an expensive chatbot.

Daily Logs

Every session writes to memory/YYYY-MM-DD.md. Format:

# Memory Log - 2026-03-12

## 08:30 - Morning Brief
- Checked email: 3 sponsor inquiries, 1 urgent payment reminder
- Calendar: Meeting with team at 14:00 CET
- Action: Drafted responses to sponsors

## 10:15 - Sponsor Follow-up
- John from TechCorp responded, accepting $3.5K deal
- Generated invoice, sent via email
- Added to sponsor pipeline spreadsheet
- Set reminder for payment (NET30, due April 11)

## 15:45 - Travel Booking
- Siraj requested flights to Lisbon May 15-22
- Searched 3 options: KLM €245, TAP €198, Ryanair €89
- Recommended TAP (balance of cost + convenience)
- Awaiting confirmation to book

Why plain markdown?

Human-readable (can grep/edit manually)
Version-controllable (git tracks changes)
LLM-friendly (easy to inject into context)

Semantic Search

Daily logs grow large. Searching linearly is slow. ChromaDB provides fast semantic lookup.

Setup:

# memory_store.py - embedding + retrieval
import chromadb
from chromadb.utils import embedding_functions

client = chromadb.PersistentClient(path="./chroma_db")
ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key=os.environ["OPENAI_API_KEY"],
    model_name="text-embedding-3-small"
)

collection = client.get_or_create_collection(
    name="memory",
    embedding_function=ef
)

# Index memory files
def index_memories():
    for file in glob("memory/*.md"):
        with open(file) as f:
            content = f.read()
        chunks = split_into_chunks(content, max_chars=1000)
        collection.add(
            documents=chunks,
            ids=[f"{file}_{i}" for i in range(len(chunks))],
            metadatas=[{"source": file, "date": extract_date(file)}] * len(chunks)
        )

# Search
def search_memory(query, n=5):
    results = collection.query(query_texts=[query], n_results=n)
    return results['documents'][0]

Usage:

$ python3 memory_store.py search "sponsor deal TechCorp"
# Returns relevant chunks about TechCorp negotiations

This runs in <200ms and scales to thousands of files.

2. Tool System

AI agents are only as useful as the tools they can use. OpenClaw has 20+ integrated tools.

Tool Definition Pattern

Each tool follows a standard interface:

// tools/gmail.js
export const gmailTool = {
  name: "gmail_search",
  description: "Search Gmail messages with optional filters",
  parameters: {
    query: { type: "string", required: true },
    maxResults: { type: "number", default: 10 },
    includeBody: { type: "boolean", default: false }
  },

  async execute({ query, maxResults, includeBody }) {
    const auth = await getGoogleAuth();
    const gmail = google.gmail({ version: 'v1', auth });

    const response = await gmail.users.messages.list({
      userId: 'me',
      q: query,
      maxResults
    });

    const messages = await Promise.all(
      response.data.messages.map(msg => 
        gmail.users.messages.get({ userId: 'me', id: msg.id })
      )
    );

    return messages.map(formatMessage);
  }
};

Why this pattern?

LLM can understand tool from description alone
Type-safe execution (prevents malformed calls)
Easy to add new tools without touching core logic

Browser Automation

For sites without APIs (Airbnb, Amazon, Marktplaats), browser automation is essential.

Local Playwright (last resort):

// tools/browser.js
import { chromium } from 'playwright';

export async function browserTask(taskDescription) {
  const browser = await chromium.launch({ headless: true });
  const context = await browser.newContext({
    userAgent: 'Mozilla/5.0...',
    viewport: { width: 1920, height: 1080 }
  });

  const page = await context.newPage();

  // Task-specific logic here

  await browser.close();
}

Browser-Use Cloud (preferred for hostile sites):

// tools/browser-use.js
import axios from 'axios';

export async function browserUseTask({ url, instructions }) {
  const session = await axios.post('https://api.browser-use.com/api/v2/sessions', {
    proxyCountryCode: 'nl',
    keepAlive: true
  }, {
    headers: { 'X-Browser-Use-API-Key': process.env.BROWSER_USE_KEY }
  });

  const result = await axios.post(`https://api.browser-use.com/api/v2/sessions/${session.data.id}/tasks`, {
    type: 'navigate_and_extract',
    url,
    instructions
  });

  return result.data;
}

When to use which:

API exists? → Use API (fastest, cheapest)
Simple scraping? → Playwright (no external cost)
Login/CAPTCHA/bot detection? → Browser-Use Cloud

3. Proactive Heartbeat System

Most AI agents are reactive. OpenClaw is proactive.

Implementation:

// heartbeat.js
import cron from 'node-cron';

// Run every 30 minutes
cron.schedule('*/30 * * * *', async () => {
  const context = await loadContext();

  // Check if this is quiet hours (11 PM - 8 AM Amsterdam)
  if (isQuietHours(context.timezone)) {
    return; // Skip heartbeat
  }

  const prompt = `
You are running a proactive heartbeat check.

Tasks to consider:
1. Check for urgent emails (unread from last 30 min)
2. Review calendar for upcoming events (<2h away)
3. Monitor sponsor deal pipeline for overdue payments
4. Check for new opportunities (Twitter mentions, etc.)

Current time: ${new Date().toISOString()}
Last heartbeat: ${context.lastHeartbeat}

If something needs attention, notify Siraj via WhatsApp.
Otherwise, reply with HEARTBEAT_OK and continue background work.
`;

  const response = await callLLM(prompt, context);

  if (response !== 'HEARTBEAT_OK') {
    await sendWhatsApp(response);
  }

  await updateLastHeartbeat();
});

Key features:

Respects quiet hours
Batches multiple checks (efficiency)
Only notifies when necessary
Runs indefinitely without supervision

4. Safety & Boundaries

An agent with access to email, calendar, and finances can do a lot of damage. Safety isn't optional.

Permission Levels

const PERMISSION_LEVELS = {
  READ: 'read',         // Safe, always allowed
  DRAFT: 'draft',       // Generate content, don't send
  EXECUTE: 'execute',   // Take action automatically
  ASK: 'ask'           // Require explicit approval
};

const toolPermissions = {
  'gmail_search': PERMISSION_LEVELS.READ,
  'gmail_send': PERMISSION_LEVELS.DRAFT,  // Drafts only
  'calendar_add': PERMISSION_LEVELS.EXECUTE,
  'bank_transfer': PERMISSION_LEVELS.ASK,  // Always ask
};

Validation Layer

async function executeTool(toolName, params) {
  const permission = toolPermissions[toolName];

  if (permission === PERMISSION_LEVELS.ASK) {
    await requestApproval(toolName, params);
  }

  if (permission === PERMISSION_LEVELS.READ || permission === PERMISSION_LEVELS.EXECUTE) {
    return await tools[toolName].execute(params);
  }

  // DRAFT mode: return without executing
  return { status: 'draft', content: await tools[toolName].preview(params) };
}

This prevents catastrophic mistakes like:

Sending unreviewed emails to sponsors
Transferring money without approval
Deleting important files

5. Error Handling & Resilience

Production systems fail. Plan for it.

Retry Logic

async function robustAPICall(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429) {
        // Rate limited, exponential backoff
        await sleep(Math.pow(2, i) * 1000);
      } else if (error.status >= 500) {
        // Server error, retry
        await sleep(1000);
      } else {
        // Client error, don't retry
        throw error;
      }
    }
  }
  throw new Error(`Failed after ${maxRetries} retries`);
}

Graceful Degradation

async function sendEmail(to, subject, body) {
  try {
    await gmail.send({ to, subject, body });
  } catch (error) {
    // Gmail API failed, try SMTP fallback
    await smtpSend({ to, subject, body });
  } catch (fallbackError) {
    // Both failed, save to drafts
    await saveDraft({ to, subject, body });
    await notify("Email send failed, saved to drafts");
  }
}

Comprehensive Logging

// logger.js
import winston from 'winston';

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  transports: [
    new winston.transports.File({ filename: 'logs/error.log', level: 'error' }),
    new winston.transports.File({ filename: 'logs/combined.log' }),
    new winston.transports.Console({ format: winston.format.simple() })
  ]
});

// Log every tool execution
function logToolExecution(toolName, params, result, duration) {
  logger.info({
    event: 'tool_execution',
    tool: toolName,
    params,
    result: result.status,
    durationMs: duration,
    timestamp: new Date().toISOString()
  });
}

Logs saved me countless times when debugging "why did it do that?"

Deployment & Operations

Infrastructure Setup

# EC2 instance (t3.medium, Ubuntu 22.04)
$ ssh user@your-ec2-ip

# Install dependencies
$ curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
$ sudo apt-get install -y nodejs git
$ npm install -g pm2

# Clone OpenClaw
$ git clone https://github.com/yourusername/openclaw.git
$ cd openclaw
$ npm install

# Set up credentials
$ cp .env.example .env
$ nano .env  # Add API keys

# Start with PM2
$ pm2 start openclaw.js --name "openclaw-agent"
$ pm2 save
$ pm2 startup  # Auto-restart on reboot

Monitoring

# View logs
$ pm2 logs openclaw-agent

# Monitor resource usage
$ pm2 monit

# Check status
$ pm2 status

Backup Strategy

# Daily backup script (cron job)
#!/bin/bash
tar -czf backup-$(date +%Y%m%d).tar.gz memory/ MEMORY.md config/
aws s3 cp backup-$(date +%Y%m%d).tar.gz s3://openclaw-backups/

Performance Optimization

LLM Call Reduction

The biggest cost is LLM API calls. Optimize aggressively.

Before optimization:

Every heartbeat: 1 LLM call
Every email check: 1 call per email
Cost: ~$300/month

After optimization:

Batch email processing (1 call for 10 emails)
Cache routine decisions (no LLM for spam detection)
Use fast models for simple tasks
Cost: ~$80/month

Code example:

// Batch processing
async function processEmails(emails) {
  // Instead of N LLM calls, make 1 batched call
  const prompt = `
Review these ${emails.length} emails and categorize:
${emails.map((e, i) => `${i+1}. From: ${e.from}, Subject: ${e.subject}`).join('\n')}

For each, output: {"id": N, "category": "urgent|normal|spam", "action": "reply|archive|flag"}
`;

  const result = await callLLM(prompt);
  return JSON.parse(result);
}

Caching Strategies

import NodeCache from 'node-cache';
const cache = new NodeCache({ stdTTL: 3600 }); // 1 hour TTL

async function getCachedGoogleAuth() {
  const cached = cache.get('google_auth');
  if (cached) return cached;

  const auth = await freshGoogleAuth();
  cache.set('google_auth', auth);
  return auth;
}

Lessons Learned

After 6 months running OpenClaw in production, here's what Siraj Raval learned:

1. Start with Read-Only

Don't give your agent write permissions on day 1. Start with:

Read emails
Summarize calendar
Generate drafts (don't send)

Build trust over weeks before enabling autonomous actions.

2. Logging Saves Everything

When something goes wrong (and it will), comprehensive logs are the only way to debug. Log:

Every tool call
Every LLM prompt + response
Every decision point
Execution time for performance analysis

3. Memory Architecture Matters More Than Model Choice

I've swapped between GPT-4, Claude, and others. The model matters less than:

Quality of memory retrieval
Context provided in prompts
Clear tool descriptions

A mediocre model with great memory > amazing model with no memory.

4. Edge Cases Are Infinite

You will never anticipate all failure modes. Plan for:

Malformed API responses
Unexpected user inputs
Network failures mid-operation
Rate limits at 3 AM

Defensive coding isn't optional.

What's Next

OpenClaw is evolving rapidly. Upcoming features:

Multi-agent orchestration - Spawn specialized sub-agents for complex tasks
Voice interface - Talk to OpenClaw naturally via phone
Mobile app - Native iOS/Android apps
Community marketplace - Share and install tool plugins
Self-improvement - Agent analyzes its own failures and updates code

The vision: every developer should be able to deploy their own AI agent in <1 hour.

Get Involved

OpenClaw is open-source. Want to contribute?

GitHub: github.com/llsourcell (search for OpenClaw repo)
Issues: Bug reports, feature requests
PRs: New tools, optimizations, docs

Follow Siraj Raval for updates:

YouTube: youtube.com/@SirajRaval (775K+ subscribers)
GitHub: github.com/llsourcell
LinkedIn: linkedin.com/in/sirajraval

This is what production AI looks like. Not perfect, but real. Ship it.

Tags: #AI #MachineLearning #OpenClaw #AIAgents #ProductionAI #NodeJS #Claude #Automation #DevOps #SirajRaval

Meta Description: Siraj Raval breaks down the architecture of OpenClaw, a production AI agent running 24/7. Code examples, deployment guide, and lessons learned from 6 months in production.

DEV Community

Building a 24/7 AI Agent with OpenClaw - Architecture Deep Dive

Building a 24/7 AI Agent with OpenClaw - Architecture Deep Dive

System Overview

Core Requirements

High-Level Architecture

Tech Stack

Core Components

1. Memory System

Daily Logs

Semantic Search

2. Tool System

Tool Definition Pattern

Browser Automation

3. Proactive Heartbeat System

4. Safety & Boundaries

Permission Levels

Validation Layer

5. Error Handling & Resilience

Retry Logic

Graceful Degradation

Comprehensive Logging

Deployment & Operations

Infrastructure Setup

Monitoring

Backup Strategy

Performance Optimization

LLM Call Reduction

Caching Strategies

Lessons Learned

1. Start with Read-Only

2. Logging Saves Everything

3. Memory Architecture Matters More Than Model Choice

4. Edge Cases Are Infinite

What's Next

Get Involved

Top comments (0)