Caper B

Posted on Feb 18

I Built an AI Agent That Makes Money 24/7 on a Mac Mini — Here's the Full Stack

#python #ai #automation #productivity

What if your computer made money while you slept? I built an AI agent on a Mac Mini M4 Pro that runs 24/7 — creating content, generating images, managing outreach — and I'm going to show you exactly how.

The Hardware: Why a Mac Mini M4 Pro

I'm running a Mac Mini M4 Pro with 24GB unified memory and a 1TB SSD. Total cost: ~$1,600.

Why this specific machine?

Unified memory means GPU and CPU share the same RAM. You can run 8B-parameter LLMs locally without a discrete GPU.
Power draw is ~15W idle. That's about $15/year in electricity. Compare that to an RTX 4090 rig at 350W.
macOS launchd is a rock-solid process supervisor that restarts crashed services automatically. No Docker, no systemd, no cloud bills.
It's silent. This thing sits on my desk and I forget it's there.

Cloud GPU instances would cost $200-500/month for equivalent capability. This box pays for itself in a few months.

The Architecture

┌─────────────────────────────────────┐
│       Agent Framework (OpenClaw)     │
│  (orchestration, memory, decisions)  │
├──────────┬──────────┬───────────────┤
│  Ollama  │ ComfyUI  │  External APIs │
│ (local   │ (image   │  (ElevenLabs,  │
│  LLMs)   │  gen)    │   Serper, etc) │
├──────────┴──────────┴───────────────┤
│           launchd (macOS)            │
│      (process supervision, 24/7)     │
├─────────────────────────────────────┤
│     Mac Mini M4 Pro / 24GB / 1TB     │
└─────────────────────────────────────┘

Four layers: hardware, process supervision, AI engines, orchestration brain. Each layer is independently replaceable.

The Brain: Agent Orchestration

The orchestration layer manages everything. It decides what to work on, routes tasks to the right model, and maintains persistent memory so it doesn't forget what it learned yesterday.

Key design choices:

Claude Sonnet as the primary reasoning model for complex tasks — writing, analysis, strategy
Local Ollama models (Llama 3.1 8B, Dolphin-Llama3) as fallbacks for cheaper tasks like formatting and summarization
Persistent memory using queryable markdown so the agent remembers past decisions, client preferences, and project state
Up to 8 concurrent sub-agents for parallel task execution

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-5-20250929",
        "fallbacks": [
          "ollama/llama3.1:8b",
          "ollama/dolphin-llama3:8b"
        ]
      },
      "maxConcurrent": 4,
      "subagents": { "maxConcurrent": 8 }
    }
  }
}

The primary model handles complex reasoning. Local fallbacks handle boilerplate. This keeps API costs under $10/month.

Ollama: Local LLMs at $0/Month

Ollama runs open-source models natively on Apple Silicon:

ollama pull llama3.1:8b
ollama pull dolphin-llama3:8b

The 8B models run at ~40 tokens/sec on the M4 Pro. Not GPT-4 fast, but fast enough for background tasks. And the cost is literally zero.

I use local models for:

Text summarization and formatting
Initial content drafts that get refined by Claude
Data extraction and parsing
Anything that doesn't need frontier-level reasoning

ComfyUI: Image Generation Engine

ComfyUI runs Stable Diffusion and FLUX models locally with a node-based workflow system and — critically — a REST API you can hit programmatically:

import json, urllib.request, time

COMFYUI = "http://127.0.0.1:8188"

def generate_image(workflow: dict, output_dir="/tmp") -> str:
    """Queue a ComfyUI workflow and return the output path."""
    payload = json.dumps({"prompt": workflow}).encode()
    req = urllib.request.Request(
        f"{COMFYUI}/prompt",
        data=payload,
        headers={"Content-Type": "application/json"}
    )
    prompt_id = json.loads(
        urllib.request.urlopen(req).read()
    )["prompt_id"]

    # Poll for completion
    while True:
        resp = urllib.request.urlopen(f"{COMFYUI}/history/{prompt_id}")
        history = json.loads(resp.read())
        if prompt_id in history:
            outputs = history[prompt_id]["outputs"]
            for node_id in outputs:
                for img in outputs[node_id].get("images", []):
                    url = (f"{COMFYUI}/view?filename={img['filename']}"
                           f"&subfolder={img.get('subfolder','')}"
                           f"&type={img['type']}")
                    data = urllib.request.urlopen(url).read()
                    path = f"{output_dir}/{img['filename']}"
                    with open(path, "wb") as f:
                        f.write(data)
                    return path
        time.sleep(2)

This is production code. The agent calls generate_image() with a FLUX workflow, waits for the result, and uses the output for product thumbnails, social posts, or client deliverables.

Making It Run 24/7 with launchd

A script that only runs when you remember to start it is not a business. You need launchd.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.local.ollama</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/ollama</string>
        <string>serve</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/tmp/ollama.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/ollama.error.log</string>
</dict>
</plist>

Drop it in ~/Library/LaunchAgents/, then:

launchctl load ~/Library/LaunchAgents/com.local.ollama.plist

KeepAlive: true means if the process crashes, macOS restarts it automatically. No cron jobs. No screen sessions. I run the same pattern for ComfyUI, the agent gateway, and a face animation server. Four services, zero babysitting.

Pro tip: If your scripts reference external drives, create a launcher wrapper on the boot volume that checks for drive availability first. External drives aren't mounted when launchd fires at boot.

The Money Streams

The agent runs multiple strategies in parallel:

1. Content Pipeline

The agent researches trending topics via web search (Serper API), generates original articles and social posts, and publishes them on schedule:

import requests

def search_trends(query: str) -> list:
    resp = requests.post(
        "https://google.serper.dev/search",
        headers={
            "X-API-Key": os.environ["SERPER_API_KEY"],
            "Content-Type": "application/json"
        },
        json={"q": query, "num": 10}
    )
    return resp.json().get("organic", [])

2. Voice Content with ElevenLabs

Text-to-speech turns written content into audio products — voiceovers, narration, podcast intros:

ELEVENLABS = "https://api.elevenlabs.io/v1/text-to-speech"

def speak(text: str, output: str, voice_id: str) -> str:
    resp = requests.post(
        f"{ELEVENLABS}/{voice_id}",
        headers={
            "xi-api-key": os.environ["ELEVENLABS_KEY"],
            "Content-Type": "application/json"
        },
        json={
            "text": text,
            "model_id": "eleven_v3",
            "voice_settings": {
                "stability": 0.5,
                "similarity_boost": 0.75
            }
        }
    )
    with open(output, "wb") as f:
        f.write(resp.content)
    return output

3. Digital Products

The agent generates prompt libraries, automation scripts, ComfyUI workflow bundles, and lists them on Gumroad. Well-documented FLUX workflows sell for $10-30 each.

4. Gig Hunting

Using web search, the agent scans freelance platforms, scores opportunities against a rubric (budget, skill match, timeline), and drafts personalized proposals for gigs above a threshold.

Results After 1 Week

I'm going to be honest here. Hype without data is just noise.

What's working:

All services running 24/7 via launchd with zero crashes
Content pipeline producing 2-3 polished pieces per day
Image generation averaging 15 product images per day
Voice generation reliable for short-form audio
Gig scanner finding 5-10 relevant opportunities daily

Revenue: Modest. A few digital product sales. First client outreach responses are just coming back. The honest truth: week one is infrastructure week. The compounding happens in weeks 3-8 when you have a content library, a reputation, and warm leads.

What surprised me:

Local LLM quality is good enough for 70% of tasks. I hit the Claude API way less than expected.
Memory pressure stays under 60% even with Ollama + ComfyUI + the agent running simultaneously.
launchd is the unsung hero. Zero manual restarts since setup.

Build Your Own: Quick Start

Step 1: Hardware

Mac Mini M4 Pro, 24GB minimum. Base M4 works but limits you to 7B models.

Step 2: Core Stack

# Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Ollama
brew install ollama
ollama pull llama3.1:8b

# ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI && pip install -r requirements.txt

# Tools
brew install yt-dlp jq
pip install requests websocket-client

Step 3: launchd Services

Create plist files for Ollama and ComfyUI using the templates above. Load them with launchctl load.

Step 4: API Keys

ElevenLabs — text-to-speech ($5/mo starter)
Serper — web search ($50 for 50K queries)
Anthropic — Claude API (pay-as-you-go, ~$5-10/mo for this use case)

Step 5: The Agent Loop

import schedule, time

def cycle():
    scan_freelance_platforms()
    create_daily_content()
    generate_product_assets()
    send_outreach()
    update_memory()

schedule.every(4).hours.do(cycle)

while True:
    schedule.run_pending()
    time.sleep(60)

The Bigger Picture

This isn't about replacing yourself with a robot. It's about leverage. The agent handles the repetitive grind — research, asset generation, outreach drafting — so you can focus on closing deals, building relationships, and improving the system.

A Mac Mini running local AI is one of the highest-ROI investments a developer can make right now. The cost of intelligence is dropping to zero. The cost of applied intelligence — knowing what to build, how to package it, and who to sell it to — is where all the value sits.

The agent doesn't replace that judgment. It just means your judgment gets applied 24 hours a day instead of 8.

Questions? Drop them in the comments. I'm actively building this and sharing what works (and what doesn't).

DEV Community