What if your computer made money while you slept? I built an AI agent on a Mac Mini M4 Pro that runs 24/7 — creating content, generating images, managing outreach — and I'm going to show you exactly how.
The Hardware: Why a Mac Mini M4 Pro
I'm running a Mac Mini M4 Pro with 24GB unified memory and a 1TB SSD. Total cost: ~$1,600.
Why this specific machine?
- Unified memory means GPU and CPU share the same RAM. You can run 8B-parameter LLMs locally without a discrete GPU.
- Power draw is ~15W idle. That's about $15/year in electricity. Compare that to an RTX 4090 rig at 350W.
-
macOS
launchdis a rock-solid process supervisor that restarts crashed services automatically. No Docker, no systemd, no cloud bills. - It's silent. This thing sits on my desk and I forget it's there.
Cloud GPU instances would cost $200-500/month for equivalent capability. This box pays for itself in a few months.
The Architecture
┌─────────────────────────────────────┐
│ Agent Framework (OpenClaw) │
│ (orchestration, memory, decisions) │
├──────────┬──────────┬───────────────┤
│ Ollama │ ComfyUI │ External APIs │
│ (local │ (image │ (ElevenLabs, │
│ LLMs) │ gen) │ Serper, etc) │
├──────────┴──────────┴───────────────┤
│ launchd (macOS) │
│ (process supervision, 24/7) │
├─────────────────────────────────────┤
│ Mac Mini M4 Pro / 24GB / 1TB │
└─────────────────────────────────────┘
Four layers: hardware, process supervision, AI engines, orchestration brain. Each layer is independently replaceable.
The Brain: Agent Orchestration
The orchestration layer manages everything. It decides what to work on, routes tasks to the right model, and maintains persistent memory so it doesn't forget what it learned yesterday.
Key design choices:
- Claude Sonnet as the primary reasoning model for complex tasks — writing, analysis, strategy
- Local Ollama models (Llama 3.1 8B, Dolphin-Llama3) as fallbacks for cheaper tasks like formatting and summarization
- Persistent memory using queryable markdown so the agent remembers past decisions, client preferences, and project state
- Up to 8 concurrent sub-agents for parallel task execution
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-5-20250929",
"fallbacks": [
"ollama/llama3.1:8b",
"ollama/dolphin-llama3:8b"
]
},
"maxConcurrent": 4,
"subagents": { "maxConcurrent": 8 }
}
}
}
The primary model handles complex reasoning. Local fallbacks handle boilerplate. This keeps API costs under $10/month.
Ollama: Local LLMs at $0/Month
Ollama runs open-source models natively on Apple Silicon:
ollama pull llama3.1:8b
ollama pull dolphin-llama3:8b
The 8B models run at ~40 tokens/sec on the M4 Pro. Not GPT-4 fast, but fast enough for background tasks. And the cost is literally zero.
I use local models for:
- Text summarization and formatting
- Initial content drafts that get refined by Claude
- Data extraction and parsing
- Anything that doesn't need frontier-level reasoning
ComfyUI: Image Generation Engine
ComfyUI runs Stable Diffusion and FLUX models locally with a node-based workflow system and — critically — a REST API you can hit programmatically:
import json, urllib.request, time
COMFYUI = "http://127.0.0.1:8188"
def generate_image(workflow: dict, output_dir="/tmp") -> str:
"""Queue a ComfyUI workflow and return the output path."""
payload = json.dumps({"prompt": workflow}).encode()
req = urllib.request.Request(
f"{COMFYUI}/prompt",
data=payload,
headers={"Content-Type": "application/json"}
)
prompt_id = json.loads(
urllib.request.urlopen(req).read()
)["prompt_id"]
# Poll for completion
while True:
resp = urllib.request.urlopen(f"{COMFYUI}/history/{prompt_id}")
history = json.loads(resp.read())
if prompt_id in history:
outputs = history[prompt_id]["outputs"]
for node_id in outputs:
for img in outputs[node_id].get("images", []):
url = (f"{COMFYUI}/view?filename={img['filename']}"
f"&subfolder={img.get('subfolder','')}"
f"&type={img['type']}")
data = urllib.request.urlopen(url).read()
path = f"{output_dir}/{img['filename']}"
with open(path, "wb") as f:
f.write(data)
return path
time.sleep(2)
This is production code. The agent calls generate_image() with a FLUX workflow, waits for the result, and uses the output for product thumbnails, social posts, or client deliverables.
Making It Run 24/7 with launchd
A script that only runs when you remember to start it is not a business. You need launchd.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.local.ollama</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/ollama</string>
<string>serve</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/ollama.log</string>
<key>StandardErrorPath</key>
<string>/tmp/ollama.error.log</string>
</dict>
</plist>
Drop it in ~/Library/LaunchAgents/, then:
launchctl load ~/Library/LaunchAgents/com.local.ollama.plist
KeepAlive: true means if the process crashes, macOS restarts it automatically. No cron jobs. No screen sessions. I run the same pattern for ComfyUI, the agent gateway, and a face animation server. Four services, zero babysitting.
Pro tip: If your scripts reference external drives, create a launcher wrapper on the boot volume that checks for drive availability first. External drives aren't mounted when launchd fires at boot.
The Money Streams
The agent runs multiple strategies in parallel:
1. Content Pipeline
The agent researches trending topics via web search (Serper API), generates original articles and social posts, and publishes them on schedule:
import requests
def search_trends(query: str) -> list:
resp = requests.post(
"https://google.serper.dev/search",
headers={
"X-API-Key": os.environ["SERPER_API_KEY"],
"Content-Type": "application/json"
},
json={"q": query, "num": 10}
)
return resp.json().get("organic", [])
2. Voice Content with ElevenLabs
Text-to-speech turns written content into audio products — voiceovers, narration, podcast intros:
ELEVENLABS = "https://api.elevenlabs.io/v1/text-to-speech"
def speak(text: str, output: str, voice_id: str) -> str:
resp = requests.post(
f"{ELEVENLABS}/{voice_id}",
headers={
"xi-api-key": os.environ["ELEVENLABS_KEY"],
"Content-Type": "application/json"
},
json={
"text": text,
"model_id": "eleven_v3",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75
}
}
)
with open(output, "wb") as f:
f.write(resp.content)
return output
3. Digital Products
The agent generates prompt libraries, automation scripts, ComfyUI workflow bundles, and lists them on Gumroad. Well-documented FLUX workflows sell for $10-30 each.
4. Gig Hunting
Using web search, the agent scans freelance platforms, scores opportunities against a rubric (budget, skill match, timeline), and drafts personalized proposals for gigs above a threshold.
Results After 1 Week
I'm going to be honest here. Hype without data is just noise.
What's working:
- All services running 24/7 via launchd with zero crashes
- Content pipeline producing 2-3 polished pieces per day
- Image generation averaging 15 product images per day
- Voice generation reliable for short-form audio
- Gig scanner finding 5-10 relevant opportunities daily
Revenue: Modest. A few digital product sales. First client outreach responses are just coming back. The honest truth: week one is infrastructure week. The compounding happens in weeks 3-8 when you have a content library, a reputation, and warm leads.
What surprised me:
- Local LLM quality is good enough for 70% of tasks. I hit the Claude API way less than expected.
- Memory pressure stays under 60% even with Ollama + ComfyUI + the agent running simultaneously.
-
launchdis the unsung hero. Zero manual restarts since setup.
Build Your Own: Quick Start
Step 1: Hardware
Mac Mini M4 Pro, 24GB minimum. Base M4 works but limits you to 7B models.
Step 2: Core Stack
# Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Ollama
brew install ollama
ollama pull llama3.1:8b
# ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI && pip install -r requirements.txt
# Tools
brew install yt-dlp jq
pip install requests websocket-client
Step 3: launchd Services
Create plist files for Ollama and ComfyUI using the templates above. Load them with launchctl load.
Step 4: API Keys
- ElevenLabs — text-to-speech ($5/mo starter)
- Serper — web search ($50 for 50K queries)
- Anthropic — Claude API (pay-as-you-go, ~$5-10/mo for this use case)
Step 5: The Agent Loop
import schedule, time
def cycle():
scan_freelance_platforms()
create_daily_content()
generate_product_assets()
send_outreach()
update_memory()
schedule.every(4).hours.do(cycle)
while True:
schedule.run_pending()
time.sleep(60)
The Bigger Picture
This isn't about replacing yourself with a robot. It's about leverage. The agent handles the repetitive grind — research, asset generation, outreach drafting — so you can focus on closing deals, building relationships, and improving the system.
A Mac Mini running local AI is one of the highest-ROI investments a developer can make right now. The cost of intelligence is dropping to zero. The cost of applied intelligence — knowing what to build, how to package it, and who to sell it to — is where all the value sits.
The agent doesn't replace that judgment. It just means your judgment gets applied 24 hours a day instead of 8.
Questions? Drop them in the comments. I'm actively building this and sharing what works (and what doesn't).
Top comments (0)