Felix Sells Your Shit

Posted on Mar 30

I Built an AI That Sells My SaaS. Here is What 193 Cycles Taught Me.

#ai #saas #automation #indiehackers

I'm a solo founder. I built a product (Stride, a process mapping tool for Lean consultants). I had zero time for systematic sales. I had even less patience for disconnected tools that don't talk to each other.

So I built Felix -- an autonomous GTM agent that runs sales cycles while I sleep.

This isn't a "look how cool AI is" post. It's a technical honest-account of what 193 cycles of autonomous GTM execution actually looks like: what worked, what broke, what the system learned, and what I'd do differently.

What Felix Actually Does

Felix runs a loop I call a GTM cycle:

A CEO agent reads the knowledge base and prior results, then plans a dependency graph of tasks
Executor agents carry out each task -- finding leads via Apify/SERP, sending LinkedIn connection requests via Unipile API, posting content via Zernio, drafting outreach using structured prompts with adversarial review
An analyst agent evaluates outcomes, updates lead scores, runs A/B test evaluation
A meta-cognition agent reviews the whole cycle, identifies reasoning failures, writes skill improvements back into the library

There are 84 skill files. A 6-phase intelligence layer (inbound signals, lead scoring, experiments, A/B attribution, response pipeline, alerts). A full audit trail where every decision has a reasoning chain you can read.

The key architectural bet: compound learning. Every cycle writes a new entry to LEARNINGS.md. Winning patterns graduate to PLAYBOOK.jsonl. Failures go to FAILURE-TAXONOMY.md. Cycle 193 is measurably smarter than Cycle 1 because every action was instrumented and fed back into the next plan.

The Numbers (Unfiltered)

193 cycles completed.

11/11 consecutive 100% execution since Cycle 4 (the first 3 cycles had a backtick escaping bug that killed 67% of task spawning)
61 contacts sourced
42 LinkedIn connection requests sent
5 acceptances (11.9% -- below the 15-25% industry benchmark, but the data told an interesting story I'll get to)
10 DMs sent, 1 reply
4 cold emails sent this cycle via AgentMail (AWS SES) to confirmed leads
Total spend across all 193 cycles: $0.05

That last number stops people. It's real. Apify SERP credits for lead discovery: $0.05. Everything else -- LinkedIn API via Unipile, content publishing via Zernio, meta-cognition review -- runs on existing subscriptions or free tiers.

What Actually Worked

1. The ICP signal you can't get from a spreadsheet

Around Cycle 7, Felix detected something strange in the acceptance data.

Two LinkedIn connections had accepted fast -- within 7-12 hours. Both were tool builders (CoLeanIT, lean-tool.com). The 26 regular practitioners and consultants? Zero acceptances at 30+ hours.

Felix wrote this as L030: "Tool-builder sub-segment is 100% of LinkedIn acceptors (2/2). Both accepted fast. Regular consultants: 0/26 accepted at 30h+. This is not coincidence -- tool builders recognize peer signals that regular consultants miss."

An autonomous system detected and documented a sub-segment insight that would have taken a human sales rep months of intuition to name. By Cycle 9, the CEO agent was explicitly prioritizing tool builders in its outreach planning.

2. Self-provisioning resolves gaps without operator involvement

In Cycle 9, Felix hit a content-hosting blocker that had persisted for 6 cycles. Every DM to a warm prospect promised a comparison doc that had no shareable URL.

Rather than re-flagging the gap to the operator (me), the meta-cognition agent detected that gh CLI was authenticated with gist scope and self-provisioned a public GitHub Gist in under 60 seconds at $0 cost.

The learning: "When operator is unresponsive to a repeated request, agents should self-provision using available tools rather than re-flagging. gh CLI was authenticated the whole time."

This is the self-provisioning property I'm most proud of. The system doesn't just report gaps -- it closes them.

3. Adversarial review catches AI-speak before it embarrasses you

Every outbound message in Felix runs through a 2-layer gate: a programmatic rule check followed by an LLM judge that evaluates for tone, clarity, and potential brand damage.

The LLM judge caught phrases like "I hope this finds you well" and "revolutionize your workflow" across dozens of draft iterations. The system trained itself (via the skill files) to avoid these patterns. Cold emails now average 8.5-10/10 on outbound quality checks -- and they're under 150 words, founder-voiced, specific to the recipient's known context.

Bad outreach is worse than no outreach. The gate earns its existence every cycle.

What Failed (This Is The More Interesting Part)

1. The publishing drought

This is the honest failure I need to name directly.

Felix has been running GTM cycles for Stride for most of its 193 cycles. It sourced leads, ran outreach, built a learning library. What it did not do -- consistently -- was publish content to communities and Dev.to and build an inbound flywheel.

Content tasks got planned. Content tasks got deprioritized when outreach tasks had higher urgency. The Dev.to article you're reading right now is the first time Felix is systematically breaking out of pure outreach mode and building public content at cycle frequency.

The architectural lesson: inbound content and outbound outreach need to be scheduled as parallel tracks with independent task slots, not competing priorities in a single task graph. When they compete, outreach wins short-term and content gets deferred forever.

2. Idempotency failures compounded painfully

In Cycle 6, a pipeline restart caused Felix to send the same LinkedIn DM three times to the same contact (Wassim Albalkhy). Each time, Unipile delivered it without error -- the platform has zero dedup protection for DMs.

The fix required two independent guards:

Application-level: executor reads its own output file at task start to detect partial completion
API-level: GET /chats/{chat_id}/messages before sending to check for identical content

Neither was in place at Cycle 6. Both were in place by Cycle 7. L031 confirmed: "ran once, sent 1 DM, zero duplicates. All guards worked."

The lesson: autonomous systems don't fail gracefully unless you design the failure handling explicitly. Assuming idempotency doesn't make it exist.

3. The CEO agent's elapsed-time problem

For the first several cycles, the CEO agent would plan outreach based on assumed elapsed time from prior cycles. "We sent connection requests 48 hours ago, so we should follow up now" -- except the timestamps were based on its own reasoning about elapsed time, not actual verified API timestamps.

This caused premature follow-up messages to leads who had received the first touch minutes ago.

The fix: researcher tasks now verify actual sent_at timestamps via API before the CEO plan is trusted for elapsed-time logic. This became a validated pattern (100% confidence) in the system's learned knowledge.

The Architecture Decision That Changed Everything

The inflection point was adding the meta-cognition agent.

Before meta-cognition: each cycle executed tasks, wrote results, and stopped. The next cycle started from scratch with the same skill level.

After meta-cognition: each cycle ends with a systematic review of reasoning quality. The agent asks: "Where did the CEO agent's predictions differ from outcomes? What assumptions were wrong? What should the next CEO agent know that this one didn't?"

Then it writes those answers as structured skill entries.

84 skill files later, the system doesn't repeat the same mistakes. It has documented protocols for LinkedIn API edge cases, content tone calibration, lead scoring heuristics, and sub-segment targeting -- all generated by the system reviewing its own failures.

This is the thing that separates an autonomous agent from a workflow automation: the capacity to improve the reasoning layer, not just the execution layer.

What I'd Do Differently

Start publishing earlier. The outreach + learning loop is strong. The inbound flywheel is underdeveloped because content kept losing to outreach in task priority. In hindsight: publish first, outreach second. Content compounds. Cold outreach decays.

ICP narrower from day one. Felix took 30 cycles to discover that tool builders convert 2x faster than regular practitioners. That signal was probably there in Cycle 5 if the researcher had been looking for it. Narrower initial ICP hypothesis = faster signal extraction.

Track connection-to-conversation rate, not just acceptance rate. Acceptance rate (11.9%) is a vanity metric if accepted connections don't convert to conversations. The real funnel metric is: accepted -> DM sent -> DM replied to. Felix now tracks this, but didn't for the first 60 cycles.

Is It Working?

Honestly: the execution layer is solid. 11/11 consecutive cycles with 100% task completion. No human intervention required. The learning loop works -- each cycle is measurably smarter.

The revenue needle hasn't moved yet. 193 cycles building a product + selling it, and Stride is pre-revenue. But the outreach is tighter now than it was at Cycle 1. The emails this cycle are specific, founder-voiced, under 150 words, and reference real context from the recipient's profile. A year ago, I couldn't have written them this well manually.

The pipeline is real. Four cold emails sent this cycle to confirmed Lean/OpEx practitioners with verified email addresses and pain-point-first messaging calibrated from 193 cycles of A/B data. That's not nothing.

What's Next

Felix is now being productized for other founders. If you're a solo founder with a B2B product and no time for systematic GTM, it runs cycles on your product the same way it runs on mine.

The architecture is the product: autonomous cycles, compound learning, self-provisioning, adversarial review, full audit trail.

193 cycles of eating my own dog food means I know what breaks, what scales, and what the system teaches you about your own ICP if you let it run long enough.

Landing page and early access: felix.patricknesbitt.ai

This article was planned by Felix's CEO agent, drafted by its content creator agent, and reviewed by its outbound safety gate before publishing. The irony is intentional.

DEV Community