Alexandre Caramaschi

Posted on Mar 25 • Originally published at alexandrecaramaschi.com

How We Used 5 LLM APIs and 25 AI Agents to Write a 60-Page Book in One Session

#agents #llm #ai #architecture

The Problem

We wanted to produce a 60-page, 30,000-word book in Portuguese about four Brazilian fintech founders -- Augusto Lins (Stone), Andre Street (Stone/Teya), David Velez (Nubank), and Guilherme Benchimol (XP) -- told through their own reconstructed voices, narrated by Ram Charan. The book needed to feel like four real humans speaking, not like a chatbot paraphrasing Wikipedia.

A single LLM call cannot do this. You get voice blending (everyone sounds the same by chapter three), factual hallucinations in biographical data, and zero structural coherence across 30k words. We needed an orchestration layer.

The result: "5 Fundadores, 5 Segundos, 1 Futuro" -- 30,329 words, 4 distinguishable voices, 8 chapters, 7 analytical notes, fact-checked against primary sources, published at alexandrecaramaschi.com/founders.

Here is what the pipeline looked like, what broke, and what we learned.

Architecture: The 6-Engine Model

The core insight: use each model for what it does best, not one model for everything.

+-------------------+----------------------------------------+
|  ENGINE           |  ROLE                                  |
+-------------------+----------------------------------------+
|  Claude Opus      |  Orchestrator + narrative writing       |
|                   |  Voice personas, assembly, QA          |
|  Perplexity       |  Real-time web research                |
|  (Sonar Pro)      |  Fact-checking with verifiable sources |
|  Gemini 2.5 Pro   |  Full-manuscript coherence analysis    |
|                   |  (1M+ context window)                  |
|  ChatGPT GPT-4o   |  Creative variations: openings,        |
|                   |  titles, dialogue scenes               |
|  Groq/Llama 3.3   |  Fast rough drafts, PT-BR accent fix,  |
|                   |  rapid iteration                       |
|  Claude Sonnet    |  HTML/PDF formatting, React component, |
|                   |  Schema.org, deploy pipeline           |
+-------------------+----------------------------------------+

Why not just Claude for everything? Three reasons:

Perplexity's web search returns sources you can verify. LLMs trained on static data fabricate citations -- Perplexity anchors facts to real URLs.
Gemini's 1M+ context window can read the entire manuscript in one pass and detect cross-chapter redundancies that no other model can see.
Groq's speed (thousands of tokens/second) makes iteration cheap. Rough drafts that take Opus 90 seconds take Groq 3 seconds.

The Pipeline: 10 Phases, 43 Agent Calls

PHASE 0: BOOTSTRAP (Orchestrator)
  |  Generate 5 system prompts (1 per persona)
  |  Generate 8 chapter briefs
  |  Generate global style guide
  v
PHASE 1: DEEP RESEARCH (7 agents in PARALLEL)
  |  6x Perplexity: one dossier per founder + Charan + 2026 context
  |  1x Gemini: cross-analysis of all 6 dossiers -> convergence map
  v
PHASE 2: WRITING WAVE 1 -- Chapters 1-4 (9 agents in PARALLEL)
  |  4x Opus: each writes ONE founder's voice for chapters 1-4
  |  1x Opus: Charan writes Preface + Prologue + Notes #1-2
  |  1x GPT-4o: 12 alternative openings + 4 epigraphs
  |  2x Groq: fast rough drafts as raw material
  |  1x Gemini: real-time coherence monitor
  v
PHASE 3: WRITING WAVE 2 -- Chapters 5-8 (9 agents in PARALLEL)
  |  Same structure as Phase 2
  |  + Charan assembles chapters 1-4 (interleaving 4 voices)
  v
PHASE 4: MANUSCRIPT ASSEMBLY (1 Opus agent -- Charan)
  |  Interleave voices, write transitions, write Epilogue
  |  -> manuscrito_v1.md (~48,000 words raw)
  v
PHASE 5: CROSS-MODEL REVIEW (7 agents in PARALLEL)
  |  4x Opus: each founder-persona reads FULL manuscript
  |           "Does this sound like me? Any data wrong?"
  |  1x Perplexity: fact-check every number against live web
  |  1x Gemini: structural analysis (pacing, arcs, redundancy)
  |  1x Groq: fast PT-BR accent/grammar sweep
  v
PHASE 6: INTEGRATED REWRITE (1 Opus agent)
  |  Incorporate all 7 review reports
  |  Fix 19 factual errors, remove fabricated citations
  |  Resolve redundancies, equalize founder presence
  |  -> manuscrito_v2.md
  v
PHASE 7: MULTI-SPECIALIST POLISH (4 agents in PARALLEL)
  |  Opus: narrative flow + chapter hooks
  |  Groq: PT-BR final accent check
  |  Sonnet: Markdown formatting + metadata
  |  GPT-4o: final title selection + back-cover copy
  v
PHASE 8: FINAL QA (1 Opus agent)
  |  Full read-through simulating first-time reader
  |  13-point checklist (voices, hooks, Charan, accents, entities)
  |  -> manuscrito_final.md (30,329 words)
  v
PHASE 9: PUBLISH (3 Sonnet agents in PARALLEL)
  |  HTML + PDF generation
  |  React/Next.js component for /founders
  |  SEO: Schema.org Book markup, OG tags, sitemap
  v
PHASE 10: DEPLOY
  |  Vercel deploy + IndexNow
  |  Health check: /founders returns 200
  |  DONE

Total: 43 agent calls across 6 APIs, with up to 9 agents running simultaneously.

Quality Gates Between Phases

Not every phase transition was automatic. We implemented quality gates -- checkpoints where the orchestrator evaluates whether output meets minimum criteria before proceeding.

GATE 1 (after Phase 1 -> Phase 2):
  CHECK: Each dossier has >= 15 verified citations with sources
  CHECK: Convergence map identifies >= 5 shared patterns
  CHECK: No founder dossier is < 3,000 words
  FAIL ACTION: Re-run Perplexity with expanded queries

GATE 2 (after Phase 2 -> Phase 3):
  CHECK: Voice distinctiveness score (Gemini evaluates)
  CHECK: No two founders share > 30% identical phrasing
  CHECK: Each founder section is within 20% of target word count
  FAIL ACTION: Re-prompt specific founder agents with
               reinforced persona instructions

GATE 3 (after Phase 5 -> Phase 6):
  CHECK: Zero critical factual errors remaining
  CHECK: Fabricated citation count = 0
  CHECK: Redundancy score below threshold
  FAIL ACTION: Return to Phase 5 with targeted re-checks

The gates prevented cascading errors. Without them, a weak dossier in Phase 1 would produce a weak chapter in Phase 2, which would produce a weak review in Phase 5. By catching problems early, we avoided expensive rewrites downstream.

The System Prompt Architecture

Each persona's system prompt was not a simple instruction -- it was a layered document with five components:

LAYER 1: IDENTITY
  Who you are, your archetype, your emotional core

LAYER 2: VOICE RULES
  Sentence length distribution, vocabulary whitelist,
  vocabulary blacklist, rhetorical patterns

LAYER 3: ANTI-CONTAMINATION
  "You are NOT [other founder]. If you find yourself
   using [specific phrases], stop and rewrite."

LAYER 4: CHAPTER BRIEF
  What this specific chapter is about, what angle
  this founder brings, what tension to explore

LAYER 5: CONTEXT INJECTION
  Research dossier, convergence map, previous chapters
  (for Wave 2), coherence report

The anti-contamination layer (Layer 3) was crucial. Without it, Augusto and Guilherme's voices converged within three chapters. With it, convergence was reduced but not eliminated -- which is why we still needed the cross-voice review in Phase 5.

Voice Persona Engineering

Each founder got a dedicated system prompt with:

PERSONA: Augusto Lins
ARCHETYPE: The Engineer Who Became a Humanist
VOICE: Measured, deep, quiet authority. Longer sentences.
VOCABULARY: "five seconds", "loyalty moat", "the Angels",
            "the most complex component is the human being"
THEMES: Obsessive service, late-career leap, NPS as compass
TENSION: The engineer who discovered the differentiator is not technology
FORBIDDEN: Never sound aggressive. Never use war metaphors.
           That is Andre's register, not yours.
MODEL: Claude Opus
CONTEXT: Full research dossier + ebook "5 Seconds for the Future"

Four personas, four distinct registers:

Founder	Voice Signature	Key Markers
Augusto Lins	Measured, reflective	Engineering metaphors, domestic imagery, NPS
Andre Street	Aggressive, percussive	Short sentences, war language, "fire your ego"
David Velez	Analytical, contained	VC vocabulary, "infinite game", strategic distance
Guilherme Benchimol	Vulnerable, confessional	Marathon metaphors, admission of pain/shame

The QA report confirmed all four voices were distinguishable without reading the founder's name -- which was our acceptance criterion.

The Fact-Checking Pipeline

This was the most sobering part of the project.

What Perplexity found

The fact-checker verified 87 items across the manuscript and found 19 errors:

7 critical (wrong data that would embarrass the author)
8 moderate (imprecise data that could mislead)
4 minor (missing context, not wrong)

5 fabricated citations

The most dangerous failure mode: LLMs fabricate convincing quotes and attribute them to real people.

FABRICATED CITATION #1:
  Text: "Give me thirty days. If you're not satisfied,
        I'll come here personally to pick up the machine."
  Attribution: Augusto Lins (at a bakery in Copacabana)
  Status: NOT VERIFIED. The bakery scene does not appear
          in any research dossier. Likely LLM fabrication.

FABRICATED CITATION #2:
  Text: "These people aren't asking for a credit card.
        They're asking to be treated like human beings."
  Attribution: Cristina Junqueira (Nubank co-founder)
  Status: NOT VERIFIED. Not in any dossier. Probably
          fabricated as "narrative reconstruction."

FABRICATED CITATION #5:
  Entire scene: "shopkeeper in rural Minas Gerais"
  (sick wife, 20 minutes on the line, microcredit)
  Status: NOT IN ANY DOSSIER. Fabricated anecdote.

The pattern: LLMs generate "too perfect" anecdotes that fit the narrative thesis exactly. They feel real because they are structurally plausible -- but they have no source.

Lesson: every quote attributed to a real person must be cross-referenced against primary sources. LLMs cannot be trusted with attribution.

The David Velez education error

One critical factual error: the manuscript stated Velez graduated from "Universidad de los Andes" in Colombia. The research dossier shows his undergraduate degree was from Stanford (Management Science and Engineering, class of 2005). This is the kind of error that destroys credibility -- and it passed through multiple writing agents before the fact-checker caught it.

The Redundancy Problem

This was the hardest engineering challenge -- harder than voice distinction, harder than fact-checking.

What happens when 4 agents write independently

Four Opus instances, each writing as a different founder about the same themes, produce remarkably similar strong points. The structural analysis (run by Gemini on the full manuscript) found:

REDUNDANCY REPORT (selected):

"Fire your ego every morning" (Andre Street)
  -> Appears in: Ch.3, Ch.4, Ch.6, Ch.8
  -> Verdict: EXCESSIVE -- 4 occurrences

"Educate before you sell" (Guilherme Benchimol)
  -> Appears in: Ch.2, Ch.3, Ch.5, Ch.8
  -> Verdict: EXCESSIVE -- 4 occurrences

Angel traveling 50km at night to deliver a card machine:
  -> Appears in: Ch.3 AND Ch.5 with nearly identical details
  -> Verdict: DUPLICATE -- keep in Ch.3 only

Medellin kidnapping + shopping mall bomb (David Velez):
  -> Appears in: Prologue, Ch.1, Ch.6
  -> Verdict: 3 occurrences -- reduce to 2

Why this happens

Each agent receives the same chapter brief and dossier. The strongest anecdotes -- the ones with the most narrative power -- get selected by every agent independently. The redundancy is not a bug in any single agent; it is an emergent property of parallel writing.

The fix

We implemented a redundancy budget: each catchphrase gets a maximum of 2 appearances in the book (first occurrence as revelation, second as deliberate callback). The third and fourth occurrences were cut or paraphrased during Phase 6.

The broader principle: multi-agent writing requires a deduplication pass that no single agent can do alone. Gemini's 1M+ context window was essential here -- it could read the entire manuscript and identify cross-chapter repetitions that individual agents, writing in isolation, could never see.

The Voice Confusion Problem

Chapters where two founders became indistinguishable

The structural analysis flagged Chapters 3 and 5 as problem zones. In these chapters, Augusto Lins and Guilherme Benchimol's voices converged -- both reflective, both talking about customer service, both using similar vocabulary.

VOICE ANALYSIS:

Augusto: Partially distinguishable
  Markers: engineer vocabulary, domestic imagery, longer sentences
  PROBLEM: In Ch.3 and Ch.5, sounds too much like Guilherme

Guilherme: Partially distinguishable
  Markers: marathon metaphors, confession of shame, financial refs
  PROBLEM: In Ch.3 and Ch.5, sounds too much like Augusto

Andre: Clearly distinguishable (always)
David: Clearly distinguishable (always)

The fix: intensify each persona's unique markers. Augusto gets more engineering language and NPS references. Guilherme gets more marathon/running metaphors and admissions of vulnerability. The rewrite in Phase 6 sharpened these distinctions.

Lesson: voice persona prompts are necessary but not sufficient. You need a cross-voice review pass where each persona reads the other three and flags convergence.

The Accent Pipeline Bug

One assembly agent (responsible for merging four voices into interleaved chapters) dropped all Portuguese diacritical marks from the output. "Producao" instead of "producao" (which should be "producao" -- wait, that is the point: "producao" vs "produção"). The entire Part 1 manuscript came out accent-free.

The fix was trivial (run fix_accents.py), but the root cause was interesting: the assembly agent was processing so much text that its output quality degraded on surface-level features (accents, em-dashes) even as the narrative content remained good.

Lesson: always run a dedicated accent/encoding check as a separate pipeline step, not as part of the writing agent's responsibilities.

The final QA report confirmed: zero words without proper PT-BR accents in the published manuscript.

Chapter 7: The "Everyone Agrees" Problem

The structural analysis flagged Chapter 7 (about AI) as lacking narrative tension:

Chapter 7 (AI): MEDIUM intensity
  Content relevant, but tone more essayistic than narrative.

  PROBLEM: All four founders say essentially the same thing:
  "AI is a tool, not a replacement." No tension, no disagreement,
  no risk. The chapter needs a moment of doubt or real failure.

When four agents are told "write what this founder thinks about AI," and all four founders are publicly optimistic about AI, you get four versions of the same optimistic take. The emergent pattern: multi-agent systems amplify consensus and suppress dissent.

The fix: we manually introduced a moment of doubt -- a concrete failure anecdote -- to create the tension the agents could not generate on their own.

The "Street Always Delivers First" Pattern

An unexpected observation from the pipeline: Andre Street's persona consistently produced output faster and with more energy than the other three. His system prompt specified "aggressive, percussive, short sentences, urgency" -- and the writing agent internalized this as raw speed.

The agents writing Augusto (measured, reflective) and David (analytical, strategic) produced longer, more deliberate text. Guilherme's agent produced the most emotionally charged text but took the longest to reach the word count.

The persona's urgency mapped to the agent's behavior. We did not design this. The writing model (Opus) treated the persona's emotional register as an instruction about pacing. This has implications for agent design: persona engineering affects not just output quality but output characteristics like length, density, and generation speed.

Results

Metric	Value
Final word count	30,329
Total agent calls	43
APIs used	5 (Claude Opus, Perplexity, Gemini, GPT-4o, Groq/Llama)
Max parallel agents	9
Pipeline phases	10
Factual errors caught	19 (7 critical, 8 moderate, 4 minor)
Fabricated citations caught	5
Duplicate anecdotes removed	4
Voice confusion zones fixed	2 chapters
Accent bug: words without diacriticals	0 (after fix)
Total API cost	Under $10
Published at	alexandrecaramaschi.com/founders

The estimated cost from the orchestration plan was $110-165 for the full 48,000-word target. The actual book came in at 30,329 words (we cut aggressively for quality), and the actual API spend was under $10.

Lessons Learned

1. Redundancy is the primary failure mode of parallel multi-agent writing

Not hallucination, not voice confusion -- redundancy. When N agents write about the same topic independently, they converge on the same strong points. You need a deduplication pass with a model that can see the entire manuscript at once.

2. Fact-checking must be a separate agent with web access

LLMs hallucinate citations with high confidence. Perplexity's web-grounded search was the only reliable way to verify quotes and data points. 5 fabricated citations in 30,000 words is a 0.016% rate -- small in percentage, catastrophic in credibility.

3. Voice personas need cross-validation, not just prompts

System prompts create initial voice distinction. But over 30,000 words, voices drift toward the mean. The fix is a review pass where each persona reads the full manuscript and flags where it sounds like another founder.

4. Use each model for its strength

Opus for narrative depth. Perplexity for verified facts. Gemini for manuscript-level coherence. Groq for speed. GPT-4o for creative variations. Sonnet for code and formatting. No single model excels at all of these.

5. Multi-agent systems amplify consensus

If all sources agree, all agents will agree, and the output will lack tension. Editorial judgment -- the decision to introduce conflict where the data shows none -- remains a human responsibility.

6. Persona urgency maps to agent behavior

An aggressive, urgent persona prompt produces faster, shorter output. A reflective, measured persona prompt produces slower, longer output. This is not documented anywhere -- it is emergent behavior worth designing for.

7. Surface-level quality degrades under load

An agent handling complex narrative assembly may drop accents, formatting, or em-dashes. Always run dedicated quality passes for surface features as separate pipeline steps.

8. The cost is negligible; the architecture is everything

Under $10 in API calls for a 30,000-word, fact-checked, multi-voice book. The engineering cost is in the orchestration design, not the API spend.

The FinOps Perspective

The original orchestration plan estimated $110-165 for the full 48,000-word target across 43 agent calls. Here is the breakdown by API:

API                    Calls  Est. Tokens   Est. Cost
----------------------------------------------------
Claude Opus              19    ~1,500,000   $80-120
Perplexity Sonar Pro      7      ~350,000   $8-12
Gemini 2.5 Pro            4      ~800,000   $10-15
ChatGPT GPT-4o            3      ~200,000   $3-5
Groq Llama 3.3 70B        6      ~600,000   $1-2
Claude Sonnet              4      ~400,000   $8-10
----------------------------------------------------
TOTAL                    43    ~3,850,000   $110-165

The actual spend came in under $10. Why the 10x difference?

Aggressive editing cut 18,000 words. The manuscript went from a 48,000-word target to 30,329 published words. Less text = fewer generation tokens.
Groq is nearly free. At $0.59/M input tokens, the 6 Groq calls cost pennies.
Gemini's free tier covered our usage. The 4 Gemini calls fit within Google's generous free allocation.
We reused outputs aggressively. Dossiers from Phase 1 were passed to every subsequent phase without regeneration.

The cost per word of the final manuscript: approximately $0.0003. For context, a human ghostwriter charges $0.50-$2.00 per word for this type of work.

What We Would Do Differently

Anti-redundancy briefs: give each agent a list of anecdotes already claimed by other agents, updated in real-time as they write.
Adversarial voice testing: before the full pipeline, run a blind test where a reviewer tries to identify which founder is speaking from unmarked excerpts.
Tension injection: explicitly assign one agent the role of "dissenter" -- someone whose job is to find disagreements and introduce doubt.
Streaming coherence monitor: instead of checking coherence after each wave, stream outputs to Gemini in real-time and get incremental feedback.

Stack Reference

Orchestrator: geo-orchestrator (custom multi-model pipeline)
Primary writing: Claude Opus 4.6 (Anthropic)
Research + fact-check: Perplexity Sonar Pro
Coherence analysis: Gemini 2.5 Pro (Google)
Creative variations: ChatGPT GPT-4o (OpenAI)
Fast iteration: Groq (Llama 3.3 70B)
Formatting + deploy: Claude Sonnet (Anthropic)
Frontend: Next.js 16 + React 19 + Tailwind 4
Hosting: Vercel
Published: alexandrecaramaschi.com/founders

Alexandre Caramaschi is CEO of Brasil GEO, former CMO of Semantix (Nasdaq), and co-founder of AI Brasil. This article documents the technical pipeline behind "5 Fundadores, 5 Segundos, 1 Futuro," a multi-agent editorial production experiment.

DEV Community