DEV Community

Cover image for How I Built and Evaluated an AI Book-Writing System with ACP and Promptfoo
Ayush kumar
Ayush kumar

Posted on

How I Built and Evaluated an AI Book-Writing System with ACP and Promptfoo

Introduction

Have you ever wondered if AI could write an entire book — from idea to polished chapters — without human help?
What if multiple AI agents could collaborate, like a team of ghostwriters, editors, and publishers?

That’s exactly what I explored in this project:
✅ ACP (Agent Communication Protocol) to build a multi-agent system
✅ OpenAI GPT-4o to generate and edit text
✅ Promptfoo to evaluate the agents’ outputs automatically

In this post, I’ll share how I built acp-booksmith, an AI-powered book creation pipeline, how it works, and how I used Promptfoo to test it like a pro.

What is ACP (Agent Communication Protocol)?

ACP, developed by IBM, is an open standard that enables AI agents, apps, and humans to communicate smoothly, regardless of their underlying backend technology stack.

Think of it as a universal language for agents.
With ACP, I could easily connect multiple agents like:

  • outline agent → drafts book structure
  • chapter agent → writes full chapters
  • editor agent → polishes text
  • compiler agent → stitches the final book

They all run on a local server (http://localhost:8000) and talk to each other through standardized ACP calls.

What is Promptfoo?

Promptfoo is a powerful open-source framework for evaluating and stress-testing LLM systems, agents, and prompt chains.

Think of it as your AI quality assurance toolkit — it helps you:

  • Define structured test cases (via YAML or CLI)
  • Compare model or agent outputs across providers
  • Run automated checks (e.g., “is the output non-empty?”, “does it follow the format?”)
  • Visualize results in an interactive web viewer
  • Launch red teaming campaigns to probe for safety, bias, and robustness issues

In this project, I used Promptfoo not just to test individual OpenAI model outputs, but to evaluate the full ACP-booksmith system, covering how all the agents work together to deliver a polished, end-to-end book-writing pipeline.

By combining ACP + Promptfoo, I got both system-level validation and security-level insights — all in one workflow.

Resources

Link: https://github.com/i-am-bee/acp
Link: https://github.com/promptfoo/promptfoo

Step-by-Step Process to Build and Evaluate an AI Book-Writing System with ACP and Promptfoo

Step 1: Set Up the Project Environment

Before diving in, make sure your system is ready:

python --version   # >= 3.11
node --version     # >= 20.x
npm --version      # >= 10.x

Enter fullscreen mode Exit fullscreen mode

Then, initialize the project:

uv init --python '>=3.11' my_acp_project
cd my_acp_project
uv add acp-sdk

Enter fullscreen mode Exit fullscreen mode

Step 2: Install all required libraries & set OpenAI API key

Install Python libraries

Run this one command to install all needed dependencies:

pip install \
acp-sdk==1.0.0 \
fastapi==0.115.0 \
uvicorn==0.29.0 \
openai==1.30.1 \
gradio==4.28.3 \
reportlab==4.1.0 \
requests==2.32.3

Enter fullscreen mode Exit fullscreen mode

This will install:

✅ acp-sdk → for the multi-agent protocol
✅ fastapi + uvicorn → for the server
✅ openai → for GPT calls
✅ gradio → for the web interface
✅ reportlab → for PDF generation
✅ requests → for HTTP calls

For Promptfoo Installation, run the following command:

npm install -g promptfoo

Enter fullscreen mode Exit fullscreen mode

Export your OpenAI API key

Before running anything (main.py, agent.py, or Gradio app), set your API key:

export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Enter fullscreen mode Exit fullscreen mode

(Replace with your real key from the OpenAI account)

Step 3: Write the Agents (agent.py)

I built four key agents:

  • outline agent → Generates a detailed book outline
  • chapter agent → Writes a full chapter from a summary
  • editor agent → Edits the chapter for style and clarity
  • compiler agent → Combines all content into a single book

These agents use openai.AsyncOpenAI under the hood and communicate via ACP.

import asyncio
import os
from collections.abc import AsyncGenerator

import openai
from acp_sdk.models import Message
from acp_sdk.server import Context, RunYield, RunYieldResume, Server

# Initialize OpenAI async client using environment variable API key
client = openai.AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Create ACP server instance to register agents
server = Server()

# Helper function to call OpenAI API with given prompt and token limit
async def call_openai(prompt, max_tokens=1000):
    try:
        response = await client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7,
            max_tokens=max_tokens
        )
        return response.choices[0].message.content  # Return generated text
    except Exception as e:
        print(f"[OpenAI API error]: {type(e).__name__}: {e}")
        return "[Error: Failed to generate content]"

# Agent: Generates book outline based on title
@server.agent()
async def outline(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
    title = input[0].parts[0].content  # Extract title from input
    prompt = f"Create a detailed book outline with chapters and sections for the book titled '{title}'."
    outline_text = await call_openai(prompt)  # Get outline from OpenAI
    yield Message(parts=[{"content": outline_text, "content_type": "text/plain"}])

# Agent: Generates full chapter text (~3000 words) from chapter summary
@server.agent()
async def chapter(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
    chapter_summary = input[0].parts[0].content  # Extract chapter summary
    prompt = f"Write a full book chapter (~3000 words) based on this summary:\n{chapter_summary}"
    chapter_text = await call_openai(prompt, max_tokens=3000)  # Get chapter draft
    yield Message(parts=[{"content": chapter_text, "content_type": "text/plain"}])

# Agent: Edits chapter text for clarity, style, and coherence
@server.agent()
async def editor(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
    raw_text = input[0].parts[0].content  # Extract raw chapter text
    prompt = f"Please edit and polish the following chapter for clarity, style, and coherence:\n\n{raw_text}"
    edited_text = await call_openai(prompt, max_tokens=3000)  # Get edited version
    yield Message(parts=[{"content": edited_text, "content_type": "text/plain"}])

# Agent: Compiles all parts (outline + chapters) into one full text
@server.agent()
async def compiler(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
    compiled = "\n\n".join(msg.parts[0].content for msg in input)  # Concatenate all inputs
    yield Message(parts=[{"content": compiled, "content_type": "text/plain"}])

# Run the ACP server to start serving agent endpoints
server.run()
Enter fullscreen mode Exit fullscreen mode

Run them with:

uv run agent.py

Enter fullscreen mode Exit fullscreen mode

Check they’re live:

curl http://localhost:8000/agents

Enter fullscreen mode Exit fullscreen mode

Step 4: Create the Orchestrator (orchestrator.py)

This script:

  • Calls each agent in order
  • Collects outlines, chapters, edited content
  • Writes output to final_book.txt and final_book.pdf using reportlab

The magic here? It acts like a project manager, coordinating the AI team.

import asyncio

from acp_sdk.client import Client
from acp_sdk.models import Message, MessagePart
from reportlab.pdfgen import canvas  # Library to generate PDF files


# Helper function to call a specific agent with input text
async def call_agent(client, agent_name, input_text, model):
    # Sends request to ACP agent and returns the content of the response
    run = await client.run_sync(
        agent=agent_name,
        input=[Message(parts=[MessagePart(content=input_text, content_type="text/plain")])]
    )
    return run.output[0].parts[0].content

# Main orchestrator function to run full book creation pipeline
async def main(title="The Quantum Cat's Journey", model="gpt-4o", progress_callback=None):
    async with Client(base_url="http://localhost:8000") as client:
        if progress_callback:
            progress_callback(0.05)  # Update progress bar if using UI (like Gradio)

        # Step 1: Generate book outline
        outline = await call_agent(client, "outline", title, model)
        if progress_callback:
            progress_callback(0.2)

        chapters = []
        # Step 2: Generate 3 chapters (can increase this later if desired)
        for i in range(1, 4):
            chapter_prompt = f"{outline} - Chapter {i}"  # Prepare chapter input
            chapter_content = await call_agent(client, "chapter", chapter_prompt, model)
            if progress_callback:
                progress_callback(0.2 + i * 0.15)

            # Step 3: Edit chapter using editor agent
            edited_chapter = await client.run_sync(
                agent="editor",
                input=[Message(parts=[MessagePart(content=chapter_content, content_type="text/plain")])]
            )
            chapters.append(edited_chapter.output[0].parts[0].content)

        # Step 4: Combine outline + chapters into full book text
        full_book = f"{outline}\n\n" + "\n\n".join(chapters)
        with open("final_book.txt", "w") as f:
            f.write(full_book)
        if progress_callback:
            progress_callback(0.85)

        # Step 5: Export final book to PDF format
        pdf = canvas.Canvas("final_book.pdf")
        pdf.setFont("Helvetica", 12)
        y = 800  # Set initial vertical position on PDF page
        for line in full_book.split("\n"):
            pdf.drawString(50, y, line[:100])  # Draw text line, truncate if too long
            y -= 15  # Move down by 15 pixels
            if y < 50:  # If near bottom, start new page
                pdf.showPage()
                pdf.setFont("Helvetica", 12)
                y = 800
        pdf.save()  # Save the PDF file

        if progress_callback:
            progress_callback(1.0)  # Mark as complete in UI if applicable
Enter fullscreen mode Exit fullscreen mode

Step 5: Build a CLI (main.py)

To make it user-friendly, I added:

  • A CLI menu to run the full book generation pipeline
  • Option to extend later with more commands or features
import asyncio
import sys

from orchestrator import (
    main as orchestrator_main,  # Import the orchestrator main function
)


# Function to display a simple text menu in the terminal
def print_menu():
    print("\nWelcome to acp-booksmith!")
    print("Select an option:")
    print("1. Run book generation workflow")
    print("2. Exit")

# Main loop function for CLI (Command Line Interface)
def main():
    while True:
        print_menu()  # Show the menu options
        choice = input("Enter choice [1-2]: ")  # Get user input
        if choice == "1":
            asyncio.run(orchestrator_main())  # Run orchestrator async function to generate book
            print("\n✅ Book generation completed! Check final_book.txt and final_book.pdf.\n")
        elif choice == "2":
            print("Goodbye!")  # Exit message
            sys.exit()  # Exit the program
        else:
            print("Invalid choice. Please enter 1 or 2.")  # Handle invalid input

# Entry point when running script directly
if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Now you can just run:

python3 main.py

Enter fullscreen mode Exit fullscreen mode

And it’ll walk you through the process.

After setting up the agent.py, orchestrator.py, and main.py files, we run our book system in the terminal to check if everything works end-to-end. We start the ACP server with uv run agent.py and then open another terminal to send test prompts (usually three to four), like generating an outline, drafting chapters, or editing content using curl commands. This allows us to confirm that the agents communicate correctly, OpenAI API calls succeed, and we receive polished outputs in both text and PDF formats — all orchestrated smoothly by the system.

Prompt 1 — Generate Outline

curl -X POST http://localhost:8000/runs -H "Content-Type: application/json" -d '{"agent_name": "outline", "input": [{"role": "user", "parts": [{"content": "The Quantum Cat'\''s Journey", "content_type": "text/plain"}]}]}'

Enter fullscreen mode Exit fullscreen mode

Prompt 2 - Chapter Agent

curl -X POST http://localhost:8000/runs -H "Content-Type: application/json" -d '{"agent_name": "chapter", "input": [{"role": "user", "parts": [{"content": "Chapter 1: The Cat Enters the Quantum Realm", "content_type": "text/plain"}]}]}'

Enter fullscreen mode Exit fullscreen mode

Prompt 3 - Editor Agent

curl -X POST http://localhost:8000/runs \
  -H "Content-Type: application/json" \
  -d '{
    "agent_name": "editor",
    "input": [
      {
        "role": "user",
        "parts": [
          { "content": "This is a raw chapter draft that needs editing for clarity and flow.", "content_type": "text/plain" }
        ]
      }
    ]
  }'

Enter fullscreen mode Exit fullscreen mode

Prompt 4 - Compiler Agent

curl -X POST http://localhost:8000/runs \
  -H "Content-Type: application/json" \
  -d '{
    "agent_name": "compiler",
    "input": [
      {
        "role": "user",
        "parts": [
          { "content": "Outline content here", "content_type": "text/plain" }
        ]
      },
      {
        "role": "user",
        "parts": [
          { "content": "Chapter 1 content here", "content_type": "text/plain" }
        ]
      },
      {
        "role": "user",
        "parts": [
          { "content": "Chapter 2 content here", "content_type": "text/plain" }
        ]
      }
    ]
  }'

Enter fullscreen mode Exit fullscreen mode

Step 6: Add a Browser UI with Gradio (gradio_app.py)

Not everyone loves the terminal, so I added a Gradio app!

import asyncio
import os
import shutil

import gradio as gr
from orchestrator import main  # Import orchestrator to run agent pipeline


# Async function to generate the book using orchestrator and update Gradio progress bar
async def generate_book_async(title, model, progress=gr.Progress()):
    # Clear old book files if they exist
    for file in ["final_book.txt", "final_book.pdf"]:
        if os.path.exists(file):
            os.remove(file)

    # Run the orchestrator with given title + model, passing in progress callback
    await main(title, model=model, progress_callback=progress)

    # Read final book text from generated TXT file
    with open("final_book.txt", "r") as f:
        book_text = f.read()

    # Return book text + file paths for download components
    return book_text, "final_book.txt", "final_book.pdf"

# Wrapper to run async function inside sync Gradio button click
def generate_book(title, model):
    return asyncio.run(generate_book_async(title, model))

# Build Gradio interface
with gr.Blocks() as demo:
    gr.Markdown("# 🐱 Quantum Cat Book Generator")  # App title
    gr.Markdown("Enter a book title, pick a model, and generate a complete polished book with TXT and PDF downloads.")

    with gr.Row():
        title_input = gr.Textbox(label="Title", placeholder="Enter book title...")  # Input box for title
        model_selector = gr.Dropdown(choices=["gpt-4o", "gpt-3.5-turbo"], value="gpt-4o", label="Model")  # Model dropdown

    output_text = gr.Textbox(label="Generated Book", lines=20)  # Output textbox to display book
    txt_download = gr.File(label="Download TXT")  # Download button for .txt
    pdf_download = gr.File(label="Download PDF")  # Download button for .pdf

    generate_btn = gr.Button("🚀 Generate Book")  # Main action button

    # Link button click to generate_book function with inputs and outputs
    generate_btn.click(
        fn=generate_book,
        inputs=[title_input, model_selector],
        outputs=[output_text, txt_download, pdf_download]
    )

# Launch the Gradio app on localhost:7860
demo.launch(share=True)
Enter fullscreen mode Exit fullscreen mode

This lets you:

  • Enter a book title
  • Choose the OpenAI model (gpt-4o or gpt-3.5-turbo)
  • Click “Generate” and get the full book in the browser, with TXT and PDF download buttons

Launch it with:

python3 gradio app.py

Enter fullscreen mode Exit fullscreen mode

Open in your browser at:

http://localhost:7860

Enter fullscreen mode Exit fullscreen mode

Step 7: Launch Promptfoo Interactive CLI

Once Promptfoo is installed and the version is verified, run the following command to open the interactive CLI:

promptfoo init

Enter fullscreen mode Exit fullscreen mode

You'll see a terminal-based interface prompting:

"What would you like to do?"

Use your arrow keys to navigate and select your intention. You can choose from:

  • Not sure yet (explore options)
  • Improve prompt and model performance
  • Improve RAG performance
  • Improve agent/chain of thought performance
  • Run a red team evaluation

Step 8: Choose Your First Model Provider (We’re Only Using OpenAI Here)

After choosing your evaluation goal, Promptfoo will ask:

"Which model providers would you like to use?"

In this guide, we're using OpenAI as the model provider.

  • Use the arrow keys to select OpenAI
  • Hit space to check the box
  • Then press Enter to continue

Step 9: Initialize Promptfoo Evaluation

Once you've selected the model provider (in this case, we’re starting with OpenAI), Promptfoo will automatically generate the necessary setup files:

  • README.md
  • promptfooconfig.yaml

Step 10: Write Promptfoo Configuration

promptfooconfig.yaml

  • Defines test prompts, agents, and JS-based assertions
description: 'ACP Agent Evaluation' # Description of this evaluation suite

prompts:
  - '{{book_title}}' # Dynamic prompt variable used in each test case

providers:
  - id: file://./provider.py # Connects to local provider script
    label: ACP Outline Agent # Label shown in Promptfoo UI
    config:
      agent_name: outline # Tell provider.py to call the 'outline' agent

  - id: file://./provider.py
    label: ACP Chapter Agent
    config:
      agent_name: chapter # Tell provider.py to call the 'chapter' agent

  - id: file://./provider.py
    label: ACP Editor Agent
    config:
      agent_name: editor # Tell provider.py to call the 'editor' agent

defaultTest:
  assert:
    # ✅ Check the output is a string (using JS in Promptfoo)
    - type: javascript
      value: typeof output === 'string'

    # ✅ Check the output is not an empty string
    - type: javascript
      value: output.trim().length > 0

tests:
  - description: 'Generate outline for book' # Test outline agent
    vars:
      book_title: "The Quantum Cat's Journey"

  - description: 'Generate chapter draft' # Test chapter agent
    vars:
      book_title: "The Quantum Cat's Journey - Chapter 1"

  - description: 'Edit draft content' # Test editor agent
    vars:
      book_title: "Refine The Quantum Cat's Journey draft"
Enter fullscreen mode Exit fullscreen mode

provider.py

  • Sends HTTP POST to localhost:8000/runs for each agent
  • Extracts clean text outputs
  • Returns result to Promptfoo
import requests  # Import HTTP requests library


def call_api(prompt, config=None, context=None):
    agent_name = config.get("agent_name", "outline")  # Get agent name from config, default to 'outline'
    url = "http://localhost:8000/runs"  # ACP server endpoint

    payload = {
        "input": [{
            "text": prompt,  # Original prompt text
            "parts": [{
                "type": "text",  # Content type (text)
                "content": prompt  # Content body
            }]
        }],
        "agent_name": agent_name  # Target agent to call (outline, chapter, editor)
    }

    headers = {"Content-Type": "application/json"}  # Set JSON header

    try:
        response = requests.post(url, json=payload, headers=headers)  # Make POST request to ACP server
        response.raise_for_status()  # Raise error if HTTP response is not 200

        result = response.json()  # Parse response JSON

        # Check if ACP server returned an error
        if result.get('error'):
            return {"output": f"[ERROR] {result['error'].get('message', 'Unknown error')}"}

        # Extract and return the first content part from output
        outputs = result.get('output', [])
        if outputs:
            first_output = outputs[0]
            if 'parts' in first_output and first_output['parts']:
                first_part = first_output['parts'][0]
                if 'content' in first_part:
                    return {"output": str(first_part['content']).strip()}

        return {"output": "[ERROR] No valid content found."}  # Fallback if no valid output

    except Exception as e:
        return {"output": f"[ERROR] Exception during call: {e}"}  # Catch and report exceptions
Enter fullscreen mode Exit fullscreen mode

Step 11: Run Evaluation

Now that everything is configured, it's time to run your first evaluation!

In the terminal, run the following command:

promptfoo eval

Enter fullscreen mode Exit fullscreen mode

What Are We Testing? ACP Agents or OpenAI Models?
When running acp-booksmith with Promptfoo, it’s important to understand what part of the system we are evaluating.

System architecture overview

We built a multi-agent system using IBM’s ACP (Agent Communication Protocol).

The ACP agents are:

  • outline → generates a book outline.
  • chapter → writes detailed chapters.
  • editor → polishes the text.
  • compiler → stitches everything together.

Each agent runs inside a Python server (agent.py) on:

http://localhost:8000

Enter fullscreen mode Exit fullscreen mode

Inside the agents, we use:

openai.AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

Enter fullscreen mode Exit fullscreen mode

to call GPT-4o models.

How Promptfoo fits into the system

  • Promptfoo does NOT connect directly to OpenAI models.
  • Instead, Promptfoo runs test cases defined in:
promptfooconfig.yaml

Enter fullscreen mode Exit fullscreen mode

It sends these prompts to:

http://localhost:8000/runs

Enter fullscreen mode Exit fullscreen mode

using provider.py, which talks to ACP agents.

The ACP agents receive the request, process it, and, inside their own logic, call OpenAI’s API to generate the response.

Why test through ACP agents?
We want to test how well our entire system works, not just the raw OpenAI output.

We care about:

  • Are the agents responding correctly?
  • Is the outline agent producing structured outlines?
  • Does the editor agent polish the text properly?
  • Can we stitch the book end-to-end?

This gives us a real-world evaluation of:

  • agent design,
  • orchestration,
  • and LLM usage, all together.

Step 12: Visualize and Analyze Agent Outputs with Promptfoo Web Viewer

Once you’ve completed your promptfoo eval and run:

promptfoo view

Enter fullscreen mode Exit fullscreen mode

You will see:

A local server starts at:

http://localhost:15500

Enter fullscreen mode Exit fullscreen mode

You can open it in your browser (just press y when asked).

What you’ll see on the web viewer
For each test case, you’ll get:

  • Description:
    What was tested (e.g., generate outline, draft chapter, edit content)

  • Variables:
    Input values, like book_title: The Quantum Cat's Journey.

  • Outputs:
    What the agent produced, e.g.,

  • Book title

  • Chapter summaries

  • Edited sections

  • Assertions (PASS/FAIL):
    Green ✅ if the output is a string and not empty (based on your promptfooconfig.yaml).

  • Comparison view:
    See side-by-side results from
    ACP Outline Agent, ACP Chapter Agent, ACP Editor Agent.

  • Model and agent details:
    Which agent was called, and what label was it given.

Step 13: Run Red Teaming to Stress-Test ACP Agents

After you’ve run a basic promptfoo eval and visually confirmed that your ACP agents (outline, chapter, editor) pass normal test cases, the next big move is:

Now that we’ve evaluated basic behavior, it’s time to turn up the heat — and dive into red teaming.

Promptfoo has a built-in Red Team mode designed for:

Surfacing weaknesses

Triggering edge-case behavior

Probing for harmful, biased, or rule-breaking outputs

🛠️ How to Activate It:
In the Promptfoo dashboard, look at the top bar.

Click on the dropdown next to “Evals”.

Choose the Red team from the menu.

Boom — you’re now in red teaming mode. The view will shift, highlighting pass/fail bars, vulnerability spread, and prompt score deltas even more clearly.

Step 14: Set Up Your Red Team Target Configuration

Target Type
➡ Select: Custom (or similar, since you’re targeting your own ACP agent)

Target Name

acp-booksmith-outline-agent

Enter fullscreen mode Exit fullscreen mode

(Use something meaningful. For example, if you test the chapter agent later (name it acp-booksmith-chapter-agent, etc.)

Target ID

file://./provider.py

Enter fullscreen mode Exit fullscreen mode

This matches your local provider setup. It connects the Red Team to your provider script.

Custom Configuration (JSON)

{
  "agent_name": "outline",
  "temperature": 0.5
}

Enter fullscreen mode Exit fullscreen mode

(Adjust "agent_name" to "chapter" or "editor" when testing other agents.)

Additional Configuration
Delay → leave empty or set e.g., 1 (1 second between requests)

Extension Hook → leave empty (unless you write custom Python hooks)

Prompts
Add your test prompts. Example:

Generate an outline for the book titled "The Quantum Cat's Journey"

Enter fullscreen mode Exit fullscreen mode

(You can later set a prompts list or dataset if you want to run many tests automatically.)

Step 15: Specify Your Usage Type – Are You Testing a Model or an App?

Now that you’ve configured your red team target, it’s time to define how you want to evaluate it — is this a model or a full application?

What You’re Seeing:
Promptfoo gives you two options here:

Option 1: I'm testing a model
This is what you want.
Since we’re directly red teaming Ollama running DeepSeek-R1, select this.

No need to simulate application flows or pass surrounding context.

You’ll go straight into prompt injection, safety probing, and reasoning stress tests.

Option 2: I'm testing an application
Only use this if you're evaluating an AI-powered product (like a chat assistant or multi-step agent with UI/API layers).

What to Do:
Click "I'm testing a model" on the right.

You’ll see a note confirming:

  • “You don't need to provide application details. You can proceed to configure the model and test scenarios in the next steps.”
  • Select “I’m testing an application” to define the red teaming context for the full ACP-booksmith system.
  • Under Main Purpose, describe that the system generates complete books via multi-agent collaboration using ACP.
  • Under Key Features, list outline generation, chapter drafting, editing, compilation, export, Gradio interface, and API endpoints.
  • Under Industry/Domain, fill in publishing, creative writing, education, AI tools, and content automation.
  • Under Specific Constraints, explain it only handles book-related prompts, uses OpenAI models via ACP, and ignores unrelated or malicious prompts.

Step 16: Plugin Configuration

  • Go to Plugin Configuration in Promptfoo Red Team setup.
  • Review all available plugin presets (like Recommended, Minimal Test, RAG, Foundation, Guardrails Evaluation, etc.).
  • For broad, balanced coverage, select Recommended — this runs a general set of tests across safety, robustness, and compliance.
  • If you want more specialized security or risk testing, optionally choose presets like OWASP LLM Top 10, Guardrails Evaluation, or MITRE.
  • Click Next after selection to apply these plugins to your red teaming run.

Select the Recommended preset — it’s designed for broad, balanced testing across safety, robustness, and compliance.

Step 17: Strategy Configuration

  • Go to the Strategy Configuration section in Promptfoo.
  • Select Custom mode to fine-tune your attack strategy selection.
  • Enable Single-shot Optimization (recommended, agent-based) — it optimizes one-turn attacks to bypass controls.
  • Enable Composite Jailbreaks (recommended) — it chains multiple attack methods for stronger testing.
  • Skip Basic or advanced multi-turn agents unless you want deeper experiments — focus on efficient, high-impact tests.

Step 18: Review and Finalize Your Configuration

This is the final checkpoint before Promptfoo launches the red team evaluation on your ACP-booksmith system.

Here’s what to review:

Plugins (39):
You’ve selected a broad and powerful set including:

  • Bias detection (e.g., bias:age, bias:race, bias:gender, bias:disability)
  • Privacy and sensitive data (e.g., pii:direct, pii:session, pii:api-db, harmful:privacy)
  • Safety and harmful content (e.g., harmful:self-harm, harmful:misinformation-disinformation, harmful:violent-crime, harmful:specialized-advice)
  • Injection and hacking risks (e.g., hijacking, harmful:cybercrime, harmful:cybercrime:malicious-code)

Strategies (2):
You’ve configured high-impact testing strategies:

  • Single-shot Optimization (Agent-based, single-turn attack optimization)
  • Composite Jailbreaks (Chains multiple attack vectors for enhanced effectiveness)

Final check:

  • Configuration description
  • All plugin categories cover your security, safety, and fairness concerns
  • Strategies are aligned with your goals

Step 19: Run Your Configuration (CLI or Browser)

You now have two options depending on your use case:

Option 1: Save and Run via CLI
Best for: Large-scale testing, automation, deeper debugging.

Click “Save YAML” – this downloads your configuration as a .yaml file.

On your terminal or VM where Promptfoo is installed, run:

promptfoo redteam run

Enter fullscreen mode Exit fullscreen mode

This command picks up your saved config and starts the red teaming process.

Why CLI?

Supports headless runs

Better logging and error tracing

CI/CD and repo integration

Option 2: Run Directly in the Browser
Best for: Simpler tests, quick feedback, small scans.

Click the blue “Run Now” button.

Promptfoo will start executing the configured tests in the web UI.

You’ll see model outputs and vulnerabilities flagged inline.

Since we are using Option 2, Promptfoo is:

Actively running your full configuration against the ACP-booksmith multi-agent system (powered by OpenAI models under ACP orchestration).

Using your selected plugins (39 types), including:

  • Bias detection (age, race, gender, disability)
  • Privacy & PII (e.g., pii:direct, pii:session, harmful:privacy)
  • Security & injection risks (e.g., hijacking, cybercrime, malicious code)
  • Harmful & unsafe content filters (e.g., self-harm, misinformation, violence)

Applying your chosen strategies:

  • Single-shot Optimization (agent-driven, one-turn attacks)
  • Composite Jailbreaks (multi-vector, chained attack paths)

Testing 6,240 probes — a large, high-coverage scan that simulates real-world attacks on AI-driven book generation systems!

Step 20: Review Results and Generate Vulnerability Report

After the tests finish running, Promptfoo shows you a detailed breakdown of model performance across various security domains and plugins.

Conclusion

Building acp-booksmith was more than just stringing together a few API calls.
It was about designing a collaborative system where AI agents play distinct roles — from outlining and drafting to editing and compiling —
and making sure they communicate, coordinate, and deliver like a true creative team.

But here’s the key insight: even the most elegant multi-agent system is only as good as its weakest link.
That’s where Promptfoo came in — it helped me uncover blind spots,
test the agents under pressure, and surface edge cases I would have never thought to check manually.

By pairing ACP’s agent orchestration with Promptfoo’s evaluation and red teaming,
I not only automated book creation — I made sure the system was robust, reliable, and responsible.

If you’re working on your own AI pipelines or agent frameworks,
I highly recommend adding Promptfoo to your stack —
because in the world of AI, trust isn’t built on magic, it’s built on testing.

Top comments (0)