Introduction
Have you ever wondered if AI could write an entire book — from idea to polished chapters — without human help?
What if multiple AI agents could collaborate, like a team of ghostwriters, editors, and publishers?
That’s exactly what I explored in this project:
✅ ACP (Agent Communication Protocol) to build a multi-agent system
✅ OpenAI GPT-4o to generate and edit text
✅ Promptfoo to evaluate the agents’ outputs automatically
In this post, I’ll share how I built acp-booksmith, an AI-powered book creation pipeline, how it works, and how I used Promptfoo to test it like a pro.
What is ACP (Agent Communication Protocol)?
ACP, developed by IBM, is an open standard that enables AI agents, apps, and humans to communicate smoothly, regardless of their underlying backend technology stack.
Think of it as a universal language for agents.
With ACP, I could easily connect multiple agents like:
- outline agent → drafts book structure
- chapter agent → writes full chapters
- editor agent → polishes text
- compiler agent → stitches the final book
They all run on a local server (http://localhost:8000) and talk to each other through standardized ACP calls.
What is Promptfoo?
Promptfoo is a powerful open-source framework for evaluating and stress-testing LLM systems, agents, and prompt chains.
Think of it as your AI quality assurance toolkit — it helps you:
- Define structured test cases (via YAML or CLI)
- Compare model or agent outputs across providers
- Run automated checks (e.g., “is the output non-empty?”, “does it follow the format?”)
- Visualize results in an interactive web viewer
- Launch red teaming campaigns to probe for safety, bias, and robustness issues
In this project, I used Promptfoo not just to test individual OpenAI model outputs, but to evaluate the full ACP-booksmith system, covering how all the agents work together to deliver a polished, end-to-end book-writing pipeline.
By combining ACP + Promptfoo, I got both system-level validation and security-level insights — all in one workflow.
Resources
Link: https://github.com/i-am-bee/acp
Link: https://github.com/promptfoo/promptfoo
Step-by-Step Process to Build and Evaluate an AI Book-Writing System with ACP and Promptfoo
Step 1: Set Up the Project Environment
Before diving in, make sure your system is ready:
python --version # >= 3.11
node --version # >= 20.x
npm --version # >= 10.x
Then, initialize the project:
uv init --python '>=3.11' my_acp_project
cd my_acp_project
uv add acp-sdk
Step 2: Install all required libraries & set OpenAI API key
Install Python libraries
Run this one command to install all needed dependencies:
pip install \
acp-sdk==1.0.0 \
fastapi==0.115.0 \
uvicorn==0.29.0 \
openai==1.30.1 \
gradio==4.28.3 \
reportlab==4.1.0 \
requests==2.32.3
This will install:
✅ acp-sdk → for the multi-agent protocol
✅ fastapi + uvicorn → for the server
✅ openai → for GPT calls
✅ gradio → for the web interface
✅ reportlab → for PDF generation
✅ requests → for HTTP calls
For Promptfoo Installation, run the following command:
npm install -g promptfoo
Export your OpenAI API key
Before running anything (main.py, agent.py, or Gradio app), set your API key:
export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
(Replace with your real key from the OpenAI account)
Step 3: Write the Agents (agent.py)
I built four key agents:
- outline agent → Generates a detailed book outline
- chapter agent → Writes a full chapter from a summary
- editor agent → Edits the chapter for style and clarity
- compiler agent → Combines all content into a single book
These agents use openai.AsyncOpenAI under the hood and communicate via ACP.
import asyncio
import os
from collections.abc import AsyncGenerator
import openai
from acp_sdk.models import Message
from acp_sdk.server import Context, RunYield, RunYieldResume, Server
# Initialize OpenAI async client using environment variable API key
client = openai.AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Create ACP server instance to register agents
server = Server()
# Helper function to call OpenAI API with given prompt and token limit
async def call_openai(prompt, max_tokens=1000):
try:
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
temperature=0.7,
max_tokens=max_tokens
)
return response.choices[0].message.content # Return generated text
except Exception as e:
print(f"[OpenAI API error]: {type(e).__name__}: {e}")
return "[Error: Failed to generate content]"
# Agent: Generates book outline based on title
@server.agent()
async def outline(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
title = input[0].parts[0].content # Extract title from input
prompt = f"Create a detailed book outline with chapters and sections for the book titled '{title}'."
outline_text = await call_openai(prompt) # Get outline from OpenAI
yield Message(parts=[{"content": outline_text, "content_type": "text/plain"}])
# Agent: Generates full chapter text (~3000 words) from chapter summary
@server.agent()
async def chapter(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
chapter_summary = input[0].parts[0].content # Extract chapter summary
prompt = f"Write a full book chapter (~3000 words) based on this summary:\n{chapter_summary}"
chapter_text = await call_openai(prompt, max_tokens=3000) # Get chapter draft
yield Message(parts=[{"content": chapter_text, "content_type": "text/plain"}])
# Agent: Edits chapter text for clarity, style, and coherence
@server.agent()
async def editor(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
raw_text = input[0].parts[0].content # Extract raw chapter text
prompt = f"Please edit and polish the following chapter for clarity, style, and coherence:\n\n{raw_text}"
edited_text = await call_openai(prompt, max_tokens=3000) # Get edited version
yield Message(parts=[{"content": edited_text, "content_type": "text/plain"}])
# Agent: Compiles all parts (outline + chapters) into one full text
@server.agent()
async def compiler(input: list[Message], context: Context) -> AsyncGenerator[RunYield, RunYieldResume]:
compiled = "\n\n".join(msg.parts[0].content for msg in input) # Concatenate all inputs
yield Message(parts=[{"content": compiled, "content_type": "text/plain"}])
# Run the ACP server to start serving agent endpoints
server.run()
Run them with:
uv run agent.py
Check they’re live:
curl http://localhost:8000/agents
Step 4: Create the Orchestrator (orchestrator.py)
This script:
- Calls each agent in order
- Collects outlines, chapters, edited content
- Writes output to final_book.txt and final_book.pdf using reportlab
The magic here? It acts like a project manager, coordinating the AI team.
import asyncio
from acp_sdk.client import Client
from acp_sdk.models import Message, MessagePart
from reportlab.pdfgen import canvas # Library to generate PDF files
# Helper function to call a specific agent with input text
async def call_agent(client, agent_name, input_text, model):
# Sends request to ACP agent and returns the content of the response
run = await client.run_sync(
agent=agent_name,
input=[Message(parts=[MessagePart(content=input_text, content_type="text/plain")])]
)
return run.output[0].parts[0].content
# Main orchestrator function to run full book creation pipeline
async def main(title="The Quantum Cat's Journey", model="gpt-4o", progress_callback=None):
async with Client(base_url="http://localhost:8000") as client:
if progress_callback:
progress_callback(0.05) # Update progress bar if using UI (like Gradio)
# Step 1: Generate book outline
outline = await call_agent(client, "outline", title, model)
if progress_callback:
progress_callback(0.2)
chapters = []
# Step 2: Generate 3 chapters (can increase this later if desired)
for i in range(1, 4):
chapter_prompt = f"{outline} - Chapter {i}" # Prepare chapter input
chapter_content = await call_agent(client, "chapter", chapter_prompt, model)
if progress_callback:
progress_callback(0.2 + i * 0.15)
# Step 3: Edit chapter using editor agent
edited_chapter = await client.run_sync(
agent="editor",
input=[Message(parts=[MessagePart(content=chapter_content, content_type="text/plain")])]
)
chapters.append(edited_chapter.output[0].parts[0].content)
# Step 4: Combine outline + chapters into full book text
full_book = f"{outline}\n\n" + "\n\n".join(chapters)
with open("final_book.txt", "w") as f:
f.write(full_book)
if progress_callback:
progress_callback(0.85)
# Step 5: Export final book to PDF format
pdf = canvas.Canvas("final_book.pdf")
pdf.setFont("Helvetica", 12)
y = 800 # Set initial vertical position on PDF page
for line in full_book.split("\n"):
pdf.drawString(50, y, line[:100]) # Draw text line, truncate if too long
y -= 15 # Move down by 15 pixels
if y < 50: # If near bottom, start new page
pdf.showPage()
pdf.setFont("Helvetica", 12)
y = 800
pdf.save() # Save the PDF file
if progress_callback:
progress_callback(1.0) # Mark as complete in UI if applicable
Step 5: Build a CLI (main.py)
To make it user-friendly, I added:
- A CLI menu to run the full book generation pipeline
- Option to extend later with more commands or features
import asyncio
import sys
from orchestrator import (
main as orchestrator_main, # Import the orchestrator main function
)
# Function to display a simple text menu in the terminal
def print_menu():
print("\nWelcome to acp-booksmith!")
print("Select an option:")
print("1. Run book generation workflow")
print("2. Exit")
# Main loop function for CLI (Command Line Interface)
def main():
while True:
print_menu() # Show the menu options
choice = input("Enter choice [1-2]: ") # Get user input
if choice == "1":
asyncio.run(orchestrator_main()) # Run orchestrator async function to generate book
print("\n✅ Book generation completed! Check final_book.txt and final_book.pdf.\n")
elif choice == "2":
print("Goodbye!") # Exit message
sys.exit() # Exit the program
else:
print("Invalid choice. Please enter 1 or 2.") # Handle invalid input
# Entry point when running script directly
if __name__ == "__main__":
main()
Now you can just run:
python3 main.py
And it’ll walk you through the process.
After setting up the agent.py, orchestrator.py, and main.py files, we run our book system in the terminal to check if everything works end-to-end. We start the ACP server with uv run agent.py and then open another terminal to send test prompts (usually three to four), like generating an outline, drafting chapters, or editing content using curl commands. This allows us to confirm that the agents communicate correctly, OpenAI API calls succeed, and we receive polished outputs in both text and PDF formats — all orchestrated smoothly by the system.
Prompt 1 — Generate Outline
curl -X POST http://localhost:8000/runs -H "Content-Type: application/json" -d '{"agent_name": "outline", "input": [{"role": "user", "parts": [{"content": "The Quantum Cat'\''s Journey", "content_type": "text/plain"}]}]}'
Prompt 2 - Chapter Agent
curl -X POST http://localhost:8000/runs -H "Content-Type: application/json" -d '{"agent_name": "chapter", "input": [{"role": "user", "parts": [{"content": "Chapter 1: The Cat Enters the Quantum Realm", "content_type": "text/plain"}]}]}'
Prompt 3 - Editor Agent
curl -X POST http://localhost:8000/runs \
-H "Content-Type: application/json" \
-d '{
"agent_name": "editor",
"input": [
{
"role": "user",
"parts": [
{ "content": "This is a raw chapter draft that needs editing for clarity and flow.", "content_type": "text/plain" }
]
}
]
}'
Prompt 4 - Compiler Agent
curl -X POST http://localhost:8000/runs \
-H "Content-Type: application/json" \
-d '{
"agent_name": "compiler",
"input": [
{
"role": "user",
"parts": [
{ "content": "Outline content here", "content_type": "text/plain" }
]
},
{
"role": "user",
"parts": [
{ "content": "Chapter 1 content here", "content_type": "text/plain" }
]
},
{
"role": "user",
"parts": [
{ "content": "Chapter 2 content here", "content_type": "text/plain" }
]
}
]
}'
Step 6: Add a Browser UI with Gradio (gradio_app.py)
Not everyone loves the terminal, so I added a Gradio app!
import asyncio
import os
import shutil
import gradio as gr
from orchestrator import main # Import orchestrator to run agent pipeline
# Async function to generate the book using orchestrator and update Gradio progress bar
async def generate_book_async(title, model, progress=gr.Progress()):
# Clear old book files if they exist
for file in ["final_book.txt", "final_book.pdf"]:
if os.path.exists(file):
os.remove(file)
# Run the orchestrator with given title + model, passing in progress callback
await main(title, model=model, progress_callback=progress)
# Read final book text from generated TXT file
with open("final_book.txt", "r") as f:
book_text = f.read()
# Return book text + file paths for download components
return book_text, "final_book.txt", "final_book.pdf"
# Wrapper to run async function inside sync Gradio button click
def generate_book(title, model):
return asyncio.run(generate_book_async(title, model))
# Build Gradio interface
with gr.Blocks() as demo:
gr.Markdown("# 🐱 Quantum Cat Book Generator") # App title
gr.Markdown("Enter a book title, pick a model, and generate a complete polished book with TXT and PDF downloads.")
with gr.Row():
title_input = gr.Textbox(label="Title", placeholder="Enter book title...") # Input box for title
model_selector = gr.Dropdown(choices=["gpt-4o", "gpt-3.5-turbo"], value="gpt-4o", label="Model") # Model dropdown
output_text = gr.Textbox(label="Generated Book", lines=20) # Output textbox to display book
txt_download = gr.File(label="Download TXT") # Download button for .txt
pdf_download = gr.File(label="Download PDF") # Download button for .pdf
generate_btn = gr.Button("🚀 Generate Book") # Main action button
# Link button click to generate_book function with inputs and outputs
generate_btn.click(
fn=generate_book,
inputs=[title_input, model_selector],
outputs=[output_text, txt_download, pdf_download]
)
# Launch the Gradio app on localhost:7860
demo.launch(share=True)
This lets you:
- Enter a book title
- Choose the OpenAI model (gpt-4o or gpt-3.5-turbo)
- Click “Generate” and get the full book in the browser, with TXT and PDF download buttons
Launch it with:
python3 gradio app.py
Open in your browser at:
http://localhost:7860
Step 7: Launch Promptfoo Interactive CLI
Once Promptfoo is installed and the version is verified, run the following command to open the interactive CLI:
promptfoo init
You'll see a terminal-based interface prompting:
"What would you like to do?"
Use your arrow keys to navigate and select your intention. You can choose from:
- Not sure yet (explore options)
- Improve prompt and model performance
- Improve RAG performance
- Improve agent/chain of thought performance
- Run a red team evaluation
Step 8: Choose Your First Model Provider (We’re Only Using OpenAI Here)
After choosing your evaluation goal, Promptfoo will ask:
"Which model providers would you like to use?"
In this guide, we're using OpenAI as the model provider.
- Use the arrow keys to select OpenAI
- Hit space to check the box
- Then press Enter to continue
Step 9: Initialize Promptfoo Evaluation
Once you've selected the model provider (in this case, we’re starting with OpenAI), Promptfoo will automatically generate the necessary setup files:
- README.md
- promptfooconfig.yaml
Step 10: Write Promptfoo Configuration
promptfooconfig.yaml
- Defines test prompts, agents, and JS-based assertions
description: 'ACP Agent Evaluation' # Description of this evaluation suite
prompts:
- '{{book_title}}' # Dynamic prompt variable used in each test case
providers:
- id: file://./provider.py # Connects to local provider script
label: ACP Outline Agent # Label shown in Promptfoo UI
config:
agent_name: outline # Tell provider.py to call the 'outline' agent
- id: file://./provider.py
label: ACP Chapter Agent
config:
agent_name: chapter # Tell provider.py to call the 'chapter' agent
- id: file://./provider.py
label: ACP Editor Agent
config:
agent_name: editor # Tell provider.py to call the 'editor' agent
defaultTest:
assert:
# ✅ Check the output is a string (using JS in Promptfoo)
- type: javascript
value: typeof output === 'string'
# ✅ Check the output is not an empty string
- type: javascript
value: output.trim().length > 0
tests:
- description: 'Generate outline for book' # Test outline agent
vars:
book_title: "The Quantum Cat's Journey"
- description: 'Generate chapter draft' # Test chapter agent
vars:
book_title: "The Quantum Cat's Journey - Chapter 1"
- description: 'Edit draft content' # Test editor agent
vars:
book_title: "Refine The Quantum Cat's Journey draft"
provider.py
- Sends HTTP POST to localhost:8000/runs for each agent
- Extracts clean text outputs
- Returns result to Promptfoo
import requests # Import HTTP requests library
def call_api(prompt, config=None, context=None):
agent_name = config.get("agent_name", "outline") # Get agent name from config, default to 'outline'
url = "http://localhost:8000/runs" # ACP server endpoint
payload = {
"input": [{
"text": prompt, # Original prompt text
"parts": [{
"type": "text", # Content type (text)
"content": prompt # Content body
}]
}],
"agent_name": agent_name # Target agent to call (outline, chapter, editor)
}
headers = {"Content-Type": "application/json"} # Set JSON header
try:
response = requests.post(url, json=payload, headers=headers) # Make POST request to ACP server
response.raise_for_status() # Raise error if HTTP response is not 200
result = response.json() # Parse response JSON
# Check if ACP server returned an error
if result.get('error'):
return {"output": f"[ERROR] {result['error'].get('message', 'Unknown error')}"}
# Extract and return the first content part from output
outputs = result.get('output', [])
if outputs:
first_output = outputs[0]
if 'parts' in first_output and first_output['parts']:
first_part = first_output['parts'][0]
if 'content' in first_part:
return {"output": str(first_part['content']).strip()}
return {"output": "[ERROR] No valid content found."} # Fallback if no valid output
except Exception as e:
return {"output": f"[ERROR] Exception during call: {e}"} # Catch and report exceptions
Step 11: Run Evaluation
Now that everything is configured, it's time to run your first evaluation!
In the terminal, run the following command:
promptfoo eval
What Are We Testing? ACP Agents or OpenAI Models?
When running acp-booksmith with Promptfoo, it’s important to understand what part of the system we are evaluating.
System architecture overview
We built a multi-agent system using IBM’s ACP (Agent Communication Protocol).
The ACP agents are:
- outline → generates a book outline.
- chapter → writes detailed chapters.
- editor → polishes the text.
- compiler → stitches everything together.
Each agent runs inside a Python server (agent.py) on:
http://localhost:8000
Inside the agents, we use:
openai.AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
to call GPT-4o models.
How Promptfoo fits into the system
- Promptfoo does NOT connect directly to OpenAI models.
- Instead, Promptfoo runs test cases defined in:
promptfooconfig.yaml
It sends these prompts to:
http://localhost:8000/runs
using provider.py, which talks to ACP agents.
The ACP agents receive the request, process it, and, inside their own logic, call OpenAI’s API to generate the response.
Why test through ACP agents?
We want to test how well our entire system works, not just the raw OpenAI output.
We care about:
- Are the agents responding correctly?
- Is the outline agent producing structured outlines?
- Does the editor agent polish the text properly?
- Can we stitch the book end-to-end?
This gives us a real-world evaluation of:
- agent design,
- orchestration,
- and LLM usage, all together.
Step 12: Visualize and Analyze Agent Outputs with Promptfoo Web Viewer
Once you’ve completed your promptfoo eval and run:
promptfoo view
You will see:
A local server starts at:
http://localhost:15500
You can open it in your browser (just press y when asked).
What you’ll see on the web viewer
For each test case, you’ll get:
Description:
What was tested (e.g., generate outline, draft chapter, edit content)Variables:
Input values, like book_title: The Quantum Cat's Journey.Outputs:
What the agent produced, e.g.,Book title
Chapter summaries
Edited sections
Assertions (PASS/FAIL):
Green ✅ if the output is a string and not empty (based on your promptfooconfig.yaml).Comparison view:
See side-by-side results from
ACP Outline Agent, ACP Chapter Agent, ACP Editor Agent.Model and agent details:
Which agent was called, and what label was it given.
Step 13: Run Red Teaming to Stress-Test ACP Agents
After you’ve run a basic promptfoo eval and visually confirmed that your ACP agents (outline, chapter, editor) pass normal test cases, the next big move is:
Now that we’ve evaluated basic behavior, it’s time to turn up the heat — and dive into red teaming.
Promptfoo has a built-in Red Team mode designed for:
Surfacing weaknesses
Triggering edge-case behavior
Probing for harmful, biased, or rule-breaking outputs
🛠️ How to Activate It:
In the Promptfoo dashboard, look at the top bar.
Click on the dropdown next to “Evals”.
Choose the Red team from the menu.
Boom — you’re now in red teaming mode. The view will shift, highlighting pass/fail bars, vulnerability spread, and prompt score deltas even more clearly.
Step 14: Set Up Your Red Team Target Configuration
Target Type
➡ Select: Custom (or similar, since you’re targeting your own ACP agent)
Target Name
acp-booksmith-outline-agent
(Use something meaningful. For example, if you test the chapter agent later (name it acp-booksmith-chapter-agent, etc.)
Target ID
file://./provider.py
This matches your local provider setup. It connects the Red Team to your provider script.
Custom Configuration (JSON)
{
"agent_name": "outline",
"temperature": 0.5
}
(Adjust "agent_name" to "chapter" or "editor" when testing other agents.)
Additional Configuration
Delay → leave empty or set e.g., 1 (1 second between requests)
Extension Hook → leave empty (unless you write custom Python hooks)
Prompts
Add your test prompts. Example:
Generate an outline for the book titled "The Quantum Cat's Journey"
(You can later set a prompts list or dataset if you want to run many tests automatically.)
Step 15: Specify Your Usage Type – Are You Testing a Model or an App?
Now that you’ve configured your red team target, it’s time to define how you want to evaluate it — is this a model or a full application?
What You’re Seeing:
Promptfoo gives you two options here:
Option 1: I'm testing a model
This is what you want.
Since we’re directly red teaming Ollama running DeepSeek-R1, select this.
No need to simulate application flows or pass surrounding context.
You’ll go straight into prompt injection, safety probing, and reasoning stress tests.
Option 2: I'm testing an application
Only use this if you're evaluating an AI-powered product (like a chat assistant or multi-step agent with UI/API layers).
What to Do:
Click "I'm testing a model" on the right.
You’ll see a note confirming:
- “You don't need to provide application details. You can proceed to configure the model and test scenarios in the next steps.”
- Select “I’m testing an application” to define the red teaming context for the full ACP-booksmith system.
- Under Main Purpose, describe that the system generates complete books via multi-agent collaboration using ACP.
- Under Key Features, list outline generation, chapter drafting, editing, compilation, export, Gradio interface, and API endpoints.
- Under Industry/Domain, fill in publishing, creative writing, education, AI tools, and content automation.
- Under Specific Constraints, explain it only handles book-related prompts, uses OpenAI models via ACP, and ignores unrelated or malicious prompts.
Step 16: Plugin Configuration
- Go to Plugin Configuration in Promptfoo Red Team setup.
- Review all available plugin presets (like Recommended, Minimal Test, RAG, Foundation, Guardrails Evaluation, etc.).
- For broad, balanced coverage, select Recommended — this runs a general set of tests across safety, robustness, and compliance.
- If you want more specialized security or risk testing, optionally choose presets like OWASP LLM Top 10, Guardrails Evaluation, or MITRE.
- Click Next after selection to apply these plugins to your red teaming run.
Select the Recommended preset — it’s designed for broad, balanced testing across safety, robustness, and compliance.
Step 17: Strategy Configuration
- Go to the Strategy Configuration section in Promptfoo.
- Select Custom mode to fine-tune your attack strategy selection.
- Enable Single-shot Optimization (recommended, agent-based) — it optimizes one-turn attacks to bypass controls.
- Enable Composite Jailbreaks (recommended) — it chains multiple attack methods for stronger testing.
- Skip Basic or advanced multi-turn agents unless you want deeper experiments — focus on efficient, high-impact tests.
Step 18: Review and Finalize Your Configuration
This is the final checkpoint before Promptfoo launches the red team evaluation on your ACP-booksmith system.
Here’s what to review:
Plugins (39):
You’ve selected a broad and powerful set including:
- Bias detection (e.g., bias:age, bias:race, bias:gender, bias:disability)
- Privacy and sensitive data (e.g., pii:direct, pii:session, pii:api-db, harmful:privacy)
- Safety and harmful content (e.g., harmful:self-harm, harmful:misinformation-disinformation, harmful:violent-crime, harmful:specialized-advice)
- Injection and hacking risks (e.g., hijacking, harmful:cybercrime, harmful:cybercrime:malicious-code)
Strategies (2):
You’ve configured high-impact testing strategies:
- Single-shot Optimization (Agent-based, single-turn attack optimization)
- Composite Jailbreaks (Chains multiple attack vectors for enhanced effectiveness)
Final check:
- Configuration description
- All plugin categories cover your security, safety, and fairness concerns
- Strategies are aligned with your goals
Step 19: Run Your Configuration (CLI or Browser)
You now have two options depending on your use case:
Option 1: Save and Run via CLI
Best for: Large-scale testing, automation, deeper debugging.
Click “Save YAML” – this downloads your configuration as a .yaml file.
On your terminal or VM where Promptfoo is installed, run:
promptfoo redteam run
This command picks up your saved config and starts the red teaming process.
Why CLI?
Supports headless runs
Better logging and error tracing
CI/CD and repo integration
Option 2: Run Directly in the Browser
Best for: Simpler tests, quick feedback, small scans.
Click the blue “Run Now” button.
Promptfoo will start executing the configured tests in the web UI.
You’ll see model outputs and vulnerabilities flagged inline.
Since we are using Option 2, Promptfoo is:
Actively running your full configuration against the ACP-booksmith multi-agent system (powered by OpenAI models under ACP orchestration).
Using your selected plugins (39 types), including:
- Bias detection (age, race, gender, disability)
- Privacy & PII (e.g., pii:direct, pii:session, harmful:privacy)
- Security & injection risks (e.g., hijacking, cybercrime, malicious code)
- Harmful & unsafe content filters (e.g., self-harm, misinformation, violence)
Applying your chosen strategies:
- Single-shot Optimization (agent-driven, one-turn attacks)
- Composite Jailbreaks (multi-vector, chained attack paths)
Testing 6,240 probes — a large, high-coverage scan that simulates real-world attacks on AI-driven book generation systems!
Step 20: Review Results and Generate Vulnerability Report
After the tests finish running, Promptfoo shows you a detailed breakdown of model performance across various security domains and plugins.
Conclusion
Building acp-booksmith was more than just stringing together a few API calls.
It was about designing a collaborative system where AI agents play distinct roles — from outlining and drafting to editing and compiling —
and making sure they communicate, coordinate, and deliver like a true creative team.
But here’s the key insight: even the most elegant multi-agent system is only as good as its weakest link.
That’s where Promptfoo came in — it helped me uncover blind spots,
test the agents under pressure, and surface edge cases I would have never thought to check manually.
By pairing ACP’s agent orchestration with Promptfoo’s evaluation and red teaming,
I not only automated book creation — I made sure the system was robust, reliable, and responsible.
If you’re working on your own AI pipelines or agent frameworks,
I highly recommend adding Promptfoo to your stack —
because in the world of AI, trust isn’t built on magic, it’s built on testing.
Top comments (0)