How to Let an AI Agent Generate Real PDFs (with an MCP server)

#ai #mcp #claude #tutorial

AI agents are great at producing text. But the moment you need an actual document — an invoice, a report, a certificate — they fall apart. You get markdown you have to format yourself, or (with Code Interpreter) a rough PDF from a Python lib with generic fonts and tables that never quite look right. There's no clean way to hand an agent the job of "produce a polished PDF."

So I gave my agents one tool that does exactly that, over MCP (Model Context Protocol). The agent describes the document, and gets back a link to a finished, editable PDF. Here's how to wire it up in about 5 lines of config.

Disclosure: I work on PDFMakerAPI — but the pattern here (give an agent a single, well-scoped tool that returns a reviewable artifact) applies to any MCP server. The config below just happens to use mine.

30-second MCP refresher

MCP is the open standard for giving an AI agent tools. You point your client (Claude Desktop, Cursor, Windsurf, Cline, VS Code, ChatGPT…) at an MCP server, and the agent can now call whatever functions that server exposes. No glue code, no custom integration per client — that's the whole appeal.

The server we're adding exposes exactly one tool: create_document. (One tool on purpose — a small, predictable surface area is easier for an agent to use correctly than a grab-bag of twenty.)

Step 1 — Add the server

Drop this into your client's MCP config (Claude Desktop: Settings → Developer → Edit Config; Cursor: ~/.cursor/mcp.json; same idea elsewhere):

{
  "mcpServers": {
    "pdfmakerapi": {
      "command": "npx",
      "args": ["-y", "@pdfmakerapi/mcp"]
    }
  }
}

Restart the client. That's the entire setup — no API key, no account.

Using a web client like Claude.ai or ChatGPT that can't run npx? Add the hosted endpoint as a custom connector instead:

https://api.pdfmakerapi.com/mcp

Step 2 — Ask for a document

Now just ask the agent in plain English:

"Make a professional invoice for Acme Corp — 3 line items, net 30."

The agent calls create_document, and you get back a link like:

https://app.pdfmakerapi.com/d/019ee2fe-c503-71c0-aaf6-68cf33ca096c

Open it and you'll see the finished invoice — and you can edit any field before downloading the PDF. Here's a real one to click:
👉 Live invoice example

It works for invoices, receipts, certificates, reports, resumes, letters — if the agent can describe it, it can lay it out with headings, tables, and your data.

Why the "editable link" matters for agents

This is the part I actually care about, and it's why I didn't make the tool just spit out a final PDF.

If you've spent any time running agents in production, you know the failure mode isn't "bad answer" — it's a wrong action that's already done. An agent that emails the wrong customer or files the wrong number is worse than one that just says something dumb.

Returning an editable document instead of a finalized file builds a natural human-in-the-loop checkpoint into the workflow:

The agent drafts the document from the request.
A human opens the link, checks it, fixes anything, and downloads.

So the agent does the tedious 90% (layout, structure, pulling the data together), but a person stays in control of what actually ships. For anything that leaves the building — an invoice to a client, a certificate with someone's name on it — that checkpoint is worth a lot more than full autonomy.

What it does not do (so you scope it right)

Being straight about the edges:

It generates documents — it does not read or edit a PDF you upload. Different problem.
It returns an editable web document + a downloadable PDF, not a blank fillable form to hand out.

If your agent needs to produce structured documents from data or a prompt, this fits. If it needs to parse existing PDFs, you want a different tool.

Under the hood (for the curious)

create_document takes a document model — a small JSON tree of nodes (text, containers, tables) with optional {{variables}} — and stores it, returning the share link. The agent can build that structure from a prompt, or you can POST it yourself to the same REST API if you'd rather skip the agent entirely. It's open source (MIT) if you want to read the tool definition: github.com/GerardoBarrera/pdfmakerapi-mcp.

Wrap-up

Giving an agent the ability to produce a real, reviewable document turned out to be a small change with a big payoff:

Add one MCP server (~5 lines).
Ask for the document in plain English.
Get an editable link a human can sign off on before it ships.

No markdown-to-PDF wrangling, no headless browser, and a built-in review step instead of blind autonomy.

If you're building agents that produce deliverables (not just text): do you let the agent finalize, or always hand off to a human to review first? Genuinely curious how others are drawing that line — drop it in the comments.