SurfaceDocs + LangChain: Give Your Agents an Output Layer

#ai #python #machinelearning #productivity

Your LangChain agent just spent 30 seconds orchestrating five tools, synthesizing three sources, and producing a solid competitive analysis. It printed to stdout. Now it's gone.

If you've built anything serious with LangChain agents, you've hit this wall. The generation is the easy part. The "where does the output live?" question is what eats your weekend.

The Output Problem

Every LangChain project eventually faces the same fork:

Console output — Fine for debugging. Useless for sharing with stakeholders.
Write to files — Now you need a file server, naming conventions, and a way to find things later.
Custom React app — Three days of work for a document viewer nobody asked for.
Dump to a database — Great, now build a frontend to read it.

None of these are the agent's job. The agent's job is generating output. But without a destination, you end up building bespoke infrastructure for every project. A research agent gets a different viewer than a report generator, which gets a different viewer than a summarization pipeline.

What you actually want is a publish() call that gives you back a URL. That's it.

The Simplest Integration

pip install langchain langchain-openai surfacedocs

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from surfacedocs import SurfaceDocs, DOCUMENT_SCHEMA, SYSTEM_PROMPT

llm = ChatOpenAI(model="gpt-4o")
docs = SurfaceDocs()  # reads SURFACEDOCS_API_KEY from env

prompt = ChatPromptTemplate.from_messages([
    ("system", SYSTEM_PROMPT),
    ("human", "Write a technical overview of WebSocket connection pooling strategies."),
])

chain = prompt | llm
response = chain.invoke({})

result = docs.save(response.content)
print(result.url)  # https://app.surfacedocs.dev/d/abc123

That's the whole thing. SYSTEM_PROMPT tells the LLM to structure its output in SurfaceDocs' block format. docs.save() parses the response and publishes it. You get a URL back.

The key insight: SurfaceDocs ships a SYSTEM_PROMPT and DOCUMENT_SCHEMA that teach the LLM how to format output as structured blocks (headings, paragraphs, code, tables, lists). The LLM does the formatting. The SDK does the publishing. You write neither.

Building a Research Agent

A real agent does more than call an LLM once. Here's a research agent that searches the web, synthesizes findings, and publishes a shareable report:

from langchain_openai import ChatOpenAI
from langchain_community.tools import TavilySearchResults
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.prebuilt import create_react_agent
from surfacedocs import SurfaceDocs, SYSTEM_PROMPT

# Tools
search = TavilySearchResults(max_results=5)

# Agent
llm = ChatOpenAI(model="gpt-4o", temperature=0.2)
agent = create_react_agent(llm, [search])

# Run the research task
research_prompt = """Research the current state of WebAssembly adoption in server-side 
applications. Cover: major runtimes, production use cases, performance characteristics, 
and limitations. Be thorough — this will be published as a report."""

result = agent.invoke({
    "messages": [
        SystemMessage(content=f"""You are a senior technical researcher. 
When producing your final report, format it using these instructions:

{SYSTEM_PROMPT}

Use heading blocks for sections, code blocks for examples, 
and paragraph blocks for analysis."""),
        HumanMessage(content=research_prompt),
    ]
})

# Extract the final message content
final_message = result["messages"][-1].content

# Publish
docs = SurfaceDocs()
doc = docs.save(final_message)
print(f"Published: {doc.url}")

The agent searches the web, reads results, reasons about them, and produces a structured report. The last two lines publish it. Your stakeholder gets a link, not a terminal screenshot.

For richer metadata tracking (model, agent name, run ID), use save_raw() instead — it accepts a metadata dict alongside explicit title and blocks. Useful when you're running multiple agents and need to trace which one produced what.

The Custom Tool Approach

The examples above publish after the agent finishes. But sometimes you want the agent itself to decide when to publish — maybe it produces intermediate artifacts, or it needs to publish different documents for different sub-tasks.

Make SurfaceDocs a LangChain tool:

from langchain_core.tools import tool
from surfacedocs import SurfaceDocs

docs = SurfaceDocs()

@tool
def publish_document(
    title: str,
    content: str,
    doc_type: str = "report",
) -> str:
    """Publish a document to SurfaceDocs and return the shareable URL.

    Use this when you've completed a piece of analysis, report, or research
    that should be saved as a permanent, shareable document.

    Args:
        title: Document title.
        content: Full document content in markdown.
        doc_type: Type of document (report, analysis, summary, research).

    Returns:
        The shareable URL for the published document.
    """
    result = docs.save_raw(
        title=title,
        blocks=[
            {"type": "heading", "content": title, "metadata": {"level": 1}},
            {"type": "paragraph", "content": content},
        ],
        metadata={"type": doc_type, "source": "langchain-agent"},
    )
    return f"Published: {result.url}"

For richer documents, parse the markdown into proper blocks:

@tool
def publish_structured_document(
    title: str,
    sections: list[dict],
    doc_type: str = "report",
) -> str:
    """Publish a structured document with multiple sections to SurfaceDocs.

    Args:
        title: Document title.
        sections: List of dicts with 'heading' and 'content' keys.
        doc_type: Type of document.

    Returns:
        The shareable URL for the published document.
    """
    blocks = [{"type": "heading", "content": title, "metadata": {"level": 1}}]

    for section in sections:
        blocks.append({"type": "heading", "content": section["heading"], "metadata": {"level": 2}})
        blocks.append({"type": "paragraph", "content": section["content"]})

    result = docs.save_raw(
        title=title,
        blocks=blocks,
        metadata={"type": doc_type, "source": "langchain-agent"},
    )
    return f"Published: {result.url}"

Now wire it into an agent:

from langchain_openai import ChatOpenAI
from langchain_community.tools import TavilySearchResults
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI(model="gpt-4o")
tools = [TavilySearchResults(max_results=5), publish_structured_document]
agent = create_react_agent(llm, tools)

result = agent.invoke({
    "messages": [
        HumanMessage(content="""Research the top 3 Python web frameworks in 2025. 
        For each, analyze performance, ecosystem, and developer experience. 
        Publish your findings as a structured report.""")
    ]
})

The agent researches, then calls publish_structured_document on its own when it's ready. The tool description guides when it gets used — write it clearly and the agent makes good decisions about publishing timing.

This pattern scales well. You can have agents that publish multiple documents per run, organize them into folders, and build up a library of artifacts over time.

What You Get

Once output is published, you're working with hosted documents instead of terminal output:

Shareable URLs — Every document gets a permanent link. Send it in Slack, embed it in a ticket, share it with a client. No "let me copy-paste this for you."
Folder organization — Pass a folder_id to save() or save_raw() to group related documents. Research reports in one folder, weekly summaries in another.
Metadata tracking — Tag documents with model, agent, pipeline, run ID, whatever you need. Useful when you're debugging why Tuesday's report looks different from Monday's.
Structured blocks — Documents aren't just markdown blobs. They're structured: headings, code blocks with syntax highlighting, tables, Mermaid diagrams. The rendering is handled for you.
Zero infrastructure — No S3 buckets, no static site generators, no React apps. The output layer is managed.

The block format also means your documents render consistently. Code blocks get syntax highlighting. Tables are actual tables. Mermaid diagrams render as diagrams. You're not fighting with markdown rendering quirks in whatever viewer you duct-taped together.

Wrapping Up

The pattern is simple: LangChain agents generate, SurfaceDocs publishes. Whether you publish after the agent runs or give the agent a tool to publish itself depends on your use case — both work, and the code is minimal either way.

Check out the SurfaceDocs docs to get an API key and start publishing. The Python SDK is pip install surfacedocs and you're about three lines of code from your first published agent output.