SurfaceDocs + LlamaIndex: From RAG Pipeline to Shareable Report

#ai #python #productivity #machinelearning

SurfaceDocs + LlamaIndex: From RAG Pipeline to Shareable Report

Your RAG pipeline answers questions beautifully. It retrieves the right chunks, synthesizes a coherent response, even cites its sources. Then the output prints to stdout and dies. What if every answer your pipeline produced was instantly a shareable, hosted document?

The last mile problem

RAG pipelines are the backbone of most production LLM applications. You invest real engineering effort into chunking strategies, embedding models, retrieval tuning, prompt design. The analysis is the hard part, and we've gotten good at it.

But sharing the result? That's where things fall apart. You copy-paste into a Google Doc. You screenshot a notebook cell. You build a bespoke Flask app to render responses. You email a JSON blob and hope someone reads it. Every team reinvents this output layer, and it's never the part anyone wants to work on.

The pipeline deserves a better destination than print(response).

The minimal version: query and publish

Let's start with something concrete. A LlamaIndex RAG pipeline that queries a document collection and saves the response to SurfaceDocs. No agent framework, no complexity — just the core loop.

from datetime import datetime, timezone
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from surfacedocs import SurfaceDocs

# Build the index
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Query
query = "What are the key architectural decisions and their tradeoffs?"
response = query_engine.query(query)

# Publish
sd = SurfaceDocs()  # uses SURFACEDOCS_API_KEY env var
result = sd.save_raw(
    title=f"Research: {query[:80]}",
    blocks=[
        {"type": "heading", "content": "Query", "metadata": {"level": 2}},
        {"type": "paragraph", "content": query},
        {"type": "heading", "content": "Analysis", "metadata": {"level": 2}},
        {"type": "paragraph", "content": str(response)},
        {"type": "heading", "content": "Sources", "metadata": {"level": 2}},
        {"type": "list", "content": "\n".join(
            f"- {node.metadata.get('file_name', 'unknown')} (score: {node.score:.3f})"
            for node in response.source_nodes
        ), "metadata": {"listType": "bullet"}},
    ],
    metadata={
        "query": query,
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "source_count": len(response.source_nodes),
    }
)

print(f"Published: {result.url}")

That's it. Your RAG response now has a URL. Send it in Slack, drop it in a ticket, bookmark it for later. The response, its sources, and the original query are all preserved together as a formatted document.

The source_nodes part matters. RAG without provenance is just vibes. By extracting the source filenames and relevance scores into the published doc, anyone reading it can evaluate the answer's grounding — not just take it on faith.

Building a research agent

The basic version works, but it's a script. Let's make the pipeline smarter. LlamaIndex's FunctionAgent lets you build agents that use tools — including a tool that publishes findings as it works.

import asyncio
from datetime import datetime, timezone
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
from surfacedocs import SurfaceDocs

# Setup
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

sd = SurfaceDocs()

# Tools
async def search_docs(query: str) -> str:
    """Search the document index for relevant information."""
    response = query_engine.query(query)
    sources = [n.metadata.get("file_name", "unknown") for n in response.source_nodes]
    return f"{response}\n\nSources: {', '.join(sources)}"

async def publish_report(title: str, summary: str, findings: str, recommendations: str) -> str:
    """Publish a research report to SurfaceDocs. Call this when your analysis is complete."""
    result = sd.save_raw(
        title=title,
        blocks=[
            {"type": "heading", "content": "Executive Summary", "metadata": {"level": 1}},
            {"type": "paragraph", "content": summary},
            {"type": "divider", "content": ""},
            {"type": "heading", "content": "Findings", "metadata": {"level": 2}},
            {"type": "paragraph", "content": findings},
            {"type": "heading", "content": "Recommendations", "metadata": {"level": 2}},
            {"type": "paragraph", "content": recommendations},
        ],
        metadata={
            "generated_by": "research-agent",
            "model": "gpt-4o",
            "timestamp": datetime.now(timezone.utc).isoformat(),
        }
    )
    return f"Report published: {result.url}"

# Agent
agent = FunctionAgent(
    tools=[search_docs, publish_report],
    llm=OpenAI(model="gpt-4o"),
    system_prompt="""You are a research analyst. When given a research question:
1. Search the documents thoroughly (multiple queries if needed)
2. Synthesize findings into a structured analysis
3. Publish the final report using the publish tool
Always publish your work — don't just print it.""",
)

async def main():
    response = await agent.run(
        user_msg="Analyze the system's authentication architecture. "
                 "What are the security implications and what would you change?"
    )
    print(response)

asyncio.run(main())

The key design choice here: the agent decides when to publish. The publish_report tool is just another tool in its toolkit. You give the agent a research question, it searches the index (potentially multiple times from different angles), synthesizes its findings, and saves the result — all autonomously.

The tool's docstring does real work here. "Call this when your analysis is complete" guides the agent to publish as a final step rather than dumping intermediate results. Tool descriptions are part of your agent architecture; treat them like API documentation.

Multi-step pipeline with structured output

For production use, you usually want more control over the output format. Here's a pipeline that runs multiple queries, assembles a structured report, and publishes with full metadata.

import asyncio
from datetime import datetime, timezone
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from surfacedocs import SurfaceDocs

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=5)
sd = SurfaceDocs()

def build_report_blocks(sections: list[dict]) -> list[dict]:
    """Convert analysis sections into SurfaceDocs blocks."""
    blocks = []
    for section in sections:
        blocks.append({
            "type": "heading",
            "content": section["title"],
            "metadata": {"level": section.get("level", 2)}
        })

        if section.get("quote"):
            blocks.append({
                "type": "quote",
                "content": section["quote"]
            })

        blocks.append({
            "type": "paragraph",
            "content": section["content"]
        })

        if section.get("code"):
            blocks.append({
                "type": "code",
                "content": section["code"],
                "metadata": {"language": section.get("language", "text")}
            })

        if section.get("data"):
            blocks.append({
                "type": "table",
                "content": section["data"]
            })

        blocks.append({"type": "divider", "content": ""})

    return blocks

async def analyze_and_publish(topic: str, queries: list[str]) -> str:
    """Run multiple queries against the index and publish a combined report."""
    sections = []

    for query in queries:
        response = query_engine.query(query)

        source_table = "| Source | Relevance |\n|--------|----------|\n"
        for node in response.source_nodes:
            name = node.metadata.get("file_name", "unknown")
            source_table += f"| {name} | {node.score:.2f} |\n"

        sections.append({
            "title": query,
            "content": str(response),
            "data": source_table,
            "level": 2,
        })

    blocks = [
        {"type": "heading", "content": topic, "metadata": {"level": 1}},
        {"type": "paragraph", "content": (
            f"Automated analysis covering {len(queries)} research questions. "
            f"Generated {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M UTC')}."
        )},
        {"type": "divider", "content": ""},
    ] + build_report_blocks(sections)

    result = sd.save_raw(
        title=f"Analysis: {topic}",
        blocks=blocks,
        metadata={
            "topic": topic,
            "query_count": len(queries),
            "model": "gpt-4o",
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "tags": ["automated-analysis", "rag-pipeline"],
        }
    )
    return result.url

async def main():
    url = await analyze_and_publish(
        topic="System Security Audit",
        queries=[
            "What authentication mechanisms are used and how are credentials stored?",
            "How is authorization handled across services?",
            "What are the known security vulnerabilities or risks?",
            "What logging and monitoring exists for security events?",
        ]
    )
    print(f"Report: {url}")

asyncio.run(main())

A few things worth noting about this approach:

Source tables per section. Each query's response includes a relevance-scored source table. When someone reads the report, they can see exactly which documents backed each section. This is table stakes for auditable RAG output.

Metadata as first-class data. The metadata dict on save_raw() isn't decoration — it's structured data you can use for organization and filtering. Tag your reports by pipeline, model version, topic. When you've generated 500 reports, you'll want to filter.

build_report_blocks is your formatting layer. It maps a simple dict structure to SurfaceDocs blocks. This separation matters because your analysis logic shouldn't know about rendering. If you swap output destinations later, you change one function.

Why this matters

The pattern here is simple: RAG pipeline → structured blocks → hosted document. A few lines of SurfaceDocs code on top of a pipeline you've already built.

SurfaceDocs handles the hosting, rendering, and sharing. You don't build a viewer, deploy a static site, or set up auth. Your pipeline's output just has a URL now.

The engineering value isn't in any single tool — it's in closing the loop. A RAG pipeline that produces a shareable artifact is a product. One that prints to a terminal is a demo.

Links: