Yaohua Chen for ImagineX

Posted on Mar 16

From Prompts to Real Files: A Developer's Guide to AI File Generation

#programming #ai #llm

Ask ChatGPT to "create a sales report PDF with a revenue chart." A year ago, it would paste some markdown and wish you luck. Today, it spins up a sandboxed Python environment, runs reportlab and matplotlib, and hands you a real, downloadable PDF file.

This is the shift from text generation to artifact generation -- and every major LLM vendor now supports it through their API. Claude, OpenAI, and Gemini each give developers a way to prompt an LLM and get back actual files: PDFs, spreadsheets, charts, slide decks, whatever you can create with Python.

This post walks through the universal pattern behind file generation, then shows you exactly how to do it with each vendor -- working code included.

The Universal Pattern

Despite different APIs, all three vendors follow the same three-step architecture:

Every vendor-specific implementation is a variation on this flow. The details change, but three concepts repeat everywhere:

Tool declaration -- you opt in to code execution by including a specific tool in your API request. It's never on by default.
Sandboxed execution -- the LLM's code runs in an isolated container with no internet access. Common libraries (pandas, matplotlib, reportlab) come pre-installed.
File retrieval -- each vendor has a different mechanism to get the bytes out. Some give you a file ID to download; others return bytes inline.

Once you internalize this pattern, learning any vendor's API is just a matter of mapping it to these three steps.

Claude: Code Execution + Files API

Claude's file generation is the most full-featured option for document creation. It provides a persistent container with full bash access, a rich set of pre-installed document libraries, and a clean Files API for uploads and downloads.

Generating a PDF from a Prompt

Enable the code_execution_20250825 tool, send your prompt, then extract file IDs from the response and download them through the Files API.

import anthropic

client = anthropic.Anthropic()

# Step 1: Request with code execution enabled
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
    messages=[{
        "role": "user",
        "content": "Create a one-page PDF sales report with a revenue chart for Q1 2026."
    }]
)

# Step 2: Extract file IDs from the response
file_ids = []
for block in response.content:
    if block.type == "bash_code_execution_tool_result":
        result = block.content
        if result.type == "bash_code_execution_result":
            for item in result.content:
                if hasattr(item, "file_id"):
                    file_ids.append(item.file_id)

# Step 3: Download each generated file
for file_id in file_ids:
    content = client.beta.files.download(file_id)
    metadata = client.beta.files.retrieve_metadata(file_id)
    content.write_to_file(metadata.filename)
    print(f"Saved: {metadata.filename}")

The response content blocks have a nested structure: you're looking for bash_code_execution_tool_result blocks, which contain bash_code_execution_result objects, which contain items with file_id attributes. The files.download() call gives you the raw bytes; retrieve_metadata() gives you the original filename.

Why bash_code_execution? When you include the code_execution_20250825 tool, Claude actually gets two sub-tools: bash_code_execution (run shell commands) and text_editor_code_execution (create and edit files). To generate a file, Claude typically writes a Python script with the text editor sub-tool, then runs it via bash. The result block is named after whichever sub-tool produced the output -- and since it's the bash execution that creates the final file, that's the block type you parse. This is also why Claude has full bash access unlike the other vendors: it's not running Python in a restricted interpreter, it's executing real shell commands. The _20250825 tool version introduced this bash/text-editor split, replacing the earlier _20250522 version that was Python-only.

Uploading a CSV, Getting Back a Chart + PDF

To process your own data, upload via the Files API first, then attach the file to your prompt alongside the code execution tool.

import anthropic

client = anthropic.Anthropic()

# Upload your input file
uploaded = client.beta.files.upload(file=open("sales_data.csv", "rb"))

# Send the file + prompt with code execution
response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    betas=["files-api-2025-04-14"],
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Analyze this sales CSV. Create a bar chart of revenue by region "
                        "and save it as 'revenue_chart.png'. Also generate a one-page PDF "
                        "summary report of the key findings."
            },
            {"type": "container_upload", "file_id": uploaded.id},
        ],
    }],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
)

# Download all generated files
for block in response.content:
    if block.type == "bash_code_execution_tool_result":
        result = block.content
        if result.type == "bash_code_execution_result":
            for item in result.content:
                if hasattr(item, "file_id"):
                    content = client.beta.files.download(item.file_id)
                    metadata = client.beta.files.retrieve_metadata(item.file_id)
                    content.write_to_file(metadata.filename)
                    print(f"Downloaded: {metadata.filename}")

A single prompt can produce multiple files. In this case, you'll get both the PNG chart and the PDF report. Always iterate the full response -- never assume a single file.

Container Reuse: The Key to Iteration Workflows

Claude containers persist for 30 days. When your first request creates a container, the response includes a container.id. Pass it to subsequent calls and Claude picks up right where it left off -- all files from the previous request are still on disk.
# First call creates the container
response1 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Generate a sales report PDF."}],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
)
container_id = response1.container.id

# Subsequent calls reuse the same container
response2 = client.messages.create(
    container=container_id,
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Update the chart on page 2 to use a pie chart instead."}],
    tools=[{"type": "code_execution_20250825", "name": "code_execution"}],
)
This enables "conversational file editing" -- users can iterate on documents without re-uploading data or starting from scratch.

Pre-installed Libraries

Claude's sandbox comes with the document generation essentials: reportlab (PDFs), python-docx (Word), python-pptx (PowerPoint), openpyxl (Excel), pandas, matplotlib, pillow, pypdf, pdfplumber, seaborn, scipy, and scikit-learn. Since Claude has full bash access, you can also pip install anything else you need during the session.

OpenAI: Responses API + Code Interpreter

OpenAI's Responses API (the successor to the deprecated Assistants API) uses the Code Interpreter tool for file generation. The pattern is similar to Claude, but the response structure and file retrieval mechanism differ.

Generating a CSV with Code Interpreter

Enable the code_interpreter tool, then parse container_file_citation annotations from the response to find generated files.

from openai import OpenAI

client = OpenAI()

# Step 1: Request with code interpreter enabled
response = client.responses.create(
    model="gpt-5.2",
    tools=[{
        "type": "code_interpreter",
        "container": {"type": "auto"}
    }],
    input="Generate a CSV file named 'q1_report.csv' with 10 rows of financial data."
)

# Step 2: Extract file references from annotations
# The response structure nests deep: output → message → content → output_text → annotations
for item in response.output:
    if item.type == "message":
        for content_block in item.content:
            if content_block.type == "output_text":
                for annotation in content_block.annotations:
                    if annotation.type == "container_file_citation":
                        # Step 3: Download from the container endpoint
                        file_data = client.containers.files.content.retrieve(
                            file_id=annotation.file_id,
                            container_id=annotation.container_id
                        )
                        with open(annotation.filename, "wb") as f:
                            f.write(file_data.read())
                        print(f"Downloaded: {annotation.filename}")

The annotation traversal is the trickiest part. Don't try to shortcut it with response.output_text -- that gives you a plain string with citation markers, not the actual file references.

Uploading a File, Transforming It

Upload via the standard Files API, then pass the file ID in the container config.

from openai import OpenAI

client = OpenAI()

# Upload the file
uploaded = client.files.create(
    file=open("sales_data.csv", "rb"),
    purpose="user_data"
)

# Pass it to code interpreter via container config
response = client.responses.create(
    model="gpt-5.2",
    tools=[{
        "type": "code_interpreter",
        "container": {
            "type": "auto",
            "file_ids": [uploaded.id]
        }
    }],
    input="Analyze this sales CSV. Create a bar chart of revenue by region and save it as a PNG."
)

# Download generated files from annotations
for item in response.output:
    if item.type == "message":
        for content_block in item.content:
            if content_block.type == "output_text":
                for annotation in content_block.annotations:
                    if annotation.type == "container_file_citation":
                        file_data = client.containers.files.content.retrieve(
                            file_id=annotation.file_id,
                            container_id=annotation.container_id
                        )
                        with open(annotation.filename, "wb") as f:
                            f.write(file_data.read())
                        print(f"Downloaded: {annotation.filename}")

You can also request higher memory tiers -- 1g (default), 4g, 16g, or 64g -- by setting "memory_limit" in the container config. Useful when processing large datasets.

OpenAI Gotchas

The cfile_ 404 trap. Generated files have IDs prefixed with cfile_. If you try to download them using the standard client.files.content() endpoint, you'll get a 404. You must use client.containers.files.content.retrieve() instead. This has tripped up every developer at least once.

20-minute container expiry. OpenAI containers are ephemeral -- they expire after 20 minutes of inactivity. Download your files immediately after generation. There is no 30-day persistence like Claude.

Missing annotations fallback. There's a known edge case where container_file_citation annotations don't appear in the response. When this happens, check response.output for items of type code_interpreter_call and inspect their outputs for file references:
if not file_refs:
    for item in response.output:
        if item.type == "code_interpreter_call":
            for output_item in getattr(item, "outputs", []):
                if hasattr(output_item, "file_id"):
                    # Download using output_item.file_id and output_item.container_id
                    pass

Gemini: Inline Results + Structured Output

Gemini takes a fundamentally different approach. It doesn't return downloadable file artifacts with file IDs. Instead, code execution results come back inline -- matplotlib charts as raw image bytes, everything else as text or JSON.

This isn't a technical limitation -- Google has the infrastructure to build containers and file artifact systems. The gap is strategic. Google's file generation story lives in Google Workspace, not in the developer API:

Gemini in Docs generates full first drafts from prompts, matching writing styles and pulling data from Gmail, Drive, and the web.
Gemini in Sheets builds entire spreadsheets from natural language and auto-populates cells with live data.
Gemini in Slides generates themed slides, with full presentation generation from a single prompt on the roadmap.

This makes business sense for Google. Anthropic and OpenAI are API-first companies -- their revenue comes from developers using their APIs, so building sandboxes and file download endpoints directly serves their customers. Google's revenue comes from Workspace subscriptions. When Gemini generates a spreadsheet in Workspace, it creates a Google Sheet (not an .xlsx), keeping users in the Google ecosystem. An API that produces vendor-neutral files would undermine that.

The practical implication: Gemini's API-level file generation gap is unlikely to close anytime soon. The structured output and inline image patterns below are the right long-term approaches, not temporary workarounds.

For developers, this means Gemini is best suited for quick charts and data transforms, while complex document creation belongs with Claude or OpenAI.

Generating a Chart (Inline Image)

Enable the code_execution tool, then extract image bytes directly from the response parts.

from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_API_KEY")

response = client.models.generate_content(
    model="gemini-2.5-flash",
    config=types.GenerateContentConfig(
        tools=[types.Tool(code_execution=types.ToolCodeExecution)]
    ),
    contents="Generate a bar chart of quarterly revenue: Q1=$2.1M, Q2=$2.8M, Q3=$3.2M, Q4=$3.9M."
)

# Gemini returns results inline -- no separate download step
for part in response.candidates[0].content.parts:
    if part.executable_code:
        print("Code ran:", part.executable_code.code[:80], "...")
    if part.code_execution_result:
        print("Output:", part.code_execution_result.output)
    if part.as_image() is not None:
        with open("revenue_chart.png", "wb") as f:
            f.write(part.as_image().image_bytes)
        print("Chart saved as revenue_chart.png")

No file IDs, no download endpoints. The image bytes are right there in the response. For text/data output, it shows up in code_execution_result.output.

Structured Output for CSV Generation

Gemini's strongest file generation pattern is actually indirect: get structured JSON data back, then format it locally with whatever library you prefer.

import json
import pandas as pd
from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_API_KEY")

# Ask for structured JSON output
response = client.models.generate_content(
    model="gemini-2.5-flash",
    config=types.GenerateContentConfig(response_mime_type="application/json"),
    contents="Return a JSON array of 10 tech companies with fields: name, ticker, market_cap, sector."
)

# Convert to CSV locally -- you control the formatting
data = json.loads(response.text)
df = pd.DataFrame(data)
df.to_csv("tech_companies.csv", index=False)
print(f"Saved {len(df)} rows to tech_companies.csv")

This "structured output" approach gives you 100% control over formatting and is the most reliable way to produce files from Gemini. Let the model do what it's good at (data generation), and handle the file formatting yourself.

30-Second Execution Timeout

Gemini's code execution sandbox has a hard 30-second timeout. This makes it ideal for quick chart generation and data transforms, but rules it out for heavy document creation tasks like multi-page PDF reports or complex PowerPoint decks. For those, use Claude or OpenAI.

Which API for What?

Feature	Claude	OpenAI	Gemini
Sandbox Type	Reusable container (30-day expiry)	Ephemeral container (20-min idle timeout)	Stateless sandbox (30s timeout)
Resources	5 GiB disk, 5 GiB RAM, 1 CPU	Up to 64 GB RAM (tiered)	Token-limited (inline output)
Shell Access	Full bash	Python only	Python only
File Download	Files API (`files.download()`)	Container endpoint (`containers.files.content.retrieve()`)	Inline in response (no download step)
Best Use Case	Complex documents (PDF, DOCX, PPTX)	Heavy data processing + file gen	Quick charts and data transforms
`pip install`	Yes (bash access)	No (isolated sandbox)	No (isolated sandbox)

The short version:

Complex documents (PDF reports, slide decks, Word docs with formatting): Claude. The pre-installed document libraries and 30-day container persistence make it the best fit.
Large dataset processing (crunching big CSVs, Excel transformations): OpenAI. The ability to request up to 64 GB of RAM is unmatched.
Quick visualizations (charts, graphs, simple data summaries): Gemini. Inline image return means fewer API calls and faster turnaround.
Maximum formatting control: Any model's Structured Output mode. Get JSON data back, render locally with your own libraries.

The Self-Hosted Alternative: Run Your Own Sandbox

The three vendor APIs above all run code in their infrastructure. You send a prompt, they spin up a container, and they hand you back the file. This is convenient, but it means your data leaves your network, you're bound by each vendor's sandbox limits (30-second timeouts, no internet, fixed library sets), and you pay per-execution fees.

There's a fourth option: run the sandbox yourself. In this pattern, you call any LLM API to generate code (without enabling the vendor's code execution tool), then execute that code locally in an isolated environment on your own machines. You get the same prompt-to-file workflow, but you control the execution environment.

Why Self-Host?

Data residency. In regulated industries (healthcare, finance, government), sending code and data to a third-party sandbox may violate compliance requirements. A local sandbox keeps everything on your infrastructure.
No vendor sandbox limits. You choose the timeout, the RAM, the disk, the installed libraries. Need 10 minutes of execution time? A GPU? Network access to internal services? Your sandbox, your rules.
Cost at scale. Vendor sandbox pricing is per-session or per-hour. At high volume, running your own execution infrastructure can be significantly cheaper.
Model flexibility. Since you're decoupling "generate the code" from "run the code," you can use any LLM -- including open-source models, fine-tuned models, or your own -- to produce the Python script. The sandbox doesn't care where the code came from.

Tools for Building It

Two open-source projects have emerged as the leading options for sandboxed code execution:

E2B uses Firecracker microVMs (the same technology behind AWS Lambda) to isolate each execution in its own lightweight VM with a dedicated kernel -- stronger isolation than Docker containers. E2B offers a managed cloud service, but you can also self-host on your own GCP or Linux infrastructure using their Terraform-based deployment. The Python and JavaScript SDKs make it straightforward to spin up a sandbox, run code, and retrieve files programmatically.

exec-sandbox takes the fully-local approach. It runs untrusted code in ephemeral QEMU microVMs with hardware acceleration (KVM on Linux, HVF on macOS). No cloud dependency -- code never leaves your machine. Warm-pool latency is 1-2ms, and it supports Python, JavaScript, and shell execution. It's designed for air-gapped environments where sending code to any external service is a non-starter.

The Architecture Shift

The key difference is that self-hosting decouples code generation from code execution. With vendor APIs, the LLM both writes and runs the code in a single API call. With a self-hosted sandbox, you split these into two steps:

Call the LLM API for text/code generation (no code execution tool needed).
Extract the generated Python script from the response.
Execute it in your local sandbox (E2B, exec-sandbox, or even a locked-down Docker container).
Retrieve the output files from the sandbox filesystem.

Here's a concrete example using E2B as the sandbox and Anthropic as the LLM. Notice there's no code execution tool in the API call -- we just ask Claude to write a script, then run it ourselves:

import re
from anthropic import Anthropic
from e2b_code_interpreter import Sandbox

# Step 1: Ask the LLM to generate a Python script (no code execution tool)
client = Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": "Write a Python script that uses matplotlib to create a bar chart "
                   "of quarterly revenue (Q1=$2.1M, Q2=$2.8M, Q3=$3.2M, Q4=$3.9M) "
                   "and saves it as 'revenue_chart.png'. Return only the script, "
                   "no explanation."
    }]
)

# Step 2: Extract the Python code from the response
code = response.content[0].text
match = re.search(r"```

python\n(.*?)

```", code, re.DOTALL)
if match:
    code = match.group(1)

# Step 3: Execute it in an E2B sandbox
with Sandbox.create() as sbx:
    execution = sbx.run_code(code)

    if execution.error:
        print(f"Error: {execution.error.value}")
    else:
        # Step 4: Download the generated file from the sandbox
        file_content = sbx.files.read("/home/user/revenue_chart.png", format="bytes")
        with open("revenue_chart.png", "wb") as f:
            f.write(file_content)
        print("Saved: revenue_chart.png")

You can swap Anthropic for OpenAI, genai.Client, or any other LLM client -- the sandbox doesn't care where the code came from. You can also upload input files to the sandbox before execution using sbx.files.write(), mirroring the upload-then-process pattern from the vendor APIs.

E2B's default code-interpreter template comes with matplotlib, pandas, numpy, scikit-learn, pillow, openpyxl, python-docx, seaborn, and dozens of other common libraries pre-installed -- similar to the vendor sandboxes. If you need additional packages, you can either install them at runtime with sbx.commands.run("pip install <package>"), or build a custom template with your dependencies baked in so every sandbox starts ready to go.

This is more work to build, but it gives you full control over execution, security, and cost. It also means you can use Gemini or any other model that doesn't offer file artifacts -- you just need the model to write good Python, and your sandbox handles the rest.

Production Tips

If you're building file generation into a real product, a few hard-won lessons:

1. Sanitize filenames. The LLM chooses the filename based on the prompt. A creative user (or an adversarial one) can craft prompts that produce filenames with path traversal characters. Always strip or validate filenames before writing to disk. os.path.basename() is your friend.

2. Handle multi-file responses. A single prompt like "make a PDF report and an Excel spreadsheet of the raw data" can produce two or more files. Always iterate the full response -- never assume exactly one file comes back.

3. Persist container IDs for edit workflows. Claude's 30-day containers enable a powerful pattern: users can say "update the chart on page 2" in a follow-up message, and the LLM picks up the original file from the persistent container. Store the container_id alongside the conversation thread in your database.

4. Set timeouts generously. Code execution is significantly slower than text generation. Simple files might take 30-60 seconds; complex multi-file generation (especially PPTX with embedded charts) can take 5-15 minutes. Don't use your standard API timeout.

5. All sandboxes are offline. None of the three vendors allow network access from within the sandbox. All data must be uploaded or included in the prompt. You can't pip install on OpenAI or Gemini (Claude is the exception -- it has bash access). You can't fetch URLs. Plan accordingly.

Conclusion

File generation via LLM APIs follows a universal pattern across all three major vendors:

Claude excels at complex document creation with its 30-day persistent containers, full bash access, and pre-installed document libraries.
OpenAI offers the most compute headroom with up to 64 GB of RAM, making it ideal for heavy data processing tasks.
Gemini is the fastest path to charts and visualizations, returning inline image bytes with no separate download step.

Try it yourself: Build a CLI tool that takes a prompt and a desired output format, routes to the best vendor based on file type (PDFs to Claude, big data to OpenAI, charts to Gemini), and saves the result locally. You'll touch all three APIs and internalize the patterns in a single afternoon.

Official Documentation

Top comments (1)

Hugo • May 23

File I/O capabilities for LLMs are a massive leap toward practical automation. The security implications are interesting though - when your LLM can read/write files, you need strong sandboxing and permission controls. We've seen developers use our API platform to build document generation pipelines where the LLM creates everything from code files to formatted reports. The key insight: always validate the LLM's output before writing to disk, especially for file types that could contain executable content. Have you explored having the LLM generate and execute multi-step file workflows (e.g., read CSV -> analyze -> write report)?