I Built the PDF Report Generator for a Species Detection Agent — Gemini Broke It Twice

#agents #ai #api #gemini

The ecological risk report is the last mile of TerraShield's pipeline. Everything before it — the image upload, the EXIF extraction, the satellite NDVI lookup, the agent memory recall — exists so that a field ecologist or a government inspector can open a PDF and make a decision. If that PDF is broken, none of the upstream work matters.

I'm Varma, and I owned the PDF generation pipeline and the Gemini API integration for TerraShield, an invasive species early warning system. This is the story of what it took to get structured AI output reliably into a professionally formatted document — and what broke twice along the way.

How the Report Pipeline Works

When a user requests a full ecological risk report, the flow is: fetch the stored analysis record from Supabase, pull the satellite NDVI data for that GPS coordinate, call Gemini with a structured prompt, parse the six-section report text it returns, and render all of it into a PDF using pdf-lib.

The Gemini call lives in lib/gemini.ts. The prompt is explicit about structure:

function buildPrompt(data: EcologicalReportInput): string {
    return `Generate a structured report with EXACTLY these six numbered section headings:

1. Species Overview
2. Ecological Threat Assessment
3. Satellite Vegetation Analysis Interpretation
4. Outbreak Likelihood Evaluation
5. Recommended Monitoring or Containment Actions
6. Final Risk Conclusion

Rules:
- Do not use markdown formatting (no **, no ##). Use plain text only.
- Use the exact numbered heading format shown above so sections can be parsed programmatically.`;
}

The "plain text only" and "exact numbered heading format" instructions are load-bearing. If Gemini decides to respond with **Species Overview** instead of 1. Species Overview, the section parser breaks silently and the PDF renders empty sections. This happened on the first integration test. The model followed the format roughly 80% of the time without explicit instructions — not good enough for a document meant for official use.

After adding the explicit formatting rules, reliability jumped to consistent structured output. The lesson: when you need machine-parseable output from a language model, the format specification needs to be part of the prompt contract, not an assumption.

Parsing Six Sections From Free Text

The section parser uses regex index positions rather than splitting on headings. This turned out to be more robust — if the model adds a blank line or extra whitespace before a heading, index-based extraction still works:

function parseSections(text: string): ReportSections {
    const headingPatterns = [
        /1\.\s*Species Overview/i,
        /2\.\s*Ecological Threat Assessment/i,
        /3\.\s*Satellite Vegetation Analysis Interpretation/i,
        /4\.\s*Outbreak Likelihood Evaluation/i,
        /5\.\s*Recommended Monitoring or Containment Actions/i,
        /6\.\s*Final Risk Conclusion/i,
    ];

    const indices = headingPatterns.map(re => {
        const match = re.exec(text);
        return match ? match.index : -1;
    });

    // Extract from after each heading to the start of the next
    function extractSection(start: number, end: number): string {
        if (start === -1) return "";
        const newlinePos = text.indexOf("\n", start);
        const contentStart = newlinePos === -1 ? start : newlinePos + 1;
        return text.slice(contentStart, end === -1 ? undefined : end).trim();
    }
    // ...
}

This approach also handles CRLF line endings correctly — the indexOf("\n") skips past the heading line regardless of whether the model used Unix or Windows line endings in its output.

The PDF Layout

The PDF is built with pdf-lib entirely server-side. No external rendering service, no headless browser. The generator in lib/pdf.ts manages a cursor (y position) that decrements as content is drawn, with an ensureSpace() guard that triggers a new page when less than 60 points remain:

function ensureSpace(space: number) {
    if (y - space < 60) newPage();
}

The risk classification badge at the bottom of every report is color-coded based on the final section text — if the model's conclusion includes "red", "high", or "critical", the badge renders in red (rgb(0.87, 0.20, 0.20)); "amber" or "moderate" gets amber; anything else gets green. This is intentionally simple — we're reading the model's own language rather than maintaining a separate risk threshold table.

The satellite metrics table renders NDVI values with color coding: anomaly scores above 0.66 render in red, 0.33–0.66 in amber. It gives the inspector immediate visual context before they read a single word of the narrative.

What Broke (Twice)

First break: PDF download returning corrupted bytes. The initial implementation streamed the Uint8Array from pdf-lib directly into the Next.js response. The client received the bytes but the PDF wouldn't open — Adobe Reader reported it as corrupted. The issue was response headers. The Content-Type was set to application/octet-stream instead of application/pdf, and the Content-Disposition header was missing the attachment directive. Browsers were interpreting the binary stream incorrectly.

Fix:

return new NextResponse(pdfBytes, {
    headers: {
        'Content-Type': 'application/pdf',
        'Content-Disposition': `attachment; filename="terrashield-report-${recordId}.pdf"`,
        'Content-Length': pdfBytes.length.toString(),
    }
});

Second break: Gemini API 429 errors crashing the report route. Under load, the Gemini API returned RESOURCE_EXHAUSTED errors. The original implementation had no retry logic — it threw immediately and the user got a 500 error with no explanation. I added a withRetry wrapper with exponential backoff that parses the retryDelay from the Gemini error message when available:

async function withRetry<T>(fn: () => Promise<T>, maxAttempts = 3): Promise<T> {
    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
        try {
            return await fn();
        } catch (err) {
            const is429 = /* ... check error message */;
            if (!is429 || attempt === maxAttempts) throw err;
            const delay = parseRetryDelay(err, 60 * attempt);
            await new Promise(r => setTimeout(r, delay * 1000));
        }
    }
}

The parseRetryDelay function reads the retryDelay: Xs pattern from the Gemini error message and caps at 120 seconds. If the pattern isn't present, it falls back to 60 * attempt seconds. Three attempts covers the vast majority of transient quota errors without blocking the request thread for too long.

The Report Page

The frontend report page (frontend/pages) handles two flows: generating a new full Gemini report with all six sections, and a "quick report" that uses the stored classification data without calling Gemini again. The quick path is for users who need the PDF immediately after upload without waiting for a full report generation.

Both paths converge on the same PDF generation function — the difference is in what EcologicalReportInput fields are populated versus estimated from stored values. The quick report uses stored ai_risk_score to infer the risk level string rather than the full satellite context.

Takeaways

Prompt formatting contracts are non-negotiable for parseable output. If you need structured text from a language model, specify the format exactly — section numbers, no markdown, plain text. Don't rely on the model defaulting to your preferred structure.

Parse by index, not by split. When extracting sections from model output, find heading positions and slice between them. It's more tolerant of the model's whitespace habits than split() on heading text.

PDF download bugs are header bugs. If a PDF arrives corrupted at the client, check Content-Type and Content-Disposition before touching the generation code. The bytes are almost certainly fine.

Retry logic for LLM API calls is not optional. Any production path that calls Gemini or any other language model API needs a retry wrapper with rate limit handling from day one. 429 errors are not exceptional — they're expected under normal load.

The full stack is at github.com/Nitish-k-s/TerraShield. The agent memory context that flows into each report — recalled from nearby past sightings using Hindsight — is what makes the ecological analysis more than boilerplate text for each GPS coordinate. You can read more about that pattern at hindsight.vectorize.io.