DEV Community

Cover image for Processing PDFs and CSVs with AI: A Complete TypeScript Guide
NeuroLink AI
NeuroLink AI

Posted on • Originally published at blog.neurolink.ink

Processing PDFs and CSVs with AI: A Complete TypeScript Guide

Your business runs on documents. PDFs, CSVs, images, text files. Now AI can understand them all through a single TypeScript API.

The challenge? Each format requires different parsing. PDFs need visual analysis. CSVs need tabular understanding. Different APIs. Different libraries. Different headaches.

NeuroLink solves this with a unified multimodal API. One interface handles every format. Auto-detection identifies file types. Smart routing selects the right provider. You write one code path for all documents.

TL;DR

  • Process PDFs natively (no OCR needed)
  • Analyze CSV data with AI insights
  • One API call for any document type
  • Full TypeScript type safety
  • Production-ready pipeline patterns

Read on for the complete guide...

Table of Contents

Why Unified Document Processing Matters

Traditional document AI requires juggling multiple tools:

Document Type Traditional Approach Problems
PDF pdf-parse + OCR + separate API Loses visual context, slow
CSV csv-parser + custom formatting No semantic understanding
Images sharp + vision API Separate pipeline
Text fs.readFile + custom parsing No intelligence

NeuroLink replaces this complexity with one API call:

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

// Process ANY document type
const result = await ai.generate({
  input: {
    text: "Analyze this document and extract key insights",
    files: ["report.pdf", "data.csv"]
  }
});
Enter fullscreen mode Exit fullscreen mode

The SDK handles everything:

  • Format Detection: Magic bytes identify file type accurately
  • Provider Selection: Routes PDFs to vision-capable providers
  • Text Optimization: Formats tabular data for LLM consumption
  • Error Handling: Graceful fallbacks for edge cases

PDF Processing

PDFs are the workhorse of business documents. NeuroLink processes them natively, preserving visual context that OCR-based approaches lose.

Why Native PDF Matters

Traditional PDF processing converts to text, destroying valuable information:

Approach Charts Tables Images Layout
OCR-based Lost Partial Lost Lost
Text extraction Lost Lost Lost Lost
Native (NeuroLink) Preserved Preserved Analyzed Understood

Native processing sends the PDF directly to vision-capable models. The AI sees exactly what humans see.

Basic PDF Analysis

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

const result = await ai.generate({
  input: {
    text: "What is the total revenue in this financial report?",
    pdfFiles: ["quarterly-report.pdf"]
  },
  provider: "vertex",
  maxTokens: 1000
});

console.log(result.content);
// "The Q3 2025 report shows total revenue of $42.3M..."
Enter fullscreen mode Exit fullscreen mode

Structured Data Extraction with Schema

Extract structured JSON from unstructured PDFs using schema enforcement:

const invoice = await ai.generate({
  input: {
    text: "Extract invoice details in JSON format",
    pdfFiles: ["invoice.pdf"]
  },
  provider: "anthropic",
  schema: {
    type: "object",
    properties: {
      vendor: { type: "string" },
      invoiceNumber: { type: "string" },
      date: { type: "string" },
      lineItems: {
        type: "array",
        items: {
          type: "object",
          properties: {
            description: { type: "string" },
            quantity: { type: "number" },
            unitPrice: { type: "number" },
            total: { type: "number" }
          }
        }
      },
      subtotal: { type: "number" },
      tax: { type: "number" },
      total: { type: "number" }
    }
  },
  output: { format: "json" }
});

console.log(JSON.parse(invoice.content));
// { vendor: "Acme Corp", invoiceNumber: "INV-2025-001", ... }
Enter fullscreen mode Exit fullscreen mode

Schema enforcement guarantees the output structure. No more parsing inconsistent responses.

Multi-PDF Comparison

Compare multiple documents in a single request:

const comparison = await ai.generate({
  input: {
    text: "Compare Q1 and Q2 reports. What changed in revenue and expenses?",
    pdfFiles: ["q1-report.pdf", "q2-report.pdf"]
  },
  provider: "vertex",
  maxTokens: 2000
});
Enter fullscreen mode Exit fullscreen mode

CLI PDF Commands

Process PDFs directly from the command line:

# Basic PDF analysis
npx @juspay/neurolink generate "Summarize this contract" \
  --pdf contract.pdf \
  --provider vertex

# Multiple PDFs
npx @juspay/neurolink generate "Compare these invoices" \
  --pdf invoice1.pdf \
  --pdf invoice2.pdf \
  --provider anthropic
Enter fullscreen mode Exit fullscreen mode

CSV Data Analysis

CSV files contain the data that drives decisions. NeuroLink transforms raw data into actionable insights.

Basic CSV Analysis

const insights = await ai.generate({
  input: {
    text: "What are the key trends in this sales data? Identify top performers.",
    csvFiles: ["sales-2024.csv"]
  }
});

console.log(insights.content);
// "Key trends from your sales data:
//  1. Q4 showed strongest growth at 23% MoM
//  2. Top performer: Sarah Chen ($2.3M total)..."
Enter fullscreen mode Exit fullscreen mode

Advanced CSV Options

Control how CSV data is processed:

const analysis = await ai.generate({
  input: {
    text: "Identify the top 10 customers by total revenue",
    csvFiles: ["customers.csv"]
  },
  csvOptions: {
    maxRows: 1000,
    formatStyle: "markdown",
    includeHeaders: true
  }
});
Enter fullscreen mode Exit fullscreen mode

For large files, maxRows prevents token overflow while maintaining representativeness.

CLI CSV Commands

# Analyze CSV data
npx @juspay/neurolink generate "Find trends" --csv sales.csv

# Multiple CSVs
npx @juspay/neurolink generate "Compare datasets" --csv q1.csv --csv q2.csv

# With options
npx @juspay/neurolink generate "Summarize top rows" \
  --csv large-data.csv \
  --csv-max-rows 500
Enter fullscreen mode Exit fullscreen mode

Cross-Format Analysis

One of NeuroLink's most powerful features is cross-referencing data across formats:

const verification = await ai.generate({
  input: {
    text: "Does the transaction data in the CSV match the totals in the PDF report?",
    files: [
      "transactions.csv",
      "monthly-report.pdf"
    ]
  },
  provider: "vertex"
});
Enter fullscreen mode Exit fullscreen mode

NeuroLink's auto-detection handles mixed formats seamlessly. Pass any combination of supported files and the SDK routes each to the appropriate processing pipeline.

Production Patterns

Real-world document processing requires robust patterns for scale and reliability.

Document Processing Pipeline

import { NeuroLink } from "@juspay/neurolink";
import fs from "fs";
import path from "path";

interface ProcessingResult {
  file: string;
  summary: string;
  success: boolean;
  error?: string;
}

class DocumentPipeline {
  private ai: NeuroLink;

  constructor() {
    this.ai = new NeuroLink({
      conversationMemory: { enabled: true }
    });
  }

  async processDirectory(dirPath: string): Promise<ProcessingResult[]> {
    const files = fs.readdirSync(dirPath);
    const results: ProcessingResult[] = [];

    for (const file of files) {
      const filePath = path.join(dirPath, file);
      const ext = path.extname(file).toLowerCase();
      const provider = this.selectProvider(ext);

      try {
        const result = await this.ai.generate({
          input: {
            text: "Extract key information and create a summary",
            files: [filePath]
          },
          provider
        });

        results.push({ file, summary: result.content, success: true });
      } catch (error: any) {
        results.push({ file, summary: "", success: false, error: error.message });
      }
    }

    return results;
  }

  private selectProvider(ext: string): string {
    if (ext === ".pdf") return "vertex";
    return "openai";
  }
}
Enter fullscreen mode Exit fullscreen mode

Streaming for Long Documents

When processing lengthy documents, streaming provides real-time feedback:

async function analyzeWithStreaming(filePath: string) {
  const ai = new NeuroLink();

  const result = await ai.stream({
    input: {
      text: "Provide a comprehensive analysis of this document",
      files: [filePath]
    },
    provider: "vertex",
    maxTokens: 4000
  });

  for await (const chunk of result.stream) {
    if (chunk.type === "text") {
      process.stdout.write(chunk.content);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Error Handling

async function processDocumentSafely(filePath: string, query: string) {
  const ai = new NeuroLink();

  try {
    const result = await ai.generate({
      input: { text: query, files: [filePath] },
      provider: "vertex"
    });
    return { success: true, content: result.content };
  } catch (error: any) {
    switch (error.code) {
      case "FILE_TOO_LARGE":
        console.log("Document exceeds page limit, splitting...");
        return await processLargeDocument(filePath, query);
      case "PROVIDER_NOT_CAPABLE":
        console.log("Falling back to vision-capable provider");
        return await processWithFallback(filePath, query);
      case "RATE_LIMIT_EXCEEDED":
        console.log("Rate limited, retrying with backoff");
        return await retryWithBackoff(() => processDocumentSafely(filePath, query));
      default:
        throw error;
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Provider Comparison

Not all providers handle documents equally:

Provider Native PDF Max Pages CSV Best For
Vertex AI Yes 100 Yes Cost-effective PDF processing
Anthropic Yes 100 Yes Best reasoning quality
Google AI Yes 100 Yes Large file support
OpenAI Yes 100 Yes General purpose
Bedrock Yes 100 Yes AWS integration

Recommendation: Use Vertex AI for cost-effective PDF-heavy workloads. Fall back to Anthropic for complex documents requiring strong reasoning.

Get Started

Install NeuroLink and start processing documents:

npm install @juspay/neurolink
Enter fullscreen mode Exit fullscreen mode

Full documentation: docs.neurolink.ink


Found this helpful? Drop a comment below with your questions or share your document processing experience!

Want to try NeuroLink?

Follow us for more AI development content:

Top comments (0)