NeuroLink AI

Posted on Feb 18 • Originally published at blog.neurolink.ink

Processing PDFs and CSVs with AI: A Complete TypeScript Guide

#ai #typescript #tutorial #pdf

Your business runs on documents. PDFs, CSVs, images, text files. Now AI can understand them all through a single TypeScript API.

The challenge? Each format requires different parsing. PDFs need visual analysis. CSVs need tabular understanding. Different APIs. Different libraries. Different headaches.

NeuroLink solves this with a unified multimodal API. One interface handles every format. Auto-detection identifies file types. Smart routing selects the right provider. You write one code path for all documents.

TL;DR

Process PDFs natively (no OCR needed)
Analyze CSV data with AI insights
One API call for any document type
Full TypeScript type safety
Production-ready pipeline patterns

Read on for the complete guide...

Why Unified Document Processing Matters
PDF Processing
CSV Data Analysis
Cross-Format Analysis
Production Patterns
Provider Comparison

Why Unified Document Processing Matters

Traditional document AI requires juggling multiple tools:

Document Type	Traditional Approach	Problems
PDF	pdf-parse + OCR + separate API	Loses visual context, slow
CSV	csv-parser + custom formatting	No semantic understanding
Images	sharp + vision API	Separate pipeline
Text	fs.readFile + custom parsing	No intelligence

NeuroLink replaces this complexity with one API call:

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

// Process ANY document type
const result = await ai.generate({
  input: {
    text: "Analyze this document and extract key insights",
    files: ["report.pdf", "data.csv"]
  }
});

The SDK handles everything:

Format Detection: Magic bytes identify file type accurately
Provider Selection: Routes PDFs to vision-capable providers
Text Optimization: Formats tabular data for LLM consumption
Error Handling: Graceful fallbacks for edge cases

PDF Processing

PDFs are the workhorse of business documents. NeuroLink processes them natively, preserving visual context that OCR-based approaches lose.

Why Native PDF Matters

Traditional PDF processing converts to text, destroying valuable information:

Approach	Charts	Tables	Images	Layout
OCR-based	Lost	Partial	Lost	Lost
Text extraction	Lost	Lost	Lost	Lost
Native (NeuroLink)	Preserved	Preserved	Analyzed	Understood

Native processing sends the PDF directly to vision-capable models. The AI sees exactly what humans see.

Basic PDF Analysis

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

const result = await ai.generate({
  input: {
    text: "What is the total revenue in this financial report?",
    pdfFiles: ["quarterly-report.pdf"]
  },
  provider: "vertex",
  maxTokens: 1000
});

console.log(result.content);
// "The Q3 2025 report shows total revenue of $42.3M..."

Structured Data Extraction with Schema

Extract structured JSON from unstructured PDFs using schema enforcement:

const invoice = await ai.generate({
  input: {
    text: "Extract invoice details in JSON format",
    pdfFiles: ["invoice.pdf"]
  },
  provider: "anthropic",
  schema: {
    type: "object",
    properties: {
      vendor: { type: "string" },
      invoiceNumber: { type: "string" },
      date: { type: "string" },
      lineItems: {
        type: "array",
        items: {
          type: "object",
          properties: {
            description: { type: "string" },
            quantity: { type: "number" },
            unitPrice: { type: "number" },
            total: { type: "number" }
          }
        }
      },
      subtotal: { type: "number" },
      tax: { type: "number" },
      total: { type: "number" }
    }
  },
  output: { format: "json" }
});

console.log(JSON.parse(invoice.content));
// { vendor: "Acme Corp", invoiceNumber: "INV-2025-001", ... }

Schema enforcement guarantees the output structure. No more parsing inconsistent responses.

Multi-PDF Comparison

Compare multiple documents in a single request:

const comparison = await ai.generate({
  input: {
    text: "Compare Q1 and Q2 reports. What changed in revenue and expenses?",
    pdfFiles: ["q1-report.pdf", "q2-report.pdf"]
  },
  provider: "vertex",
  maxTokens: 2000
});

CLI PDF Commands

Process PDFs directly from the command line:

# Basic PDF analysis
npx @juspay/neurolink generate "Summarize this contract" \
  --pdf contract.pdf \
  --provider vertex

# Multiple PDFs
npx @juspay/neurolink generate "Compare these invoices" \
  --pdf invoice1.pdf \
  --pdf invoice2.pdf \
  --provider anthropic

CSV Data Analysis

CSV files contain the data that drives decisions. NeuroLink transforms raw data into actionable insights.

Basic CSV Analysis

const insights = await ai.generate({
  input: {
    text: "What are the key trends in this sales data? Identify top performers.",
    csvFiles: ["sales-2024.csv"]
  }
});

console.log(insights.content);
// "Key trends from your sales data:
//  1. Q4 showed strongest growth at 23% MoM
//  2. Top performer: Sarah Chen ($2.3M total)..."

Advanced CSV Options

Control how CSV data is processed:

const analysis = await ai.generate({
  input: {
    text: "Identify the top 10 customers by total revenue",
    csvFiles: ["customers.csv"]
  },
  csvOptions: {
    maxRows: 1000,
    formatStyle: "markdown",
    includeHeaders: true
  }
});

For large files, maxRows prevents token overflow while maintaining representativeness.

CLI CSV Commands

# Analyze CSV data
npx @juspay/neurolink generate "Find trends" --csv sales.csv

# Multiple CSVs
npx @juspay/neurolink generate "Compare datasets" --csv q1.csv --csv q2.csv

# With options
npx @juspay/neurolink generate "Summarize top rows" \
  --csv large-data.csv \
  --csv-max-rows 500

Cross-Format Analysis

One of NeuroLink's most powerful features is cross-referencing data across formats:

const verification = await ai.generate({
  input: {
    text: "Does the transaction data in the CSV match the totals in the PDF report?",
    files: [
      "transactions.csv",
      "monthly-report.pdf"
    ]
  },
  provider: "vertex"
});

NeuroLink's auto-detection handles mixed formats seamlessly. Pass any combination of supported files and the SDK routes each to the appropriate processing pipeline.

Production Patterns

Real-world document processing requires robust patterns for scale and reliability.

Document Processing Pipeline

import { NeuroLink } from "@juspay/neurolink";
import fs from "fs";
import path from "path";

interface ProcessingResult {
  file: string;
  summary: string;
  success: boolean;
  error?: string;
}

class DocumentPipeline {
  private ai: NeuroLink;

  constructor() {
    this.ai = new NeuroLink({
      conversationMemory: { enabled: true }
    });
  }

  async processDirectory(dirPath: string): Promise<ProcessingResult[]> {
    const files = fs.readdirSync(dirPath);
    const results: ProcessingResult[] = [];

    for (const file of files) {
      const filePath = path.join(dirPath, file);
      const ext = path.extname(file).toLowerCase();
      const provider = this.selectProvider(ext);

      try {
        const result = await this.ai.generate({
          input: {
            text: "Extract key information and create a summary",
            files: [filePath]
          },
          provider
        });

        results.push({ file, summary: result.content, success: true });
      } catch (error: any) {
        results.push({ file, summary: "", success: false, error: error.message });
      }
    }

    return results;
  }

  private selectProvider(ext: string): string {
    if (ext === ".pdf") return "vertex";
    return "openai";
  }
}

Streaming for Long Documents

When processing lengthy documents, streaming provides real-time feedback:

async function analyzeWithStreaming(filePath: string) {
  const ai = new NeuroLink();

  const result = await ai.stream({
    input: {
      text: "Provide a comprehensive analysis of this document",
      files: [filePath]
    },
    provider: "vertex",
    maxTokens: 4000
  });

  for await (const chunk of result.stream) {
    if (chunk.type === "text") {
      process.stdout.write(chunk.content);
    }
  }
}

Error Handling

async function processDocumentSafely(filePath: string, query: string) {
  const ai = new NeuroLink();

  try {
    const result = await ai.generate({
      input: { text: query, files: [filePath] },
      provider: "vertex"
    });
    return { success: true, content: result.content };
  } catch (error: any) {
    switch (error.code) {
      case "FILE_TOO_LARGE":
        console.log("Document exceeds page limit, splitting...");
        return await processLargeDocument(filePath, query);
      case "PROVIDER_NOT_CAPABLE":
        console.log("Falling back to vision-capable provider");
        return await processWithFallback(filePath, query);
      case "RATE_LIMIT_EXCEEDED":
        console.log("Rate limited, retrying with backoff");
        return await retryWithBackoff(() => processDocumentSafely(filePath, query));
      default:
        throw error;
    }
  }
}

Provider Comparison

Not all providers handle documents equally:

Provider	Native PDF	Max Pages	CSV	Best For
Vertex AI	Yes	100	Yes	Cost-effective PDF processing
Anthropic	Yes	100	Yes	Best reasoning quality
Google AI	Yes	100	Yes	Large file support
OpenAI	Yes	100	Yes	General purpose
Bedrock	Yes	100	Yes	AWS integration

Recommendation: Use Vertex AI for cost-effective PDF-heavy workloads. Fall back to Anthropic for complex documents requiring strong reasoning.

Get Started

Install NeuroLink and start processing documents:

npm install @juspay/neurolink

Full documentation: docs.neurolink.ink

Found this helpful? Drop a comment below with your questions or share your document processing experience!

DEV Community