Prosper Otemuyiwa for Valyu AI

Posted on Mar 2

Deep Research API for AI Agents: The Complete Guide (2026)

#deepresearch #ai #python #typescript

I spent three days testing every deep research API I could find. OpenAI's. Perplexity's. Exa's. Parallel's. Gemini's.

They all have the same blind spot: they only search the web.

If your AI agent needs to cross-reference a drug trial with a bioRxiv preprint, or analyze a company's 10-K risk factors alongside FRED economic data, or map a patent landscape against recent academic research, web-only search might not get you there. The data you need most likely isn't indexable by Google or crawlable by search bots.

This guide covers what a deep research API actually is, how the major options compare, and how to build AI agents that can reach proprietary data sources: SEC filings, PubMed, clinical trials, patents in a single API call.

Note: All examples are shown in both Python and TypeScript. Non-developers stay tuned, there's also something for you! 😉

What Is a Deep Research API?

A deep research API is a programmatic interface that performs multi-step, autonomous research on behalf of an AI agent. Unlike a standard search API that returns a list of results, a deep research API:

Plans a research strategy from a query
Executes multiple searches across sources
Reads, synthesizes, and cross-references retrieved content
Returns a structured report with citations

The term entered mainstream usage after OpenAI launched their "deep research" feature in February 2025. Since then, every major AI company has shipped a version. For developers building AI agents, the question isn't whether to use one, it's which one reaches the data you actually need.

Key factors when evaluating a deep research API:

Data source coverage (web-only vs. proprietary databases)
Output formats (markdown, PDF, structured JSON)
The ability to handle deliverables
Async handling (how long tasks run, webhook support)
Pricing model (per task vs. per retrieval)
Benchmark accuracy on domain-specific queries

The Deep Research API Landscape in 2026

Here's what's actually ranking when you search "deep research API" and what each tool can and can't reach:

API	Data Sources	Output Formats	Best For	Pricing
OpenAI Deep Research	Web (Bing)	Text	General research, broad questions	o3: ~$10-30/call ($10/M in, $40/M out); o4-mini: ~$1-3/call ($2/M in, $8/M out)
Perplexity Deep Research	Web	Text	Quick cited answers	Free tier + Pro $20/mo
Parallel.ai	Web	Markdown, JSON	Developer agentic workflows	Per task, usage-based
Gemini Deep Research	Web + Google Search	Text, structured	Google ecosystem integration	Gemini Advanced $20/mo
Valyu DeepResearch	Web + 36+ proprietary (SEC, PubMed, patents, clinical trials, financial data)	Markdown, PDF, structured JSON	AI agents needing authoritative/paywalled data	$0.10-$15.00 per task

On pricing transparency

OpenAI's deep research costs are token-based and can spike quickly. One independent analysis ran 10 test queries and spent $100 on o3-deep-research, $9.18 on o4-mini-deep-research. Both are usage-variable, a research task that cites 50 sources will cost substantially more than one that cites 5.

Valyu's pricing is flat per task regardless of how many internal searches and retrievals the agent performs. For production workloads where cost predictability matters, that's a meaningful difference.

The critical column: Data Sources

Four of the five options above are web-only. That means:

No SEC 10-K/10-Q filings (EDGAR isn't fully indexable by web crawlers)
No paywalled academic papers (Elsevier, Springer, Wiley)
No real-time financial data (stock prices, earnings, balance sheets)
No clinical trial data (ClinicalTrials.gov full text)
No patent claims (USPTO full text)

If your use case lives entirely in open web content, the OpenAI or Perplexity options are fine. If it doesn't, you need an API with proprietary source access.

When Web Search Is Not Enough

Three use cases where web-only deep research fails:

1. Financial research agents

You ask: "What are the key risk factors disclosed by Nvidia in their latest 10-K, and how do they compare to AMD's?"

A web-only API returns news articles about these filings, not the filings themselves. The actual MD&A sections, risk disclosures, and financial statements are in EDGAR. Some might return details about the filings, but simply surface details.

2. Biomedical research agents

You ask: "What bioactive compounds in ChEMBL target the KRAS G12C mutation, and how do they relate to current clinical trials?"

ChEMBL has 2.5 million bioactive molecule records. ClinicalTrials.gov is partially indexed but the structured data (phase, endpoints, eligibility criteria) isn't extractable via web search.

3. Patent landscape analysis

You ask: "Which companies hold patents in transformer-based neural architecture search filed after 2022?"

USPTO full-text patent search isn't something a lot of Search APIs return well. Some return some data but let you know that it might not be complete or recent.

Deep Research API for Researchers & Non-Developers

Valyu has a "Deep Research" UI mode simply for non-developers, researchers and folks from all walks of life to have access to the full power of the Deep Research API simply by prompting what you need.

DeepResearch dashboard for the non-coders

You can see the Deliverables feature there as well. Deliverables allow you to extract structured data or create formatted documents (CSV, Excel, PowerPoint, Word, PDF) from the research alongside the report.

Building with Valyu's Deep Research API (for Developers)

Valyu's DeepResearch is an async API. You submit a task, it runs in the background, you poll for completion or use webhooks. This is the right architecture for research that takes 30 seconds to 15 minutes depending on complexity.

Installation

Python

pip install valyu
export VALYU_API_KEY=your_key_here

TypeScript

pnpm add @valyu/valyu-js
export VALYU_API_KEY=your_key_here

Quick Start

Python

from valyu import Valyu

valyu = Valyu()

# Create a research task
task = valyu.deepresearch.create(
    query="What are the key risk factors disclosed by Nvidia in their 2024 10-K?",
    mode="standard",
    search={
        "search_type": "proprietary",
        "included_sources": ["finance"]  # SEC filings, earnings, market data
    }
)

print(f"Task ID: {task.deepresearch_id}")
print(f"Status: {task.status}")  # 'running' or 'queued'

TypeScript

import { Valyu } from "@valyu/valyu-js";

const valyu = new Valyu(process.env.VALYU_API_KEY!);

// Create a research task
const task = await valyu.deepresearch.create({
  query: "What are the key risk factors disclosed by Nvidia in their 2024 10-K?",
  mode: "standard",
  search: {
    search_type: "proprietary",
    included_sources: ["finance"]  // SEC filings, earnings, market data
  }
});

console.log(`Task ID: ${task.deepresearch_id}`);
console.log(`Status: ${task.status}`);  // 'running' or 'queued'

Waiting for Results

Python

# Wait for completion with progress tracking
def on_progress(status):
    if status.progress:
        print(f"Step {status.progress.current_step}/{status.progress.total_steps}")

result = valyu.deepresearch.wait(
    task.deepresearch_id,
    poll_interval=5,
    max_wait_time=1800,
    on_progress=on_progress
)

if result.status == "completed":
    print(result.output)      # Full markdown report
    print(f"Cost: ${result.cost}")

    for source in result.sources:
        print(f"- {source.title}: {source.url}")

TypeScript

const result = await valyu.deepresearch.wait(task.deepresearch_id, {
  pollInterval: 5000,
  maxWaitTime: 1800000,
  onProgress: (status) => {
    if (status.progress) {
      console.log(`Step ${status.progress.currentStep}/${status.progress.totalSteps}`);
    }
  }
});

if (result.status === "completed") {
  console.log(result.output);      // Full markdown report
  console.log(`Cost: $${result.cost}`);

  result.sources.forEach(source => {
    console.log(`- ${source.title}: ${source.url}`);
  });
}

Research Modes

Valyu has four modes, optimized for different depth/cost tradeoffs:

Mode	Price	Max Steps	Best For
`fast`	$0.10	10	Quick lookups, batch processing
`standard`	$0.50	15	Balanced research
`heavy`	$2.50	15	Complex topics, fact verification
`max`	$15.00	25	Exhaustive multi-source analysis

For most agentic workflows, standard mode hits the right tradeoff. Use fast for high-volume batch tasks. Use heavy or max when you need the agent to cross-verify claims across sources.

Proprietary Source Selection

The search parameter controls which data sources the agent searches.

Python

# Academic + biomedical research
task = valyu.deepresearch.create(
    query="Recent advances in CRISPR base editing for sickle cell disease",
    mode="heavy",
    search={
        "search_type": "proprietary",
        "included_sources": ["academic"]  # PubMed, arXiv, bioRxiv, medRxiv, clinical trials
    }
)

# Financial + economic analysis
task = valyu.deepresearch.create(
    query="How do current FRED interest rate indicators compare to 2008 pre-crisis levels?",
    mode="standard",
    search={
        "search_type": "proprietary",
        "included_sources": ["finance"]  # SEC filings, FRED, BLS, stocks, earnings
    }
)

# Patent landscape
task = valyu.deepresearch.create(
    query="Which companies have filed transformer architecture patents since 2022?",
    mode="heavy",
    search={
        "search_type": "proprietary",
        "included_sources": ["patents"]  # USPTO patent database
    }
)

# Cross-domain: web + proprietary combined
task = valyu.deepresearch.create(
    query="Analyze competitor drug pipeline for NASH treatment",
    mode="max",
    search={
        "search_type": "all",
        "included_sources": ["academic", "finance"]
    }
)

TypeScript

// Academic + biomedical research
const academicTask = await valyu.deepresearch.create({
  query: "Recent advances in CRISPR base editing for sickle cell disease",
  mode: "heavy",
  search: {
    search_type: "proprietary",
    included_sources: ["academic"]  // PubMed, arXiv, bioRxiv, medRxiv, clinical trials
  }
});

// Financial + economic analysis
const financeTask = await valyu.deepresearch.create({
  query: "How do current FRED interest rate indicators compare to 2008 pre-crisis levels?",
  mode: "standard",
  search: {
    search_type: "proprietary",
    included_sources: ["finance"]  // SEC filings, FRED, BLS, stocks, earnings
  }
});

// Patent landscape
const patentTask = await valyu.deepresearch.create({
  query: "Which companies have filed transformer architecture patents since 2022?",
  mode: "heavy",
  search: {
    search_type: "proprietary",
    included_sources: ["patents"]  // USPTO patent database
  }
});

// Cross-domain: web + proprietary combined
const crossTask = await valyu.deepresearch.create({
  query: "Analyze competitor drug pipeline for NASH treatment",
  mode: "max",
  search: {
    search_type: "all",
    included_sources: ["academic", "finance"]
  }
});

Available proprietary source categories: academic, finance, patent, legal, transportation, politics.

Structured JSON Output

Instead of markdown, you can define a schema for structured output, useful when your agent needs to pipe results into a database or downstream process.

Python

competitor_schema = {
    "type": "object",
    "properties": {
        "companies": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "ticker": {"type": "string"},
                    "key_risk_factors": {
                        "type": "array",
                        "items": {"type": "string"}
                    },
                    "revenue_guidance": {"type": "string"}
                },
                "required": ["name", "key_risk_factors"]
            }
        },
        "summary": {"type": "string"}
    },
    "required": ["companies", "summary"]
}

task = valyu.deepresearch.create(
    query="Extract risk factors and revenue guidance from Q4 2024 10-Ks for major cloud providers",
    mode="heavy",
    output_formats=[competitor_schema],
    search={
        "search_type": "proprietary",
        "included_sources": ["finance"]
    }
)

TypeScript

const competitorSchema = {
  type: "object",
  properties: {
    companies: {
      type: "array",
      items: {
        type: "object",
        properties: {
          name: { type: "string" },
          ticker: { type: "string" },
          key_risk_factors: {
            type: "array",
            items: { type: "string" }
          },
          revenue_guidance: { type: "string" }
        },
        required: ["name", "key_risk_factors"]
      }
    },
    summary: { type: "string" }
  },
  required: ["companies", "summary"]
} as const;

const task = await valyu.deepresearch.create({
  query: "Extract risk factors and revenue guidance from Q4 2024 10-Ks for major cloud providers",
  mode: "heavy",
  output_formats: [competitorSchema],
  search: {
    search_type: "proprietary",
    included_sources: ["finance"]
  }
});

The API returns JSON that conforms to your schema, ready to deserialize directly.

Webhooks for Production

Don't poll in production. Use webhooks.

Python

import hmac
import hashlib
from flask import Flask, request, jsonify

app = Flask(__name__)
WEBHOOK_SECRET = "your-stored-secret"  # returned on task creation, store it

@app.route("/webhooks/deepresearch", methods=["POST"])
def handle_research_complete():
    signature = request.headers.get("X-Webhook-Signature")
    timestamp = request.headers.get("X-Webhook-Timestamp")
    payload = request.get_data(as_text=True)

    signed = f"{timestamp}.{payload}"
    expected = "sha256=" + hmac.new(
        WEBHOOK_SECRET.encode(),
        signed.encode(),
        hashlib.sha256
    ).hexdigest()

    if not hmac.compare_digest(expected, signature):
        return jsonify({"error": "Invalid signature"}), 401

    data = request.json
    if data["status"] == "completed":
        process_research_result(data["deepresearch_id"], data["output"])

    return jsonify({"received": True}), 200


# Task creation with webhook
task = valyu.deepresearch.create(
    query="Comprehensive competitive analysis: SEC filings + market data",
    mode="max",
    webhook_url="https://your-app.com/webhooks/deepresearch"
)
# CRITICAL: store task.webhook_secret immediately - only returned once
store_secret(task.deepresearch_id, task.webhook_secret)

TypeScript

import crypto from "crypto";
import express, { Request, Response } from "express";

const app = express();
app.use(express.raw({ type: "application/json" }));

const WEBHOOK_SECRET = process.env.WEBHOOK_SECRET!;

app.post("/webhooks/deepresearch", (req: Request, res: Response) => {
  const signature = req.headers["x-webhook-signature"] as string;
  const timestamp = req.headers["x-webhook-timestamp"] as string;
  const payload = req.body.toString();

  const signed = `${timestamp}.${payload}`;
  const expected = "sha256=" + crypto
    .createHmac("sha256", WEBHOOK_SECRET)
    .update(signed)
    .digest("hex");

  if (!crypto.timingSafeEqual(Buffer.from(expected), Buffer.from(signature))) {
    return res.status(401).json({ error: "Invalid signature" });
  }

  const data = JSON.parse(payload);
  if (data.status === "completed") {
    processResearchResult(data.deepresearch_id, data.output);
  }

  res.json({ received: true });
});


// Task creation with webhook
const task = await valyu.deepresearch.create({
  query: "Comprehensive competitive analysis: SEC filings + market data",
  mode: "max",
  webhook_url: "https://your-app.com/webhooks/deepresearch"
});
// CRITICAL: store task.webhookSecret immediately - only returned once
await storeSecret(task.deepresearch_id, task.webhook_secret);

Date Filtering

For time-sensitive research tasks, you can pin the search window.

Python

task = valyu.deepresearch.create(
    query="Clinical trial results for GLP-1 receptor agonists in NASH",
    mode="standard",
    search={
        "search_type": "proprietary",
        "included_sources": ["academic"],
        "start_date": "2023-01-01",
        "end_date": "2025-12-31"
    }
)

TypeScript

const task = await valyu.deepresearch.create({
  query: "Clinical trial results for GLP-1 receptor agonists in NASH",
  mode: "standard",
  search: {
    search_type: "proprietary",
    included_sources: ["academic"],
    start_date: "2023-01-01",
    end_date: "2025-12-31"
  }
});

This prevents the agent from pulling in outdated studies. This is very important for medical or financial research where recency matters.

Attach Documents for Analysis & Inclusion

Feed existing documents of all types into the research task. This is useful for combining internal documents with external research.

Python

import base64

with open("internal_q4_report.pdf", "rb") as f:
    pdf_b64 = base64.b64encode(f.read()).decode()

task = valyu.deepresearch.create(
    query="Compare the Q4 projections in this internal report against actual SEC filing data for our competitors",
    mode="heavy",
    files=[{
        "data": f"data:application/pdf;base64,{pdf_b64}",
        "filename": "q4_report.pdf",
        "mediaType": "application/pdf",
        "context": "Internal Q4 2024 financial projections"
    }],
    search={
        "search_type": "proprietary",
        "included_sources": ["finance"]
    }
)

TypeScript

import fs from "fs";

const pdfBuffer = fs.readFileSync("internal_q4_report.pdf");
const pdfB64 = pdfBuffer.toString("base64");

const task = await valyu.deepresearch.create({
  query: "Compare the Q4 projections in this internal report against actual SEC filing data for our competitors",
  mode: "heavy",
  files: [{
    data: `data:application/pdf;base64,${pdfB64}`,
    filename: "q4_report.pdf",
    mediaType: "application/pdf",
    context: "Internal Q4 2024 financial projections"
  }],
  search: {
    search_type: "proprietary",
    included_sources: ["finance"]
  }
});

Deep Research API in Production

"Talk is cheap, show me the code, and show it to me in production" - Odogwu Machalla

Consult Ralph is an AI-powered Deep Research app for consultants. It's in production, currently used by hundreds of consultants daily. The app is heavily powered by Valyu DeepResearch API.

It's also open-source. You can fork, clone, star it and check out the code for good references on how to use the Valyu Deep Research API in your codebase.

Benchmarks

Valyu's DeepResearch scores 53.1 on DeepResearch-Bench. The best published score for any commercial deep research API.

On domain-specific benchmarks where proprietary data access matters:

Benchmark	Valyu	Parallel	Exa	Google
Finance (120 questions)	73%	67%	63%	55%
Economics (100 questions)	73%	52%	45%	43%
MedAgent (562 medical queries)	48%	42%	44%	45%
FreshQA (600 time-sensitive)	79%	52%	24%	39%

The finance and economics gaps are almost entirely explained by proprietary data access. Valyu queries FRED, BLS, and SEC directly. Web-only APIs rely on articles that reference this data, which is noisier and often outdated.

Frequently Asked Questions

What is a deep research API?

A deep research API is a programmatic interface that performs multi-step autonomous research. It accepts a query, plans a research strategy, searches multiple sources, synthesizes results, and returns a structured report with citations. Unlike a standard search API, deep research APIs run asynchronously - tasks can take seconds to minutes depending on depth.

How does Valyu's DeepResearch API differ from OpenAI's?

OpenAI's deep research (o3-deep-research, o4-mini-deep-research) only searches the web via Bing. Valyu's DeepResearch searches both the web and 36+ proprietary data sources including SEC filings, PubMed, arXiv, clinical trials, USPTO patents, FRED economic data, and ChEMBL bioactive compounds. For AI agents that need authoritative domain data rather than general web content, this is the meaningful difference.

What does the Valyu DeepResearch API cost?

Four modes: fast ($0.10/task), standard ($0.50/task), heavy ($2.50/task), max ($15.00/task). Pricing is per task regardless of the number of searches or retrievals the agent performs internally. Signup gets $10 in free credits, no card required.

Does Valyu's deep research API support webhooks?

Yes. Pass a webhook_url on task creation. The API returns a webhook_secret for signature verification. Webhooks fire on task completion or failure with the full output payload. Retries use exponential backoff (up to 5 attempts) on server errors.

Can I get structured JSON output from a deep research API?

Valyu supports custom JSON Schema for structured output. Pass a schema object in output_formats. The API returns output that conforms to your schema, ready to deserialize directly into your application's data structures.

How long does a deep research task take?

Fast mode typically completes in under 60 seconds. Standard mode takes 2-5 minutes. Heavy and max modes can take up to 15-30 minutes for complex multi-source analysis. Use the wait() method with progress callbacks for synchronous use cases, or webhooks for production event-driven workflows.

Can I attach documents to a deep research task?

Yes. Pass files as base64-encoded data with a media type. The API supports PDFs, images (PNG, JPEG, WebP), and other documents. Useful for combining internal documents with external research - for example, analyzing your internal financial projections against publicly filed SEC data.

Is there a TypeScript / JavaScript SDK?

Yes. pnpm add @valyu/valyu-js. The API surface is identical to the Python SDK - all examples in this guide are shown in both languages.

Summary

If you're building AI agents that only research general web content, OpenAI or Perplexity deep research are solid choices, including Valyu. If your agents need to touch SEC filings, academic papers behind paywalls, clinical trial databases, USPTO patents, or real financial data, you need an API with proprietary source access.

Valyu's DeepResearch API:

Four modes from $0.10 (fast) to $15.00 (max)
36+ proprietary data sources across finance, biomedical, academic, patent, and legal domains
Markdown, PDF, and structured JSON output
Webhook and polling support
53.1 on DeepResearch-Bench (best published commercial score)

Docs at docs.valyu.ai/guides/deepresearch.

$10 free credits on signup at platform.valyu.ai.

Top comments (2)

klement Gunndu • Mar 2

The web-only vs proprietary source distinction is the real bottleneck for research agents. Cross-referencing SEC filings against preprints is exactly where most pipelines quietly return incomplete results.

Prosper Otemuyiwa Valyu AI • Mar 3

You are absolutely right!