ResearchSwarm: AI-Powered Research Discovery Engine

Thaywo — Sun, 02 Nov 2025 14:42:10 +0000

🎯 What I Built

ResearchSwarm is a system that rely on MINIMAX AI via OpenRouter to reimagines how academic research is discovered and connected. It uses the unique capabilities of Agentic Postgres to let several specialized processes work in parallel each on its own isolated database fork to uncover hidden relationships between research papers up to four times faster than traditional methods.

The Inspiration

As a researcher, I’ve often struggled with manually tracking citations, finding cross-disciplinary links, and identifying trends buried in thousands of papers. Most research tools only handle basic keyword searches or limited semantic matching. I wanted to build something that could explore data from different angles at once bringing together multiple analyses to reveal insights that would otherwise take weeks to find.

Core Concept

At the heart of ResearchSwarm is how it uses “Agentic Postgres.”
Each process runs independently on its own database fork, avoiding conflicts and boosting speed:

Citation Analyzer – Maps and explores citation relationships

Topic Discovery Unit – Groups related papers into themes

Connection Finder – Detects links between different domains

Trend Tracker – Monitors how topics evolve over time

Together, these components complete analysis in about 5 seconds compared to 15 seconds or more using sequential execution.

Repository

GitHub:
https://github.com/thaywo/agentic_challenge_backend
https://github.com/thaywo/frontend_researchswarm

Key Features

Hybrid Search

Combines BM25 keyword ranking with semantic vector search.

POST /api/search/hybrid
{
  "query": "quantum machine learning",
  "keywordWeight": 0.5,
  "vectorWeight": 0.5
}

Parallel Discovery

Runs all four processes simultaneously.

POST /api/agents/discover

Response:

{
  "total_duration": 2845,
  "agents": 4,
  "successful": 4,
  "results": [...]
}

Citation Network

Displays an interactive graph of how papers reference one another.

GET /api/papers/:id/network?depth=2

Cross-Domain Connections

Reveals research that bridges different academic areas.

GET /api/analytics/connections/cross-domain

Screenshots

Hybrid Search Interface

Combines keyword and semantic results for better discovery.

Parallel Execution Dashboard

All four processes running at once on separate database forks.

Citation Network Visualization

Interactive graph showing citation relationships.

Cross-Domain Links

Papers connecting quantum computing and machine learning.

🚀 How Agentic Postgres Was Used

Fast Database Forks (Zero-Copy)

Challenge: Running multiple tasks at once without conflicts.
Solution: Each process creates its own zero-copy database fork using Tiger Cloud.

async createFork(name) {
  const forkName = `${name}-${Date.now()}`;
  const command = `tiger service fork create --name ${forkName}`;
  const { stdout } = await execAsync(command);
  const info = JSON.parse(stdout);

  return {
    forkId: info.service_id,
    connectionString: info.connection_string
  };
}

Impact:

Forks created in under 500 ms

No extra storage until data diverges

Automatic cleanup after completion

4× faster execution

Hybrid Search (pg_textsearch + pgvector)

Challenge: Academic search needs both keyword precision and contextual meaning.
Solution: Combine BM25 and vector similarity in a single query.

CREATE FUNCTION hybrid_search(...) RETURNS TABLE (...) AS $$
BEGIN
  WITH keyword_search AS (...),
       vector_search AS (...)
  SELECT COALESCE(k.id, v.id),
         (k.keyword_score * keyword_weight + v.vector_score * vector_weight) AS combined_score
  FROM keyword_search k
  FULL OUTER JOIN vector_search v ON k.id = v.id
  ORDER BY combined_score DESC;
END;
$$ LANGUAGE plpgsql;

Impact:

Balances precision and context

Handles synonyms and related concepts

Responds in under 200 ms for 10K+ papers

Tiger MCP Integration

Challenge: Each component needed awareness of the database schema.
Solution: Used Tiger MCP (Model Context Protocol) to provide schema context.

const mcpContext = await tigerMCP.getContext({
  schema: 'research_discovery',
  tables: ['papers', 'citations', 'topics'],
  include_docs: true
});

Impact:

Automatically generates correct SQL queries

Reduces setup time by half

Ensures consistent structure across components

Tiger CLI for DevOps

Challenge: Simplifying database infrastructure management.
Solution: Used Tiger CLI for automation.

tiger service create --name research-swarm --addons time-series,ai
tiger service fork create --name citation-analyzer-123
tiger db connection-string --service-id ywwb0507h1

Impact:

Easy setup and monitoring

Ready for CI/CD pipelines

Clean and intuitive workflow

TimescaleDB Hypertables

Challenge: Efficient trend tracking over time.
Solution: Used TimescaleDB hypertables for time-series optimization.

CREATE TABLE trends (
  topic_id INTEGER,
  time_period DATE,
  paper_count INTEGER,
  citation_velocity FLOAT,
  growth_rate FLOAT,
  is_emerging BOOLEAN
);

SELECT create_hypertable('trends', 'time_period', chunk_time_interval => INTERVAL '1 month');

Impact:

10× faster time-based queries

Automatic compression

Smooth long-term data analysis

💡 Experience Summary
What Worked Well

Instant Forks: Creating large database forks in seconds.

Hybrid Search: Strong combination of text and vector search.

Developer Tools: Excellent CLI and dashboard experience.

Challenges and Fixes

Fork Management: Solved with automatic cleanup in the orchestrator.

Search Weighting: Added configurable keyword/vector balance.

Result Merging: Used JSONB columns for flexible data storage.

Key Takeaways

Agentic Postgres changes the way databases are used.
It’s not just a place to store data it’s an active partner in data analysis:

Databases can fork and parallelize their own workloads

Searches understand meaning, not just text

Infrastructure adjusts automatically based on need

This marks a major shift in how large-scale data exploration can be done.

What’s Next

Live arXiv Integration – Automatically fetch new papers daily.

More Specialized Components – For summarization and collaboration suggestions.

Advanced Visuals – 3D citation graphs, timeline views, and heatmaps.

Suggestions for the Tiger Team

Fork Management UI: Add a visual fork tree and analytics.

pg_textsearch Upgrades: Include built-in hybrid functions and multilingual support.

MCP Documentation: Provide more examples and integration tips.

📊 Performance Metrics
Metric Value
Parallel Execution 2.8 s (vs 12 s sequential)
Fork Creation Time < 500 ms
Hybrid Search Latency < 200 ms (10 K papers)
Citation Network Query < 300 ms (depth = 2)
Database Size ~ 50 MB
Fork Storage Overhead 0 bytes (until divergence)
🛠️ Tech Stack

Backend: Node.js, Express

Database: PostgreSQL 16 (Tiger Cloud)

Extensions: pgvector, timescaledb, pg_textsearch

CLI: Tiger CLI v0.15.0

Hosting: Tiger Cloud

Deployment: Tiger Cloud Service (ywwb0507h1)

🎓 Why It Stands Out

True Parallel Architecture – Independent forks for full isolation

Production-Ready Implementation – Structured APIs, logging, and monitoring

Proven Results – Real, measurable performance gains

Meaningful Use Case – Solves an everyday research challenge

🙏 Acknowledgments

Special thanks to:

Tiger Data Team – For creating Agentic Postgres

TimescaleDB – For the solid foundation

DEV Community – For hosting this challenge

Researchers worldwide – For inspiring this project

🔗 Final Links

Live Demo: https://agentic-researchswarm.vercel.app/

Built with passion for the Agentic Postgres Challenge.

ResearchSwarm – where isolated database forks work together to uncover what traditional systems overlook.

DEV Community: Thaywo

ResearchSwarm: AI-Powered Research Discovery Engine