DEV Community

Cover image for Building MCP Servers for Genealogy: AI-Powered Historical Research
Raphaël Pinson
Raphaël Pinson

Posted on

Building MCP Servers for Genealogy: AI-Powered Historical Research

For years now, I’ve been writing a book tracing four family branches across Europe, the Middle East, and South Africa. One thread follows Louis Rau, my 3rd great-uncle, who was president of Compagnie Continentale Edison (CCE) in the early 1900s. He was an Edison Pioneer, part of the inner circle that brought Edison's electrical systems to Europe.

Last year, I found that Thomas Edison's papers were digitized at Rutgers University. So I navigated to edisondigital.rutgers.edu, typed "Louis Rau" into the search box, and hit enter, and 847 results were returned.

Somewhere in those 847 documents was the correspondence that would explain Louis Rau's business relationship with Élie Moïse Léon, co-founder of CCE. Somewhere were the letters that traced his movements between Paris and Geneva. Somewhere were the details of CCE's electrical installations across Europe.

But I'd have to click through them one by one, read the snippets, open promising documents, cross-reference dates, take notes, come back later and forget which ones I'd already checked…

A few weeks ago, I started feeding genealogy documents to Claude AI, but that was still pretty tedious, and I kept hitting image upload limits in conversations. And then it clicked: why not build an MCP server, so Claude could perform the search directly?

That question became three MCP servers, a transformed research workflow, and a fundamentally different relationship with historical archives.

First Win: The Edison Papers MCP

The Edison Papers has an API. I didn't know that initially — I just knew they had a website with a search box. But a quick look at the network tab showed clean REST endpoints returning JSON.

I opened Claude Code and asked it to build an MCP server that wrapped the Edison Papers API. A few hours of iteration later, I had:

  • edison_search: Query with field-level precision (creator:"Rau, Louis", recipient:"Léon, Élie")
  • edison_get_document: Retrieve full metadata and transcriptions
  • edison_browse_series: Navigate document collections systematically
  • edison_get_images: Access high-resolution scans

GitHub logo raphink / edison-archive-mcp

MCP server for the Edison Archive

Edison Papers MCP Server

An MCP server for querying the Thomas A. Edison Papers (Rutgers University) — ~150,000 documents, public domain (CC0).

Tools

Tool Description
edison_search Full-text search by keyword, author, or recipient
edison_get_document Fetch full metadata and transcription for a document by call number
edison_browse_series List all documents in an archive series

Use with Claude.ai (hosted)

Deploy the server online so Claude.ai can connect to it via HTTP.

1. Deploy to Railway (free)

  1. Push this repo to GitHub
  2. Go to railway.appNew Project → Deploy from GitHub repo
  3. Select your repo, then add this environment variable:
    MCP_TRANSPORT = http
    
    (PORT is set automatically by Railway)
  4. Click Deploy (~2 minutes)
  5. Go to Settings → Networking → Generate Domain to get your public URL

2. Connect to Claude.ai

Go to Claude.ai → Settings → Integrations → Add custom integration and enter:

https://your-app.up.railway.app/mcp

Use with Claude Desktop (local)

1. Install

Now instead of clicking through 847 results, I could ask Claude:

"Find correspondence where Louis Rau is the creator, dated 1892-1895, mentioning electrical installations or Paris operations."

And Claude would orchestrate the full research pipeline:

  1. Search: Call Edison Papers MCP → retrieve all matching results
  2. Triage: Read all abstracts, decide which documents warrant full analysis
  3. Track: Create a Notion database entry for each document with analysis status
  4. Prioritize: Rank documents by relevance
  5. Deep read: For priority documents, get high-resolution images and use OCR for full context
  6. Summary: Provide a summary of all findings

What would have taken hours of manual clicking, note-taking, and cross-referencing now happens in one conversation.

This was immediately useful. But it surfaced a new problem: where do all these findings go?

The Organization Problem: Enter Notion MCP

I was already using Notion to organize my research: person profiles, document summaries, research questions. And Claude already had an MCP for Notion.

So now when I asked:

"Search Edison Papers for Louis Rau correspondence from 1892-1895, create a Notion page summarizing the findings, and link it to Louis Rau's profile."

Claude would:

  1. Search: Call Edison Papers MCP → retrieve all matching results
  2. Triage: Read all abstracts, decide which documents warrant full analysis
  3. Track: Create a Notion database entry for each document with analysis status
  4. Prioritize: Rank documents by relevance
  5. Deep read: For priority documents, get high-resolution images and use OCR for full context
  6. Document: Update Notion pages with findings
  7. Connect: Update profile pages for people mentioned (Louis Rau, Élie Léon, etc.)

This was amazing. Structured knowledge, automatically organized, all in one conversation.

But then Claude started hallucinating.

The Hallucination Problem: Claude Needs Ground Truth

Claude would find documents mentioning for example Samuel Léon and Élie Léon, and confidently conclude that they that Samuel was Élie's nephew, completely making it up.

Or it would claim someone was born in 1847 when they were actually born in 1867. Dates off by decades. Family relationships invented wholesale.

The problem: Claude had access to documents (via Edison Papers MCP) and research notes (via Notion MCP), but not the actual genealogy data. It was inferring family structure from fragmentary mentions in letters and my incomplete notes.

I needed to give Claude access to the tree itself, the actual source of truth about who's related to whom and when they lived.

Attempt 1: GEDCOM MCP (Local)

My family tree lives in Geni — a collaborative genealogy platform to build a unique World family tree. Geni has an API, but OAuth kept failing when I tried it and I wanted something working now.

So I took a shortcut. From time to time, I export data from Geni to GEDCOM (the genealogy standard format), with about 25000 individuals in my export. I used airy10's GEDCOM MCP to make it queryable locally.

GitHub logo airy10 / GedcomMCP

MCP Server to create or query GEDCOM files

GEDCOM MCP Server

Genealogy for AI Agents, by AI Agents

A robust MCP server for creating, editing and querying genealogical data from GEDCOM files Works great with qwen-cli and gemini-cli

This project provides a comprehensive set of tools for AI agents to work with family history data enabling complex genealogical research, data analysis, and automated documentation generation.

The server has been recently improved with fixes for critical bugs, enhanced error handling, and better code quality while maintaining full backward compatibility.

Some sample complex prompts:

   Load gedcom "myfamily.ged"
   Make a complete, detailled biography of <name of some people from the GEDCOM> and his fammily. Use as much as you can from this genealogy, including any notes from him or his relatives
   You can try to find some info on Internet to complete the document, add some historical or geographic context, etc. Be as complete as possible to tell us a nice

This worked! Now Claude could:

  • Search for individuals by name
  • Verify relationships ("Is X related to Y?")
  • Check birth/death dates
  • Trace lineage paths

No more hallucinated family connections. The GEDCOM became a hypothesis database, and claims in documents could be verified against known family structure.

Why Geni as my main database?

I use Geni instead of maintaining a private tree because genealogy is collaborative research. Multiple people contribute information, sources get peer-reviewed, duplicates get merged. A tree on Geni is a shared knowledge base, not siloed private data that might be duplicated (and wrong) across dozens of individual researchers' files.

But the GEDCOM approach had limitations:

  • It only works in Claude Desktop (local MCP)
  • It requires manually re-exporting GEDCOM whenever the tree updated
  • No access in claude.ai web sessions (or phone)

I needed the real API.

Back to Geni: Tackling OAuth

So I went back to the Geni API. A few more hours of iteration with Claude Code, and I had:

  • Full OAuth implementation (access tokens, refresh flow)
  • 13 tools: profile CRUD, relationship pathfinding, merge candidate detection, family traversal
  • Search by name, verify relationships, trace lineage paths programmatically

GitHub logo raphink / geni-mcp

An MCP Server for Geni

geni-mcp

An MCP (Model Context Protocol) server that gives Claude access to Geni — the collaborative genealogy platform. Use Claude to browse, search, correct, and extend your family tree.

Features

Tool Description
get_authorization_url Start the OAuth flow — get the URL to authorize Claude
exchange_code Complete OAuth — exchange the code for tokens
get_my_profile Get your own Geni profile
get_profile Look up any profile by ID
update_profile Correct names, dates, locations, biography
create_profile Add a new person to Geni
get_immediate_family Get parents, siblings, spouses, children
get_relationship_path Find relationship path between two profiles
get_union Get a family unit (couple + children)
add_relation Add a parent, child, sibling, or spouse
search_profiles Search by name with optional birth/death filters
get_merge_candidates Find potential duplicate profiles
merge_profiles Merge a duplicate into a base profile

Prerequisites

  1. A Geni account at geni.com
  2. A registered Geni app — create one at geni.com/platform/developer/apps
  3. Node.js 20+

Setup

1. Clone &

Now I could ask mid-conversation: "Is Samuel Léon related to Élie Moïse Léon?" and get the relationship path instantly, whether I was in Claude Desktop or claude.ai.

The tree became queryable context accessible anywhere, not just on my local machine with an up-to-date GEDCOM file.

Third Server: Newspapers MCP

With Edison Papers and Geni working, I could trace business connections and verify family relationships. But I was still missing contemporary context: how did the public see these people? What did newspapers say about CCE's operations? Were there announcements, obituaries, social mentions?

Historical newspapers are digitized across dozens of national archives. Each has its own interface. Searching them all manually meant opening multiple websites, running the same query in different systems, downloading results individually.

So I built a newspapers MCP that:

  • Aggregates multiple national newspaper archives
  • Searches across collections simultaneously
  • Returns snippets as base64-encoded images (because OCR quality varies)

Newspapers MCP Server

An MCP (Model Context Protocol) server for searching online newspaper archives across multiple countries and regions. This server provides unified access to newspaper collections from around the world through a single, standardized interface.

Supported Archives

Archive Region Source key Full-text search OCR text Snippet images API key
Europeana Collections Europe (multi-country) europeana Optional (get key)
Gallica (BnF) France gallica None
Deutsche Digitale Bibliothek Germany ddb None
digiPress (BSB) Germany / Bavaria digipress None
ANNO (Austrian NL) Austria / Austro-Hungarian Empire anno None
Delpher (KB) Netherlands delpher None
Chronicling America (LoC) United States chronicling_america None
eLuxemburgensia (BnL) Luxembourg eluxemburgensia None
Trove (NLA) Australia trove Required (free — get key)
Norwegian NL (nb.no) Norway norwegian None

Here’s a real example:

I asked Claude to search for "Joseph Dreyfus grain Paris 1895" (a grain merchant in the family who had a financial collapse). The MCP found the concordataire liquidation announcement in French commercial journals. That single search led to discovering a 90-page Archives de Paris dossier (D14U³/89) I'm still analyzing.

One search. Ten minutes. What would have been days of archive website navigation.

How They Work Together: Finding Solomon Rau in Munich

Here's a recent example showing how the MCPs orchestrate together:

I asked Claude to search for Solomon Rau's activity in Munich newspapers. The newspapers MCP returned various results, including this advertisement:

DDSG Announcement

This ad showed Solomon Rau advertising the reimbursement of DDSG (Danube Steam Shipping Company) stock — a discovery that:

  • Revealed his business activity (financial/stock trading)
  • Connected him to DDSG, a major shipping company
  • Provided a concrete date and location (Munich)
  • Led to further discoveries about other family members' activities

Claude then cross-referenced this against the Geni tree to verify Solomon's identity and relationships, and documented the finding in Notion with the newspaper snippet as a source.

It then correlated it to the DDSG stock that Adolphe Grünberg, Solomon’s son-in-law, had in his post-mortem inventory the next year in 1878, and added another note there.

Have you built AI integration for research yourself? What were your best findings?

Top comments (0)