Tommaso Bertocchi

Posted on Jul 1 • Originally published at openosint.tech

I Built an Open-Source, Agentic OSINT Platform You Run With Your Own API Key

#ai #security #python #opensource

Before publishing — delete this block. GIFs on dev.to have no Giphy liquid tag: embed them as a plain markdown image pointing at the direct https://media.giphy.com/media/<ID>/giphy.gif URL (limit: 200 megapixels per frame). Every ![alt](REPLACE_WITH_GIPHY_URL) below is a placeholder with the alt text written and a search term in the comment — open Giphy, hit Copy link → GIF Link, paste it in. Cover image: 1000×420 px, keep any text within the left ~840px (social cards crop the rest).

TL;DR

What: an open-source, agentic OSINT platform — 16 tools, one natural-language interface.
How it's different: BYOK (bring your own LLM key) + a client-side agent loop + a stateless backend. Your keys and your queries stay with you.
Where: github.com/OpenOSINT/OpenOSINT · live demo at demo.openosint.tech
Run it: git clone, uv sync, uv run openosint repl. ~2 minutes.

OSINT tooling forces a bad trade-off. On one side: a pile of disconnected CLI scripts you glue together by hand. On the other: a polished SaaS that wants your investigation data on their servers and a subscription on your card.

I wanted the power of an orchestrated platform with the privacy of a local script — so I built OpenOSINT: the LLM does the orchestration, you bring your own API key, and nothing sensitive touches a server you don't control.

What it actually is
Orchestration vs. privacy
The architecture that matters: BYOK
One core, four front-ends
The Entity Correlation Graph
Quick start (2 minutes)
Drop it into Claude (or any MCP client)
Why BYOK is the whole point
Where it's going

What it actually is

OpenOSINT is a toolbox plus a brain.

The toolbox: 16 OSINT tools — username lookups, email/domain intelligence, IP geolocation, breach checks, metadata extraction, and more. Each is a clean, typed function with one job.

See the tool categories

Identity — username enumeration across platforms, email validation & reputation
Network — IP geolocation, ASN/WHOIS, reverse DNS
Domain — DNS records, subdomain discovery, certificate data
Exposure — breach/leak checks, paste monitoring
Artifacts — file & image metadata extraction

(Exact tool list lives in the repo README — it grows over time.)

The brain: an agent loop. You ask in plain language — "what can you find on this domain?" — the model picks the tools, chains them, and hands back a synthesized answer instead of 12 raw JSON blobs.

The twist: the brain runs on your key, and the loop runs client-side. The backend is a stateless dispatcher.

Orchestration vs. privacy

Every setup makes you choose. OpenOSINT tries not to:

	Loose scripts	Hosted SaaS	OpenOSINT
Orchestration	❌ manual	✅	✅
Data stays local	✅	❌	✅
Keys stay yours	✅	❌	✅
One NL interface	❌	✅	✅
Open source	sometimes	rarely	✅ MIT

The architecture that matters: BYOK

One rule drives everything: the server should know as little as possible.

┌─────────────────────────────────────┐
│  Your machine / browser              │
│   ┌──────────────┐                   │
│   │  Agent loop  │  ← your LLM key   │
│   │ (client-side)│                   │
│   └──────┬───────┘                   │
│          │ tool call (no secrets)    │
└──────────┼──────────────────────────┘
           ▼
┌─────────────────────────────────────┐
│  Stateless FastAPI backend           │
│  - rate limiting                     │
│  - real client-IP detection          │
│  - dispatches the 16 tools           │
│  - holds NO investigation state      │
└─────────────────────────────────────┘

Three deliberate choices:

BYOK. LLM inference is configured client-side, with adapters for multiple providers. Your key talks to your model — the backend never sees it.
Client-side agent loop. The reasoning ("call this, then that") happens next to you, not on a shared server.
Stateless backend. No stored query history. Restart it, scale it, throw it away — there's nothing sitting on a box you don't own.

One core, four front-ends

The same 16 tools, four ways in — use OpenOSINT however you already work:

CLI — scripting and one-off lookups
REPL — interactive, conversational investigation in your terminal
MCP server — plug it into Claude or any MCP client; the tools become available to your assistant directly
Web UI — browser front-end with the visual graph

# CLI: one-shot
openosint lookup username johndoe

# REPL: interactive
openosint repl

The Entity Correlation Graph

Raw results are noise. What turns lookups into an investigation is seeing how entities connect: this username links to that email, which resolves to that domain, hosted on that IP.

OpenOSINT renders this as an interactive Entity Correlation Graph (Cytoscape.js): nodes are entities, edges are discovered relationships. You drag, zoom, and watch the picture assemble as the agent works.

This is quietly evolving toward a proper ontology + entity-resolution layer — the idea behind platforms like Palantir Gotham, but open and self-hosted.

Quick start (2 minutes)

With Python and uv:

git clone https://github.com/OpenOSINT/OpenOSINT.git
cd openosint
uv sync
uv run openosint repl

Bring your own key:

export OPENOSINT_LLM_PROVIDER=anthropic   # or openai, etc.
export OPENOSINT_API_KEY=sk-...

Then just ask:

> what can you find about the domain example.com?

The agent picks the tools, runs them, correlates the output, and answers.

Prefer to look before installing? Live demo: demo.openosint.tech.

Drop it into Claude (or any MCP client)

Because OpenOSINT ships an MCP server, an MCP-aware assistant can call the 16 tools as part of its own reasoning — no copy-pasting between windows:

{
  "mcpServers": {
    "openosint": {
      "command": "uvx",
      "args": ["openosint", "mcp"]
    }
  }
}

Why BYOK is the whole point

"Bring your own key" reads like a cost feature. It isn't — it's a trust feature.

In OSINT, the query itself is sensitive. The username you're profiling, the domain you're digging into — that's signal about what you're working on. A hosted tool that proxies your LLM calls sees all of it.

With BYOK + a client-side loop:

Your prompts go straight from your machine to your model provider
Your API key is never transmitted to the backend
Your investigation state lives with you

Orchestration without handing over the thing you're trying to keep private.

Where it's going

A real ontology layer — typed entities and relationships, not labeled blobs
Stronger entity resolution (deduping "the same person across three handles")
More tools, same client-side, key-safe model

The MIT core stays free and open. The direction is "self-hostable Gotham," not "another locked SaaS."

One question for you: which OSINT source would you wire in first? Drop it in the comments — that's genuinely how the roadmap gets prioritized. 👇

If this is useful, a ⭐ helps it reach more people:

OpenOSINT / OpenOSINT

AI-powered OSINT agent with interactive REPL, MCP server, and CLI. 16 tools. Works with Claude, GPT-4, or local models. For authorized security research only.

mcp-name: io.github.OpenOSINT/openosint

OpenOSINT

OSINT agent for security researchers and analysts: 18 investigation tools behind a natural-language interface.

Use it as a REPL, CLI, MCP server, or browser Web UI.

The AI issues hard-stop tool calls; your code executes the real binary — hallucinated findings are structurally impossible.

Run a real OSINT investigation in your browser — bring your own Anthropic / OpenRouter / Ollama key, no signup.

OpenOSINT Web UI — live entity correlation graph demo: investigating openosint.tech

Try the live demo →

pip install openosint

Quick Start

# Interactive AI REPL (default)
openosint

# Web interface
openosint web

# Direct tool (no AI)
openosint email target@example.com

Usage

Start the REPL and investigate any target — the agent decides which tools to run and chains them on findings:

openosint > investigate target@example.com
  -> generate_dorks('target@example.com')
  -> search_email('target@example.com')
  Found: Spotify, WordPress, Gravatar, Office365

  -> search_breach('target@example.com')
  Found in 2 breaches: LinkedIn (2016), Adobe (2013)

  -> search_username('johndoe99')   <- pivoted from email findings
  Found: GitHub, Reddit,

…

View on GitHub

Built solo, in the open. Issues and PRs welcome.

DEV Community