Originally posted on the CNPJ Aberto blog (PT-BR). This post is the English version, kept in sync.
If you've ever asked Claude to look up a Brazilian company by its CNPJ (the federal tax ID equivalent to a US EIN), you know how it goes:
- It guesses, badly.
- It says "I can't access external data."
- It fires up web search and lands on a captcha-walled aggregator that returns half a page.
There's a fourth option now. You can give Claude — or any Model Context Protocol (MCP) client like Cursor, Cline, or Continue — a real tool. Plug in a free API key, and the LLM can pull a structured JSON record of any of ~67 million Brazilian companies (and ~70 million establishments) directly from the public dump from Brazil's federal revenue service.
I shipped two packages for this last week: cnpjaberto on PyPI and cnpjaberto on NPM. Both are dual-purpose: each one is a thin HTTP SDK and an MCP server bundled in the same package. So you pick whichever stack you're already in.
What it looks like in practice
A real prompt I sent to Claude Desktop after wiring it up:
"Look up CNPJ 18.236.120/0001-58. When was it founded, what's the main CNAE, and who are the partners?"
Claude calls lookup_cnpj under the hood and answers with a clean paragraph: Nu Pagamentos S.A., founded May 2013, registered as financial holding (CNAE 6435-2/01), partners include David Vélez, Edward Wible, Cristina Junqueira… No hallucination, no captcha, no stale Wikipedia snippet.
Or this one, which would be ~impossible without structured data:
"Find every active company where 'Maria Silva' appears as a partner, group by state, and tell me which industries dominate."
Claude chains companies_by_owner → cnae_stats. No SQL written by hand. The whole point of MCP is that the model decides when to call which tool — you just expose them.
Setup in 60 seconds
Two paths. Same result.
Option A — Node (zero-install via npx)
If you have Node 18+, you don't need to install anything. Drop this in ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"cnpjaberto": {
"command": "npx",
"args": ["-y", "cnpjaberto"],
"env": {
"CNPJABERTO_API_KEY": "your_key_here"
}
}
}
}
npx -y cnpjaberto pulls the package on demand. Nothing global. Restart Claude and you're done.
Option B — Python
pip install cnpjaberto[mcp]
{
"mcpServers": {
"cnpjaberto": {
"command": "cnpjaberto-mcp",
"env": {
"CNPJABERTO_API_KEY": "your_key_here"
}
}
}
}
The API key is free — sign up at cnpjaberto.com.br/planos, copy the key, paste it above. The free tier is 1,000 requests per day and covers every endpoint.
What tools the LLM gets
Each tool is a thin wrapper over a documented REST endpoint. Names and contracts are the same on both Python and Node packages, so MCP clients see identical schemas regardless of which one you run.
| Tool | What it returns |
|---|---|
lookup_cnpj |
Full registry: legal name, capital, CNAE, partners, all establishments (HQ + branches), addresses, phones |
list_filiais |
Branches of a parent company, paginated, optional state filter |
search_companies |
Search by legal name, brand name, or CNPJ digits (3+ chars) |
companies_by_owner |
Companies where a person appears as partner; partial CPF disambiguates homonyms |
companies_at_same_address |
Other companies registered at a given address (postal code, street, number) |
companies_by_contact |
Companies sharing the same email or phone (DDD + number) |
cnae_stats |
Aggregated stats for a CNAE code: total, top states, top municipalities, mortality |
panorama_overview |
National view: top states, top CNAEs, capital tiers, age tiers, 10-year history |
panorama_year |
Year-by-year cut: openings, closings, monthly series, MEI share |
The whole design is "give the model boring, deterministic tools and let it compose them." Two tool calls and Claude can answer questions that previously required either a paid SaaS or a custom SQL pipeline against the 4 GB monthly RFB dump.
Use it as a plain SDK too
If you're not on the LLM hype train (or you're building a backend that uses an LLM but doesn't embed MCP), the same package is a normal HTTP client. No MCP runtime in the call path.
Python:
from cnpjaberto import Client
with Client() as cnpj: # reads CNPJABERTO_API_KEY from env
company = cnpj.lookup("18.236.120/0001-58")
print(company["razao_social"])
snap = cnpj.panorama_year(2024)
print(f"{snap['abertas']:,} new companies in 2024")
TypeScript / JavaScript:
import { Client } from "cnpjaberto";
const cnpj = new Client(); // reads CNPJABERTO_API_KEY from env
const company = await cnpj.lookup("18.236.120/0001-58");
console.log(company.razao_social);
const snap = await cnpj.panoramaYear(2024);
console.log(`${snap.abertas} new companies in 2024`);
The two packages are mirror images (only casing differs: snake_case in Python, camelCase in TS). Same HTTP client the MCP server uses internally.
Typed errors, because life
from cnpjaberto import Client, NotFoundError, RateLimitError, AuthError
with Client() as cnpj:
try:
cnpj.lookup("00000000000000")
except NotFoundError:
...
except RateLimitError as e:
print("Daily quota:", e.payload)
except AuthError:
...
import { Client, NotFoundError, RateLimitError, AuthError } from "cnpjaberto";
try {
await cnpj.lookup("00000000000000");
} catch (e) {
if (e instanceof NotFoundError) { /* ... */ }
if (e instanceof RateLimitError) console.log("daily quota:", e.payload);
if (e instanceof AuthError) { /* ... */ }
}
Three exception types is the whole hierarchy. No custom retry logic to learn — just normal try/except (or try/catch).
Why bother building this
Two reasons.
One, MCP makes "AI agents that touch real data" actually feasible. Before it, every team rolled their own function-calling glue, and every glue handled errors and tokens differently. After it, you ship one package and any compliant client picks it up: Claude Desktop today, Cursor and Cline already, OpenAI's compat layer reportedly soon, plus a long tail of open-source agent frameworks. You pay the cost once.
Two, the Brazilian company registry is uniquely valuable as MCP fodder because the underlying queries are cheap (millisecond lookups in our index) but expensive to recreate locally (the public dump is ~4 GB of CSV, with monthly drops, and joining HQ/branches/partners across the three core tables is non-trivial). It's the perfect "I'd rather call a tool than build a pipeline" surface area.
If you're working on:
- Lead generation or B2B sales tooling
- Compliance / KYC / AML automation
- Investigative journalism or OSINT
- Tax / accounting software for Brazil
- Anything that touches Brazilian invoices or vendor onboarding
…you probably want this in your agent's toolkit. Free tier covers more than enough for prototyping.
What's next
v0.1 is intentionally minimal. On the roadmap:
-
Async client in Python (
AsyncClient) for FastAPI / async agents. Node already returns Promises everywhere. -
Hosted MCP at
mcp.cnpjaberto.com.brover HTTP+SSE — paste a URL, no local install. - Pro endpoints programmatically: list every company in a city, lead generator, full corporate ownership trees.
PRs welcome on both repos.
Links
- Python: pypi.org/project/cnpjaberto · github.com/cnpjaberto/cnpjaberto-py
- Node/TypeScript: npmjs.com/package/cnpjaberto · github.com/cnpjaberto/cnpjaberto-js
- Docs and live demo: cnpjaberto.github.io/cnpjaberto-py
-
Plain REST API (no SDK, no MCP — just
X-API-Keyand curl): cnpjaberto.com.br/desenvolvedores
License is MIT, and the company directory itself is and stays free. If you build something with it, ping me here or on the repo issues — I read everything.z
Top comments (0)