DEV Community

Pedro Parker
Pedro Parker

Posted on • Originally published at cnpjaberto.com.br

Give your AI agent access to 67M Brazilian companies, free MCP server

Originally posted on the CNPJ Aberto blog (PT-BR). This post is the English version, kept in sync.

If you've ever asked Claude to look up a Brazilian company by its CNPJ (the federal tax ID equivalent to a US EIN), you know how it goes:

  • It guesses, badly.
  • It says "I can't access external data."
  • It fires up web search and lands on a captcha-walled aggregator that returns half a page.

There's a fourth option now. You can give Claude — or any Model Context Protocol (MCP) client like Cursor, Cline, or Continue — a real tool. Plug in a free API key, and the LLM can pull a structured JSON record of any of ~67 million Brazilian companies (and ~70 million establishments) directly from the public dump from Brazil's federal revenue service.

I shipped two packages for this last week: cnpjaberto on PyPI and cnpjaberto on NPM. Both are dual-purpose: each one is a thin HTTP SDK and an MCP server bundled in the same package. So you pick whichever stack you're already in.

What it looks like in practice

A real prompt I sent to Claude Desktop after wiring it up:

"Look up CNPJ 18.236.120/0001-58. When was it founded, what's the main CNAE, and who are the partners?"

Claude calls lookup_cnpj under the hood and answers with a clean paragraph: Nu Pagamentos S.A., founded May 2013, registered as financial holding (CNAE 6435-2/01), partners include David Vélez, Edward Wible, Cristina Junqueira… No hallucination, no captcha, no stale Wikipedia snippet.

Or this one, which would be ~impossible without structured data:

"Find every active company where 'Maria Silva' appears as a partner, group by state, and tell me which industries dominate."

Claude chains companies_by_ownercnae_stats. No SQL written by hand. The whole point of MCP is that the model decides when to call which tool — you just expose them.

Setup in 60 seconds

Two paths. Same result.

Option A — Node (zero-install via npx)

If you have Node 18+, you don't need to install anything. Drop this in ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "cnpjaberto": {
      "command": "npx",
      "args": ["-y", "cnpjaberto"],
      "env": {
        "CNPJABERTO_API_KEY": "your_key_here"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

npx -y cnpjaberto pulls the package on demand. Nothing global. Restart Claude and you're done.

Option B — Python

pip install cnpjaberto[mcp]
Enter fullscreen mode Exit fullscreen mode
{
  "mcpServers": {
    "cnpjaberto": {
      "command": "cnpjaberto-mcp",
      "env": {
        "CNPJABERTO_API_KEY": "your_key_here"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The API key is free — sign up at cnpjaberto.com.br/planos, copy the key, paste it above. The free tier is 1,000 requests per day and covers every endpoint.

What tools the LLM gets

Each tool is a thin wrapper over a documented REST endpoint. Names and contracts are the same on both Python and Node packages, so MCP clients see identical schemas regardless of which one you run.

Tool What it returns
lookup_cnpj Full registry: legal name, capital, CNAE, partners, all establishments (HQ + branches), addresses, phones
list_filiais Branches of a parent company, paginated, optional state filter
search_companies Search by legal name, brand name, or CNPJ digits (3+ chars)
companies_by_owner Companies where a person appears as partner; partial CPF disambiguates homonyms
companies_at_same_address Other companies registered at a given address (postal code, street, number)
companies_by_contact Companies sharing the same email or phone (DDD + number)
cnae_stats Aggregated stats for a CNAE code: total, top states, top municipalities, mortality
panorama_overview National view: top states, top CNAEs, capital tiers, age tiers, 10-year history
panorama_year Year-by-year cut: openings, closings, monthly series, MEI share

The whole design is "give the model boring, deterministic tools and let it compose them." Two tool calls and Claude can answer questions that previously required either a paid SaaS or a custom SQL pipeline against the 4 GB monthly RFB dump.

Use it as a plain SDK too

If you're not on the LLM hype train (or you're building a backend that uses an LLM but doesn't embed MCP), the same package is a normal HTTP client. No MCP runtime in the call path.

Python:

from cnpjaberto import Client

with Client() as cnpj:                       # reads CNPJABERTO_API_KEY from env
    company = cnpj.lookup("18.236.120/0001-58")
    print(company["razao_social"])

    snap = cnpj.panorama_year(2024)
    print(f"{snap['abertas']:,} new companies in 2024")
Enter fullscreen mode Exit fullscreen mode

TypeScript / JavaScript:

import { Client } from "cnpjaberto";

const cnpj = new Client();                   // reads CNPJABERTO_API_KEY from env

const company = await cnpj.lookup("18.236.120/0001-58");
console.log(company.razao_social);

const snap = await cnpj.panoramaYear(2024);
console.log(`${snap.abertas} new companies in 2024`);
Enter fullscreen mode Exit fullscreen mode

The two packages are mirror images (only casing differs: snake_case in Python, camelCase in TS). Same HTTP client the MCP server uses internally.

Typed errors, because life

from cnpjaberto import Client, NotFoundError, RateLimitError, AuthError

with Client() as cnpj:
    try:
        cnpj.lookup("00000000000000")
    except NotFoundError:
        ...
    except RateLimitError as e:
        print("Daily quota:", e.payload)
    except AuthError:
        ...
Enter fullscreen mode Exit fullscreen mode
import { Client, NotFoundError, RateLimitError, AuthError } from "cnpjaberto";

try {
  await cnpj.lookup("00000000000000");
} catch (e) {
  if (e instanceof NotFoundError) { /* ... */ }
  if (e instanceof RateLimitError) console.log("daily quota:", e.payload);
  if (e instanceof AuthError) { /* ... */ }
}
Enter fullscreen mode Exit fullscreen mode

Three exception types is the whole hierarchy. No custom retry logic to learn — just normal try/except (or try/catch).

Why bother building this

Two reasons.

One, MCP makes "AI agents that touch real data" actually feasible. Before it, every team rolled their own function-calling glue, and every glue handled errors and tokens differently. After it, you ship one package and any compliant client picks it up: Claude Desktop today, Cursor and Cline already, OpenAI's compat layer reportedly soon, plus a long tail of open-source agent frameworks. You pay the cost once.

Two, the Brazilian company registry is uniquely valuable as MCP fodder because the underlying queries are cheap (millisecond lookups in our index) but expensive to recreate locally (the public dump is ~4 GB of CSV, with monthly drops, and joining HQ/branches/partners across the three core tables is non-trivial). It's the perfect "I'd rather call a tool than build a pipeline" surface area.

If you're working on:

  • Lead generation or B2B sales tooling
  • Compliance / KYC / AML automation
  • Investigative journalism or OSINT
  • Tax / accounting software for Brazil
  • Anything that touches Brazilian invoices or vendor onboarding

…you probably want this in your agent's toolkit. Free tier covers more than enough for prototyping.

What's next

v0.1 is intentionally minimal. On the roadmap:

  • Async client in Python (AsyncClient) for FastAPI / async agents. Node already returns Promises everywhere.
  • Hosted MCP at mcp.cnpjaberto.com.br over HTTP+SSE — paste a URL, no local install.
  • Pro endpoints programmatically: list every company in a city, lead generator, full corporate ownership trees.

PRs welcome on both repos.

Links

License is MIT, and the company directory itself is and stays free. If you build something with it, ping me here or on the repo issues — I read everything.z

Top comments (0)