How to Get Canadian Contractor License Data by API (2026)

#ai #tutorial #python #datascience

Quebec publishes its entire registry of active construction contractors as open data. Every licensed contractor in the province — roofers, electricians, general builders, excavators — sits in one file, updated daily, free to use under a Creative Commons license.

That sounds great until you download it. It's a ~10.8 MB zip. Inside is a CSV with 924,000 rows and French column headers. One contractor appears across many rows (one row per license category), so the raw file is not a list of contractors — it's a list of contractor-category pairs you have to group yourself. If you just want a clean, queryable list of "roofing contractors in Quebec with a phone number and email," the open data is technically available but practically painful.

This post shows how to skip the pain and pull that data through an API — either from your own code or directly from an AI agent over MCP. We'll use a small hosted scraper on Apify that turns the raw RBQ dump into structured, deduplicated records.

The source: RBQ's active license list

The data comes from the Régie du bâtiment du Québec (RBQ) — the provincial body that licenses construction contractors. Their "Liste des licences actives" dataset lives on the Données Québec open-data portal, is refreshed daily, and is published under CC-BY 4.0 (free to use with attribution). This is official government open data, not a scrape of a private portal.

If you want the DIY route, it's a few lines of Python:

import io, zipfile, requests, pandas as pd

# Resource URL from the Données Québec dataset page
URL = "https://www.donneesquebec.ca/.../licences-rbq.zip"

raw = requests.get(URL, timeout=120).content
zf = zipfile.ZipFile(io.BytesIO(raw))
csv_name = zf.namelist()[0]
df = pd.read_csv(zf.open(csv_name), sep=";", encoding="latin-1")

print(df.shape)  # ~(924000, 25)

Then you still have to: normalize French/accented column names, decode the license-category codes, group ~924k rows down to ~54k unique contractors, and re-run the whole thing every day to stay current. Doable, but it's a maintenance job, not a one-liner.

The API route

Instead, we call a hosted Actor — truenorthdata/canada-contractor-licenses — that does the download, normalization, category decoding and grouping, and returns one clean record per license. You call it like any REST API.

The most convenient endpoint is Apify's run-sync-get-dataset-items: it starts the Actor, waits for it to finish, and returns the dataset items in one HTTP response. (Synchronous runs are capped at 300 seconds, so keep the result set filtered.)

Python

import requests

APIFY_TOKEN = "YOUR_APIFY_TOKEN"
ACTOR = "truenorthdata~canada-contractor-licenses"

url = f"https://api.apify.com/v2/acts/{ACTOR}/run-sync-get-dataset-items"

payload = {
    "keywords": ["toiture"],   # "toiture" = roofing
    "maxResults": 100,
}

resp = requests.post(
    url,
    params={"token": APIFY_TOKEN},
    json=payload,
    timeout=300,
)
records = resp.json()

for r in records[:5]:
    print(r["licenceNumber"], r["companyName"], r.get("email"), r.get("phone"))

Note the tilde in the Actor ID (truenorthdata~canada-contractor-licenses) — the API uses ~ instead of / to separate the username and Actor name.

curl

Same call from the shell:

curl -X POST \
  "https://api.apify.com/v2/acts/truenorthdata~canada-contractor-licenses/run-sync-get-dataset-items?token=YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"keywords": ["toiture"], "maxResults": 100}'

You get back a JSON array of contractor records. No zip handling, no latin-1 decoding, no grouping — the Actor already did it.

The MCP route: call it straight from an AI agent

If you're building with LLMs, you don't even need the REST call. Apify exposes Actors over the Model Context Protocol (MCP), so an AI agent can invoke this scraper as a tool and reason over the results in the same turn.

Point your MCP client at:

https://mcp.apify.com?tools=truenorthdata/canada-contractor-licenses

In a Claude Desktop / Claude Code style config that looks like:

{
  "mcpServers": {
    "contractor-licenses": {
      "url": "https://mcp.apify.com?tools=truenorthdata/canada-contractor-licenses"
    }
  }
}

Once it's wired up, you can ask the agent in plain language — "Find active roofing contractors in Quebec that have both an email and a phone number" — and it will call the Actor, get structured records, and filter them for you. That's the real payoff of the API-first approach: the same data source works for a nightly Python job and for an interactive agent, with no glue code in between.

Verifying one contractor

The other common job isn't lead-gen, it's verification — someone hands you an RBQ number and you want to confirm it's active and see what it covers. Same endpoint, just search by the number:

payload = {"keywords": ["5678-1234-01"], "maxResults": 1}
resp = requests.post(url, params={"token": APIFY_TOKEN}, json=payload, timeout=60)
match = resp.json()

if match:
    c = match[0]
    print("Active:", c["companyName"])
    print("Categories:", ", ".join(c["categories"]))
    print("Restrictions:", c.get("restrictions") or "none")
    print("Bond claims:", c.get("bondClaims", 0))
else:
    print("No active license found for that number.")

Because the underlying dataset only lists active licenses, an empty result is itself a signal — a number that returns nothing is not currently a valid active license.

What a record looks like

Each item is one active license, with the RBQ's fields mapped to clean keys (24 of 25 source fields are carried through). The useful ones for lead-gen and verification:

licenceNumber — the RBQ license number (the primary key contractors quote)
companyName — legal / operating name
categories — decoded license categories (what they're actually allowed to build)
email and phone — contact info, where the RBQ publishes it
address, city, postal code
bondClaims — number of bond claims against the contractor (a rough risk signal)
restrictions — any restrictions on the license

Because records are already grouped by license number, one contractor is one record with all their categories in an array — not scattered across dozens of rows.

Cost

The Actor is monetized pay-per-event: a small start fee (about $0.005 per run) plus $6 per 1,000 contractor-license records returned. You pay for what you pull, and a filtered query (say, roofers in one region) costs cents. There's no subscription and no infrastructure to run — the Actor and its daily refresh live on Apify.

Because the RBQ source is refreshed daily, so is the Actor's view of it. If you're maintaining your own database, the cheapest pattern is to run a filtered query on a schedule (nightly, weekly) and upsert by licenceNumber — you get new and updated contractors without re-pulling the whole registry every time.

When to use which route

REST API (run-sync-get-dataset-items) — best for scripts, cron jobs, and pipelines. Filter with keywords / maxResults so runs finish inside the 300-second sync window; for very large pulls, start an async run and read the dataset afterward.
MCP — best when an AI agent should fetch and reason over the data itself, no wrapper code.
Raw open data — best if you genuinely want all 924k rows and are happy owning the normalization and daily refresh yourself. The source is free and CC-BY; the Actor just saves you the plumbing.

Try it

The scraper is live on the Apify Store as canada-contractor-licenses (truenorthdata/canada-contractor-licenses). Grab an Apify token, drop in the Python snippet above, and you'll have clean Quebec contractor data in one call.

Next on the roadmap: adding Ontario's HCRA builder directory so the same Actor covers licensed contractors across two provinces. If that'd be useful to you, let me know in the comments.

Data source: Régie du bâtiment du Québec — Liste des licences actives, via Données Québec, licensed CC-BY 4.0.