France has one of the most comprehensive business registries in Europe. The SIRENE database, maintained by INSEE (the national statistics institute), contains information on over 30 million active establishments. And since 2017, this data has been freely available as open data.
What caught my attention was a specific subset: the artisans. Plumbers, electricians, bakers, carpenters, stonemasons — the backbone of local economies across France. Could you build a genuinely useful local directory of craftspeople using nothing but open data? That question led to mon-site-artisan.net, and the answer turned out to be: yes, but with caveats worth sharing.
The SIRENE Database: A Gold Mine with Rough Edges
SIRENE (Systeme Informatise du Repertoire National des Entreprises et des Etablissements) assigns every business in France a unique SIREN number (9 digits for the legal entity) and SIRET number (14 digits for each establishment). The public dataset includes:
- Business name and legal form
- Address (street, postal code, commune)
- NAF code (activity classification, similar to NAICS/NACE)
- Creation date
- Workforce category
- Whether the business is registered with the Chambre de Metiers (artisan registry)
The last field is key. France maintains a separate Repertoire des Metiers for artisans — businesses that perform primarily manual work requiring specific qualifications. Registration with the Chambre de Metiers et de l'Artisanat (CMA) is mandatory for these businesses, and this flag appears in SIRENE data.
Identifying Artisans in the Data
Filtering for artisans involves two complementary approaches:
1. The Artisan Registration Flag
SIRENE includes a field indicating registration with the Repertoire des Metiers. This is the most reliable indicator, but not all artisan-type businesses are flagged — some register only with the Registre du Commerce et des Societes (RCS) if they also have commercial activity.
2. NAF Code Filtering
The NAF (Nomenclature d'Activites Francaise) classification maps well to artisan trades:
| NAF Code | Activity |
|---|---|
| 43.21A | Travaux d'installation electrique |
| 43.22A | Travaux d'installation d'eau et de gaz |
| 43.91A | Travaux de charpente |
| 43.32A | Travaux de menuiserie bois et PVC |
| 10.71A | Fabrication de pain et patisserie fraiche |
| 96.02A | Coiffure |
| 45.20A | Entretien et reparation de vehicules automobiles |
Combining both approaches gives you a reasonably complete picture, though you will always have edge cases — the plumber who incorporated as an SAS and somehow slipped through the artisan registry.
Accessing the Data
INSEE provides SIRENE data through two channels:
Bulk download — Monthly CSV exports of the entire database. We are talking about 12+ million active establishments. The files are large (several GB compressed) but manageable.
API Sirene — A REST API for real-time queries. Rate-limited but useful for targeted lookups.
import requests
# Search for electricians in Marseille (postal code 13001-13016)
response = requests.get(
"https://api.insee.fr/entreprises/sirene/V3.11/siret",
headers={"Authorization": f"Bearer {token}"},
params={
"q": "activitePrincipaleEtablissement:43.21A AND codePostalEtablissement:13*",
"nombre": 100
}
)
For building a directory, the bulk approach makes more sense. You download the full dataset, filter for artisan trades, and build your local database. Then use the API for incremental updates.
The Enrichment Challenge
Raw SIRENE data tells you what a business does and where it is. It does not tell you:
- Whether they are still actively working. A business can be registered but dormant.
- Their actual service area. A plumber registered in Aix-en-Provence might serve the entire department.
- Contact details. SIRENE does not include phone numbers, email, or websites.
- Reviews or reputation. No quality signal whatsoever.
This is where building a useful directory diverges from just repackaging open data. We addressed these gaps through several strategies:
Web presence detection. We run automated searches to find business websites, Google Business profiles, and social media pages. A business with an active web presence is almost certainly still operating.
Geographic service areas. For rural artisans, we estimate service areas based on local population density. A roofer in a small village likely serves a 30-40 km radius. One in central Lyon might cover just the city.
Cross-referencing with CMA data. Some Chambres de Metiers publish their own directories with additional information. Where available, we use these to enrich our records.
Building the Search Experience
The core user journey is simple: someone needs a plumber in Gap, or a baker near Manosque, or an electrician in Aix-en-Provence. The search needs to be:
Trade-aware. Users think in terms of "plumber" or "electrician", not NAF codes. We built a mapping layer that translates common trade names (including regional variations and slang) to the appropriate activity codes.
Geographically smart. Searching for artisans "near Avignon" should include nearby communes like Villeneuve-les-Avignon (technically in a different department) and Le Pontet.
Transparent about data freshness. We display when each record was last verified and flag businesses that may have ceased activity.
What We Learned About French Artisan Data
The artisan economy is hyperlocal. 85% of artisan businesses have fewer than 5 employees. They serve their immediate area and rely heavily on word-of-mouth. A directory adds value precisely because it makes these small, local businesses discoverable to people outside their usual network.
Data quality varies by region. Some departments have well-maintained CMA databases with rich information. Others have minimal data beyond what SIRENE provides. Rural areas tend to have less digital presence, making enrichment harder.
Seasonal patterns matter. Construction artisans (roofers, masons, painters) are heavily seasonal. Displaying availability or at least seasonal context helps users set realistic expectations.
The "artisan" label carries weight. In France, calling yourself an artisan is legally protected. It requires specific qualifications (usually a CAP/BEP or 3 years of experience). This built-in quality signal is one of the strengths of using official data.
Open Data as a Foundation, Not a Product
The biggest takeaway from this project is that open data is a starting point, not an end product. SIRENE gives you a solid foundation — verified, comprehensive, regularly updated. But transforming it into something people actually want to use requires significant effort in enrichment, UX, and ongoing maintenance.
If you are considering building on French open data, SIRENE is one of the best datasets to work with. The documentation is solid, the API is well-designed, and the community of developers using it is active. Just be prepared for the gap between raw data and a finished product — that is where most of the work lives.
The full SIRENE dataset is available at sirene.fr and through the INSEE API portal. Documentation for the API is at api.insee.fr.
Top comments (0)