The French government's Base Adresse Nationale (BAN) contains 26 million addresses — every street, every house number, every hamlet across mainland France and overseas territories. We built GEOREFER to make this data accessible through a single REST API, combined with company lookup from the SIRENE database.
This is the technical story of how we did it.
The Problem: Fragmented French Geographic Data
If you're building a FinTech product in France, you need to validate customer addresses for KYC compliance. Sounds simple, right?
Here's what the landscape looks like in 2026:
- API Adresse (BAN) — Free, but no SLA, rate-limited to 50 req/s, and no company data
- La Poste RNVP — The gold standard for postal validation, but no public REST API
- Google Address Validation — Global coverage but $0.005/request adds up fast, and no SIRENE integration
- INSEE API SIRENE — Company data, but separate authentication, slow responses (~500ms), and no address validation
To do proper KYC, you need at least two of these APIs, with different auth mechanisms, different response formats, and different rate limits.
We decided to build one API that does it all.
Architecture Overview
GEOREFER is built on a straightforward Java stack:
Java 11 + Spring Boot 2.7.5
PostgreSQL 16 (42M+ rows across 12 tables)
Redis 7 (API key cache, TTL 5min)
Elasticsearch 7.17 (city autocomplete, fuzzy search)
The architecture follows a clean layered approach:
REST Controllers (17 controllers, 39 endpoints)
|
Business Services (12 interfaces, 16 implementations)
|
Repositories (JPA + Elasticsearch)
|
PostgreSQL + Redis + Elasticsearch
Importing 26M Addresses from the BAN
The BAN publishes its data as CSV files, updated monthly. The full dataset is around 3.5 GB compressed.
Our import strategy:
- Download the latest BAN CSV export
- Parse with streaming CSV reader (no full file in memory)
- Batch insert using JDBC batch operations (batch size = 5000)
- Index city data into Elasticsearch for autocomplete
The key challenge was handling the French administrative hierarchy:
Region (18) → Department (101) → Commune (35,000+) → Address (26M)
Each commune has an INSEE code (5 digits), one or more postal codes, and belongs to exactly one department. Paris, Lyon, and Marseille have arrondissements that function as sub-communes with their own INSEE codes.
We store communes in a french_town_desc table with full hierarchy:
SELECT f.name, f.insee_code, f.postal_code,
d.name as department, r.name as region
FROM georefer.french_town_desc f
JOIN georefer.department d ON f.department_code = d.code
JOIN georefer.region r ON d.region_code = r.code
WHERE f.name ILIKE 'paris%'
Address Validation with GeoTrust Scoring
The core feature is POST /addresses/validate. You send a French address, and we return:
- Confidence score (0-100) — how sure we are the address exists
- GeoTrust Score (0-100) — composite reliability score for KYC
- Validated address — normalized, corrected, with GPS coordinates
- AFNOR format — postal-standard NF Z 10-011 formatting
The GeoTrust Score is a weighted composite:
| Component | Weight | What it measures |
|---|---|---|
| Confidence | 35% | Street-level address matching |
| Geo Consistency | 25% | Cross-validation: postal code vs commune vs department |
| Postal Match | 20% | Postal code precision (exact, partial, invalid) |
| Country Risk | 20% | FATF/GAFI country risk rating |
curl -X POST 'https://georefer.io/geographical_repository/v1/addresses/validate' \
-H 'Content-Type: application/json' \
-H 'X-Georefer-API-Key: YOUR_API_KEY' \
-d '{
"street_line": "15 Rue de la Paix",
"postal_code": "75002",
"city": "Paris",
"country_code": "FR"
}'
Response:
{
"success": true,
"data": {
"validated_address": {
"street_line": "15 Rue de la Paix",
"postal_code": "75002",
"city": "PARIS",
"country": "France"
},
"confidence_score": 95,
"geotrust_score": {
"overall": 92,
"level": "LOW",
"components": {
"confidence": 95,
"geo_consistency": 100,
"postal_match": 100,
"country_risk": 0
}
}
}
}
Elasticsearch for City Autocomplete
City autocomplete needs to be fast — under 50ms for a good UX. We use Elasticsearch's Completion Suggester with a custom analyzer:
city_analyzer: edge_ngram (min=2, max=15) + ascii_folding
city_search_analyzer: standard + ascii_folding
The ASCII folding is critical for French cities. Users type "Beziers" but the official name is "Beziers". Our analyzer handles both.
The GET /cities/autocomplete?q=marseil&limit=5 endpoint returns results in under 50ms, even with 35,000+ communes indexed.
We also support fuzzy search with GET /cities/search?q=Monplier — using Elasticsearch's fuzziness AUTO parameter, this correctly returns "Montpellier" despite the typos.
Multi-Tenant API Keys & Rate Limiting
GEOREFER is a SaaS with 5 subscription plans:
| Plan | Daily Limit | Rate/min | Price |
|---|---|---|---|
| DEMO | 50 | 10 | Free |
| FREE | 100 | 10 | Free |
| STARTER | 5,000 | 30 | 49 EUR/mo |
| PRO | 50,000 | 60 | 199 EUR/mo |
| ENTERPRISE | Unlimited | 200 | Custom |
Each API key gets its own token bucket (Bucket4j) for rate limiting. Authentication goes through a Spring filter chain:
Request → API Key validation (Redis cache) → Quota check → Rate limit → Feature gate → Controller
The Feature Gate controls which endpoints each plan can access. For example, company search (/companies) requires PRO or higher, while city search is available on all plans.
What's Next
We're currently at 16.8 million SIRENE establishments imported and 35,000+ communes indexed. The API handles 39 endpoints across geographic data, address validation, company search, and admin/billing.
If you're building anything that touches French addresses or company data, give it a try:
- Free tier: 100 requests/day, no credit card required
- Docs: https://georefer.io/docs
- Sign up: https://georefer.io/#signup
- Examples: https://github.com/azmoris-group/georefer-examples
In the next article, we'll deep-dive into how we query 16.8M SIRENE establishments in 66ms using PostgreSQL trigram indexes.
AZMORIS Engineering — "Software that Endures"
Top comments (0)