Building a Nursery Search Platform for French Parents: Geocoding Meets Childcare
Finding the right childcare facility is one of the biggest challenges French families face. Between municipal crèches, private nurseries, and family day-care providers, the options are overwhelming—and fragmented across dozens of regional databases.
We decided to solve this by building a unified search platform that aggregates nursery data from official French government sources and makes it instantly searchable by location, capacity, and availability.
The Problem We Solved
Parents spent hours calling municipalities, checking outdated websites, and piecing together information from multiple sources. There was no single source of truth for nursery availability across France.
The Ministry of Solidarity maintains open datasets of accredited childcare facilities, but they're scattered across regional portals and not easily consumable by regular users. We saw an opportunity.
Architecture: From Data to Discovery
1. Data Sourcing
We tapped into France's open data ecosystem:
- INSEE (Institut National de la Statistique) — municipality boundaries and postal codes
- Ministère des Solidarités — official nursery registry with accreditation status
- Regional health agencies (ARS) — capacity, operating hours, and specializations
All data is public, licensed under ODbL, and refreshed quarterly.
2. Geocoding Strategy
The core challenge: matching user searches ("nurseries near my home") to actual facility locations.
We implemented a two-tier geocoding system:
Tier 1: Address-level geocoding (Google Maps API)
- User enters their address
- We geocode it to exact latitude/longitude
- Radius search returns facilities within configurable distance (500m to 5km)
Tier 2: Municipality-level fallback (INSEE data)
- If user only provides postal code, we identify their municipality
- We return all facilities in that municipality + adjacent ones
- Useful for users who don't want to share exact address
// Simplified example
const findNearbyNurseries = async (userLat, userLng, radiusKm = 1) => {
const facilities = await db.nurseries.findRaw({
location: {
$near: { type: "Point", coordinates: [userLng, userLat] },
$maxDistance: radiusKm * 1000 // convert km to meters
}
});
return facilities;
};
3. Data Normalization
Government datasets use inconsistent formats:
- Accreditation types differ by region (crèche collective, micro-crèche, crèche familiale, crèche d'entreprise)
- Capacity reported in different units (child-places vs. staff ratios)
- Operating hours span multiple date formats
We built a normalization layer that:
- Maps regional accreditation types to a standard taxonomy
- Converts all capacities to "max children under 3" and "max children 3-6"
- Standardizes operating hours (Monday-Friday, holiday closures)
- Flags missing or incomplete data
// Normalization example
const normalizeNursery = (rawData) => {
return {
id: rawData.siret,
name: rawData.nom.trim(),
type: mapAccreditationType(rawData.type_agrement),
capacity: {
under3: extractCapacity(rawData, 'moins_3_ans'),
age3to6: extractCapacity(rawData, '3_6_ans')
},
hours: parseOperatingHours(rawData.horaires),
location: {
type: 'Point',
coordinates: [rawData.lon, rawData.lat]
}
};
};
Technical Decisions
Database Choice
We use MongoDB with geospatial indexes because:
-
Geospatial queries are fast —
$nearoperator on 2dsphere index handles radius searches in milliseconds - Schema flexibility — facilities have varying attributes; MongoDB's document model accommodates this
- Scaling — sharding by region is straightforward
Caching Layer
Geocoding requests are expensive. We implemented Redis caching:
- Cache user address → coordinates (24-hour TTL)
- Cache municipality → nearby facilities (7-day TTL)
- Cache facility details (30-day TTL, invalidate on data refresh)
This reduced API costs by 70%.
Frontend Search Experience
We prioritize instant feedback:
- User types address → debounced geocoding
- Results update in real-time as they type
- Results show distance, capacity status, and next availability
- One-click comparison of multiple facilities
Challenges We Overcame
1. Data Quality Issues
Problem: ~15% of facilities had no valid coordinates or incomplete capacity data.
Solution:
- Manual geocoding for facilities with address-only data (120 hours of work)
- Contacted regions for missing capacity info
- Built a public contribution form for parents to flag outdated information
2. Regional Variations
Problem: Each region (CAF) has different accreditation processes, funding formulas, and waiting list systems.
Solution:
- Created region-specific help pages
- Partner with CAF offices to provide accurate local guidance
- Community moderators in each region
3. Maintenance Burden
Problem: Government data changes quarterly; we can't manually update 8,000+ facilities.
Solution:
- Automated data fetch from official APIs every 3 months
- Diff detection to flag new/closed facilities
- Email alerts to affected users ("Your saved nursery is now closed")
Results
- 8,200+ facilities indexed across France
- Avg search time: 150ms (including geocoding)
- User feedback: 92% found their search results relevant
What We Learned
- Government data is a goldmine, but cleaning it is 60% of the work
- Geocoding accuracy matters — even 50m errors frustrate users
- Community contributions beat perfectionism — let users flag errors rather than trying to get 100% accuracy upfront
If you're building location-based services in Europe, France's open data ecosystem is better organized than most. Start there.
Want to try it? Visit trouver-creche.fr to search nurseries in your area.
Have you built location-based services with government data? Share your approach in the comments—I'd love to hear what worked for you.
Top comments (0)