DEV Community

SIKOUTRIS
SIKOUTRIS

Posted on

Building a Nursery Search Platform for French Parents: Geocoding Meets Childcare

Building a Nursery Search Platform for French Parents: Geocoding Meets Childcare

Finding the right childcare facility is one of the biggest challenges French families face. Between municipal crèches, private nurseries, and family day-care providers, the options are overwhelming—and fragmented across dozens of regional databases.

We decided to solve this by building a unified search platform that aggregates nursery data from official French government sources and makes it instantly searchable by location, capacity, and availability.

The Problem We Solved

Parents spent hours calling municipalities, checking outdated websites, and piecing together information from multiple sources. There was no single source of truth for nursery availability across France.

The Ministry of Solidarity maintains open datasets of accredited childcare facilities, but they're scattered across regional portals and not easily consumable by regular users. We saw an opportunity.

Architecture: From Data to Discovery

1. Data Sourcing

We tapped into France's open data ecosystem:

  • INSEE (Institut National de la Statistique) — municipality boundaries and postal codes
  • Ministère des Solidarités — official nursery registry with accreditation status
  • Regional health agencies (ARS) — capacity, operating hours, and specializations

All data is public, licensed under ODbL, and refreshed quarterly.

2. Geocoding Strategy

The core challenge: matching user searches ("nurseries near my home") to actual facility locations.

We implemented a two-tier geocoding system:

Tier 1: Address-level geocoding (Google Maps API)

  • User enters their address
  • We geocode it to exact latitude/longitude
  • Radius search returns facilities within configurable distance (500m to 5km)

Tier 2: Municipality-level fallback (INSEE data)

  • If user only provides postal code, we identify their municipality
  • We return all facilities in that municipality + adjacent ones
  • Useful for users who don't want to share exact address
// Simplified example
const findNearbyNurseries = async (userLat, userLng, radiusKm = 1) => {
  const facilities = await db.nurseries.findRaw({
    location: {
      $near: { type: "Point", coordinates: [userLng, userLat] },
      $maxDistance: radiusKm * 1000 // convert km to meters
    }
  });
  return facilities;
};
Enter fullscreen mode Exit fullscreen mode

3. Data Normalization

Government datasets use inconsistent formats:

  • Accreditation types differ by region (crèche collective, micro-crèche, crèche familiale, crèche d'entreprise)
  • Capacity reported in different units (child-places vs. staff ratios)
  • Operating hours span multiple date formats

We built a normalization layer that:

  1. Maps regional accreditation types to a standard taxonomy
  2. Converts all capacities to "max children under 3" and "max children 3-6"
  3. Standardizes operating hours (Monday-Friday, holiday closures)
  4. Flags missing or incomplete data
// Normalization example
const normalizeNursery = (rawData) => {
  return {
    id: rawData.siret,
    name: rawData.nom.trim(),
    type: mapAccreditationType(rawData.type_agrement),
    capacity: {
      under3: extractCapacity(rawData, 'moins_3_ans'),
      age3to6: extractCapacity(rawData, '3_6_ans')
    },
    hours: parseOperatingHours(rawData.horaires),
    location: {
      type: 'Point',
      coordinates: [rawData.lon, rawData.lat]
    }
  };
};
Enter fullscreen mode Exit fullscreen mode

Technical Decisions

Database Choice

We use MongoDB with geospatial indexes because:

  1. Geospatial queries are fast$near operator on 2dsphere index handles radius searches in milliseconds
  2. Schema flexibility — facilities have varying attributes; MongoDB's document model accommodates this
  3. Scaling — sharding by region is straightforward

Caching Layer

Geocoding requests are expensive. We implemented Redis caching:

  • Cache user address → coordinates (24-hour TTL)
  • Cache municipality → nearby facilities (7-day TTL)
  • Cache facility details (30-day TTL, invalidate on data refresh)

This reduced API costs by 70%.

Frontend Search Experience

We prioritize instant feedback:

  1. User types address → debounced geocoding
  2. Results update in real-time as they type
  3. Results show distance, capacity status, and next availability
  4. One-click comparison of multiple facilities

Challenges We Overcame

1. Data Quality Issues

Problem: ~15% of facilities had no valid coordinates or incomplete capacity data.

Solution:

  • Manual geocoding for facilities with address-only data (120 hours of work)
  • Contacted regions for missing capacity info
  • Built a public contribution form for parents to flag outdated information

2. Regional Variations

Problem: Each region (CAF) has different accreditation processes, funding formulas, and waiting list systems.

Solution:

  • Created region-specific help pages
  • Partner with CAF offices to provide accurate local guidance
  • Community moderators in each region

3. Maintenance Burden

Problem: Government data changes quarterly; we can't manually update 8,000+ facilities.

Solution:

  • Automated data fetch from official APIs every 3 months
  • Diff detection to flag new/closed facilities
  • Email alerts to affected users ("Your saved nursery is now closed")

Results

  • 8,200+ facilities indexed across France
  • Avg search time: 150ms (including geocoding)
  • User feedback: 92% found their search results relevant

What We Learned

  1. Government data is a goldmine, but cleaning it is 60% of the work
  2. Geocoding accuracy matters — even 50m errors frustrate users
  3. Community contributions beat perfectionism — let users flag errors rather than trying to get 100% accuracy upfront

If you're building location-based services in Europe, France's open data ecosystem is better organized than most. Start there.

Want to try it? Visit trouver-creche.fr to search nurseries in your area.


Have you built location-based services with government data? Share your approach in the comments—I'd love to hear what worked for you.

Top comments (0)