DEV Community

SIKOUTRIS
SIKOUTRIS

Posted on • Originally published at mysoiltype.com

Parsing USDA Soil Survey Data: Building MySoilType.com

Parsing USDA Soil Survey Data: Building MySoilType.com

I've always been fascinated by soil—not in a "I want to be a geologist" way, but in the "why does my neighbor's garden thrive while mine fails?" way.

Turns out, the answer is usually in the soil data. The US government has incredibly detailed soil maps for every square foot of the country. I built MySoilType to make that data accessible to regular people.

The USDA SSURGO Database

The USDA's Soil Survey Geographic Database (SSURGO) is a goldmine:

  • Soil type classifications
  • Water retention capacity
  • Drainage characteristics
  • pH levels
  • Suitable crops and land uses

But here's the catch: it's stored as massive GIS shapefiles and text files. Not user-friendly.

Step 1: Getting the Data

The USDA publishes SSURGO as:

  1. Shapefiles (GIS format) for soil polygons
  2. CSV files with soil attributes
  3. County-by-county downloads

My first attempt was downloading all 50 states—400GB+ of data. Not practical for a web app.

Instead, I opted for on-demand fetching via the USDA's Web Soil Survey API:

const fetchSoilData = async (latitude, longitude) => {
  const response = await fetch(
    `https://sdmdataaccess.nrcs.usda.gov/Tabular/SDMTabularService.asmx`,
    {
      method: 'POST',
      body: `SELECT mukey, muname FROM mapunit WHERE ST_Intersects(geom, ST_Point(${longitude}, ${latitude}))`
    }
  );
  return parseXML(response);
};
Enter fullscreen mode Exit fullscreen mode

This query-based approach let me fetch only the soil data for the user's exact location.

Step 2: Geocoding User Input

Users don't think in coordinates—they think "Portland, Oregon" or "near this road."

I integrated Google's Geocoding API:

const geocodeAddress = async (address) => {
  const response = await fetch(
    `https://maps.googleapis.com/maps/api/geocode/json`,
    { params: { address, key: GOOGLE_API_KEY } }
  );
  return response.results[0].geometry.location; // { lat, lng }
};
Enter fullscreen mode Exit fullscreen mode

Once I had coordinates, I could fetch soil data.

Step 3: Parsing XML

The USDA returns nested, verbose XML. I parsed it into something useful:

function parseSoilXML(xml) {
  const parser = new DOMParser();
  const doc = parser.parseFromString(xml, 'text/xml');
  return {
    soilName: doc.querySelector('muname')?.textContent,
    mapUnit: doc.querySelector('mukey')?.textContent,
    drainageClass: doc.querySelector('drclassdcd')?.textContent,
    waterHoldingCapacity: doc.querySelector('whc')?.textContent
  };
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Enriching with Interpretations

Raw soil data is useless to a gardener. "Loamy silt clay" means nothing without context.

I added human-readable interpretations:

const soilInterpretations = {
  'loamy_silt_clay': {
    description: 'Loam with silt and clay',
    drainageRating: 'Moderately well drained',
    waterRetention: 'High—retains moisture well',
    bestFor: ['vegetables', 'perennials'],
    amendments: 'Add compost for structure',
    phRange: '6.0-7.5',
    recommendations: [
      'Works well for most vegetables',
      'May compact if worked when wet',
      'Good water retention in dry seasons'
    ]
  }
};
Enter fullscreen mode Exit fullscreen mode

Now when users search their address, they get actionable advice, not taxonomy codes.

The Performance Challenge

Geocoding + USDA API calls are slow. Full roundtrip was 3-4 seconds.

Optimizations:

  1. Caching: Store coordinates and soil queries for 24h
  2. Async rendering: Show map immediately, load soil data in background
  3. Precomputed data: Pre-cache top 100 cities
const cachedSoilData = new Map();
const getSoilData = async (coords) => {
  const key = `${coords.lat.toFixed(4)},${coords.lng.toFixed(4)}`;
  if (cachedSoilData.has(key)) return cachedSoilData.get(key).data;
  const data = await fetchFromUSDA(coords);
  cachedSoilData.set(key, { data, timestamp: Date.now() });
  return data;
};
Enter fullscreen mode Exit fullscreen mode

What Surprised Me

  1. The data is shockingly accurate. Two addresses 100m apart show different soil types
  2. People care about their soil. Expected gardeners; got farmers, homeowners, and consultants
  3. USDA API docs are sparse. Took days to figure out the correct XML format

Lessons Learned

  1. Government data is free but clunky. Expect verbose formats and slow APIs
  2. Interpretation matters more than raw data. People need context, not codes
  3. Caching is essential for third-party API integration
  4. Map interfaces are powerful. Click-on-map was more intuitive than searching

Head to MySoilType, enter your address, and discover what's beneath your feet.

Top comments (0)