DEV Community

Cover image for PostGIS Geometry Quality: Invalid Geometries, Mixed SRIDs, and Complexity
Philip McClarence
Philip McClarence

Posted on

PostGIS Geometry Quality: Invalid Geometries, Mixed SRIDs, and Complexity

Why Your PostGIS Spatial Queries Return Wrong Results (and How to Fix It)

PostGIS spatial functions assume their input geometries are valid. When they're not, things go wrong quietly. ST_Intersection returns NULL instead of a result. ST_Area reports negative values. Spatial joins silently drop matching rows. You don't get an error -- you get incomplete or wrong results that look plausible.

This article covers the three geometry quality problems that cause these failures and how to fix each one.

Problem 1: Invalid Geometries

A geometry is "invalid" under the OGC Simple Features specification when it has structural defects: self-intersecting polygon rings, duplicate consecutive points, or unclosed rings. These defects cause spatial operations to behave unpredictably.

The insidious part is that different PostGIS functions fail differently on invalid input. ST_Contains might return FALSE for a point that's clearly inside a polygon. ST_Union might throw a TopologyException. ST_Buffer might return an empty geometry. You often don't discover the problem until a specific query hits a specific invalid geometry.

Find them:

-- Count invalid geometries
SELECT count(*) AS invalid_count
FROM points_of_interest
WHERE NOT ST_IsValid(geom);

-- Get the specific reason for each invalid geometry
SELECT id, ST_IsValidReason(geom) AS reason
FROM points_of_interest
WHERE NOT ST_IsValid(geom)
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

ST_IsValidReason gives you actionable diagnostics: "Self-intersection at POINT (12.5 45.3)" tells you exactly what's wrong and where.

Fix them:

UPDATE points_of_interest
SET geom = ST_MakeValid(geom)
WHERE NOT ST_IsValid(geom);
Enter fullscreen mode Exit fullscreen mode

One caveat: ST_MakeValid can change the geometry type. A self-intersecting polygon may become a MULTIPOLYGON or a GEOMETRYCOLLECTION. Run it on a sample first and check the results.

Problem 2: Mixed SRIDs

An SRID (Spatial Reference Identifier) defines the coordinate system for a geometry. SRID 4326 is WGS 84 (latitude/longitude in degrees). SRID 3857 is Web Mercator (meters). If your table contains geometries with different SRIDs, spatial calculations between them produce meaningless results.

The danger: PostGIS doesn't enforce SRID consistency within a column by default. You can INSERT a row with SRID 4326 and another with SRID 3857 into the same column. ST_Distance between them returns a number -- it just means nothing.

Detect:

SELECT DISTINCT ST_SRID(geom) AS srid, count(*)
FROM points_of_interest
GROUP BY ST_SRID(geom);
Enter fullscreen mode Exit fullscreen mode

This should return exactly one row. Multiple rows means mixed coordinate systems.

Fix:

-- Normalize everything to WGS 84
UPDATE points_of_interest
SET geom = ST_Transform(geom, 4326)
WHERE ST_SRID(geom) != 4326;
Enter fullscreen mode Exit fullscreen mode

Problem 3: Overly Complex Geometries

A polygon with 200,000 vertices represents a shape with extreme precision -- often far more than your application needs. Every spatial operation scales with vertex count: ST_Intersects on a 200,000-vertex polygon is orders of magnitude slower than on a 5,000-vertex simplification of the same shape.

These geometries also consume significant memory. A 500,000-vertex polygon can occupy several megabytes in ST_MemSize, and spatial joins involving such geometries dominate query time.

Find them:

SELECT id,
       GeometryType(geom),
       ST_NPoints(geom) AS vertex_count,
       pg_size_pretty(ST_MemSize(geom)) AS memory_size
FROM regions
WHERE ST_NPoints(geom) > 10000
ORDER BY ST_NPoints(geom) DESC
LIMIT 20;
Enter fullscreen mode Exit fullscreen mode

Simplify:

-- Douglas-Peucker simplification
-- 0.001 degrees ≈ 111 meters at the equator for SRID 4326
SELECT id,
       ST_NPoints(geom) AS original_vertices,
       ST_NPoints(ST_Simplify(geom, 0.001)) AS simplified_vertices
FROM regions
WHERE ST_NPoints(geom) > 10000;
Enter fullscreen mode Exit fullscreen mode

If you need to preserve topology (prevent new intersections between adjacent polygons), use ST_SimplifyPreserveTopology instead.

Preventing All Three

The fix order matters: validity first, then SRID normalization, then simplification. But prevention is better than repair.

Add CHECK constraints that reject bad data at INSERT time:

-- Reject invalid geometries
ALTER TABLE points_of_interest
ADD CONSTRAINT enforce_valid_geom CHECK (ST_IsValid(geom));

-- Enforce a single SRID
ALTER TABLE points_of_interest
ADD CONSTRAINT enforce_srid_geom CHECK (ST_SRID(geom) = 4326);
Enter fullscreen mode Exit fullscreen mode

With these constraints in place, any INSERT or UPDATE that stores an invalid geometry or a geometry with the wrong SRID fails immediately with a clear error message. This is drastically better than silently accumulating bad data and finding it months later when a spatial query returns wrong results.

For vertex complexity, add validation in your data ingestion pipeline. Check ST_NPoints on incoming geometries and simplify anything above your threshold before insertion.

Continuous Monitoring

Constraints catch bad data at insert time, but they don't cover data that's already in your tables or data arriving through direct SQL imports, ETL jobs, or third-party integrations. Continuous monitoring fills that gap.

myDBA.dev includes PostGIS-specific health checks that flag invalid geometries, mixed SRIDs, and high-complexity geometries on every collection cycle. The PostGIS monitoring page tracks these metrics over time, making it straightforward to identify whether quality degraded after a specific import or ETL run.


Originally published at mydba.dev/blog/postgis-geometry-quality

Top comments (0)