Sam

Posted on Jun 13

Leveraging PostGIS to Write And Read FlatGeobuf Files

#python #sql #geospatial #flatgeobuf

Post adapted from https://www.openstreetmap.org/user/spwoodcock/diary/402948

Flatgeobuf in Python

Flatgeobuf is an excellent replacement for shapefile, particularly for geospatial data on the web.

With web/javascript being the main target, currently there is no official implementation in Python to read/write data.

Instead devs should most likely use the GDAL Python bindings.

To GDAL or not to GDAL

GDAL is an incredible geospatial library and underpins so much of what we do, including our databases (PostGIS).

However, sometimes it might be a bit heavyweight for what we are trying to achieve.

Installing it as a base system dependency inevitably installs everything - there are no options.

Install size is especially important when building container images, that we want to be as small as possible for distribution.

GDAL in PostGIS

PostGIS uses GDAL for most of it's geospatial processing, including reading and writing various geospatial file formats.

When developing a web application in Python, including a database, there is a good chance you are using Postgis already as part of your software stack.

So today I thought: why not just use the geospatial processing built into PostGIS for reading and writing flatgeobuf data?

This would save having to install GDAL alongside my Python API and reduce container image size significantly.

The solution wasn't super simple, but works quite nicely for my use case!

Database Access

First we need a way to access the database.

An example using SQLAlchemy could be:

from sqlalchemy.engine import create_engine
from sqlalchemy.orm import DeclarativeBase, Session

def get_engine(db: Union[str, Session]):
    """Get engine from existing Session, or connection string.
    If `db` is a connection string, a new engine is generated.
    """
    if isinstance(db, Session):
        return db.get_bind()
    elif isinstance(db, str):
        return create_engine(db)
    else:
        msg = "The `db` variable is not a valid string or Session"
        log.error(msg)
        raise ValueError(msg)

This example allows for an existing connection to be re-used from an endpoint, for example a FastAPI dependency.

The nitty-gritty SQL


from geojson import FeatureCollection
from sqlalchemy.orm import Session

def geojson_to_flatgeobuf(db: Session, geojson: FeatureCollection):
    """From a given FeatureCollection, return a memory flatgeobuf obj."""
    sql = f"""
        DROP TABLE IF EXISTS public.temp_features CASCADE;
        CREATE TABLE IF NOT EXISTS public.temp_features(
            id serial PRIMARY KEY,
            geom geometry
        );
        WITH data AS (SELECT '{geojson}'::json AS fc)
        INSERT INTO public.temp_features (geom)
        SELECT
            ST_AsText(ST_GeomFromGeoJSON(feat->>'geometry')) AS geom
        FROM (
            SELECT json_array_elements(fc->'features') AS feat
            FROM data
        ) AS f;
        WITH thegeom AS
        (SELECT * FROM public.temp_features)
        SELECT ST_AsFlatGeobuf(thegeom.*)
        FROM thegeom;
    """
    # Run the SQL
    result = db.execute(text(sql))
    # Get a memoryview object, then extract to Bytes
    flatgeobuf = result.fetchone()[0].tobytes()
    # Cleanup table
    db.execute(text("DROP TABLE IF EXISTS public.temp_features CASCADE;"))
    return flatgeobuf

The function requires a FeatureCollection geojson.

Now I'm sure this is a much more efficient way to write this by nesting SQL SELECTs, but I was too lazy to debug and I find this approach quite readable, albeit slightly less efficient.

Using the code

An example of using in FastAPI:

import geojson
from app.db.postgis_utils import geojson_to_flatgeobuf
from io import BytesIO

@router.post("/something")
async def something(
    upload_geojson: UploadFile = File(...),
    db: Session = Depends(database.get_db),
):
    json_obj = await upload_geojson.read()
    parsed_geojson = geojson.loads(json_obj)
    flatgeobuf = BytesIO(geojson_to_flatgeobuf(db, parsed_geojson))
    *do something with flatgeobuf file*

Limitations

There is one glaringly obvious limitation of this approach: if reading the FlatGeobuf is implemented in the same way then we lose the benefit of it's 'cloud native' encoding.

Reading requires downloading the entire file, passing to PostGIS, and returning a GeoJSON.

However, that was not the intended purpose of this workaround.

FlatGeobuf is primarily a format meant for browser consumption. With excellent support via the npm package.

So while the backend API can write data to FlatGeobuf without requiring dependencies, the frontend can then read the data if it's hosted somewhere online (i.e. an S3 bucket).

Reading The Data Again in Python

In some cases you may wish to access all of the data, say to convert to a different format.

This is also possible directly in the database.
I ended up writing the reverse query flatgeobuf --> geojson:

async def flatgeobuf_to_geojson( db: Session, flatgeobuf: bytes ) -> Optional[geojson.FeatureCollection]:
    """Converts FlatGeobuf data to GeoJSON.

    Args:
        db (Session): SQLAlchemy db session.
        flatgeobuf (bytes): FlatGeobuf data in bytes format.

    Returns:
        geojson.FeatureCollection: A FeatureCollection object.
    """
    sql = text(
        """
        DROP TABLE IF EXISTS public.temp_fgb CASCADE;

        SELECT ST_FromFlatGeobufToTable('public', 'temp_fgb', :fgb_bytes);

        SELECT jsonb_build_object(
            'type', 'FeatureCollection',
            'features', jsonb_agg(feature)
        ) AS feature_collection
        FROM (
            SELECT jsonb_build_object(
                'type', 'Feature',
                'geometry', ST_AsGeoJSON(fgb_data.geom)::jsonb,
                'properties', fgb_data.properties::jsonb
            ) AS feature
            FROM (
            SELECT *,
                NULL as properties
                FROM ST_FromFlatGeobuf(null::temp_fgb, :fgb_bytes)
            ) AS fgb_data
        ) AS features;
    """
    )

    try:
        result = db.execute(sql, {"fgb_bytes": flatgeobuf})
        feature_collection = result.first()
    except ProgrammingError as e:
        log.error(e)
        log.error(
            "Attempted flatgeobuf --> geojson conversion, but duplicate column found"
        )
        return None

    if feature_collection:
        return geojson.loads(json.dumps(feature_collection[0]))

    return None

There are two steps required.

First a table must be created with fields representing the field types in the flatgeobuf.

Then the data is extracted from the file, using the table type as reference.

This wasn’t very intuitive to me & the PostGIS docs are really lacking here, so I hope this helps someone!

DEV Community

Leveraging PostGIS to Write And Read FlatGeobuf Files

Flatgeobuf in Python

To GDAL or not to GDAL

GDAL in PostGIS

Database Access

The nitty-gritty SQL

Using the code

Limitations

Reading The Data Again in Python

Top comments (0)

Read next

Movie Finder

10 Essential Questions to Ask When Starting with NumPy Data Manipulation

How to Deploy your FastAPI Backend with PostgreSQL Database to Render

Use Gemini Pro Asynchronously in Python