Hiroaki Yutani

Posted on Jan 24

How it feels to write a GPKG library in 2026 (in Rust!)

#geospatial #rust

There are a lot of existing libraries to deal with GeoPackage (GPKG). GDAL is the undisputed champion of the geospatial world. If your target is web browsers, there is GeoPackage JS. Rust? We have gpkg-rs! But, unfortunately, sometimes you still have to write your own one for various reasons.

In my case, the main reason was WebAssembly support. I'm building a browser-side converter for Japanese geospatial data, which converts Shapefiles to GeoParquet and GeoJSON. To add GPKG to that list, I needed some Rust crate for this, but, gpkg-rs doesn't support Wasm.

repository: https://github.com/yutannihilation/ksj2gp

To be fair, it's not gpkg-rs's fault. Only last year did rusqlite, the underlying crate for handling SQLite, gain Wasm support, while gpkg-rs was implemented several years ago.

If I only wanted Wasm support, the easiest route would have been to fork gpkg-rs and update its rusqlite dependency. However, I had two additional ambitions that convinced me to build a new crate from the ground up: rusqlite-gpkg.

Support 3D geometries by using geo-traits instead of geo-types
Write spatial index

You can check out the repository here:

yutannihilation / rusqlite-gpkg

GeoPackage reader/writer built on top of rusqlite.

rusqlite-gpkg

GeoPackage reader/writer built on top of rusqlite.

Overview

rusqlite-gpkg provides a small API around the main GeoPackage concepts:

Gpkg represents the whole data of GeoPackage data.
GpkgLayer represents a single layer in the data.
GpkgFeature represents a single feature in the layer.
Value represents a single property value related to the feature.

Apache Arrow support is available behind the arrow feature flag You can find some example codes in the bottom of this README.

The library focuses on simple, explicit flows. You control how layers are created and which property columns are present.

Browser usage (to_bytes / from_bytes)

Web environments often cannot access files directly (OPFS can be used by rusqlite, but this crate does not currently expose a way to enable it). In those cases, the recommended workflow is to serialize a GeoPackage to bytes.

use rusqlite_gpkg::Gpkg;
let gpkg = Gpkg::open_in_memory

…

View on GitHub

Enough with the backstory. Let’s get into the heart of the matter: What does it actually feel like to write a GeoPackage library in 2026?

(Disclaimer: this post discusses GPKG as a vector data format. I have not yet explored how it works as a raster data format.)

The Age of LLMs

No "writing a library in 2026" post can skip mentioning AI. I actually used Codex a lot, especially for adding tests and documentation. If I were an LLM expert, I would have inserted lengthy advice here. But, it is fortunate that I'm so poor at LLM that we can skip this boring part and save some time. :)

GPKG Is SQLite

Did you know GPKG is built on top of SQLite? More specifically, did you know a .gpkg file is just a plain SQLite database? There is no magic. If you read the specification, you can find GPKG is designed to be "portable" in the sense that it only requires SQLite to read the data.

Unlike other geospatial formats like GeoParquet, GPKS's specification doesn't define a binary file format (except for the geometry column). Instead, it defines SQLs to define the tables and triggers.

There are pros and cons.On the positive side, we don't need to implement a parser for the binary! We can just rely on some existing SQLite client/library. If you have implemented a parser for some binary format, you should realize how much this makes the code easier.

However, this also means we need an SQLite library as a dependency. At least, I cannot imagine writing a full SQLite reader (and a writer) from scratch. This makes build harder especially in case of Rust; Turso is a promising pure-Rust implementation of SQLite, but it's not ready at the moment. As explained above, rusqlite is ready to use. But, as it depends on libsqlite and needs to bundle it in many cases.

Writer

Reading a GPKG file can be done without knowing geospatial. I mean, the data needs to be interpreted by some other libraries that knows geospatial, but the reader can just extract and pass the WKB binary and/or some metadata like SRID.

Writing data into a GPKG file is, however, a bit complicated. The writer cannot be ignorant about geospatial, at least at two points.

SRID and WKT

gpkg_spatial_ref_sys table contains the information about the CRSs used by the layers. The specification says these columns are mandatory:

srs_id: Unique identifier for each Spatial Reference System within a GeoPackage
definition: Well-known Text Representation of the Spatial Reference System

So, at least, the writer needs to know the SRID and the WKT representation of the CRS of the data the writer is trying to write. This is not easy. In practice, this actually requires PROJ dependency if the writer want to support arbitrary SRIDs.

For comparison, take an example of GeoArrow's specification; GeoArrow allows various types of CRS. This is easy for writers. Of course, on the other hand, this imposes the responsibility of interpreting various CRS representation to the reader's side. But, the reader still be able to pass the CRS representation without interpreting. I think this is better strategy.

geoarrow.org

`ST_` Functions

A GPKG file includes spatial indices built on SQLite’s R-tree extension. The table definitions contain triggers that update the corresponding index. Here's an example:

CREATE TABLE IF NOT EXISTS "points"(
  "id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
  "geom" POINT,
  "name" TEXT,
  "elevation" REAL,
  "active" BOOLEAN,
  "category" TEXT,
  "note" TEXT
);
CREATE TRIGGER "rtree_points_geom_insert" AFTER INSERT ON "points" WHEN (new."geom" NOT NULL AND NOT ST_IsEmpty(NEW."geom")) BEGIN INSERT OR REPLACE INTO "rtree_points_geom" VALUES (NEW."id",ST_MinX(NEW."geom"), ST_MaxX(NEW."geom"),ST_MinY(NEW."geom"), ST_MaxY(NEW."geom")); END;
CREATE TRIGGER "rtree_points_geom_update6" AFTER UPDATE OF "geom" ON "points" WHEN OLD."id" = NEW."id" AND (NEW."geom" NOTNULL AND NOT ST_IsEmpty(NEW."geom")) AND (OLD."geom" NOTNULL AND NOT ST_IsEmpty(OLD."geom")) BEGIN UPDATE "rtree_points_geom" SET minx = ST_MinX(NEW."geom"), maxx = ST_MaxX(NEW."geom"),miny = ST_MinY(NEW."geom"), maxy = ST_MaxY(NEW."geom") WHERE id = NEW."id";END;
CREATE TRIGGER "rtree_points_geom_update7" AFTER UPDATE OF "geom" ON "points" WHEN OLD."id" = NEW."id" AND (NEW."geom" NOTNULL AND NOT ST_IsEmpty(NEW."geom")) AND (OLD."geom" ISNULL OR ST_IsEmpty(OLD."geom")) BEGIN INSERT INTO "rtree_points_geom" VALUES (NEW."id",ST_MinX(NEW."geom"), ST_MaxX(NEW."geom"),ST_MinY(NEW."geom"), ST_MaxY(NEW."geom")); END;
CREATE TRIGGER "rtree_points_geom_update2" AFTER UPDATE OF "geom" ON "points" WHEN OLD."id" = NEW."id" AND (NEW."geom" ISNULL OR ST_IsEmpty(NEW."geom")) BEGIN DELETE FROM "rtree_points_geom" WHERE id = OLD."id"; END;
CREATE TRIGGER "rtree_points_geom_update5" AFTER UPDATE ON "points" WHEN OLD."id" != NEW."id" AND (NEW."geom" NOTNULL AND NOT ST_IsEmpty(NEW."geom")) BEGIN DELETE FROM "rtree_points_geom" WHERE id = OLD."id"; INSERT OR REPLACE INTO "rtree_points_geom" VALUES (NEW."id",ST_MinX(NEW."geom"), ST_MaxX(NEW."geom"),ST_MinY(NEW."geom"), ST_MaxY(NEW."geom")); END;
CREATE TRIGGER "rtree_points_geom_update4" AFTER UPDATE ON "points" WHEN OLD."id" != NEW."id" AND (NEW."geom" ISNULL OR ST_IsEmpty(NEW."geom")) BEGIN DELETE FROM "rtree_points_geom" WHERE id IN (OLD."id", NEW."id"); END;
CREATE TRIGGER "rtree_points_geom_delete" AFTER DELETE ON "points" WHEN old."geom" NOT NULL BEGIN DELETE FROM "rtree_points_geom" WHERE id = OLD."id"; END;
CREATE TRIGGER "trigger_insert_feature_count_points" AFTER INSERT ON "points" BEGIN UPDATE gpkg_ogr_contents SET feature_count = feature_count + 1 WHERE lower(table_name) = lower('points'); END;
CREATE TRIGGER "trigger_delete_feature_count_points" AFTER DELETE ON "points" BEGIN UPDATE gpkg_ogr_contents SET feature_count = feature_count - 1 WHERE lower(table_name) = lower('points'); END;

If you scroll through the SQL definitions, you will notice several ST_ functions (e.g. ST_IsEmpty). You might assume these functions are automatically provided by SQLite.

Surprisingly, they are not. Just because they are used in the table definition does not mean they exist. We must provide the following functions ourselves:

ST_IsEmpty
ST_MinX
ST_MinY
ST_MaxX
ST_MaxY

This means the writer needs to parse WKB and iterate over coordinates. It is not extremely difficult, but it is an extra responsibility.

Writing an SQLite Database Actually Writes Multiple Files

If you have worked with a GPKG file, you may have noticed additional files such as .gpkg-wal and .gpkg-shm created when you open a .gpkg file. This adds complexity.

This even affects the API design of the SQLite library. The rusqlite's Connection::open() takes a Path, and there's no version that accepts impl std::io::Read.

pub fn open<P: AsRef<Path>>(path: P) -> Result<Self>

In other words, reading and writing an SQLite database requires a filesystem, not just a file handle. This is problem in web browsers, where a normal filesystem does not exist. In theory, sqlite-wasm-rs, the crate behind rusqlite's Wasm support, should support OPFS, but enabling it was not straightforward (sqlite_wasm_vfs::sahpool::install() is async while other APIs are sync).

Compared to other cloud-native formats that work as a single file, this feels like a disadvantage.

So, How Does It Feel?

Honestly, not very comfortable.

The need for geospatial logic in the writer and the requirement for a filesystem both make GPKG harder to use in modern, browser-based environments. Compared to other cloud-native formats available in 2026, it feels less flexible.

That said, GPKG still has clear strengths, especially for desktop use. And as SQLite is becoming cool again, I believe the ecosystem will improve (especially Turso)!