DEV Community

Cover image for Google Cloud Datastore Deserves a Better Python DX: Introducing google-cloud-datastore-odm
Chris Karvouniaris
Chris Karvouniaris

Posted on • Originally published at levelup.gitconnected.com

Google Cloud Datastore Deserves a Better Python DX: Introducing google-cloud-datastore-odm

Intro

Google Cloud Datastore is a serverless, fully managed NoSQL document database that scales horizontally without configuration. For teams building on Google Cloud, particularly those running workloads on App Engine or Cloud Run, it's a natural fit: no clusters to manage, no connection pools to tune, and strong consistency guarantees backed by Google's infrastructure.

If you've been around Google Cloud long enough, you probably remember the old App Engine NDB library. For its time, NDB was genuinely delightful: you declared your models with expressive Python classes, queried them with clean syntax, and got automatic in-memory caching almost for free. The developer experience was the closest thing the Google Cloud ecosystem had to Django's ORM or SQLAlchemy for NoSQL.

Then came the great migration. App Engine's second generation dropped the NDB runtime, and the recommended path forward became google-cloud-ndb(a Python3 port) or google-cloud-datastore (the raw modern SDK). Both are maintained. Both work. And both, in different ways, leave something to be desired.

The Problem with Existing Libraries

Let's be honest about the current state of Python libraries for Google Cloud Datastore.

google-cloud-ndb

The successor to the legacy NDB library was built primarily to provide a migration path, and it does that reasonably well. But it also carries a lot of architectural baggage.

It relies heavily on a thread-local context cache that made sense in the synchronous, single-threaded App Engine runtime of a decade ago, but becomes problematic in async web frameworks like FastAPI. If context boundaries are not handled carefully, stale cache entries and unexpected state sharing can occur between requests. The API also depends on patterns like global context managers that feel increasingly out of place in modern Python applications.

Beyond the architectural concerns, google-cloud-ndb also hasn't fully embraced newer Datastore capabilities. Server-side aggregation queries such as COUNT, SUM, and AVG were introduced to Datastore in recent years and are genuinely useful for production workloads. With google-cloud-ndb, support for these features is limited, often pushing developers toward workarounds like keys-only queries and application-side aggregation.

google-cloud-datastore

The raw modern SDK is thorough, stable, and actively maintained, but intentionally low-level. You work directly with Entity objects, which are essentially dictionaries with attached keys. There's no schema declaration, no type enforcement, no query DSL, and no IDE-friendly typing experience.

Queries are assembled manually through filter objects. Here's a minimal example of saving and querying an entity with the raw SDK:

from google.cloud import datastore

client = datastore.Client()

# Saving an entity
key = client.key("Article", "my-first-post")

entity = datastore.Entity(key=key)
entity.update({
  "title": "Hello World",
  "author": "Alice",
  "score": 10,
  "is_published": True
})
client.put(entity)

# Querying
query = client.query(kind="Article")
query.add_filter(
    filter=datastore.query.PropertyFilter(
        "is_published", "=", True
    )
)
query.add_filter(
    filter=datastore.query.PropertyFilter(
        "score", ">=", 5
    )
)

query.order = ["-score"]

results = list(query.fetch(limit=10))
Enter fullscreen mode Exit fullscreen mode

This is perfectly fine for scripts or small utilities. But in a production application with dozens of entity kinds, validation rules, multiple contributors, and evolving schemas, it quickly becomes a maintenance burden.

There's no single place where your data contract lives. No protection against typos in property names. No strong typing that your IDE can actually understand. And no abstraction layer for common patterns like validation, lifecycle hooks, or transactions.

The gap between "I want to build something with Datastore" and "I have a maintainable, production-grade data layer" is real.

That's the gap this library attempts to close.

Where This ODM Steps In

An Object Document Mapper (ODM) does for document databases what an ORM does for relational ones: it maps your application's objects to the database's native storage format, abstracting away the low-level API while still keeping you close enough to the metal when needed.

google-cloud-datastore-odm is a modern, fully typed Python library (Python 3.10+) built on top of google-cloud-datastore.

Its design takes direct inspiration from NDB's developer ergonomics such as declarative models, operator-overloaded queries and lifecycle hooks, but rebuilds them on modern foundations: strict type annotations, descriptor-based properties, AST-driven query construction, explicit validation layers, and cleaner runtime semantics.

The philosophy is simple: 

  • Your data model should be readable Python code, not a collection of dictionary accesses scattered across your codebase.
  • Google Cloud Datastore is a genuinely good database. It deserves a Python interface that matches.

Install it:

pip install google-cloud-datastore-odm
Enter fullscreen mode Exit fullscreen mode

GitHub repo: https://github.com/trebbble/google-cloud-datastore-odm
Documentation: https://trebbble.github.io/google-cloud-datastore-odm/

Key Features and Basic Usage

Declarative Models with a Rich Property System

The core of the library is the Modelbase class. You define your entity schema by declaring properties as class-level descriptors, the same pattern NDB used, now rebuilt with proper typing support and modern validation semantics:

import datetime
from google.cloud.datastore.helpers import GeoPoint

from google_cloud_datastore_odm import (
    Model,
    StringProperty,
    IntegerProperty,
    FloatProperty,
    BooleanProperty,
    TextProperty,
    DateTimeProperty,
    DateProperty,
    TimeProperty,
    JsonProperty,
    PickleProperty,
    BytesProperty,
    GenericProperty,
    KeyProperty,
    StructuredProperty,
    ComputedProperty,
    GeoPtProperty,
)

class Address(Model):
    city = StringProperty()
    country = StringProperty()

class Article(Model):
    # Primitives
    title = StringProperty(required=True)
    word_count = IntegerProperty(default=0)
    score = FloatProperty()
    is_published = BooleanProperty(default=False)

    # Large text (automatically unindexed)
    body = TextProperty(compressed=True)

    # Dates & Times
    created_at = DateTimeProperty(
        auto_now_add=True,
        tzinfo=datetime.timezone.utc
    )

    updated_at = DateTimeProperty(
        auto_now=True,
        tzinfo=datetime.timezone.utc
    )

    publish_date = DateProperty()

    # Complex structures
    metadata = JsonProperty(compressed=True)
    raw_payload = BytesProperty()
    legacy_object = PickleProperty()
    schemaless_data = GenericProperty()

    # Relational & structured
    author_key = KeyProperty("Author")
    location = StructuredProperty(Address)
    coordinates = GeoPtProperty()

    # Computed property
    @ComputedProperty
    def read_time(self):
        return max(1, self.word_count // 200) if self.word_count else 0
Enter fullscreen mode Exit fullscreen mode

The available property types cover the full spectrum of Datastore-supported values while also preserving much of what developers appreciated from NDB, with some cleanup, consolidation, and modernization along the way.

Basic CRUD is exactly what you'd expect:

# Create
article = Article(
    title="Getting Started with Datastore ODM",
    word_count=800
)

article.put()

# Read by ID
fetched = Article.get_by_id(article.key.id)

# Update
fetched.score = 9.5
fetched.put()

# Delete
fetched.delete()

# Bulk operations
Article.put_multi([article1, article2, article3])
Article.get_multi([key1, key2, key3])
Article.delete_multi([key1, key2])
Enter fullscreen mode Exit fullscreen mode

Flexible validation in field and model level

NDB provided basic python type validation, inline property validators and hooks. All that is covered and extended here with modern syntactics for field and model validators for clarity, reusability and extensibility:

  • Property-Level Constraints: Built-in checks like required=True, choices=[...], and strict type enforcement.
  • Field-Level Validators: Custom functions that run immediately whenever a specific property is assigned a value.
  • Model-Level Validators: Complex, cross-property checks that run right before the entity is saved to the database.
  • Lifecycle hooks: Pre and post hooks for all read, write and delete operations.
from google_cloud_datastore_odm import (
    Model, StringProperty, IntegerProperty, 
    field_validator, model_validator
)

# A reusable inline validator
def no_emoji_allowed(value: str) -> str:
    for char in value:
        if ord(char) > 127:
            raise ValueError(f"Value '{value}' contains non-ASCII characters.")
    return value

class Article(Model):
    # Built-in Constraints & Inline Validators
    title = StringProperty(required=True)
    status = StringProperty(default="draft", choices=["draft", "published"])
    clean_notes = StringProperty(validators=[no_emoji_allowed])
    word_count = IntegerProperty(default=0)

    # Field-Level Validators
    @field_validator("title")
    def validate_title(self, value: str) -> str:
        if len(value) < 3 or len(value) > 200:
            raise ValueError("Title must be between 3 and 200 characters.")
        return value.strip() # You can modify and clean data here!

    # Model-Level Validators
    @model_validator
    def validate_published_requires_content(self):
        if self.status == "published" and (self.word_count or 0) == 0:
            raise ValueError("A published article must have a word count > 0")

class TrackedTask(Model):
    description = StringProperty()

    # --- Write Hooks (Instance Methods) ---
    def _pre_put_hook(self):
        print(f"Preparing to save task: {self.description}")
        # Ideal place to update 'last_modified' timestamps manually

    def _post_put_hook(self):
        print(f"Successfully saved task with ID: {self.key.id_or_name}")
        # Ideal place to trigger event-driven webhooks

    # --- Read Hooks (Class Methods) ---
    @classmethod
    def _pre_get_hook(cls, key):
        print(f"Preparing to fetch key: {key.id_or_name}")

    @classmethod
    def _post_get_hook(cls, key, instance):
        print(f"Fetched key: {key.id_or_name}. Found instance? {instance is not None}")
        # Ideal place to decrypt sensitive fields coming from the DB

    # --- Delete Hooks (Class Methods) ---
    @classmethod
    def _pre_delete_hook(cls, key):
        print(f"Preparing to delete key: {key.id_or_name}")

    @classmethod
    def _post_delete_hook(cls, key):
        print(f"Successfully deleted key: {key.id_or_name}")
        # Ideal place to remove associated files from Cloud Storage
Enter fullscreen mode Exit fullscreen mode

An Expressive, Strongly-Typed Query Builder

Rather than constructing property filters manually, queries are built directly through Python comparison operators on model properties.

Under the hood, the ODM constructs an AST and translates it into native Datastore query objects at execution time:

from google_cloud_datastore_odm import OR, AND

# Simple filter
q = Article.query().filter(
    Article.status == "published"
)

# Multiple conditions
q = Article.query().filter(
    Article.author == "Alice",
    Article.score >= 5
)

# Explicit OR / AND logic
q = Article.query().filter(
    OR(
        AND(
            Article.status == "published",
            Article.score > 10
        ),
        Article.author == "Alice"
    )
)

# Bitwise syntax
q = Article.query().filter(
    (
        (Article.author == "Alice")
        & (Article.status == "draft")
    )
    | (Article.score >= 4)
)

# IN / NOT_IN
q = Article.query().filter(
    Article.status.IN(["draft", "archived"])
)

# Ordering and limiting
articles = list(
    Article.query()
    .order(-Article.score, Article.title)
    .fetch(limit=10)
)

# First result only
best = (
    Article.query()
    .order(-Article.score)
    .get()
)
Enter fullscreen mode Exit fullscreen mode

Cursor-based pagination, which matters for any API endpoint dealing with large collections:

page_query = Article.query().order(Article.title)

cursor = None
while True:
    # Fetches up to 20 items and returns the next cursor
    page, cursor, has_more = page_query.fetch_page(page_size=20, start_cursor=cursor)

    for article in page:
        print(article.title)

    if not has_more:
        break
Enter fullscreen mode Exit fullscreen mode

Deep queries into nested StructuredProperty models work through natural dot-notation:

from google_cloud_datastore_odm import StructuredProperty

class Address(Model):
    city = StringProperty()

class User(Model):
    location = StructuredProperty(Address)

# Query deeply into the embedded entity
q = User.query().filter(User.location.city == "Athens")
Enter fullscreen mode Exit fullscreen mode

Modern Transactions

Transactions use native Python context managers and support automatic retries with exponential backoff for optimistic concurrency conflicts:

from google_cloud_datastore_odm import Model, IntegerProperty, transaction

class LedgerAccount(Model):
    balance = IntegerProperty(default=0)

alice = LedgerAccount.get_by_id("alice")
bob = LedgerAccount.get_by_id("bob")

try:
    with transaction():
        # Read from the transaction snapshot
        alice = LedgerAccount.get(alice.key)
        bob = LedgerAccount.get(bob.key)

        # Mutate memory state
        alice.balance -= 50
        bob.balance += 50

        if alice.balance < 0:
            raise ValueError("Insufficient funds!")

        LedgerAccount.put_multi([alice, bob])

except Exception as e:
    print(f"Transaction aborted: {e}")
Enter fullscreen mode Exit fullscreen mode

Bonus feature: Multi-Tenancy as a First-Class Feature

For SaaS applications, multi-tenancy often becomes an architectural concern only after the system has already grown complicated.

The ODM treats it as a first-class capability.
Every model can declare routing defaults through its Meta configuration, and those defaults can be overridden dynamically at runtime:

from google_cloud_datastore_odm import (
    Model,
    StringProperty
)

class SystemLog(Model):
    event = StringProperty()
    user_id = StringProperty()

    class Meta:
        kind = "AuditLog"
        project = "central-logging-system"
        database = "db-1"
        namespace = "system-events"

log = SystemLog(
    event="Startup",
    user_id="system"
)

log.put()

def log_user_action(
    tenant_id: str,
    action: str,
    user: str
):
    tenant_log = SystemLog(
        event=action,
        user_id=user,
        namespace=tenant_id,
        database="customer-db"
    )

    tenant_log.put()

log_user_action("tenant-a", "Login", "alice")
log_user_action("tenant-b", "Download", "bob")
Enter fullscreen mode Exit fullscreen mode

Queries respect the same routing boundaries:

# Default partition
central_logs = SystemLog.query().fetch()

# Tenant-specific partition
customer_b_logs = SystemLog.query(
    namespace="tenant-b",
    database="customer-db"
).fetch()
Enter fullscreen mode Exit fullscreen mode

Since Datastore namespaces are fully isolated at the storage layer, cross-tenant leakage becomes structurally difficult rather than merely conventionally discouraged.

Taking the Best from Both Worlds

The ODM intentionally borrows strengths from both google-cloud-ndb and google-cloud-datastore, while avoiding many of their long-standing tradeoffs.

From google-cloud-ndb: the developer experience people actually loved

The declarative model syntax, operator-overloaded query filters, lifecycle hooks, and overall "Pythonic" feel of NDB were genuinely productive abstractions. That familiarity is preserved intentionally.

If you've worked with NDB before, the transition should feel natural rather than disruptive.

At the same time, some legacy assumptions were deliberately left behind.

The implicit thread-local context cache, for example, made sense in the original App Engine runtime model, but becomes increasingly fragile in modern async applications. In frameworks like FastAPI, hidden cross-request state is more liability than convenience. The ODM avoids implicit caching entirely and instead plans to expose explicit cache integration hooks so applications can choose their own caching strategy safely and transparently.

Another important shift is around transactions and entity groups.

Historically, Datastore developers had to carefully design ancestor hierarchies around strict entity group write limitations. Modern Cloud Datastore has relaxed many of those historical constraints, and the ODM does not artificially preserve outdated patterns that no longer reflect the platform's capabilities.

Migration from NDB should be an easy path and is already documented and will be enriched as the library evolves,
Questions, migration edge cases, and feature requests are encouraged through the project's GitHub discussions and issue tracker.

From google-cloud-datastore: access to modern Datastore capabilities

The lower-level SDK exposes newer Datastore functionality that older abstractions never fully adopted. The ODM embraces those capabilities directly instead of hiding them.

One of the clearest examples is native server-side aggregation queries:

from google_cloud_datastore_odm import Model, IntegerProperty

class SaleRecord(Model):
    price = IntegerProperty()

base_query = SaleRecord.query()

# Individual Aggregations
total_sales = base_query.count()
total_revenue = base_query.sum(SaleRecord.price)
avg_price = base_query.avg(SaleRecord.price)
Enter fullscreen mode Exit fullscreen mode

Instead of fetching thousands of entities or keys and aggregating in Python, the database performs the work server-side.

The ODM also exposes native IN and NOT_IN filtering support directly, rather than emulating them through internally expanded OR conditions like older NDB implementations historically did.

Most importantly, the underlying premise of the library is built around the actively maintained google-cloud-datastore SDK itself. New Datastore functionality, fixes, and backend improvements can be surfaced cleanly without depending on legacy runtime assumptions.

Project state

Features

The library is at v0.1.2, installable from PyPI today. The core feature set is solid: models, properties, validation, queries, aggregations, transactions, multi-tenancy, lifecycle hooks, schema introspection, and key allocation helpers are all shipped and tested.

Roadmap

The roadmap is also available in documentation with async support being the most significant planned addition, important for anyone building with async frameworks. A cache integration support is also planned.

Testing

The test suite targets near-100% coverage for shipped functionality, with regression testing across supported Python versions and compatible google-cloud-datastore dependency ranges.

Contribution

The most useful thing you can do right now , more than starring the repository, more than opening a PR, is to use it in a real project and share what you find.

Try it. Break it. Tell me where it feels awkward.

  • Does the API feel natural?
  • Does something behave unexpectedly?
  • Is there functionality you assumed would exist that doesn't yet?

Early feedback shapes a library far more than polished marketing ever will.

Links

GitHub repository
Documentation
Contributing guide

Outro

Cloud Datastore has always occupied an interesting place in the Google Cloud ecosystem.

The infrastructure story is excellent: fully managed, horizontally scalable, operationally lightweight, and deeply integrated into the broader Google platform. But for Python developers, the experience around modeling, querying, validation, and application architecture has long felt split between a legacy abstraction layer and a low-level SDK.

google-cloud-datastore-odm is an attempt to close that gap.

Not by hiding Datastore behind heavy abstractions, but by making the database feel natural to work with again, using modern Python conventions, explicit behavior, strong typing, and APIs designed for contemporary applications rather than legacy runtimes.

Datastore deserves that level of tooling. Python developers do too.

Top comments (0)