A First Look at Google Cloud Datastore

#gcp #aws #exploration

Note: This is a continuation of my post on exploring Google Cloud Platform. There is also a mirror of it on my personal blog.

At first glance, Datastore is equivalent to DynamoDB. I personally think that Datastore is better compared to SimpleDB. Unfortunately, SimpleDB isn't accepting new customers/isn't being deployed to new regions, so it's not a good comparison.

That said, the fundamental idea behind Datastore is the same (Hosted NoSQL database), but how it's implemented is very different.

App Engine Datastore vs Cloud Datastore?

At one point, Cloud Datastore was (is?) part of App Engine, but it's since been split out. Presumably as part of this legacy, Datastore appears limited to the regions that App Engine is in, which unfortunately isn't all of Google's regions.

Additionally, an App Engine account is created for Cloud Datastore. It's required if you use the Datastore SDKs as well. Why this dependency exists, and why it's exposed are open questions.

Even more confusing is that the docs for App Engine Datastore list DB Datastore as superseded, but then link to docs about Cloud Datastore. App Engine also mentions a NDB Client Library, which as far as I can tell wraps the actual Cloud Datastore service, but is specific to App Engine. There is also at least one more article that treats Cloud Datastore and the DB/NDB libraries as separate things.

The only thing I can suggest is check the URL, make sure the docs you're reading start with https://cloud.google.com/datastore/.

Pros:

SQL-like semantics (transactions!)
More granular breakdowns for multi-tenancy: namespaces/'Kind'/'ancestor path' (Google says a Kind is functionally equivalent to a table). I'm not sure about the usefulness of the namespace/kind distinction, but it's an extra way to get multi-tenancy and is ignored by default, so meh.
Per request pricing! DynamoDB is charged at what you're expected to use, not what you actually use. Given AWS's obsessive focus on "pay what you use", Dynamo's provisioned read/write units are odd
Automatic indexes for every property enables arbitrary querying, not AWS's you must define any indexes you want
A dashboard that allows SQL-like queries to be run (but only SELECT queries)

Cons:

Nothing like DynamoDB streams (which are awesome for replication/async actions that are implicitly triggered off a data change)
Dynamo has 25x the storage on the free tier compared to Datastore (25GB vs 1GB)
Dynamo offers more total read/write operations per day - good if you have a consistent request rate, bad if you have bursts
Index (created by default, you have to opt out) data storage seems to be charged for
Creating a custom index requires the use of the gcloud CLI tool. There is no mention of any other method in the index documentation.
If you have a query that involves filtering on more than one property, you might run into a situation that isn't covered by the built-in indexes or is otherwise impacted by one of a decently long list of query restrictions.

While you could get away with doing a scan + filter combination in Dynamo, GQL will reject you with a "Your Datastore does not have the composite index (developer-supplied) required for this query." error. (My usecase was select * from kind where property1 < value order by property2.)

I haven't found a way to get Datastore to scan and filter server side, so I have to iterate over everything and throw away data that I don't want - after retrieving it.

Pricing

A bit more about the price, because the pricing models of the two products are really different.

Dynamo's pricing model makes sense if you're doing a fairly consistent number of requests per second. Dynamo attempts to support bursting, but they do it by having a 300 sec bucket of * provisioned but unused read/write capacity, and bursting out of that. When you exhaust the bucket, requests are denied.

So if you're trying to save money, and drop your read/write units to 1, and you do something request heavy, you're going to have a bad time unless you increase the units before running your operation. Dynamo's new auto scaling feature takes some time to kick in as well (the scale up alarms take 5 minutes to kick in - the CloudWatch alarm is set on ConsumedWriteCapacityUnits > NN for 5 minutes).

In contrast, Datastore's charge-per-request model fits dramatically varying traffic patterns better, mainly because you're not paying for capacity that sits unused.

If you're doing any sort of table scanning in Dynamo to find elements by properties, or you have indexes on single properties, chances are Datastore will work better for you by virtue of the built-in-by-default indexes. You can get the same functionality out of Dynamo, but it's harder to set up, and functions as (and is charged as) a separate table.

If you have composite (multi-property) indexes, that's a bit more complicated. Datastore does a far better job of hiding the index complexity (once it's set up) and actually using the indexes. But the setup process is hit or miss, requiring you to know in advance things like sort orders.

If you're not doing anything fancy, and just accessing everything directly by key, Dynamo is better for small scale stuff by virtue of the massively greater free storage space (25GB vs 1GB).

DEV Community

A First Look at Google Cloud Datastore

App Engine Datastore vs Cloud Datastore?

Pricing

Top comments (0)