Discussion on: Why does writing code for DynamoDb get my spidey senses tingling?

View post

Replies for: DynamoDB requires different thinking compared to most mainstream solutions. Could it be you’re bringing habits over from other systems? I have sure...

You are absolutely correct Andrew, those habits an ingrained and hard to shake.

But what I'm finding is even simple tasks require an amount of code and a degree of 'fluffiness' that I'm not comfortable writing.

For example, a fairly common and simple scenario is a customer looking to return 'products' based on category.
Given this task, with an RDBMS the task could be solved with 2x relational tables and a SQL statement like "SELECT a.* FROM PRODUCTS a, PRODUCT_CATEGORIES b WHERE a.id = b.product_id AND b.category_name LIKE 'driink' OR b.category_name LIKE 'foood' ORDER by a.product_name'

What would the code look like to get a list of product items from dynamodb, where a product can have many categories, based on a lookup of multiple categories?

Don't get me wrong, I like the promise of dynamodb, and would not consider myself a novice at modelling and using it. But the code I need to write to solve tasks similar to the above, or even basic CRUD, seems very verbose currently, and more code means more liability.

Andrew Berth • Jul 26 '20

Can you tell me how you’re going about tackling such a problem right now in Dynamo? For example, what do your tables look like? What are the kind of API calls you’re doing?

Conor Woods • Jul 26 '20

Hi Andrew, I appreciate the response but in truth, I was hoping to see some code to show me how you would personally solve the task using DynamoDb. But I'll tell you how I would approach it using DynamoDb and how I'll actually probably end up approaching it.

Just one of the api calls that I need to satisfy is as follows:
/products/?search=[freetext]&category=[cat1]&category=[cat2]

The data/schema for a product currently looks like the following:
{
"entity": "Product",
"sk": "ORG-1#PRODUCT",
"val": {
"name": "Product 1",
"orgId": "1",
"id": "EwfoHf7zAdRvNsiHw2SbxTeSnPb2",
"categories": [
{
"name": "Business",
"primary": true
},
{
"name": "Executive"
},
{
"name": "Career"
}
],
"status": "ACTIVE",
"createDate": "2020-07-23T14:56:29.994Z"
},
"pk": "PRODUCT-EwfoHf7zAdRvNsiHw2SbxTeSnPb2",
"updatedDateTime": "2020-07-23T14:56:29.994Z",
"entityId": "EwfoHf7zAdRvNsiHw2SbxTeSnPb2"
}

IF I were to solve this with DynamoDb I would either use a GSI to store a composite key of the categories or denormalize and duplicate the data. However, even if I could do a full-text search on the composite key (which I don't believe you can) it wont help with a multi-category search (unless I used a filter). And if I went down the denormalization route, I would still have the full text issue and I would also need to write the code and employ extra infrastructure to manage this (stream + SNS probably).

How I'll actually end up solving it is by introducing another piece of infrastructure like Elastic and populate it using Dynamodb streams.

My point is this: I'm jumping through hoops, adding extra infrastructure and writing way more code to solve previously simple tasks.

I am aware of the trade-offs, but I really wish there was an abstraction over DynamoDb to take the development pain away. I want to have my cake and eat it; I want the blazing fast speed and I want to write/support/maintain as little code as possible.

Maybe I'm just lazy, but something doesn't sit right with me.

Andrew Berth • Jul 26 '20

I did not write any code because, as I suspected, no amount of code is going to do what you want.

DynamoDB’s entire ‘thing’ is getting certain chunks of data (partitions) really fast, no matter how much data you’re dealing with. They do this by having all related data together: no multiple tables, no joins. They say you should duplicate your data in the format you’re going to want to access it, all to prevent you from having to compile stuff when you’re looking for it.

Getting all products from one or more categories should be no problem. With a GSI, as you said, you could perfectly make partitions of your data based on their category. Then you could get one (GetItem) or multiple (BatchGetItem) very easily.

Free text search, on the other hand, is an entirely different beast. Free text search is all about getting some arbitrary values out of a certain group. DynamoDB was not made to do this kind of thing. The same goes for relational databases, really. The LIKE operator lets you do some searching, but in a really limited way. (And it’s slow, which Dynamo does not allow you to be.) That’s why they made an Elasticsearch integration.

Now, of course it would be nice to have one service that would do exactly what you need, quickly, and with an API so simple even a toddler could use it. I know I would use the features you’re describing in a heartbeat. But for now, it simply does not exist.

Conor Woods • Jul 26 '20

Don't disagree with anything there Andrew.
By chance I came across this from the burning monk
lumigo.io/aws-serverless-ecosystem...
Now we're getting somewhere.
Coupled with Jeremy daly's dynamo toolbox, it could scratch an itch.
github.com/jeremydaly/dynamodb-too...