DEV Community


Posted on • Updated on

We used CosmosDB instead of a relational database, here's what we went through and learned.

Almost two years ago, we took on a pretty sizable project. We're talking about a potential of users from everywhere in the world, if all goes well, and; a brilliantly complex domain that makes a lot of sense to the aim of the product.

I was involved on designing the solution architecture, everything was agreed upon except with the database. Taking to consideration the complexity of the relationships of the entities, I proposed to use Azure SQL and find a way with Elastic Database tools to scale it horizontally. On the other side of the table, it was decided to use CosmosDB, why? mainly for these reasons:

  1. FAST. We're guaranteed that some queries would be faster than 10ms.

  2. CosmosDB offers an almost painless scaling. Set a partition key, write a few lines of code to handle partitioning, and there you go.

  3. Since we're dealing with multi-tenancy, we found that CosmosDB's partitioning could help us to properly segregate data for each tenant.

I've worked on CosmosDB back when it was still called DocumentDB. I was never sold that you could totally replace a relational database with a NoSQL database, though, you can still pull it off if you do it properly or the complexity permits. The key is modelling your entities correctly, you should identify when you need to reference or embed, it's all about how volatile the data is. I was given a sandwich for my idea and told me that we would go with CosmosDB. We did the dirty work, and here's what we went through:

  1. Stored procedures run on their own transaction. You can't write a transaction that would run multiple stored procedures and sweetly roll back all the changes if something went wrong.

  2. The SQL engine isn't really all your familiar SQL. SQL statements are very limited, and JOIN doesn't work like the JOIN you know from relational databases. You can't do cross collection queries.

  3. Back then, the SDK doesn't support that much LINQ statements. We had a hard time implementing pagination. Currently, more LINQ statements are supported and OFFSET LIMIT statements was introduced to the SQL engine.

  4. Partitioning can get in the way of your queries. We had a lot of filters in our application. There were times we need to restructure the partitions to accommodate filters or totally say we can't do that filters since the partitioning forbids us.

  5. You have to deal with error 429's. I wouldn't go deep about it, it needs another wall of text. It basically means that your collection is getting overloaded with requests and you have to handle this error and retry the request. The SDK can help you handle it automatically or just pay more for more request units.

  6. Hot partitions. We tried the auto-scale feature for the request units (RU's). The idea is when we hit the threshold, more RU's will be automatically allocated and will go back down if there's no need. Still we're getting a lot of error 429's, we found out that we're experiencing hot partitions. The partition keys developers set are the logical partitions, which are then mapped to physical partitions, these physical partitions take up a definite amount of RU's; if the physical partition's RU's are used up, all requests for all the logical partitions mapped to it will result to error 429.

  7. I am personally convinced that you cannot live without error 429's, unless you are filthy rich and willing to pay the price. We optimized our code, did caching, and implemented automatic scaling. We still get error 429's and the cost didn't really go down.

I was never convinced that CosmosDB is the right way to go. We're spending way too much and we can't do so much about it. We hopelessly combed through the diagnostic logs to find some light, and even bothered support to give us answers for our questions that doesn't even make sense. Up until the day I left, we're still trying to make do with CosmosDB and bled our pockets for it. Through the experience, here's what we realized:

  1. CosmosDB really does offer painless scaling through partitioning.

  2. It would really shine for you if you are not doing fancy things with your data. If you have data that you would store as you display it or display it as store it, you could really benefit from the speed CosmosDB promised.

  3. If you have huge amount of data that you just need to store outright, you'll could take advantage on how easy you can scale storage through partitioning. We had other projects where CosmosDB really paid-off. We have an IoT project that used CosmosDB to save tons of telemetry. The team was able to scale for terabytes of data without breaking a sweat and worrying for storage.

  4. You could create collection triggers and have a change feed processor if you need to something event-driven.

  5. You couldn't totally replace a relational database with a NoSQL Database for a domain with business logic relying heavily on entity relationship.

That's all. I do believe that CosmosDB is a great product. On the right design and with proper documentation, developers could really benefit with the speed and almost painless storage scaling. Let me know if I missed something, you agree or disagree with my thoughts, discussion is well appreciated.

DISCLAIMER: This post is not meant to criticize CosmosDB negatively or convince developers to stay away from it. CosmosDB is a really good NoSQL database and gives a lot of benefits on proper implementation.

Top comments (2)

jmarbutt profile image
Jonathan Marbutt

I would love to hear more about what kind of expense you mean for RU/s? Our SQL Azure db is growing so much we are paying a fortune for and would love to get your thoughts on comparison.

neil445 profile image

I don't have the figures on the top of my head and I can't go in and take a look since I've left the project already, but I'll see if I can get a hold of the figures through other means.

but what I remember, we're spending at the minimum of 150$ a week to a maximum of 300+$, with no paying clients onboard, and looking at the usage , it's automatically scaling between 10K - 100K RUs, let's just say that in most days we're on 50K RUs.