Patrice Ferlet

Posted on Nov 20, 2023

You forget RethinkDB, it's a shame

#database #development #programming #opensource

There is no only MongoDB... and while you may think of ArangoDB, or CouchDB, you haven't thought of RethinkDB. This project must be saved, urgently!

In the world of database servers, there are many different solutions. PostgreSQL is potentially one of the most reliable solutions, MariaDB has beaten MySQL to the punch, and there are countless projects based on MongoDB.

Document databases have become commonplace.

It's practical, it's quick to set up, and it's easy to manage the schema within the application. As a result, the data is in JSON/BSON form, and is structured as a result.

All's well in the best of worlds.

OK, have you ever tried to quickly:

scale your database, I mean "QUICKLY"?
listen for changes in a table, to use like a bus, providing events to your clients/users?
having a UI where you can see what happens on tables, where are replicas, shards,... ?

Yes, of cours, you can scale, shard, monitor MongoDB, CouchDB. But please go ahead and follow my article.

I've been poking around, testing and breaking database servers for a long time (more than 20 years today). But a few years ago I came across a jewel, the grail, one of the best solutions available. Under the radar, shunned for whatever reason, RethinkDB is nonetheless one of the finest database server projects I've ever tested.

And the pleasure of using this server can be summed up in 3 points:

it's as simple as can be, with a web-based interface for querying, administration, and monitoring
With the "changes API", I can listen to a table and wait for changes. So it can be used as a bus, and I can use server side event with ease in my web applications.
efficient scalability through sharding and replication. Can be configured in no time at all, with no need for in-depth skills.

Wanna test?

Below, you can replace "podman" commands to "docker" of course. I prefer podman while I'm using Fedora, and because it works rootless.

# Create a network. It's important for what happens next.
podman network create rdb

# Start our fist node, bind 8080 to see the 
# web interface
# It's important to name the container, and
# to connect the container to a network. 
# You'll see why...
podman run --rm -it \
   --network rdb \
   --name rdb1 -p 8080:8080 \
   docker.io/rethinkdb

OK, now visit http://localhost:8080 and see the web interface.

The 8080 port is the web admin page port. RethinkDB uses the default 28015 port for query (from your application), and 29015 for clustering.

You can now visit the "Tables" tab to create databases and tables. No scheme to define, of course. At this time, we cannot "replicate" and "shard" the data. But, it will be OK in a few seconds.

Because, we will add a new node in the cluster. Ready?

# Create a new node in the network, but now
# we tell rethinkdb to "join" one node
# (whatever the node!), here "rdb1".
podman run --rm -it \
    --network rdb \
    --name rdb2 \
    docker.io/rethinkdb \
    rethinkdb -j rdb1

That's important, with podman and docker, to be in the same network to be able to communicate by "name" (via name resolution).

And see:

Yes, 2 servers now!

You can now configure your tables to be replicated and/or shard.

Add more and more nodes, you can tell the "rethinkdb" to join any node in the cluster, RethinkDB will do the job for you.

In our example, with Docker or Podman, you only need to ensure that you connect the container to the same network. This to use name resolution. In Kubernetes, you will need to create Statefulset and use headless services. In bare metal and VM, use whatever you want to contact one node.

Let's create a table

Let's create an "app" database, and add a "movies" table:

It's time to create some entries. Go to "data explorer" tab, and type this:

r.db("app").table("movies").insert([
  {name: "Titanic", director: "Cameron"},
  {name: "Citizen Kane", director: "Welles"}
])

I will not give details. But yes you can:

add indexes

add structures

create DB and tables with RQL langage (Javascript like)

and many other things

This inserts 2 new entries. As we gave to insert() an array. Of course you can add one entry.

If you now visit the table view, you can see this:

Nice, isn't it? A monitor of the table. OK, let's replicate the data to be sure that, if one node fails, we can get it back from a second server. Click "Reconfigure" button and do this:

By saying that we now need 2 replicas, RethinkDB will do what it must to do to replicate the data.

And, as you can see now, the documents are distributed in several servers.

Of course, all we did here can be made programatically!

The queries are easy to understand:

The "Changes" API

One of the best features of RethinkDB is the "changes" api. This is a very interesting thing that I use a lot.

You can listen for "changes" in a table, and react to this changes. Then you can create a serverside event handler in you web server, or whatever you want, to send messages to clients. It's very easy to understand:

r.db("app").table("movies").changes()

Each changes provide the "old" and "new" element that changed in the table. And of course, you can filter the changes!

Using with Python, Go, JS, Ruby, ...

RethinkDB officially support Python, JS, Java and Ruby drivers. But there are plenty of drivers for Go, C#, Dart, Perl, PHP, Lua... See the documentation page here

But it's dying by inches (or not)

The developers of this service have gone to work for Stripe. Kudos to them, without any irony, because they richly deserve their success.

The sad thing is that RethinkDB is now only benefiting from a few advances and maintenance operations. It works, the project is maintained at arm's length, certain pull-requests continue to be integrated. But the fear is growing for its users. Because you don't want to lose a tool like this.

Really, we've got to do something to save it.

But what? Of course, sponsorship is the first thing to do. If companies are listening, the message is clear:

you benefit from free software, and so much the better! but help us to offer you continuity of development. Give to the projects you use.

But there's another thing you can do at your own level: use it!

Because a tool can only feel useful if it's used.

I use RethinkDB against all fears of seeing it disappear. I refuse to succumb to the fear of obsolescence that I constantly hear.

RethinkDB is developed in C++ and is completely opensource. There are no major bugs and as a result:

we can therefore say that it doesn't logically need frequent updating
we can be sure that it can be recompiled for future OSes

Clearly, I'm not afraid to say that RethinkDB is production-ready, reliable, usable and future-proof.

So I've got three messages for you

Users: Go for it, use it, thank the developers, go and click on the little star in GitHub to show your encouragement not to let it die in a corner.
Companies, CIOs, CTOs: use it. Test it. Stop falling into the trap of commercial arguments for certain solutions, even opensource ones. RethinkDB works and offers you a service. A service you won't find as complete with other solutions. And for God's sake, donate to the opensource tools you use!
Developers, RethinkDB authors: please don't give up. To date, there's no document database solution that offers such simple scalability, with such a well-designed "changes" API system, and that combines simplicity and efficiency.

Top comments (9)

Alicia Sykes • Nov 20 '23

I had actually forgotten all about Rethink. I remember getting all excited when it was open sourced. Your post does a good job about highlighting it's many benefits, and explaining how it is still a viable option in 2023 :)

My worry is that it's no longer actively developed, so no chance on seeing those nice-to-have features we've come to expect (restarting feeds, gzipping support, GraphQL interface...) as well as bug fixes and performance improvements for running on newer architectures

Paweł Świątkowski • Nov 20 '23

I used to be a great fan and proponent of RethinkDB many years ago. We even used it for production for some heavy-load analytic cases. Then I watched it fall when the company behind it bankrupted. Then reborn as open source. Sad to hear it's struggling again.

Unfortunately, I think it might be hard to convince decision-makers in most companies to use unstable tech. Is there some "golden reference production use" for it?

Patrice Ferlet • Nov 20 '23

Actually there are still some PR and the community is alive.

I'm OK with your point, saying that it's hard to convince decision-maker to us it, but RethinkDB is stable, actually. That's a point in my article, there is no critical issue, the database works without any problem, even with thousands of writes per seconds in my projects.

Unfortunately, I don't find golden references... I'm pretty sure there are some. 😉

Anyway, my call is still the same: we, engineers, should propose it, we must show how it works, why it's interesting and what is the gain to use it. The more we use it, the more we demonstrate, the more it can be more active.

PS: really, RethinkDB is not dead. It's sleeping a bit.

Eckehard • Nov 20 '23

Do you have any information about using RethinkDB as a time series database?

Recently I had some problems with a bloated postgre-database, that flooded my server. For some reason we had the same data also written to influx, which occupied only a fraction of space. I cannot quantify the effect, as I do not know what else is written to postgre, but the difference seems to be reasonable.

Patrice Ferlet • Nov 20 '23

RDB is not specifically a time series database server. But it can be :)

I already used RethinkDB to store logs from a Rancher (Kubernetes) cluster with approximately 500 pods. That was OK, but I had to make a task to store and drop too old entries. The data size was very correct, but I was limited by the platform itself, not by RDB.

Actually, RethinkDB could be a few lighter than PostgresSQL - there are fewer controls, fewer data links, and you may replicate + shard on several RDB servers.

That means that you will need to manage a load balancer to send data to all nodes (I did it with a Service in Kubernetes). Or you can use one RDB node as a proxy (it's a simple option) that will load balance the flow.

All depends on the load you will apply.

Influx is very well-made, and is optimized to store that kind of data. Prometheus is a very good solution too.

Jonathan Gros-Dubois • Aug 24 '25

In terms of simplicity, architecture and ease of scaling, it may be the greatest database of all time.

Its query language is also remarkably minimalist. Its design encourages writing efficient queries. Moreso than even MongoDB.

The fact that it has not gained more traction is baffling. I see it is a testament to a steep decline of craftsmanship in the tech industry at around that time. It was a time when good ideas and projects were put to rest and replaced by bad ideas and bad projects... The period following 2016 will be looked back as some kind of software dark ages IMO. Where software ceased to be about science and became almost all about money and politics.

Alicia Sykes • Nov 20 '23

RethinkDB is developed in C++ and is completely opensource. There are no major bugs and as a result

Meanwhile on the Rethink repo....

Patrice Ferlet • Nov 22 '23

"Issues" doesn't mean "major bugs" 😉

Actually, if you check the list, this is mainly feature requests, some display bugs in the docker container (the version is not OK), are some specific mecanism problems.

I know you were kidding