DEV Community

alex
alex

Posted on

Postgres or Memgraph: a terrible comparison

Intro

  • I'm trying to decide on Postgres or Memgraph for a personal project.
  • I'm not a database expert. Until yesterday, I didn't even know databases use fsync to write stuff to disk.
  • As usual this is more for me than anyone else.
  • There're a lot of nuances and caveats and details in this topic. I will ride roughshod all over them. With no apologies!

The project

The project itself is small for now. It's an events app because I'm a fool for trying to compete with the likes of EventBrite and Meetup instead of jumping on the AI/ML bandwagon like everyone else.

An events app, it turns out, can be quite complex. Each event requires subject matter, locations, reference materials, languages, budgets networks of people with myriad interests, and an assortment of other details. It's highly graphable.

So I could use a relational database like Postgres, the obvious traditional workhorse of software development everywhere. Or, this enticing little thing called a graph database.

The old ways: relational + data warehouse

Here, I draw mainly from some experience and also from Martin Kleppmann's Designing Data-intensive Applications, 2017, first edition.

As far as archaeologists can tell, from around 10,000 BC, hominids have preferred this workflow:

  • Relational database for transactions and source of truth
  • Automate regular ETL to move required data from relational database to data warehouse
  • Run analysis on ETLed data from warehouse and related ecosystem

Why can't I use this tried-and-tested way? Because I'm just one person and I'm like constantly exhausted all the time these days so why would I want to maintain more than one database.

Trade-offs: performance and source of truth

The method sketched out above makes sense because relational databases remain more performant than graph databases for transactions. I've been reading about things on and off in between work weeks so I don't know where I got this sense from and I might be totally wrong. And also, "performant" is a big broad word.

Another reason the above is still a good workflow is that graph databases might not be suitable as a source of truth. For instance, Memgraph does not write every transaction to disk. It holds the latest ones in memory. If it crashes for some reason, those are forever gone. On the other hand, Postgres writes updates to disk.

Postgres docs says "If [fsync] parameter is on, the PostgreSQL server will try to make sure that updates are physically written to disk". As far as I understand, it's on by default. And while it says "try", it probably does. But who knows. You tell me, you experts.

Memgraph does have a --storage-wal-file-flush-every-n-tx flag to configure how many transactions to accumulate before calling fsync to write to disk. I have no idea yet how this affects read and write performance.

Thanks to Marko Budiselić (CTO at memgraph) for discussing this with me on this Twitter thread. (Do NOT follow me there. I talk tech once in a very long while. Mostly, Twitter is just my scream space.)

How I settled on Memgraph

I research this on weekends when I'm exhausted so it's not like I'm gonna be comprehensive about it. But anyway.

For my purpose I want something with:

  • High availability (HA)
  • An open source language

Here's what I found.

  • Neo4j has incredible marketing outreach. It overshadows every other graph db out there. TBH, this usually rings all the alarm bells in my head. I had to do "-neo4j" on Google for every search otherwise pages of results are clogged with it. Sorry I'm half ranting about this but it really gets my goat.
  • A lot of graph databases, including Neo4j, either don't have HA outside of enterprise tier or have proprietary graph query languages.

For the moment, Memgraph seems to fit the bill, given some trade-offs I can probably live with. It has HA and uses Cypher.

However, I may be a little wrong on the HA. From Memgraph's docs on replication:

"To enable replication, there must be at least two instances of Memgraph in a cluster. Each instance has one of two roles: MAIN or REPLICA. The MAIN instance accepts read and write queries to the database and REPLICA instances accept only read queries."

So there's still a bottleneck for writes.

How I'm unsettling from Memgraph

For a couple of weekends I had fun playing with Cypher on the dockerized Memgraph platform. Then I started to think about actual implementation and realised that as far as I could tell:

  • There's no schema. I'm spoilt by strong typing so I feel really as though I'm unmoored in an open sea without this.
  • You can't have multiple databases. If you want separate graphs, here's a Stack Overflow answer by Katarina Šupe, DevRel at Memgraph:

    "You can achieve this behavior using labels. For example, you can add the label GraphOne to each node of the first graph, the label GraphTwo to each node of the second graph, and so on... Aside from this method, the only other option is to fire up a new instance."

With this I began to realise why the old workflow is still pretty standard in order to get the best of both worlds. But it's usually a full team or two who do this. Let's not forget I still have front end, back end, and maybe even k8s on my plate.

Conclusion

It kills me to drop graph databases. Playing with Cypher on Memgraph tickles my brain in new ways. It makes me think about my data in an actually relational way, which is ironic given that writing SQL for relational DB's doesn't give me that feeling.

We can achieve the same thing with relational tables and lots of SQL joins. But the cypher syntax and the idea of nodes and edges inflects my thinking in a way that feels so much more ergonomic to relationships.

At the very least I don't have to do a hundred annoying joins with Cypher. NGL, this was a compelling reason I looked into graph dbs.

On the whole I got a bit of that feeling like going from Flatland to 3d/Space land. It's an incredible sensation, one of the best feelings in the world.

But the lack of a schema on Memgraph, and the lack of open source languages or of HA on community tiers for the others make graph databases unsuitable for me right now.

It's not me, it's them. Or maybe it's me, who knows. Maybe I just don't have the skills in this area.

So, I think... I'll stick with Postgres? 💔

I'll chew on it a bit more but it's likely. 😤

Thanks

Comment if you like but I engage very, very slowly. Usually only on weekends. So maybe like twice a month. I appreciate feedback though! 🙏🏼

Top comments (0)