So... what database does Gatsby use?

#gatsby

Recently I started migrating an older Drupal based blog to a combination of Gatsby and Netlify CMS (source), both of which I had no prior experience with. In this blog series I'll talk you through the experiences, hurdles and solutions. In part 5 I'll answer one of the questions that kept nagging me during development: what database does Gatsby use to query from GraphQL?

GraphQL queries from databases right?

In my previous understanding, one of the defining features of GraphQL is being able to query efficiently from multiple datasources. I always understood these datasources can be REST APIs, other GraphQL APIs or databases such as SQL and MongoDb.

Thus arised my question, what database does GatsbyJS use?

No database, but Redux

Gatsby actually does not use a database at all. Under the hood everything that is processed by the plugins gets transformed into Nodes and added to a nodes collection in Redux.

Other collections that exist in Redux are: pages, components, schema, jobs, webpack and metadata.

The combination of plugins and actions (such as those defined in gatsby-node.js like createPage()) will populate these collections.

Redux stores these collections in-memory during build / dev time and GraphQL queries against these collection based on an automatically generated GraphQL Schema.

The in-memory approach also means that data is not persisted! A regular database would write its data to a database file on a system. In-memory databases recreate the whole database every time. Gatsby does provide a caching system to speed up subsequent builds though.

The schema itself is also stored in Redux and used to execute GraphQL queries that query other Redux collections.

That's quite some Redux-ception!

There is still one problem to be solved. Redux collections store plain Javascript, but GraphQL queries are written in using the GraphQL Syntax. The solution is using sift.js. Further reading can be found here.

The in-memory approach does have some drawback at scale (100k+ documents)...

Next up, LokiJS

To address the scaling issues, the Gatsby team are investigating replacing Redux with the LokiJS library. This storage / retrieval mechanism is still hidden behind a feature flag. LokiJS is still an in-memory database, but promises better performance.

If you'd like to know more, take a peek in the source code and follow the code 🔎.

That's it! Hopefully you understand now where the data comes from that is being queried by GraphQL!

This blog is part of a series on how I migrated away from a self-hosted Drupal blog to a modern JAM stack with Gatsby and Netlify CMS.