One of the things that made NoSQL databases like MongoDB so popular was how easy it is to just throw a blob of JSON in, when that’s what you needed. What you may not realise though, is that Postgres supports JSON pretty much equally well. In addition to just throwing some JSON in a field, you can query it using all the usual SQL operations. You get JOINs, transactions, indexes etc.
JSON vs. JSONB
The first step, when adding a JSON field to postgres is to decide between JSON and JSONB. I’ll make this easy for you:
Always use JSONB, never use JSON
JSONB is just like JSON, except that it doesn’t store the actual string of JSON, instead it stores an efficient binary representation. The only reason you might ever want to store JSON is if you wanted to keep track of the whitespace in the original JSON serialisation or the key order in your objects. If you want a “pretty” view of your JSON, you can just use JSON.stringify to pretttify it.
Creating the Table
Now that you’ve decided on JSONB
as the format, you can create the table just like you normally would.
CREATE TABLE my_data (
id TEXT NOT NULL PRIMARY KEY,
data JSONB NOT NULL
);
This creates a table with a primary key called id
that’s of type TEXT
and a data
column to store our JSON data.
Reading and Writing JSON Data
If you’re using the @databases/pg
client, you can read and write Postgres data just like you would any other value:
import connect, {sql} from '@databases/pg';
const db = connect();
export async function get(id) {
const [row] = await db.query(
sql`
SELECT data
FROM my_data
WHERE id=${id}
`
);
return row ? row.data : null;
}
export async function set(id, value) {
await db.query(sql`
INSERT INTO my_data (id, data)
VALUES (${id}, ${value})
ON CONFLICT id
DO UPDATE SET data = EXCLUDED.data;
`);
}
This gives us a simple key value store for JSON blobs using a postgres database.
Querying JSON
Imagine we’re storing some blog posts in our “NoSQL Postgres Database”:
await set('post-a', {
author: 'ForbesLindesay',
title: 'Post A',
body: 'This post is about the letter A',
});
await set('post-b', {
author: 'ForbesLindesay',
title: 'Post B',
body: 'This post is about the letter B',
});
await set('post-a-rebuttal', {
author: 'JoeBloggs',
title: 'Post A - Rebuttal',
body: 'Forbes was wrong about the letter A',
});
Now imagine we want to get a list of all the blog posts by ForbesLindesay
. The author field is buried in the JSONB field, but that doesn’t mean we can’t use it in our query.
export async function listByAuthor(author) {
return await db.query(
sql`
SELECT data
FROM my_data
WHERE
data ->> 'author'
= ${author}
`
);
}
Here, the ->>
operator means “get the value of this property”. It will only work when the value is a string, number or boolean. If the value is another object, you must use the ->
operator, which means “get the value of this property as JSON”.
Hopefully it’s clear that this means you can use the full power of SQL here, but just to give another example, we could get a list of all the authors:
export async function getAuthors() {
return (await db.query(
sql`
SELECT DISTINCT
data ->> 'author' as author
FROM my_data
`
)).map(({author}) => author);
}
Here, we’re extracting the author from the data, and then using SQL’s DISTINCT
operator to only return each author once.
Conclusion
In Postgres, you can use JSON just like any other value, and it would be easy to set up a NoSQL like store for JSON blobs and just use that as your entire database. This doesn’t necessarily mean you should. This JSON data is totally schema-less, so it makes it very important that you properly validate that it matches any expected structure, before inserting it into the database. It’s very useful when you need to store large JSON structures and you’re not yet sure how you’re going to want to query them, but for the most part, I still recommend having explicit field in SQL, and using JOINs to store nested lists etc.
Originally published on medium
Top comments (2)
I had some bad experiences with storing very large JSONB structures an then selecting parts of it in a SQLQuery (
SELECT "field"->'key' FROM "myTable"
). It got much better performance by storing the data that is part of the JSON blob in a separate column. I must say that some of those BLOBs are a few Megabytes as we store data that has been extracted by a ML classifier. Nevertheless, I am curious what data people usually store in JSON columns, for my use cases it is mainly mutli-dimensional data that would not make sense getting normalized.I don't have any experience trying to store ML data in JSONB. Currently I can think of two places we use them at Threads Styling:
We have a "MediaStore" service for storing and processing images and videos. This service is consumed by a range of different services internally, so we allow storing a small blob of metadata along with each image. This makes it really flexible for supporting lots of different scenarios.
We have a service that synchronises contacts between google contacts and an internal database (for our employees). We store the data we currently care about (name, phone numbers etc.) in normalised columns in postgres, but we also dump the entire JSON document for each contact into a JSONB column, which has proved useful for debugging, and would allow us to extract extra info if we needed by running a db migration, rather than needing to re-sync with google.