A Unified Multi-Tenant User Cache with PostgreSQL
Dian Fay Dec 13 '17 Originally published at di.nmfay.com on Dec 10, 2017
I've been working on a multitenant Node.js product which recently moved its authentication into a Single Sign-On (SSO) system. With PostgreSQL we were able to structure and retrieve user data efficiently through an interesting combination of uncommon or unique database functionality:
- Foreign data wrappers (FDWs)
- Table inheritance
- Materialized views
Foreign Data Wrappers
When we began redesigning the application's user infrastructure we wanted to avoid maintaining a copy of user data independent from the chosen SSO system, Keycloak. We knew we could represent data from other sources through a foreign data wrapper. This is (so far as I know) a unique feature to Postgres, which lets you represent data in other sources as tables by implementing a standard connecting API.
The bad news: Postgres is written in C, and while I could probably brush up on pointers and make it work eventually, higher-level languages have spoiled me. Fortunately, there's a project which enables FDW development in Python: Multicorn. With my coworker's efforts on foreign-keycloak-wrapper, that got us as far as being able to create a table representing a particular "realm" or user organization in Keycloak (we kept our
organizations table in order to have referential integrity in our data ownership) and retrieve its users through the Keycloak API.
CREATE SERVER "myrealm_server" FOREIGN DATA WRAPPER multicorn OPTIONS ( wrapper 'keycloak.Keycloak', url 'url to the Keycloak instance', username 'a realm admin username', password 'a realm admin password', realm 'myrealm', client_id 'a realm client id', grant_type 'password', client_secret 'a realm client secret', organization_id 'id of an entry in the organizations table' ); CREATE FOREIGN TABLE "myrealm" ( id uuid, username text, "firstName" text, "lastName" text, email text, organization_id uuid ) SERVER "myrealm_server"; SELECT * FROM myrealm;
We'd still have to create a foreign server and table for each realm, since the Keycloak API only retrieves users per realm by design. In a multitenant system, we want bootstrapping new organizations to be easy and automated on the backend. Here Keycloak has the ability to export realm connection information as JSON, which lets us access the information required to
CREATE FOREIGN SERVERs and
Table inheritance is another feature unique to Postgres among the four major RDBMSs. Setting up a base
users table and declaring that the
INHERITS (users) accomplishes two things:
myrealm builds on top of
users' column list. This mostly makes the
CREATE FOREIGN TABLE statement shorter (it's also optional), since we have no new columns to add as long as the base
users schema conforms to the Keycloak API contract.
myrealm's data can be accessed through
users with a simple
SELECT. In fact, this is the default behavior, and a
SELECT must specify
FROM ONLY users in order to omit rows from descendant tables.
CREATE TABLE "users" ( id uuid, username text, "firstName" text, "lastName" text, email text, organization_id uuid ); CREATE FOREIGN TABLE "myrealm" ( ) INHERITS "users" SERVER "myrealm_server"; SELECT * FROM users;
SELECT now combines information from every active realm, so for higher-level APIs the only question is one of ensuring the requesting user is authorized to see the information retrieved. This is exactly the way we had it with the local users table, so we've already got that authorization infrastructure in place and overall impact to the rest of the application is minimal.
The Keycloak server being separate from the application database server means longer roundtrip time in any query involving user records. There are some advantages to having the data stored locally, after all! However, the real problem isn't in having the data but in ensuring it stays current: what we need is a cache. A materialized view is exactly that.
Materialized views are found in Postgres, SQL Server, and Oracle. If you use MySQL, you're out of luck (but then, if you're following this whole thing, it's Postgres or bust anyway). It's defined just like a regular view, with the
VIEW the important difference indicating that the results of the view query are to be stored until refreshed. The stored results can be indexed just like tables, too.
CREATE MATERIALIZED VIEW cached_users AS SELECT * FROM users;
If we add a new realm and its foreign table, or if information inside an existing realm changes (such as if we see a previously-unknown user try to login), we can
REFRESH MATERIALIZED VIEW CONCURRENTLY cached_users; to.... refresh the cache. The
CONCURRENTLY means it happens in the background, so
SELECTs happening while the data is being retrieved see the old version. It's not staying as close to realtime as possible; we could do that with cron or a systemd timer if we really wanted to, but for our purposes refreshing on new organizations being created or unknown users authenticating suffices.
Overall this has added some complexity to our database setup. We're no longer running stock Postgres since both Multicorn and foreign-keycloak-wrapper must be installed. Discrepancies between Python versions bundled with Postgres on various operating systems have also caused some issues -- universally resolved with a careful inspection of the Postgres configuration and the install logs, but annoying. Docker's taken some of the pain out of that, since we can ship an image with everything ready to go and use volumes to persist data.
Lastly, realms have to be created in Keycloak before we can do anything with them, so there are more moving parts to keep track of. Oh well; we have our unified user cache so the application logic stays simple, and that's all we wanted out of it. SSO is supposed to make life easier for users, not necessarily for architects!