Database Schema Linting

#sql #linting #postgres

I started writing custom linter rules for Submotion and it’s been a surprisingly rewarding experience. I felt that my database schema was uncomfortably exempt from this scrutiny so I created a linting tool for it:

https://github.com/kristiandupont/schemalint

This tool will extract the schema from a live database and inspect it according to rules just like a code linter does with code. It uses nodejs but it’s just a command line tool so you can use it with whatever, as long as you have a postgres database.

You install it with npm:

$ npm install --save-dev schemalint

You then create a .schemalintrc.js configuration file that looks something like this:

module.exports = {
  connection: {
    host: 'localhost',
    user: 'postgres',
    password: 'postgres',
    database: 'acme',
    charset: 'utf8',
  },

  rules: {
    'name-casing': ['error', 'snake'],
    'name-inflection': ['error', 'singular'],
    'prefer-text-to-varchar': ['error'],
  },

  schemas: [{ name: 'public' }],
};

..and run the tool:

$ npx schemalint

I have written a bunch of rules for it. Some of them are general-purpose enough to be included in the package. For instance, some of the schema-related advice on The Postgresql Wiki like “prefer text over varchar(n) types”. If you enable that rule, you will get an error message if any of your tables has a column that is varchar(n) typed.

However, while the built-in rules might be of value, the true benefit comes when you start writing custom rules.

Linter rules, if written well, can help you enforce architecture. This is something that can also be accomplished with types to some degree. You can write a framework and define the shapes of things that come in and out of it. But there are plenty of situations where type systems fall short or where you would need to engage in seriously convoluted exercises to make it work. Linter rules, on the other hand, can be really flexible and based on quirky conditions and hints. This makes them more powerful than types for certain types of nudging. And the great thing is that they are easy to disable on a case-by-case basis so you can write something that works well 99% of the time rather than then 100% typically required by types (where that one last percent can be a true deal breaker).

Something similar applies to testing, but tests work really well for existing code, where linting applies to future code. It basically feels like you are pair programming with your future self, or with future developers working on your code base.

If you haven’t tried it, I highly encourage you to try writing a linter rule for yourself. It’s easier than you might think. If you are using JS, go to https://astexplorer.net/ and choose ESLint in the Transform dropdown. It gives you a nice playground to work from.

Lint-driven refactoring: before changing any code, make a linter-rule that outlaws the existing pattern. If possible, make it supply an automatic fix. This beats codemods.

Ok, so what do I do with my schema? Well, for instance you could enforce naming policies. Let’s try creating a rule that requires primary key columns to be named like this: member_id.

The identifierNaming object is a rule. It contains three properties: name, docs and process. The process function is where the validation happens. It takes a schemaObject parameter which is a json representation of your database, based on the format described in the extract-pg-schema package. The second parameter, report is the function you call when you want to inform the system that something doesn’t look right. You call that function with the parameters shown. The suggestedMigration string is optional — include it if you can create a meaningful migration that will resolve the issue you are pointing out. The identifier property should be something that will uniquely identify the problematic item. The user should be able to specify this in an ignore object and ignore a specific instance.

To see this particular example in action, take a look at the example folder in the repository: https://github.com/kristiandupont/schemalint/example.

You can do more advanced stuff than this, though. The schema object that you receive contains comments on tables, views and columns. You can set those in Postgres, so the comments live in the database. And in order to store additional metadata, you can use @tags in your comments.

So say you run this SQL:

COMMENT ON TABLE "member" IS 'Members of an organization @cached @alias:person'

The object for the member table will contain:

a comment with the value 'Members of an organization', and
a tags value of { cached: true, alias: 'person' }

Using this, you can create linter rules based on specific metadata.

In Submotion, I have machinery for automatically synchronizing a number of tables between backend and frontend. All of those have the tag @workspace on them, so I have specific linter rules like requiring that those tables don’t have columns with encrypted data in them (as I wouldn’t want that automatically synchronized to the frontend).

I feel that there is quite a bit of opportunity in this space and I would love to hear feedback and ideas.