DEV Community

Andy Wermke
Andy Wermke

Posted on

Built an unconventional database thing

Hey folks!

I just built a tool to statically evaluate SQL queries in JavaScript and TypeScript code: Postguard.

It's born out of the desire to write explicit database queries without a query builder, yet statically analyzable, and built on top of Babel and the TypeScript compiler API.

What does it do?

You query the database like this:

import { sql } from "squid/pg"
import database from "./database"
import { UserRecord } from "./schema"

export async function queryUserById(id: string): Promise<UserRecord | null> {
  const { rows } = await database.query<UserRecord>(sql`
    SELECT * FROM users WHERE if = ${id}
  `)
  return rows[0] || null
}

Define a schema for your tables. The record type can be derived from the schema when using TypeScript:

// schema.ts
import { defineTable, Schema, TableRow } from "squid"

export type UserRecord = TableRow<typeof usersTable>

const usersTable = defineTable("users", {
  id: Schema.Number,
  name: Schema.String,
  created_at: Schema.Date
})

Now let's run postguard to validate what we just wrote:

$ npx postguard ./sample/*.ts

✖ Query validation failed in sample/test.ts:7:32:

  No table in the query's scope has a column "if".
  Tables in scope: "users"

   5 | export async function queryUserByID(id: number) {
   6 |   const { rows } = await database.query<UserRecord>(sql`
>  7 |     SELECT * FROM users WHERE if = ${id}
     |                               ^
   8 |   `)
   9 |   return rows[0] || null
  10 | }

Ahhh, we have a typo in our query! Let's fix it:

const { rows } = await database.query<UserRecord>(sql`
  SELECT * FROM users WHERE id = ${id}
`)

Let's run it again:

$ npx postguard ./sample/*.ts

✔ Validated 1 queries against 1 table schemas. All fine!

More advanced stuff

Let's take our previous sample code and change the SELECT query:

import { sql } from "squid/pg"
import database from "./database"
import { UserRecord } from "./schema"

export async function queryUserById(id: string): Promise<UserRecord | null> {
  const { rows } = await database.query<UserRecord>(sql`
    SELECT id, name FROM users WHERE id = ${id}
  `)
  return rows[0] || null
}

$ npx postguard ./sample/*.ts

✖ Query validation failed in sample/test.ts:6:40:

  Query's result does not match the expected result type.
    Missing columns in result rows: "created_at"
    Actual columns in result rows: "id", "name"

  4 |
  5 | export async function queryUserByID(id: number) {
> 6 |   const { rows } = await database.query<UserRecord>(sql`
    |                                        ^^^^^^^^^^^^
  7 |     SELECT id, name FROM users WHERE id = ${id}
  8 |   `)
  9 |   return rows[0] || null

What happened? We defined that we expect the query to return rows of type UserRecord:
await database.query<UserRecord>(/*...*/)

Postguard evaluated the query and noticed that the result rows of that SELECT query do not match the TypeScript type UserRecord, since that type has a created_at property.

The fix is trivial:

import { sql } from "squid/pg"
import database from "./database"
import { UserRecord } from "./schema"

export async function queryUserById(id: string): Promise<UserRecord | null> {
  const { rows } = await database.query<UserRecord>(sql`
    SELECT * FROM users WHERE id = ${id}
  `)
  return rows[0] || null
}

$ npx postguard ./sample/*.ts

✔ Validated 1 queries against 1 table schemas. All fine!

Why?

So why did I even spent my time on that? Having worked with ORMs for years, I slowly grew more and more tired of them. A query builder seemed like a better solution.

Now the issue with query builders is that you effectively write SQL queries, but not as SQL queries; you formulate them using the query builder's proprietary API. Now I need to read up on two complex interfaces: My Postgres' fancy SQL features AND the query builder's comprehensive API...

Writing good old SQL feels like the natural way, so back to that. But now I've lost all confidence in my code and need to keep a really high test coverage, since I cannot statically reason about my queries. I actually need to run them to see if they work at all.

Enter the stage: Postguard. Back to confidence and short feedback cycles.

Feedback

So this tool is still pretty damn young and should not be used for production code yet.

I would love to hear some feedback regarding the overall approach, though!

The concept might seem a bit esoteric, but the code feels quite natural.

Share anything that's on your mind :)

Happy hacking!

Top comments (9)

Collapse
 
samuraiseoul profile image
Sophie The Lionhart

I really like the idea here, and would like to see what some programmers from other languages think about it as well. That said, my only think I don't like is this bit,

sql`SELECT * FROM users WHERE id = ${id}`) 

I personally hate the template function syntax thing, so the leading sql bit I guess.

Also personally I still think that grouping table queries together makes sense so having some kind of repository makes sense I feel. Like:

abstract class Repository {
    /* 
     * This would call whatever library is actually connecting to sqls' raw query method
    */
    protected abstract query(string sql); 
}

class Users extends Repository {
    protected query(string sql) : object|object[] { //idk the best return for this off the top of my head
        return DB.query(sql); //whatever lib that may be :P
    }

    public function getUsers() : User[] {
        return this.query('SELECT * FROM users;');
    }
}

and then since the actual checker in this case is not part of TS, but rather I assume looks for the sql prepended templates, that we could instead to some full tokenization of all the repository class files, and find any string that contained sql keywords, then validate them.

But I really like the idea of this, no need for a meh ORM, nor a meh query builder, but you get safe queries, and still get the benefit of model objects with types.

Excited to get a chance to play with this. :)

Collapse
 
andywer profile image
Andy Wermke

Nice to hear that you like the concept, Scott :)

I personally hate the template function syntax thing, so the leading sql bit I guess.

Aesthetic preference is of course one thing to consider, but I don't see any alternative to the tagged template string that would offer similar benefits. The reason why it's a tagged template in a nut shell:

  • We can reliably tag any SQL string and identify it with confidence later; it also seems natural to use a template tag for it as its purpose is specifically to work on specialized string data
  • Using the tag function we can interpolate the expressions into the SQL query string, SQL-injection-proof them and return a postgres query object (these tagged templates here don't just return a string, but a postgres query object; see the squid docs)
  • Template tags are a standardized language feature that have been developed with this purpose in mind

Also personally I still think that grouping table queries together makes sense

Agreed. Absolutely. :)

If you like to have traditional OOP and classes, feel free to go for it. I also keep the query functions for each table together in one file. I'm just a big advocate of a more functional programming style, that's all ;)

we could instead to some full tokenization of all the repository class files, and find any string that contained sql keywords

Might work, too, but would force users into a particular style of programming.


Thanks a lot for sharing this! I think this kind of dialog is very important, especially when having different views on a few things.

Tbh I didn't even question the template tags before, so this raised my awareness already quite a bit.

I hope we can have more such exchanges :)

Collapse
 
samuraiseoul profile image
Sophie The Lionhart

I'd love to have more exchanges about it. :)

The tagged sql template isn't any more identifiable than Repository.query in terms of a token per se I would argue. As for its use in terms of how it deals with interpolation and then injection proofing, it would already either have to parse the sql string, or otherwise impose structure to how the interpolation happens right?

const otherComment = '/* Another comment cause I am mean. */';
return `sql`
    SELECT ${fields} FROM ${table} WHERE ${conditions}; /*${sqlComment}*/ ${anotherComment}
`;

How will it know how to deal with all of those? I haven't looked at the code yet, but I imagine you will really need to parse out things well and guard against various things. Also on the return of the Postgres Query Object, simply require that be the return type, though marrying it to Postgres or a library's raw query function's return type/param type is perhaps not the best option. This all said, I dislike tagged template literal functions in general so maybe I'm not rational here. But I also am concerned about someone forgetting to wrap it in the literal function so doing

DB.rawQuery(`SELECT * FROM Users WHERE ${dropTablesInjection};`);

will result in bad things, but having a single spot in the repo option saves you mostly providing some developer doesn't go rogue and try to interact with the DB directly. You could have a check for that, but then you're back to looking for DB.rawQuery as a token instead of Repository.query.


I'm intrigued as to the functional approach you mention though. I assume you mean you have something like:

export getUser(id) { // blah }
export getAllUsers() { //blah } 
export updateUser(id, fields) { //blah }
export deleteUser(id) { //blah }

Then do something like:

import {getUser} from './queries/Users.js';
const user = getUser(id);

or something like:

import * as Users from './queries/Users.js'; 

const user = Users.getUser(id);

and especially in the second example I'm not sure how that's more functional. Its basically a repository class without making an actual class.

This is providing I'm not just on a whole different idea here. :P


This is fun dialogue and I hope we can continue. I worry that my top part about the tagged template literals my not be clear and comprehensible though, let me know if so.

Thread Thread
 
andywer profile image
Andy Wermke

The tagged sql template isn't any more identifiable than Repository.query

In fact it is. It's subtle, but there is one major difference: Repository.query(sqlString) might be fine, but Repository.query(createSomeQuery(someVar)) will be hard to resolve, especially if createSomeQuery is imported from another file. The template tag and the string will be tightly coupled, though. No further indirections possible, just need to check if the template tag is an export of the squid package 👌

I imagine you will really need to parse out things well and guard against various things

It's really using Babel to parse the source code to an abstract syntax tree, analyzing it, and when it finds an sql query, it will take the template string, replace the interpolated expressions with placeholders and parse the resulting SQL string using the native Postgres SQL parser implementation.

So we are actually utilizing a proper compiler toolchain and as long as we can trust Babel and the Postgres parser to do their job, there is not that much that can go wrong in that regard :)

But I also am concerned about someone forgetting to wrap it in the literal function

User might forget to use the sql template tag... That's a very interesting point!

We can prevent that source of error, though, I think. Let's come up with a check that makes sure that every query invocation involving a template string requires the template string to be tagged with either sql or sql.raw. Anything else would be considered an error.

I'm intrigued as to the functional approach

Yeah, your code sample is basically it :)

The reason why I think it's a better choice than having a class: If you decide for the class-approach you will either

a) Keep the class completely stateless and all methods can be static. As you noted, this is basically equivalent to the exported functions approach.
b) Might introduce some statefulness to the class, maybe instantiate a singleton-ish instance, because that's what distinguishes the class from a set of static exports. That's what I would try to prevent in the first place. Introducing state here would lead to creating an ORM with most of its downsides. This is what we are trying to get away from ^

The difference between the concepts of classes and static functions is that the classes can be stateful. Since I would do whatever I can to prevent any statefulness here, there is no reason for me to introduce classes here.

But I suppose you are coming from the other side: You have a lot of classes and you try to figure out why not to use one here...


I think we are laying the foundation for a whole new blog post here right now... :D

Thread Thread
 
samuraiseoul profile image
Sophie The Lionhart

Haha I think we really are laying the foundation for a blog post indeed. I'm not sure what about yet, perhaps a few different kinds really at this point. But anyways...


I hear what you mean on statefulness. Statefulness is indeed public enemy number one, I like the class with static methods approach a bit more just from the standpoint of classes are approachable to non-JS devs. What I've done recently which I really like is the following:

function invisibleHelperFunction(tooShortWord : string) : string {
    return $var + 'adding random words to strings is always helpful.';
}

const ClassThatActsOnStuff = {
    exposedMethod : function (customObject : CustomObject ) : string {
        return invisibleHelperFunction(customObject.stringField);
    }
}

Object.freeze(ClassThatActsOnStuff); //prevents adding new methods and changing state and modification in general

export default ClassThatActsOnStuff;

I like this because it lets you see that you might have added stuff to the file that doesn't belong in the idea that you can throw a function into it, export it, and go on, where as if I throw a function into the class that isn't used outside, then something has gone wrong, its also easier to keep track of what's being exported and hat isn't. And if I need one of the helper functions somewhere else, then I might have some un-needed coupling or another. Obviously this isn't real OOP by any stretch of the word and is really just a slightly different way of doing things, but I'm a fan. :P


Repository.query(createSomeQuery(someVar))

being hard to resolve is true, but is

sql`${createSomeQuery(someVar)}`

actually easier to resolve?

Now full disclosure here, I haven't actually gotten to look through the source code yet so I'm not 100% on how squid does things, but I am familiar with how tagged templates functions work. I would assume that the sql part is just a tag to identify it later, and the template function doesn't do anything fancy at all to it. And so later when we parse everything out, we still have to resolve that function.

Unless the sql safety check is at run time? I just looked at the function in question I believe(raw.githubusercontent.com/andywer/...) and it seems to do the actual sql query parameter wrapping thing, but doesn't actually check the validity of the query in the DB for you.

But for the checking that the query is valid syntactically with the schema, that is an extra build step right? And that will require the same bit regardless of the identifier unless there's something I'm really truly missing. I get that the template literal functions(this is getting annoying to repeatedly type out lol) get the params passed in the second argument, but that only helps you providing like I said that someone hasn't forgotten the sql tag, or done like

sql`SELECT * FROM Users WHERE ` + `${invalidatedInput};`;

though at this point I suppose the developer is just trying to hurt themselves and we should allow darwinism to run its course.


Let's come up with a check that makes sure that every query invocation involving a template string requires the template string to be tagged with either sql or sql.raw

This is kind of iffy to me as a solution. Checking every query invocation means that squid HAS to be tied to a specific framework or set of frameworks. And only working on template strings means that someone not using one, because either they're not a fan of string interpolation or just kind of accidentally forgets they exists, is left in the dust.

I find trying to guard against developer accidents and incompetence as much as possible is good practice, so I like a more general solution. Also since its in a build step, even if this process takes a bit longer, I think its worth it. This is where having a Repository Interface that requires them to implement those methods can help I feel.


The above points I guess bring me to two other things that I'd like to say that maybe are the source of some of our disconnect here:

  • Squid seems kind of tightly coupled at the moment to a specific database library and sql implementation(postgres)
  • Squid seems to actually be doing two things, both really with the template literal function(typed it again....)
    • At run time, wraps what anything it assumes is user supplied input to prevent sql injection and the like (this is where the functionality of the template literal function(again...) matters
    • At compile time, validates the query contents to be sure that no queries are incorrect or outdated. (this is where using it as a tag matters)

I think perhaps that decoupling the two sub point, and the two main points could be quite beneficial. Also I feel that whatever querying library you use should be wrapping stuff for you so doing it yourself may not be necessary, though that might stick us back into query builder kind of land, I'm not sure about it without a lot more investigation, but I feel end of the day it's DB.query(sql, params); as the call, so allowing the developer to decide how to wrap it and call it is up to them, we just do some fancy validation and parsing. :P Not care about the implementation of it at all nor the library used.


This is a fun discussion that I hope can continue! Also I've been using the inclusive 'we' here, but Its not my project and I haven't made any OSS to it, so forgive me!

Thread Thread
 
andywer profile image
Andy Wermke

Nahhh, I appreciate the "we". Sounds like a community effort already :)

I try to keep it shorter this time... ^

So the big picture overview as of right now is:

Squid:

  • has two functionalities
    • mark sql queries as such via the sql template tag
    • turn the query string with the interpolated expressions into a query object (at runtime)
  • that query object varies between database drivers, so as of right now it requires either
    • having a tiny adapter for each db in squid (maybe three lines of code each, but yeah, not 100% elegant)
    • wrapping the db driver's query function into a three-line adapter function
  • it's the runtime side of the whole thing

Postguard

  • is only a static build step, no runtime
  • might make it more generic in the future, supporting other databases as well
  • or create sibling tools for other databases, sharing the babel/typescript parts of the code

Checking every query invocation means that squid HAS to be tied to a specific framework or set of frameworks

No, squid could basically be completely db-agnostic, since all it does is super generic. All the validation happens on build time by postguard.

The reason why it's SQL-injection proofed is that squid doesn't merge the expressions into the SQL query string yet, but produces a string with parameters á la $1 and an array of the expression values, so the database driver can figure out how to escape them in a water-proof way.

I hope that shines some light on things, because it seems there was some confusion about the big picture concepts (-> will need to improve the readme files, i guess).

But sure, there is still room for various improvements and I love to hear your thoughts :)

Collapse
 
mindplay profile image
Rasmus Schultz • Edited

I've seen your squid project as well, and I'm glad to see someone exploring this direction. 😀

Running static analysis on the SQL bits is interesting - but appears to miss out on refactoring safety? I believe there is a much simpler and refactoring-safe way to approach this - have a look at this old thing I built in PHP some years back:

github.com/mindplay-dk/sql#select-...

That is, to use static references for all table and column symbols, e.g. interpolating schema/column objects and values into template literals:

sql`SELECT * FROM ${user} WHERE ${user.id} = ${id}`
Enter fullscreen mode Exit fullscreen mode

This approach would work much better with JS/TS than it did with PHP, since user and user.id would have types that the sql function can recognize. (in PHP the string interpolation happens before the call, which creates some corner cases and inconsistencies for e.g. INSERT statements in my library.)

Now you can rename user or user.id (with TS at least) and have automatic refactoring. You also get errors highlighted (by TS) for example after removing a column. Running static analysis on the SQL queries is probably much less necessary then, and the whole thing could be quite a bit simpler?

I know some users are averse to interpolation-heavy syntax - personally, considering how much this pays back in terms of safety and refactoring, I don't mind at all.

Prisma, TypeORM and Sequelize just don't do it for me - they're either too complex or too removed from SQL. I hope the idea of leveraging string interpolation catches on more widely in the future.

Cheers. 🙂

Collapse
 
david_j_eddy profile image
David J Eddy

Interesting library Andy. I as well have run into the 'learn SQL engine AND query builder API" issue in the past. If you target this library squarely in the 'you have to know your SQL' and keep high performance I think you could be on to something here.

Collapse
 
andywer profile image
Andy Wermke

Good point, David. One thing that jumps to my mind:

The "keep high performance" part makes perfect sense, but even if that's not your ambition you might still prefer SQL over any query builder's custom API.

Would you agree? Of course you would also want to have some simple IDE integration for a better dev experience. I think this can be solved with reasonable effort, though.