DEV Community

Gajus Kuizinas
Gajus Kuizinas

Posted on

Dynamically generating SQL queries using Node.js

Ever since I have released Slonik (PostgreSQL client for Node.js) and written a controversial Stop using Knex.js article (tl;dr; query builders are designed to be building blocks for ORMs; they do not add value when majority of the query is static.), I have been asked a lot – then how do I generate dynamic queries? I will answer this by sharing a couple of real-life examples.

All of the queries in this article are actual queries used in real-life business, Applaudience which heavily relies on PostgreSQL.

Disclaimer: (1) All examples discuss only SQL injection threats. Authorization logic (e.g. whitelisting columns user is authorized to access) is not in the scope of this article. (2) All statements assume there are no bugs in Slonik implementation.

Static query with dynamic value bindings

If your query logic does not change depending on user's input, then simply construct SQL query using sql tagged template literal, e.g.

sql`
  SELECT c1.country_id
  FROM cinema_movie_name cmn1
  INNER JOIN cinema c1 ON c1.id = cmn1.cinema_id
  WHERE cmn1.id = ${cinemaMovieNameId}
`;

If you are using Slonik, it is safe to pass values as template literal placeholders. sql will interpret all placeholder tokens and construct final SQL query. In this case, the only dynamic part of the query is the value bindings themselves, therefore the final query is:

SELECT c1.country_id
FROM cinema_movie_name cmn1
INNER JOIN cinema c1 ON c1.id = cmn1.cinema_id
WHERE cmn1.id = $1

Query and bound values will be sent to PostgreSQL separately: no risk of SQL injection.

Binding a list of values

When your query input is a list of values (e.g. such as when retrieving rows matching multiple identifiers), then you may use sql.valueList, e.g.

sql`
  SELECT m1.*
  FROM movie m1
  WHERE m1.id IN (${sql.valueList(movieIds)})
`;

This will generate a query with dynamic set of value bindings, i.e. if movieIds is [1, 2, 3] the query that is sent to PostgreSQL will be:

SELECT m1.*
FROM movie m1
WHERE m1.id IN ($1, $2, $3)

However, despite this being a common pattern, I do not advise to use this pattern. Instead, use sql.array, e.g.

sql`
  SELECT m1.*
  FROM movie m1
  WHERE m1.id = ANY(${sql.array(movieIds, 'int4')})
`;

This will generate a fixed-length query that does not change based on its inputs, i.e.

SELECT m1.*
FROM movie m1
WHERE m1.id = ANY($1::"int4"[])

Continue reading sql.array vs sql.valueList.

Query with dynamic columns

If your query result refers to columns that depend on user's input, then use sql.identifier to generate SQL that identifies those columns, e.g.

(Note: Not an actual query used in business. See next paragraph.)

sql`
  SELECT m1.id, ${sql.identifier(['m1', movieTableColumnName])}
  FROM movie m1
  WHERE
    m1.id = ${moveId}
`;

This query will produce a query that selects exactly 1 dynamically identified column. There is no risk of SQL injection, i.e. even if logic leading to generation of movieTableColumnName was somehow compromised, the worst that can happen is that query attacker will be able to return any column under m1 alias or execute query with invalid column identifier values (both carry risk; business logic is not in scope of this article).

Just because you can do this, you probably shouldn't. When your application requires to return different columns depending on user's query, it is better to select all columns that are in scope of the business logic and pick value of the needed column, i.e. If the intent of the latter query was to return a different movie identifier based on movieTableColumnName, then it is better to write a static query:

sql`
  SELECT
    m1.id,
    m1.foreign_comscore_id,
    m1.foreign_imdb_id,
    m1.foreign_metacritic_id
    m1.foreign_rottentomatoes_id,
    m1.foreign_tmdb_id,
    m1.foreign_webedia_id
  FROM movie m1
  WHERE
    m1.id = ${moveId}
`;

The latter has does return some superfluous data on every query, but it has several advantages:

  1. It reduces risk of SQL injection (regardless of how much you trust code generation logic, static code is always safer than dynamic code).
  2. It produces only one entry pg_stat_statements. You will learn to appreciate as few as possible queries in pg_stat_statements as your application scales.

Query with multiple dynamic columns

Same as the above, but sql.identifierList.

Nesting dynamic SQL queries

sql tagged template literals can be nested, e.g.

(Note: Simplified version of an actual query used in business.)

const futureEventEventChangeSqlToken = sql`
  SELECT
    ec1.event_id,
    ec1.seat_count,
    ec1.seat_sold_count
  FROM event_change_future_event_view ec1
`;

sql`
  SELECT
    event_id,
    seat_count,
    seat_sold_count
  FROM (
    ${futureEventEventChangeSqlToken}
  ) AS haystack
  WHERE ${paginatedWhereSqlToken}
  ORDER BY ${orderSqlToken}
  LIMIT ${limitSqlToken}
`

This allows to pass pre-bound SQL queries as first-class citizens across your program. This is handy when the intent is to isolate SQL generation logic for testing or when large SQL fragments are shared between queries or when the intent is to simply reduce concentration of code complexity in one place.

Injecting dynamic SQL fragments

sql.raw is used to inject dynamic SQL fragments, i.e.

sql`
  SELECT ${sql.raw('foo bar baz')}
`

translates to (invalid) query:

SELECT foo bar baz

Unlike the previous example using sql tagged template, sql.raw is not safe – it allows to create dynamic SQL using user input.

There are no known use cases for generating queries using sql.raw that aren't covered by nesting bound sql expressions (described in "Nesting dynamic SQL queries") or by one of the other existing query building methods. sql.raw exists as a mechanism to execute externally stored static (e.g. queries stored in files).

Query with a dynamic comparison predicate members or operator

If an operator of a comparison predicate present in your query is dynamic, then use sql.comparisonPredicate, e.g.

(Note: Not an actual query used in business.)

sql`
  SELECT
    c1.id,
    c1.nid,
    c1.name
  FROM cinema c1
  WHERE
    ${sql.comparisonPredicate(
      sql`c1.name`,
      nameComparisonOperator,
      nameComparisonValue
    )}
`;


nameComparisonOperator can be values such as =, >, <, etc. Assuming nameComparisonOperator is "=", then the resulting query is going to be:

SELECT
  c1.id,
  c1.nid,
  c1.name
FROM cinema c1
WHERE
  c1.name = $1

The latter is an extremely rare use case, reserved almost entirely to building higher level SQL abstraction tools (such as ORMs). It may be useful for "advance search" scenarios, however continue reading to familiarise with alternative patterns (see sql.booleanExpression).

Query with dynamic WHERE clause members

If presence of WHERE clause members is dynamic, then use sql.booleanExpression.

const findCinemas = (root, parameters, context) => {
  const booleanExpressions = [
    sql`TRUE`,
  ];

  if (parameters.input.query) {
    const query = parameters.input.query;

    if (query.countryId !== undefined) {
      booleanExpressions.push(
        sql`c2.id = ${query.countryId}`
      );
    }

    if (query.nid !== undefined) {
      booleanExpressions.push(
        sql`c1.nid % ${query.nid}`
      );
    }

    if (query.name !== undefined) {
      booleanExpressions.push(
        sql`c1.name % ${query.name}`
      );
    }
  }

  const whereSqlToken = sql.booleanExpression(
    booleanExpressions,
    'AND'
  );

  return context.pool.any(sql`
    SELECT
      c1.id,
      c1.nid,
      c1.name,
      c2.code_alpha_2 country_code,
      c2.name country_name
    FROM cinema c1
    INNER JOIN country c2 ON c2.id = c1.country_id
    WHERE ${whereSqlToken}
  `);
},

findCinemas is an implementation of a GraphQL resolver. WHERE clause of the query is constructed using a combination of 3 possible boolean expressions. As is the case with all the other query building methods in Slonik, all expressions can be nested: you can have other boolean expressions as members of a boolean expression or even SQL expression constructed using sql tagged template literal.

Summary

These examples cover every common dynamic SQL building scenario and provide enough knowledge of how Slonik works to enable reader to continue journey of familiarising with other query building methods provided by Slonik. The primary intent of this article was to demonstrate that Slonik provides a safe abstraction for constructing SQL queries keeping the static parts of the query intact.

If you value my work and want to see Slonik and many other of my Open-Source projects to be continuously improved, then please consider becoming a patron:

Buy Me A Coffee
Become a Patron

Finally, I missed a use case scenario that you would like me to cover, mention it in the comments and I will happily include it.

Oldest comments (26)

Collapse
 
shijiezhou profile image
Shijie Zhou

I think you should look into graphic QL

Collapse
 
gajus profile image
Gajus Kuizinas

What about GraphQL?

Collapse
 
craigmichaelmartin profile image
craig martin

Hey Garjus - this is the first time I'm coming across your articles/project. Looks really interesting! I look foward to digging in :) I would love to hear your thoughts on github.com/craigmichaelmartin/pure... - a pure ORM (no query builder dimension) which allows you to write regular native SQL and receive back properly nested/structured nested pure (not db-aware stateful) business objects.

Collapse
 
ivan_paqmind profile image
Ivan Kleshnin • Edited

Is it ok to use sql.raw to create a fully dynamic ORDER BY query?

let field = "createdAt"
let dir = "DESC" // generated like field.startsWith("+") ? "ASC" : "DESC"

let orderToken = sql`
  ORDER BY ${sql.identifier(field)} ${sql.raw(dir)}'
`

let q = sql`
  SELECT * FROM "post"
  ${orderToken}
  LIMIT 3
`
Enter fullscreen mode Exit fullscreen mode
Collapse
 
gajus profile image
Gajus Kuizinas • Edited

You don't need to: sql tags can be nested.

let field = 'createdAt';
let dir = field.startsWith('+') ? sql`ASC` : sql`DESC`;

let orderToken = sql`
  ORDER BY ${sql.identifier([field])} ${dir}'
`;

let q = sql`
  SELECT * FROM "post"
  ${orderToken}
  LIMIT 3
`;

Collapse
 
ivan_paqmind profile image
Ivan Kleshnin • Edited

Thanks. What about optional ORDER BY? For WHERE you started with TRUE so the absence of conditions results in WHERE (true) which is syntactically correct and has no performance implications. I can't find anything than can be used to order by default:

ORDER BY ??? default ???
Enter fullscreen mode Exit fullscreen mode

Another attempt with ?: breaks the placeholder ordering in query:

SELECT ${makeProjection(pick)} 
FROM ${sql.identifier([table])}
WHERE ${makeWhere(filter)}
${sort ? sql`ORDER BY ${makeOrder(sort)}` : ``} -- doesn't work
LIMIT ${makeLimit(limit)}
OFFSET ${makeOffset(offset)}
Enter fullscreen mode Exit fullscreen mode
syntax error at or near "$1"
Enter fullscreen mode Exit fullscreen mode
Thread Thread
 
gajus profile image
Gajus Kuizinas
${sort ? sql`ORDER BY ${makeOrder(sort)}` : sql``}

Enter fullscreen mode Exit fullscreen mode
Thread Thread
 
ivan_paqmind profile image
Ivan Kleshnin

Causes

InvalidInputError: Unexpected SQL input. Query cannot be empty.
Thread Thread
 
gajus profile image
Gajus Kuizinas
sql`--`

Thread Thread
 
gajus profile image
Gajus Kuizinas

I cannot recall the original reason for this constraint. Will likely remove this constraint. Track github.com/gajus/slonik/issues/93.

Collapse
 
paulovieira profile image
Paulo Vieira

Hello Gajus,

I've been experimenting with Slonik for the first time today. Seems interesting, I'm very much aligned with the philosophy of "just write plain old sql". This article was useful, especially the final example. Thanks.

Collapse
 
ivan_paqmind profile image
Ivan Kleshnin • Edited

What's the difference between tuple and valueList? The only one I see is that tuple adds ( and ) parens around its values... The implementation and types are almost identical to valueList, to the point I wonder why not to express one in terms of another or just ditch tuple as unnecessary. Most probably I miss something – that's just my first impression.

Collapse
 
gajus profile image
Gajus Kuizinas

Eventually sql.valueList, sql.rawList, sql.tupleList, sql.identifierList and sql.booleanExpression will be removed in favour of a single method sql.expressionList.

github.com/gajus/slonik/issues/91

The original intention for separation was to force semantic resemblance and type strictness specific to the code fragment being generated. However, since then types have evolved for all of these helpers to allow a lot broader spectrum of allowed values.

Collapse
 
ivan_paqmind profile image
Ivan Kleshnin • Edited

Another question. Why valueList and other similar functions support only primitive values?

Is TypeScript a reason? The thing I like about pg driver is that, unlike MySQL, I can SELECT or INSERT a date, a boolean, an object, etc. value an it will do "the right thing" without an extra hassle.

Maybe it's not entirely safe or somehow "not a good practice" – I dunno.
For now, the API feels limiting for no obvious reason.

Collapse
 
gajus profile image
Gajus Kuizinas

Value token can be any primitive that is shared between pg and JavaScript (string, integer, float, boolean, null).

If you have a requirement for automatic coalescing of other object types (such as Date), I suggest raising an issue. I cannot think of other types that would be relatively safe to cast, though.

Collapse
 
ivan_paqmind profile image
Ivan Kleshnin

JS Object with JSON.stringify?

Thread Thread
 
gajus profile image
Gajus Kuizinas

Too many edges cases where an object could be passed accidentally inserting potentially sensitive data to the database.

There is sql.json() for that, though.

Collapse
 
cvh23 profile image
cvh23

Slonik's approach is very interesting. Although it's maybe not directly related to Slonik, but because we don't use an ORM: What would be the best way for transforming a SELECT query result of rows into an array of nested objects? For instance if we are joining multiple tables and have a master-detail-relationship. The web client usually expects some kind of nested JSON structure.

Collapse
 
gajus profile image
Gajus Kuizinas

You can return JSON from PostgreSQL query.

Collapse
 
cvh23 profile image
cvh23

So your recommendation is to do everything in PostgreSQL with its JSON functions? So we already have the ready-to-use object or array of objects as query result?

Thread Thread
 
gajus profile image
Gajus Kuizinas

Not at all. I think it is entirely unnecessary. Keep the queries simple. If you need to nest objects, either use frameworks that automate nesting (e.g. GraphQL), or write simple iteration routines to amend the data structures.

Thread Thread
 
cvh23 profile image
cvh23

Sounds good, thank you! Do you know of any application which uses Slonik + GraphQL and is open source, so that I can study this approach?

Collapse
 
baerrach profile image
Barrie Treloar

sql.booleanExpression was removed in favour of sql.join but the article has not been updated.

See feat: remove multiple methods in favor of sql.join

The doc link is now github.com/gajus/slonik#slonik-que...

Collapse
 
5422m4n profile image
Sven Kanoldt

Seems that sql.valueList does not exist anymore. Also the link above provided Continue reading sql.array vs sql.valueList. does not lead to the right section of the README in the repo.

Was this feature removed lately?

Collapse
 
ackvf profile image
Vítězslav Ackermann Ferko

Where have all the sql builder functions go and how to replace them?
comparisonPredicate, valueList, etc.

I am trying to construct multiple WHERE statements dynamically, but I get all sorts of errors. The thing is, some of them might be undefined and thus ommited, but then sql builder yells that it's not a valid value

Collapse
 
sajedulkarim profile image
Sajedul karim

I am fetching issue where I have to generate dynamic ORDER BY clause for multiple columns.
Here is my sample query:

SELECT *
FROM user
WHERE gender = 'male'
  ORDER BY created_at ASC, updated_at DESC
LIMIT 10 OFFSET 0;
Enter fullscreen mode Exit fullscreen mode

My Try:

const orderBy: string = 'ORDER BY created_at ASC, updated_at DESC';

SELECT *
FROM user
WHERE gender = 'male'
  ${orderBy}
LIMIT 10 OFFSET 0;
Enter fullscreen mode Exit fullscreen mode

Here, if I pass ORDER BY clause as ${orderBy}, then it take it as a value. It shows error.

The generated sql is like below:

{
  sql: "\n  SELECT *\nFROM user\nWHERE gender = 'male'\n  $1\nLIMIT 10 OFFSET 0;\n",
  type: 'SLONIK_TOKEN_SQL',
  values: [ 'ORDER BY created_at ASC, updated_at DESC' ]
}
Enter fullscreen mode Exit fullscreen mode

Please help me.