My latest attitude towards the database has been to respect it more, by taking advantage of more of the powerful features it has available. That means taking advantage of features specific to my go-to database, PostgreSQL, and locking myself in. I have a pending blog post about my thoughts on that, but for now I wanted to make a quick note on something that seems so simple and obvious, yet I'd never considered before.
Quite often in our application we'll want to customise our WHERE clauses for filtering results. A common example of this is implementing filters for a table of results. To do this, we might loop over a list of filters that come to us from the client (e.g., query parameters in a GET request), and include all and only the filter values which have a value set. If we have one filter requested, then we should add a 'WHERE' clause, and if we have more than one, we should join all the options by 'AND'. It's a small thing to do, but in building up our SQL out of joining strings together, we need to be a little careful to construct it right.
Alternatively, when we're using an ORM or query builder (for Go, I like to use squirrel), adding the correct 'WHERE' and 'AND' terms is handled automatically. A small convenience that we could do ourselves, but then we'll find ourselves building our own in-house micro-query-builder.
The nice thing about not using a query builder or ORM is that you can work directly with the SQL, a great advantage when you're familiar with it and know just what you want to write. You don't run into situations where you think, "I know what the SQL query needs to be, but I don't know how to get this ORM/Query Builder to write that SQL". The problem is you end up composing your SQL queries using your own equivalent of a querybuilder anyway, and then the details of your queries are split all over the place. You can't easily look in one place, and see the query as a whole to understand it. Instead, it's scattered in piecemeal strings throughout your function or wider code.
I've been playing around with Rust some more, and looking at sqlx as an option that lets me stay close to the SQL, with some additional compile time checks. sqlx sends queries to the database at compile time to check that they are valid, which sounds great -- compile time checks for validity save time.
This, of course, won't work when you're constructing your clauses from parts and therefore don't have the whole query written anywhere at compile time that can be checked against the database. For example (written in Go), you might have:
package main
import (
"fmt"
"strings"
)
var filters = []struct {
Name string
Value interface{}
}{
{
Name: "name",
Value: "Homer",
},
{
Name: "city",
Value: "Springfield",
},
}
func main() {
query := "SELECT * FROM person"
var where []string
var args []interface{}
index := 0
// Make sure 'filters' are not provided by the client, or are checked against a list of allowed values, so that you don't give an opening for a SQL injection attack:
for _, filter := range filters {
index++
where = append(where, fmt.Sprintf("%s = $%d", filter.Name, index))
args = append(args, filter.Value)
}
if len(where) > 0 {
query = query + " WHERE " + strings.Join(where, " AND ")
}
fmt.Println(query)
// db.Query(query, args...)
_ = args
}
We've built our own home brewed SQL building solution, and this example doesn't even deal with ranges -- for example, finding everyone with a date of birth between two dates (or just after a particular date). Those have a different syntax that also needs to be dealt with, complicating our 'quick' homebrewed query builder. Note also that the full query is scattered across parts throughout this function. The query it generates In this case would be:
SELECT * FROM person WHERE name = $1 AND city = $2
But it's very hard to see that at a glance, and it can't be checked in advance. How do we get compile time checks for these runtime constructed queries? I've been thinking about options where we inspect the database to understand what tables and fields there are, and inside Rust we use a macro where we can use simple tags inside our query string that the compiler uses to check are valid values. We could use this to check that we're at least using correct table and field names.
However, I saw another solution for this in an issue for the sqlx project. And that solution is just to have the query itself handle the optional cases! Instead of including them only if they're set, you always list them in the query inside your code, so you don't need to do any kind of runtime building up of the conditions. If the filter is not provided then the query won't filter on that value. Simple, but effective, and should still let sqlx check the query at compile time! And the query is fully expressed, so it's easy to read all in one place, and therefore easy to understand.
Here's an updated version of the above using this method:
package main
import (
"database/sql"
)
type filter struct {
Name sql.NullString
City sql.NullString
DOBBefore sql.NullTime
DOBAfter sql.NullTime
}
func main() {
var f filter
f.Name = sql.NullString{String: "Homer", Valid: true}
f.City = sql.NullString{String: "Springfield", Valid: true}
query := `
SELECT * FROM person
WHERE
($1 IS NULL OR name = $1) AND
($2 IS NULL OR city = $2) AND
($3 IS NULL OR dob < $3) AND
($4 IS NULL OR dob > $4)
`
rows, err := db.Query(query, f.Name, f.City, f.DOBBefore, f.DOBAfter)
[...]
}
This is both shorter and much clearer about what's going on. Morever, that query can be checked at compile time with the database, and is easily portable. It even handles date range filters without needing to enhance our quickly built in-house query builder to consider such filters. Also, since we've hard-coded our filter options, we don't run the risk of accidentally allowing any SQL injection attacks because we forgot to properly check our filter names.
It's such a simple and obvious way to handle these things, but it never occurred to me before. I'll be interested to see if this kind of solution can work in most situations, or if it will have common limitations I haven't yet considered.
Top comments (2)
One point against that final query you show there. Using "OR" anywhere in a query generally destroys performance entirely, usually resulting in a full table scan. If you're dealing with a couple hundred or maybe a thousand rows, that's fine. In my line of work, a billion rows per table is quite common, and that would literally take hours to execute the query. Not entirely ideal on the DBA side, just to save some cleanliness on the developer's side.
Edit: I should have used an earlier date for my tests, and maybe a second clause. A quick test with a second where clause (not included below) seems to show exactly the same plan is used both with and without the OR parts.
The largest tables I deal with are less than 1 million rows, so my pressures are definitely different to yours.
I did some tests in postgesql, and I can't see any difference in the query plan that the database uses, whether I include OR's or not. I'm far from a database expert, so please let me know if I've misunderstood something.
There are two cases here: a date is given to be filtered on or isn't. There are two approaches for each of these cases -- using an OR, and not using one.
dateplan
anddateplan_all
are used for the cases where date is filtered on or not, respectively, with no OR.dateplan_or
includes a where clause, whether we filter on date or not.The two plans that are actually used when filtering on date, using
dateplan_or
anddateplan
:And the two plans for
dateplan_or
anddateplan_all
when no date is given:If I create an index, it gets used by both queries when filtering by date: