DEV Community

Robert Ozimek
Robert Ozimek

Posted on

Parsing Filter Expressions in NestJS with a Context-Free Grammar

Flat query param filtering breaks down fast. ?status=active&age_gte=18 works until someone needs (status=active OR status=pending) AND age>=18. At that point you're either inventing your own encoding convention or telling the client it can't do that.

The underlying issue is that boolean logic is recursive. AND and OR nest arbitrarily, and flat key-value params have no way to express that structure.

What a Grammar Actually Is

A Context-Free Grammar (CFG) defines a language through production rules. Each rule describes how a named concept (non-terminal) expands into tokens (terminals) or other non-terminals.

The arithmetic grammar is the classic example:

expression → term ( ("+" | "-") term )*
term       → factor ( ("*" | "/") factor )*
factor     → NUMBER | "(" expression ")"
Enter fullscreen mode Exit fullscreen mode

Operator precedence isn't hardcoded, it falls out of the nesting. term sits inside expression, so * binds tighter than + automatically. factor → "(" expression ")" lets expressions nest to any depth. The grammar describes the full language in a handful of rules, and a parser derived from it handles every valid input correctly by construction.

That last point is the main reason to use a grammar over ad-hoc parsing code. You're not handling cases, you're defining a structure that makes invalid inputs unrepresentable.

Defining the Tokens

Before you can parse, you need a lexer, something that turns a raw string into a stream of tokens. In nestjs-filter-grammar this is done with Chevrotain, a parser building toolkit for TypeScript.

Tokens are defined with createToken. The order they appear in matters. Chevrotain matches greedily, so multi-character operators need to come before their single-character prefixes:

import { createToken, Lexer } from 'chevrotain';

// Multi-char operators first
export const GreaterEqual = createToken({ name: 'GreaterEqual', pattern: />=/ });
export const LessEqual    = createToken({ name: 'LessEqual',    pattern: /<=/ });
export const NotEqual     = createToken({ name: 'NotEqual',     pattern: /!=/ });
export const IContains    = createToken({ name: 'IContains',    pattern: /\*~/ });
export const Contains     = createToken({ name: 'Contains',     pattern: /\*=/ });
export const IStartsWith  = createToken({ name: 'IStartsWith',  pattern: /\^~/ });
export const StartsWith   = createToken({ name: 'StartsWith',   pattern: /\^=/ });
export const IEndsWith    = createToken({ name: 'IEndsWith',    pattern: /\$~/ });
export const EndsWith     = createToken({ name: 'EndsWith',     pattern: /\$=/ });
export const INotEqual    = createToken({ name: 'INotEqual',    pattern: /!~/ });

// Single-char operators after
export const GreaterThan  = createToken({ name: 'GreaterThan',  pattern: />/ });
export const LessThan     = createToken({ name: 'LessThan',     pattern: /</ });
export const IEquals      = createToken({ name: 'IEquals',      pattern: /~/ });
export const Equals       = createToken({ name: 'Equals',       pattern: /=/ });

// Structural tokens
export const Semicolon    = createToken({ name: 'Semicolon',    pattern: /;/ });  // AND
export const Pipe         = createToken({ name: 'Pipe',         pattern: /\|/ }); // OR
export const Comma        = createToken({ name: 'Comma',        pattern: /,/ });
export const LParen       = createToken({ name: 'LParen',       pattern: /\(/ });
export const RParen       = createToken({ name: 'RParen',       pattern: /\)/ });

// Literals
export const StringLiteral = createToken({
  name: 'StringLiteral',
  pattern: /"(?:[^"\\]|\\.)*"/,
});

export const Token = createToken({
  name: 'Token',
  pattern: /[^,"|;!=><^$\*~()\s]+/,
});

// NullLiteral uses longer_alt so "nullable" still matches Token
export const NullLiteral = createToken({
  name: 'NullLiteral',
  pattern: /null/,
  longer_alt: Token,
});

export const WhiteSpace = createToken({
  name: 'WhiteSpace',
  pattern: /\s+/,
  group: Lexer.SKIPPED,
});
Enter fullscreen mode Exit fullscreen mode

The longer_alt on NullLiteral is worth noting. Without it, a field named nullable would partially match as null followed by able. Chevrotain's longer_alt says: only match this token if there isn't a longer match available for the alternative token.

The Filter Grammar

With tokens defined, the grammar itself describes how those tokens compose into valid filter expressions:

filter     → or_expr
or_expr    → and_expr ( "|" and_expr )*
and_expr   → atom ( ";" atom )*
atom       → condition | "(" or_expr ")"
condition  → field operator values
values     → value ( "," value )*
value      → Token | StringLiteral | NullLiteral
Enter fullscreen mode Exit fullscreen mode

AND binds tighter than OR for the same reason * binds tighter than + in arithmetic. and_expr is nested inside or_expr. The atom → "(" or_expr ")" rule is where arbitrary grouping comes from. Comma-separated values in values naturally handles IN queries without any special casing.

Some valid inputs from this grammar:

status=active
status=active;age>=18
status=active|status=pending
(status=active|status=pending);age>=18
status=active,pending
name="John Doe"
name*~john
deletedAt=null
Enter fullscreen mode Exit fullscreen mode

Recursive Descent

The standard way to implement a parser from a CFG like this is recursive descent, where each non-terminal becomes a function, the structure of the functions mirrors the grammar, and mutual recursion is what handles nesting. Chevrotain generates this parser from the grammar rules you define.

When the parser hits (, it recurses back up to or_expr. The call stack is the implicit parse stack. This is why recursive descent parsers map so naturally to CFGs. The language's recursive structure and the code's recursive structure are the same thing.

The output is a parse tree (AST):

(status=active|status=pending);age>=18

AND
├── OR
│   ├── condition: status = active
│   └── condition: status = pending
└── condition: age >= 18
Enter fullscreen mode Exit fullscreen mode

This tree is what gets handed to the ORM adapter. Translating it to a Prisma where or TypeORM QueryBuilder is a recursive walk. and nodes become AND: [...], or nodes become OR: [...], leaf conditions become field predicates. The parser and the ORM adapter don't know about each other; the tree is the contract between them.

Building This for NestJS

This is the approach behind nestjs-filter-grammar. The packages are:

The library adds two things on top of the raw grammar: field validation and ORM adapters.

Field validation is necessary because a raw parser accepts any field name and operator. You need to reject requests that filter on undeclared fields, use operators the field doesn't support, or pass values that can't be coerced to the right type. This is handled with decorators:

@Filterable()
class UserQuery {
  @FilterableColumn([FilterOperator.eq, FilterOperator.iContains])
  @SortableColumn()
  name!: string;

  @FilterableColumn([FilterOperator.eq, FilterOperator.neq], { type: Status })
  @SortableColumn()
  status!: Status;

  @FilterableColumn([FilterOperator.gte, FilterOperator.lte], { type: 'number' })
  @SortableColumn()
  age!: number;
}
Enter fullscreen mode Exit fullscreen mode

You're explicitly declaring which fields are filterable and which operators each one accepts. After parsing, each condition node in the tree is checked against this schema before anything reaches the ORM.

The controller side uses a @Filter() decorator that parses the query string and runs validation:

@Controller('users')
export class UsersController {
  @Get()
  findAll(@Filter(UserQuery) { filter, sort, query }: FilterResult) {
    // filter is a validated parse tree, or undefined
    // sort is a parsed list of sort entries, or undefined
  }
}
Enter fullscreen mode Exit fullscreen mode

ORM adapters take the validated tree and produce queries. For Prisma:

import { applyFilter, applySort } from '@nestjs-filter-grammar/prisma';

const users = await prisma.user.findMany({
  where:   filter ? applyFilter(filter) : undefined,
  orderBy: sort   ? applySort(sort)     : undefined,
});
Enter fullscreen mode Exit fullscreen mode

Same pattern for TypeORM and MikroORM, different adapter package, same interface.

The Client Query Builder

The grammar being formal means it can be exposed to clients in a type-safe way. @nestjs-filter-grammar/core automatically enriches your OpenAPI spec with x-filter-grammar extensions when @nestjs/swagger is installed. These extensions describe exactly which fields are filterable, which operators each supports, and what types they expect.

@nestjs-filter-grammar/client-query-builder reads those extensions and generates typed filter builders:

import { and, or, sort } from '@nestjs-filter-grammar/client-query-builder';
import { FilterUsers, SortUsers } from './generated';

const filter = and(
  or(
    FilterUsers.status.eq('active'),
    FilterUsers.status.eq('pending'),
  ),
  FilterUsers.age.gte(18),
).build();
// → "(status=active|status=pending);age>=18"

const sortStr = sort(SortUsers.name.asc(), SortUsers.age.desc()).build();
// → "+name,-age"

fetch(`/api/users?filter=${filter}&sort=${sortStr}`);
Enter fullscreen mode Exit fullscreen mode

The codegen step reads your OpenAPI spec:

npx filter-grammar generate ./openapi.json -o ./src/generated
Enter fullscreen mode Exit fullscreen mode

Orval is one example of an OpenAPI client generator you can pair this with. Orval generates your API client hooks (React Query, SWR, Axios etc.) and filter-grammar generate generates the typed filter builders from the same spec. But any OpenAPI client generator works; the filter builder generation is a separate step that only needs the spec. The point is that the filter language your server accepts gets surfaced to the client with full type coverage, without maintaining a separate contract.

The Sort Grammar

Sorting is a simpler case, no boolean logic needed, just a list of fields with direction:

sort       → sort_entry ( "," sort_entry )*
sort_entry → direction? field
direction  → "+" | "-"
Enter fullscreen mode Exit fullscreen mode

sort=+name,-createdAt means name ascending, createdAt descending. + is the default so it can be omitted. This is a linear scan with no recursion.


GitHub: robertozimek/nestjs-filter-grammar

Top comments (0)