Expression Parser : Part 2 - Define and implement a visitor for MongoDB

#javascript #typescript #parser #visitor

In the first part, we saw how to use Chevrotain to write a small parser. The post is available here.

To use the output of a parser, i.e. a syntax tree, we have several solutions. We can discard the interpreter, which is unsuitable in our case, and focus on either the Listener or the Visitor.

The main difference between the Listener and the Visitor is that the Listener will walk through the tree in one pass, node by node, from start to finish, triggering events related to the traversal, while the Visitor can decide when and how the nodes will be visited.

A Xml type language can be parsed with a Listener, as SAX parsers do. A language such as C# will have to go through a Visitor to allow further analysis and optimizations that will require to go through some nodes several times.

Defining the Visitor

Our goal is for our micro filtering language to be usable by multiple database providers, we need to start by defining the interfaces that represent it, in order to provide a model for the various implementations.

Each non-terminal node will be represented by a method. Each method will take a context object that will contain the specific data to understand and use them during the traversal.

andOrExp example

So let's try to define the andOrExp node. To start with, let's create a method to represent it.

  /**
   * [expression]--(AndOp | OrOp)?--[expression]?
   * @param ctx 
   */
  andOrExp:(ctx: AndOrExpNodeContext) => unknown;

The method should return an unknown type because we cannot define the method return type. It will be set depending on the database provider.

The AndOrExpNodeContext object should represent all data that allow us to interact with all tokens or non terminal nodes connected to this one.

export type AndOrExpNodeContext = CstChildrenDictionary & {
  lhs    : [ExpressionNode] ;
  rhs   ?: ExpressionNode[] ;
  AndOp ?: IToken[]         ;
  OrOp  ?: IToken[]         ;
}

The nodes and tokens available through the context will be represented as an array, because these elements can be defined several times. The node on the left can only be defined once, so it is typed as an array of a single element.

We need to do the same for each non terminal node. The definition will look like this :

export interface IFilterInterpretor {
  /**
   * [andOrExp]--[orderBy]?--[skip]?--[take]?
   * @param ctx ExpressionsContext
   */
  expressions: (ctx: ExpressionsContext) => unknown;

  /**
   * [expression]--(AndOp | OrOp)?--[expression]?
   * @param ctx 
   */
  andOrExp: (ctx: AndOrExpNodeContext) => unknown;

  /**
   * (OrderBy)--(Identifier)+--(Asc | Desc)+
   * @param ctx 
   */
  orderBy: (ctx: OrderByNodeContext) => unknown;

  /**
   * (Take)--(Integer)
   * @param ctx 
   */
  take: (ctx: TakeNodeContext) => unknown;

  /**
   * (Skip)--(Integer)
   * @param ctx 
   */
  skip: (ctx: SkipNodeContext) => unknown;

  /**
   * [compareRule] | [inExp] | [notInExp] | [parentAndOrExp]
   * @param ctx 
   */
  expression: (ctx: ExpressionNodeContext) => unknown;

  /**
   * (Identifier)--(EqOp | NotEqOp | GtOp | GteOp | LtOp | LteOp)?--[atomicExp]
   * @param ctx 
   */
  compareRule: (ctx: CompareRuleNodeContext) => unknown;

  /**
   * (Identifier)--(InOp)--[array]
   * @param ctx 
   */
  inExp: (ctx: InExpNodeContext) => unknown;

  /**
   * (Identifier)--(NotInOp)--[array]
   * @param ctx 
   */
  notInExp: (ctx: NotInExpNodeContext) => unknown;

  /**
   * (LParen)--[andOrExp]--(RParen)
   * @param ctx 
   */
  parentAndOrExp: (ctx: ParentAndOrExpNodeContext) => unknown;


  /**
   * (Integer) | (Float) | (String) | [dateExp]
   * @param ctx 
   */
  atomicExp: (ctx: AtomicExpNodeContext) => unknown;

  /**
   * (Dt)--(LCurly)--(String)--(RCurly)
   * @param ctx 
   */
  dateExp: (ctx: DateExpNodeContext) => unknown;

  /**
   * (LBraket)--[atomicExp]--(Comma)*--[atomicExp]*--(RBraket)
   * @param ctx 
   */
  array: (ctx: ArrayNodeContext) => unknown;
}

Implementing the visitor for MongoDB

We will see the strategy used to transform our initial filter into a MongoDB usable version. For this we need to implement a visitor based on the previous definition.

The global rule definition

We need to return the global filtering object as it is needed by MongoDB.

  expressions(ctx: Filter.ExpressionsContext)  {
    const query = ctx.andOrExp ? { "$query" : this.visit(ctx.andOrExp) } : {};

    return  {
      filter: query ,
      aggregate: [
        ctx.orderBy && this.visit(ctx.orderBy, true),
        ctx.skip    && this.visit(ctx.skip),
        ctx.take    && this.visit(ctx.take)
      ].filter(_ => _)
    } as ExpressionResult;
  }

As you can see, we focus only on what the current rule should do, and rely on the result returned by other nodes when necessary.

To get the result of an orderBy rule, for example, we just have to call the visit method with the orderBy context available in the current context. .filter(_ => _) is used to remove empty elements.

Returning the result as ExpressionResult type will allow the method to infer the result and force the unknown type to become an ExpressionResult type instead of an any type.

A more complex one, the andOrExp

  andOrExp(ctx: Filter.AndOrExpNodeContext) {
    let leftHandSide = this.visit(ctx.lhs);

    let opTokens = [] as IToken[];
    ctx.AndOp && opTokens.push(...ctx.AndOp);
    ctx.OrOp  && opTokens.push(...ctx.OrOp);

    let rightHandSide = [] as any[];

    if (ctx.rhs) {
      rightHandSide = ctx.rhs.map(_ => this.visit(_));
    }

    rightHandSide.unshift(leftHandSide);
    opTokens = opTokens.sort((a,b) => a.startOffset - b.startOffset);

    if (rightHandSide.length === 1) return rightHandSide.pop();
    let prev = rightHandSide.shift();

    opTokens.forEach(_ => {
      prev = { [`$${_.image}`] : [ prev, rightHandSide.shift() ] }
    });

    return prev;
  }

What makes it more complex ? The answer is simple, Chevrotain vitisor contexts are table based and not recursived. This means that if the current node has a many chained node, all occurences of the node are represented in an array at the same level.

So if in the current node we have this : ( XXX eq 10 and (YYY eq 20 or YYY eq 25)) and ZZZ eq 30 or ZZZ eq 35, how to properly handle all AND and all OR tokens ?

In our rule definition, AND and OR operators are alternatives, but declared as 2 arrays. And each right-hand expression that comes after an operator is provided in an expression type array too.

As we can have left and right expression, we need to sort everything in order to build the correct filter as a result.

expression nodes

Left and right expression rule is named lhs and rhs, for Left and Right hand side, but are of the same type. We know that the left expression is always defined, but not the right ones.

We can build an expression array to get all right expressions, and add the left one at the begining. This array will contain all expressions already sorted by default.

For the operators, we need to merge and sort all of them in one array too.

 let opTokens = [] as IToken[];
 ctx.AndOp && opTokens.push(...ctx.AndOp);
 ctx.OrOp  && opTokens.push(...ctx.OrOp);
 /* ... */
 opTokens = opTokens.sort((a,b) => a.startOffset - b.startOffset);

Now that all operators and expressions are sorted, we can process all operators from the operator array, and we will find the corresponding expression at the same index in the expression array.

The final class looks like this :


export class MongoDBFilterVisitor extends BaseCstVisitor  implements IFilterInterpretor {
  constructor() {
    super();
    this.validateVisitor();
  }

  expressions(ctx: Filter.ExpressionsContext)  {
    const query = ctx.andOrExp ? { "$query" : this.visit(ctx.andOrExp) } : {};

    return  {
      filter: query ,
      aggregate: [
        ctx.orderBy && this.visit(ctx.orderBy, true),
        ctx.skip    && this.visit(ctx.skip),
        ctx.take    && this.visit(ctx.take)
      ].filter(_ => _)
    } as ExpressionResult;
  }

  andOrExp(ctx: Filter.AndOrExpNodeContext) {
    let leftHandSide = this.visit(ctx.lhs);

    let opTokens = [] as IToken[];
    ctx.AndOp && opTokens.push(...ctx.AndOp);
    ctx.OrOp  && opTokens.push(...ctx.OrOp);

    let rightHandSide = [] as any[];

    if (ctx.rhs) {
      rightHandSide = ctx.rhs.map(_ => this.visit(_));
    }

    rightHandSide.unshift(leftHandSide);
    opTokens = opTokens.sort((a,b) => a.startOffset - b.startOffset);

    if (rightHandSide.length === 1) return rightHandSide.pop();
    let prev = rightHandSide.shift();

    opTokens.forEach(_ => {
      prev = { [`$${_.image}`] : [ prev, rightHandSide.shift() ] }
    });

    return prev;
  }

  orderBy(ctx: Filter.OrderByNodeContext, shouldAggregate: boolean = false) { 
    const ids = ctx.Identifier.sort((a,b) => a.startOffset - b.startOffset);
    const dirs = [...ctx?.Asc ?? [], ...ctx?.Desc ?? []].sort((a,b) => a.startOffset - b.startOffset);

    const items = {} as any;
    ids.forEach((_, i) => {
      items[_.image] = dirs[i].image === "asc" ? 1 : -1;
    });

    return { [shouldAggregate ? "$sort" : "$orderby"]: items };
  }

  take(ctx: Filter.TakeNodeContext) { 
    return { "$limit": Number(ctx.Integer[0].image) };
  }

  skip(ctx: Filter.SkipNodeContext) { 
    return { "$skip": Number(ctx.Integer[0].image) };
  }

  expression(ctx: Filter.ExpressionNodeContext) { 
    if (ctx.compareRule) return this.visit(ctx.compareRule);
    if (ctx.inExp)       return this.visit(ctx.inExp);
    if (ctx.notInExp)    return this.visit(ctx.notInExp);
    return this.visit(ctx.parentAndOrExp);
  }

  compareRule(ctx: Filter.CompareRuleNodeContext) { 
      const cmp = {} as any;
      let cmpOp = "";

      if (ctx.EqOp)     cmpOp = "$eq";
      if (ctx.NotEqOp)  cmpOp = "$ne";
      if (ctx.GtOp)     cmpOp = "$gt";
      if (ctx.GteOp)    cmpOp = "$gte";
      if (ctx.LtOp)     cmpOp = "$lt";
      if (ctx.LteOp)    cmpOp = "$lte";

      cmp[ctx.Identifier[0].image] = {
        [cmpOp]: ctx.Identifier[0].image === "id" ? new MongoDB.ObjectID(this.visit(ctx.atomicExp)) : this.visit(ctx.atomicExp)
      };

      return cmp;
  }

  inExp(ctx: Filter.InExpNodeContext) { 
    return {
      [ctx.Identifier[0].image] : {
        "$in": this.visit(ctx.array, ctx.Identifier[0].image === "id")
      }
    }
  }

  notInExp(ctx: Filter.NotInExpNodeContext) { 
    return {
      [ctx.Identifier[0].image] : {
        "$nin": this.visit(ctx.array)
      }
    }
  }

  parentAndOrExp(ctx: Filter.ParentAndOrExpNodeContext) { 
    return this.visit(ctx.andOrExp);
  }

  atomicExp(ctx: Filter.AtomicExpNodeContext) { 
    if (ctx.Float)   return Number(ctx.Float[0].image); 
    if (ctx.Integer) return Number(ctx.Integer[0].image); 
    if (ctx.String)  return ctx.String[0].image.slice(1, ctx.String[0].image.length - 1); 
    if (ctx.dateExp) return this.visit(ctx.dateExp); 
  }

  dateExp(ctx: Filter.DateExpNodeContext) { 
    return Date.parse(ctx.String[0].image.slice(1, ctx.String[0].image.length - 1));
  }

  array(ctx: Filter.ArrayNodeContext, convertToId: boolean = false) { 
    const res = ctx.atomicExp.map(_ => this.visit(_));
    return convertToId ? res.map(_ => new MongoDB.ObjectID(_)) : res;
  }          

}

Conclusion

We have seen how to implement our visitor to provide something that can be processed by MongoDB. Following this, we can imagine to implement the same for SQLite or MySql (MariaDB)...

Enjoy !