elfalem for DealerOn Dev

Posted on Oct 5, 2019

Alternative view engine for ASP.NET Core: Parsing template syntax

#csharp #dotnet

In my last post, we looked at how we can hook into the ASP.NET Core architecture to create a custom view engine. Although we had a working view engine, it was quite rudimentary. One of the limitations mentioned was that there is only one variable named Message that we can bind to. In this post, we'll look at how we can use as many variables as we need for a given template. Be sure to check out the last post if you haven't done so already as it will make it easier to follow this one.

As it currently stands, our view rendering logic is using direct text replacement to look for {{Message}}:

var processedOutput = template.Replace("{{Message}}", context.ViewData["Message"]?.ToString());
return context.Writer.WriteAsync(processedOutput);

In order to build a more robust view engine, we will need to more intelligently analyze the template syntax by tokenizing and parsing it.

Tokenization and Parsing

Tokenization (or lexical analysis) is the process by which we identify tokens from the view template. Tokens are the specific components that are the building blocks of our view template syntax. Suppose that we're tokenizing the following arithmetic expression:

3 * -4 + 55

The specific tokens would be: 3, *, -, 4, +, 55. We can also assign categories to tokens. In this case, 3, 4, and 55 would be operands or numbers and *, - and + would be operators.

After tokenization the next part of the process is parsing. Parsing takes the tokens and generates some sort of data structure (such as a parse tree or abstract syntax tree) that represents the syntactic structure. Parsing the above arithmetic expression would result a tree as follows:

Having a structure like the above tree allows us to see the relationship between tokens so that we can process it appropriately whether that is evaluating an arithmetic expression or rendering a view from a template.

Looking at our mustache syntax template, we can tokenize and parse it in a similar fashion.

Original template:

Your Message: {{Message}}

The tokens from that template are: Your Message: (string), {{ (open braces), Message (string), }} (close braces).

In terms of parsing, since our mustache syntax is not quite hierarchical, a tree structure is not ideal. Instead we can use a simple list structure with two types of elements: text (that we render directly) and expression (that we treat as a variable and replace with a value). Here it is represented as a table:

-----------------------------
|Type       | Value         |                
-----------------------------
|text       | Your Message: |
|expression |{{Message}}    |
-----------------------------

Implementing a parser

Now that we have some idea of how we can tokenize and parse our template syntax, we'll turn our attention to how we can actually implement it in code. Depending on the template syntax that you're using, it may be possible to use an existing parser. For example, if the syntax is similar to HTML, you can perhaps use Html Agility Pack. Alternatively, if it's mustache or Liquid syntax, you can use fluid parser. If there is no existing parser for the syntax you are using, then you would have to create it. Although we can probably use the fluid parser for our case, we're going to build our own.

Creating a parser could be a very complex process but there are tools and libraries to facilitate it. One such library is Superpower, a parser construction toolkit. Superpower simplifies the process of creating tokenizers and parsers. The first step is to install the NuGet package into our app from the terminal:

dotnet add package Superpower

Once installed, create a new file Stache/StacheParser.cs. In addition to the StacheParser class, I'm going to include other small classes in the same file. Feel free to create separate files for these if that makes better organizational sense to you. You can find the full contents in this GitHub gist.

We'll first create an enum to represent our token categories.

public enum Tokens
{
  String,
  OpenBraces,
  CloseBraces
}

Next, we'll create a tokenizer in the StacheParser class.

public static Tokenizer<Tokens> Tokenizer  = new TokenizerBuilder<Tokens>()
  .Match(Span.EqualTo("{{"), Tokens.OpenBraces)
  .Match(Span.EqualTo("}}"), Tokens.CloseBraces)
  .Match(Span.MatchedBy(Character.ExceptIn('{','}')).Many(), Tokens.String)
  .Build();

Basically we read the input and attempt to match a specific set of characters that map to a token. The Match statements will be tried in order until a match is found. Note that the above tokenizer will run into issues if there are single braces present in the template (e.g. {{foo{bar}}). We will not address this limitation in this post as it involves creating more advanced tokenizers as opposed to using the building blocks provided by Superpower.

The next step is to create a parser. Let's create a class called ParseNode to represent parsed elements. It will be an abstract class that's derived by three other classes: TextNode for literal text, ExpressionNode for a mustache expression that we'll need to replace, and DocumentNode for a top level structure to hold a list of the other two nodes.

public abstract class ParseNode 
{
}

public class TextNode : ParseNode
{
  public string Value { get; set; }
}

public class ExpressionNode : ParseNode
{
  public string Value { get; set; }
}

public class DocumentNode : ParseNode
{
  public List<ParseNode> Nodes { get; set; }
}

Superpower is a parser combinator so it allows us to combine simpler parsers to create a more complex one. In our case we will create a simple parser for text literals in the StacheParser class.

private readonly static TokenListParser<Tokens, ParseNode> LiteralParser =
  from str in Token.EqualTo(Tokens.String)
  select (ParseNode)new TextNode {
    Value = str.ToStringValue()
  };

The above parser will consume tokens that are of type String and returns a TextNode.

We will create another parser for mustache expressions.

private readonly static TokenListParser<Tokens, ParseNode> ExpressionParser = 
  from ob in Token.EqualTo(Tokens.OpenBraces)
  from str in Token.EqualTo(Tokens.String)
  from cb in Token.EqualTo(Tokens.CloseBraces)
  select (ParseNode)new ExpressionNode {
    Value = str.ToStringValue()
  };

This parser will consume three tokens in order: an open brace followed by a string followed by a close brace. It will return an ExpressionNode with the value of the String token since that represents the variable we need to replace.

Putting it all together, we have the combined parser.

public readonly static TokenListParser<Tokens, ParseNode> MainParser =
  from nodes in ExpressionParser.Or(LiteralParser).Many()
  select (ParseNode)new DocumentNode {
    Nodes = nodes.ToList()
  };

The combined parser looks for tokens to match either the expression parser or the literal parser. The Many() tells it to match as many times as possible so that we consume all the tokens and process the entire template. It returns a DocumentNode with a list of the nodes returned from the other two parsers. We now have our tokenizer and parser that we can use to render views!

Using our parser

In StachView.cs, we can replace the contents of RenderAsync() with the following:

var template = File.ReadAllText(Path);

var tokens = StacheParser.Tokenizer.Tokenize(template);
var parsedResult = StacheParser.MainParser.TryParse(tokens);

var processedOutput = new StringBuilder();

if(parsedResult.HasValue){
  var document = (DocumentNode)parsedResult.Value;

  foreach(var node in document.Nodes){
    switch(node){
      case TextNode textNode:
        processedOutput.Append(textNode.Value);
      break;
      case ExpressionNode expNode:
        processedOutput.Append(context.ViewData[expNode.Value]?.ToString());              
      break;
    }
  }
}else{
  throw new Exception(parsedResult.ErrorMessage);
}

return context.Writer.WriteAsync(processedOutput.ToString());

In the above code, we run the template through the tokenizer. The resulting tokens are then parsed giving us a DocumentNode structure. We then process each parsed node and directly write text nodes or find variable values in ViewData for expression nodes.

We are now able to use any variable in our view template and bind to it in our controller.

For example, we can change our template to:

<h2>{{header}}</h2>

<p>{{content}}</p>

<small>{{footer}}</small>

and the Bar action in HomeController.cs to:

public IActionResult Bar(){
    ViewData["header"] = "Hello World!";
    ViewData["content"] = "Greetings from a mustache template.";
    ViewData["footer"] = "Powered by ASP.NET Core.";

    return View();
}

Then running the app (dotnet run) and navigating to https://localhost:5001/Home/Bar you should see:

Hello World!

Greetings from a mustache template.
Powered by ASP.NET Core.

Summary

In this post we explored the process of building a more flexible view engine that can tokenize and parse the template syntax. We now have gotten rid of one of the limitations mentioned in the first post. Using the Superpower library we can design a template syntax that suits our needs. Stay tuned for the next post in which we'll explore supporting inline CSharp expressions such as {{1 + 1}}.

Cover photo by Paul Skorupskas via Unsplash