Suyash Thakur

Posted on Apr 16

How We Built an Expression Engine Inside Image Templates

#api #webdev #javascript #saas

I'm Suyash, solo founder of Pictify. What started as a side project for GIF templating has slowly turned into a full infrastructure layer for dynamic image generation. I build the whole thing alone: backend, frontend, editor, renderer, and apparently now, a custom expression engine.

I want to tell you the story of the engineering problem that almost made me rethink the entire product, and the ~1,000-line solution that turned it from a rendering utility into something genuinely useful.

The Ticket That Started Everything

Last year, a customer on our Slack channel sent this message:

"Hey, love the API. Quick question: can I show a 'PRO' badge on the card only when the user is on a paid plan? Right now I'm maintaining two separate templates and picking one in my code."

I said no. At the time, Pictify's templates were exactly what every image generation API offers: a canvas with variable placeholders. You mark a text element as {{ name }}, send { "name": "Sarah" } in the API call, and the renderer swaps it in. Find and replace. Done.

That customer was maintaining two templates that were identical except for a tiny badge. But here's the thing. Their next question was worse:

"Also, can I change the background color based on a score? Green for 80+, yellow for 50-79, red for below 50?"

Now they needed six templates. Two badge states × three colors. And they could see where this was going. Every new condition doubled the template count. Their "simple image API" was turning into a combinatorial nightmare managed entirely in application code.

We heard this same pattern from five other customers that month. The template system was the bottleneck, and the bottleneck was that templates had no logic.

The Template Explosion Problem

Let me make this concrete, because it's the core of why we built the expression engine.

On the left: the world every image API lives in. Your backend code picks a template based on conditions, passes variables, and gets an image. Two conditions × three colors = six templates. Add "show testimonial only if available" and you're at twelve. Add "different CTA for mobile vs desktop" and you're at twenty-four.

On the right: what happens when the template itself can evaluate conditions. One template. Any number of rules. Zero code changes. The API caller sends data; the template decides what to show.

This is the product insight that drove the whole project: the people managing templates (marketers, product managers, designers) shouldn't need to file engineering tickets to add personalization logic. The logic should live inside the template.

Why We Couldn't Use Existing Tools

Before building anything, we spent a week evaluating existing solutions. We needed something that could:

Evaluate boolean expressions safely (user-supplied input, no code execution)
Work on canvas object properties (not HTML strings)
Be readable by non-engineers (no JSON blobs)

Here's what we found:

eval() / new Function() was dead on arrival. Users supply expressions. A malicious user could access process.env, require('child_process'), or worse. Even with sandboxing attempts, the attack surface is enormous.

Handlebars, Nunjucks, Liquid are string template languages. They're designed to interpolate variables into HTML. Our templates aren't strings. They're FabricJS canvas objects: a JSON tree of rectangles, text nodes, images, and groups. We needed to evaluate expressions on object properties (should this rectangle be visible? what color should it be?), not render HTML.

JSON Logic / json-rules-engine had the right concept, but the syntax is hostile. {"and": [{">=": [{"var":"score"}, 80]}, {"var": "isPremium"}]} versus score >= 80 && isPremium. The people building templates in our visual editor aren't writing JSON DSLs. They're typing conditions into a text field.

So we built our own. A tokenizer + recursive descent parser that runs in a sandboxed context. No eval, no prototype access, no arbitrary code execution. About 1,000 lines including all 45 built-in functions.

The Architecture: Three Layers

Before diving into the expression engine itself, here's where it fits in the rendering pipeline:

Four steps, running sequentially:

API request arrives with a template ID and variables ({ name: "Sarah", plan: "pro", score: 92 })
Data Mapper extracts and transforms data using JSONPath ($.user.name → "Sarah") with fallback defaults
Expression Engine processes the template: evaluates conditionals, expands loops, interpolates text, applies filters
Canvas Renderer takes the processed FabricJS JSON, renders it with node-canvas, and uploads to S3/CDN

The expression engine adds ~2-5ms to total render time. The bottleneck is always canvas rendering (50-150ms). So we had room to add significant logic without hurting performance.

Building the Tokenizer

The first piece: turning a string like score >= 80 && isPremium into a stream of typed tokens.

The tokenizer does a single character-by-character pass over the expression. No regex. It recognizes:

Identifiers: variable names like score, isPremium, user.name
Operators: ==, !=, >, <, >=, <=
Logical operators: &&, ||, and, or, not
Literals: numbers, strings (single/double quoted), true, false, null
Special characters: (, ), ., ,, |, [, ]

The critical design choice: identifiers resolve against a variable context object, not the JavaScript scope. When the evaluator sees score, it doesn't look for a JavaScript variable called score. It does context["score"]. This is what makes the system safe. The expression can only read values you explicitly pass in.

// This is all the tokenizer does for identifiers:
if (/[a-zA-Z_$]/.test(expr[i])) {
  let value = '';
  while (i < expr.length && /[a-zA-Z0-9_$]/.test(expr[i])) {
    value += expr[i];
    i++;
  }
  // Check for keywords (true, false, null, and, or, not)
  // Otherwise: it's an IDENTIFIER token
  tokens.push({ type: TOKEN_TYPES.IDENTIFIER, value });
}

If someone tries require('fs'), the tokenizer sees require as an IDENTIFIER. The evaluator looks it up in the context, finds undefined, and returns false. No code execution happens.

The Recursive Descent Parser

The evaluator implements a standard recursive descent parser with operator precedence:

parseLogicalOr    →  ||, or     (lowest precedence)
  parseLogicalAnd →  &&, and
    parseNot      →  !, not
      parseComparison → ==, !=, >, <, >=, <=
        parsePrimary  → literals, identifiers, function calls, parens
                        (highest precedence)

Each level calls the next one down, which gives us correct evaluation order automatically. score > 80 && isPremium || isAdmin parses as (score > 80 && isPremium) || isAdmin. No special precedence-climbing code needed.

Here's the heart of the comparison evaluator:

parseComparison() {
  let left = this.parsePrimary();

  if (this.peek()?.type === TOKEN_TYPES.OPERATOR) {
    const operator = this.consume().value;
    const right = this.parsePrimary();

    switch (operator) {
      case '==': return left === right || left == right;
      case '!=': return left !== right && left != right;
      case '>':  return left > right;
      case '<':  return left < right;
      case '>=': return left >= right;
      case '<=': return left <= right;
    }
  }
  return left;
}

We intentionally use both strict and loose equality for ==, so "80" == 80 is true. Template builders don't think about type coercion, and API payloads sometimes send numbers as strings. This "do what I mean" behavior prevents a whole class of support tickets.

Text Interpolation and Filters

Inside text elements, we support double-brace syntax with pipe filters:

Hello, {{ name }}!
Price: {{ amount | currency }}
{{ title | uppercase | truncate: 40 }}

The interpolation engine finds all {{ ... }} blocks, splits on | to separate the expression from filters, evaluates the expression, then pipes the result through each filter.

We ship ~45 built-in filter functions:

// String
uppercase("sarah")         → "SARAH"
titleCase("new york")      → "New York"
truncate(longText, 50)     → "The quick brown fox..."

// Numbers
currency(29.99, "USD")     → "$29.99"
percent(0.85)              → "85%"
round(4.678, 1)            → 4.7

// Dates
date("2026-04-16", "short") → "Apr 16, 2026"

// Arrays
join(["a","b","c"], ", ")  → "a, b, c"
first(items)               → items[0]

// Logic
default(value, "N/A")      → "N/A" if value is null
isEmpty(text)              → true if "", null, or []

We also support inline conditionals inside text. Useful for things like "You have {{ if count == 1 }}1 item{{ else }}{{ count }} items{{ endif }}":

const conditionalRegex = 
  /\{\{\s*if\s+(.+?)\s*\}\}(.*?)(?:\{\{\s*else\s*\}\}(.*?))?\{\{\s*endif\s*\}\}/gs;

result = result.replace(conditionalRegex, (match, condition, truePart, falsePart) => {
  const conditionResult = evaluateExpression(condition.trim(), context);
  return conditionResult ? (truePart || '') : (falsePart || '');
});

This is intentionally not a full template language. No loops in text, no partials, no extends. Text interpolation handles the "put data in a string" case. Structural logic (showing/hiding whole elements, repeating them) happens at the object level. Clean separation.

Conditional Visibility: `showWhen` / `hideWhen`

This is where the expression engine becomes genuinely powerful for image generation. Every canvas object (text, rectangle, image, group) can have a showWhen or hideWhen expression:

{
  "type": "i-text",
  "text": "PRO",
  "showWhen": "plan == 'pro' || plan == 'enterprise'",
  "fill": "#FFD700",
  "left": 400,
  "top": 20
}

At render time, the engine evaluates the expression against the provided variables. If the condition is falsy, the entire object is removed from the canvas before rendering. Not hidden with CSS, not set to opacity 0. Removed from the FabricJS object tree entirely. The rendered image has no trace of it.

function processObjectWithContext(obj, context) {
  if (obj.showWhen) {
    const shouldShow = evaluateExpression(obj.showWhen, context);
    if (!shouldShow) return null; // Object is gone
  }

  if (obj.hideWhen) {
    const shouldHide = evaluateExpression(obj.hideWhen, context);
    if (shouldHide) return null;
  }

  // ... process text interpolation, variable bindings, etc.
  return obj;
}

Remember that customer with six templates? Now they have one. The PRO badge has showWhen: "plan == 'pro'". The green background has showWhen: "score >= 80". The yellow one has showWhen: "score >= 50 && score < 80". Three overlapping colored rectangles, each with a condition. The right one survives; the rest are removed before rendering.

It works recursively too. Groups with showWhen hide all their children. This means you can build complex conditional sections: a "premium features" panel that's a group containing a badge, a description, and a border, all controlled by one showWhen: "isPremium" on the group.

Loops: Repeating Elements

For elements that need to repeat (leaderboard rows, testimonial cards, product grids), we support loop expansion directly on canvas objects:

You define one template object with loopVariable pointing at an array. The engine clones it for each item, adjusts positions based on direction and spacing, and gives each clone its own context:

loopItems.forEach((item, index) => {
  const clonedObj = structuredClone(obj);

  const loopContext = {
    ...variables,
    [itemName]: item,        // current item
    [indexName]: index,       // 0, 1, 2...
    __loopFirst: index === 0,
    __loopLast: index === loopItems.length - 1,
    __loopLength: loopItems.length,
  };

  const processedObj = processObjectWithContext(clonedObj, loopContext);

  // Position based on layout direction
  if (obj.loopDirection === 'vertical') {
    processedObj.top = (obj.top || 0) + (index * (obj.loopSpacing || 50));
  } else if (obj.loopDirection === 'grid') {
    const cols = obj.loopColumns || 3;
    processedObj.left = (obj.left || 0) + ((index % cols) * obj.loopSpacingX);
    processedObj.top = (obj.top || 0) + (Math.floor(index / cols) * obj.loopSpacingY);
  }

  processedObj.id = `${obj.id}_loop_${index}`;
  processedObjects.push(processedObj);
});

Three layout modes: vertical (stack down), horizontal (lay out right), and grid (wrap into rows with configurable columns). Inside each iteration, {{ item.name }} and {{ if __loopLast }} just work because the loop context is merged with the parent variables.

If the array is empty, the object disappears entirely. No "No items to display" placeholder unless you explicitly add one with showWhen: "length(items) == 0".

Expression-Driven Properties

Beyond show/hide, expressions can drive visual properties directly:

{
  "type": "rect",
  "fillExpression": "score >= 80 ? '#22c55e' : '#ef4444'",
  "opacityExpression": "confidence / 100",
  "backgroundColorExpression": "isHighlighted ? '#ffc480' : 'transparent'"
}

The engine evaluates each *Expression property and applies the result:

if (obj.fillExpression) {
  const fillResult = evaluateExpression(obj.fillExpression, context);
  if (fillResult) obj.fill = String(fillResult);
}

if (obj.opacityExpression) {
  const opacityResult = evaluateExpression(obj.opacityExpression, context);
  if (typeof opacityResult === 'number') {
    obj.opacity = Math.max(0, Math.min(1, opacityResult));
  }
}

This means a single rectangle can change color, opacity, and visibility based on data, without duplicating objects on the canvas.

Context Variables: Personalization Without Data Sources

Here's my favorite part. For our Smart Links feature, we needed personalization that doesn't come from the API caller. It comes from the viewer.

When someone clicks a smart link, we resolve variables from the HTTP request itself:

The context variables system extracts geo-IP data, user-agent info, and timezone calculations from the viewer's request:

function buildContextVariables(requestContext) {
  const { device, geo, time, browser, referrer } = requestContext;

  return {
    'ctx.country': geo.country || '',
    'ctx.country_name': COUNTRY_NAMES[geo.country] || '',
    'ctx.city': geo.city || '',
    'ctx.device': device.type || 'desktop',
    'ctx.time_of_day': getTimeOfDay(time.hour),
    'ctx.greeting': getGreeting(time.hour),
    'ctx.referrer': referrer.domain || '',
  };
}

Combined with showWhen, one template becomes infinitely personalized:

"Good morning, San Francisco" on a share card, without the sender knowing the viewer's location
Mobile-specific CTAs with showWhen: "ctx.device == 'mobile'"
Localized content with showWhen: "ctx.country == 'DE'" showing German text
Referrer-aware messaging: different CTA when clicked from Twitter vs. email

The viewer has no idea they're looking at a dynamically generated image. It looks like a static PNG. But it was rendered on-the-fly with their context, in ~200ms, and cached on the CDN.

The Data Mapper: JSONPath to Variables

Before expressions even run, the data mapper transforms messy API responses into clean template variables. We built a custom JSONPath implementation (no external dependency) that handles the shapes we see in practice:

// Simple paths
$.product.name            → "Pictify Pro"

// Array access
$.items[0].price          → 29
$.items[*]                → [all items]

// Fallback syntax
$.stats.revenue || 0      → 0 if field is missing
$.user.bio || "No bio"    → "No bio" if null/undefined/empty

The || fallback is built into the mapper, not the expression engine. This means data source failures degrade gracefully. If an external API returns null for a field, the template gets the default value instead of rendering undefined as text.

const applyFallback = (value, expression, defaults = {}) => {
  if (value !== undefined && value !== null && value !== '') {
    return value;
  }

  if (expression.includes('||')) {
    const [, fallbackPart] = expression.split('||').map(s => s.trim());
    try { return JSON.parse(fallbackPart); }
    catch { return fallbackPart.replace(/^['"]|['"]$/g, ''); }
  }

  return value;
};

This one function eliminates an entire category of broken renders. Before we added it, every time an external data source had a missing field, the image rendered with "undefined" as literal text. Now it fails gracefully.

Performance

The expression engine adds 2-5ms to a typical template with 10-20 objects and a few conditions. Here's where time actually goes:

Phase	Time	What it does
Data mapping	~1ms	JSONPath extraction + fallbacks
Expression engine	~2-5ms	Conditionals, loops, text interpolation
Font registration	~10-30ms	Load custom fonts from S3
Canvas render	~50-150ms	FabricJS → node-canvas → PNG
S3 upload	~20-50ms	Fire-and-forget (non-blocking)
Total	~80-200ms

Design choices that keep the expression engine fast:

No regex for parsing. The tokenizer is a single character-by-character while loop. No match() calls, no backtracking.
No AST construction. The recursive descent parser evaluates directly. No intermediate tree allocation. The result of parseLogicalOr is the final boolean, not a node.
structuredClone() for loops. Faster than custom deep-clone for our object shapes.
Short-circuit evaluation. && and || skip the right side when unnecessary.

What We'd Do Differently

Ternary expressions. We don't support condition ? valueA : valueB in the expression language. Users work around it with three overlapping objects, each with a showWhen. It works but creates canvas clutter. If we were starting over, we'd add ternaries from day one.

A match function. Five overlapping objects for five states is ugly. Something like match(status, "active", "#22c55e", "pending", "#eab308", "error", "#ef4444") would cut object count significantly. This is on our roadmap.

Custom functions per template. Power users want formatters like formatPhoneNumber or calculateDiscount. We currently ship a fixed function set. User-defined functions with sandboxing is the next big expression engine feature.

The Takeaway

An expression engine for image templates is a surprisingly small project. Ours is ~1,000 lines including all 45 built-in functions. But it completely changes what a "template" can do.

Without it, every conditional personalization is an API-side if/else that selects a different template. The template count grows exponentially with the number of conditions.

With it, one template handles every variant. The person managing the template adds a condition in a text field. No code changes. No deploys. No engineering tickets.

If you're building any kind of template-to-image system (OG cards, certificates, share cards, reports), the expression layer is the thing that turns it from a rendering utility into a product. It's the difference between "we have an image API" and "we have a programmable image engine."

Don't skip it.

I'm building Pictify, an image generation API where templates have logic, not just variables. If your team is drowning in template variants or fighting Puppeteer servers, take a look.

DEV Community

How We Built an Expression Engine Inside Image Templates

The Ticket That Started Everything

The Template Explosion Problem

Why We Couldn't Use Existing Tools

The Architecture: Three Layers

Building the Tokenizer

The Recursive Descent Parser

Text Interpolation and Filters

Conditional Visibility: `showWhen` / `hideWhen`

Loops: Repeating Elements

Expression-Driven Properties

Context Variables: Personalization Without Data Sources

The Data Mapper: JSONPath to Variables

Performance

What We'd Do Differently

The Takeaway

Top comments (0)

The Ticket That Started Everything

The Template Explosion Problem

Why We Couldn't Use Existing Tools

The Architecture: Three Layers

Building the Tokenizer

The Recursive Descent Parser

Text Interpolation and Filters

Conditional Visibility: showWhen / hideWhen

Loops: Repeating Elements

Expression-Driven Properties

Context Variables: Personalization Without Data Sources

The Data Mapper: JSONPath to Variables

Performance

What We'd Do Differently

The Takeaway

Conditional Visibility: `showWhen` / `hideWhen`