Sasa

Posted on May 19

How to detect and protect ESP tokens across 5 different template syntaxes

#email #webdev #python #javascript

How to detect and protect ESP tokens across 5 different template syntaxes

When you're building a multilingual email workflow, one problem shows up immediately: every major email service provider uses a different syntax for personalization tokens. And every one of them will break silently if a translator touches the wrong characters.

This is the problem we solved building Transendly — a localization workspace for HTML email campaigns. Here's what we learned about token detection across the five most common ESP syntaxes.

The five syntaxes you'll encounter

1. Handlebars — SendGrid, Postmark

SendGrid Dynamic Templates and Postmark both use Handlebars-compatible syntax:

{{first_name}}
{{#if customer.plan}}
  You're on the {{customer.plan}} plan.
{{/if}}
{{#each items}}
  {{this.name}} — {{this.price}}
{{/each}}
{{{unescaped_html}}}

Key things to detect:

Double-stache {{variable}} — simple interpolation
Triple-stache {{{variable}}} — unescaped HTML output (Postmark uses this for {{{pm:unsubscribe}}})
Block helpers: {{#if}}...{{/if}}, {{#each}}...{{/each}}
Nested paths: {{customer.first_name}}

The triple-stache is a common failure point. Translators working in raw HTML often "fix" the extra brace because it looks like a typo.

2. Django template tags — Klaviyo

Klaviyo uses Django-style syntax with a key difference: filter chaining.

{{ first_name }}
{{ first_name|default:"there" }}
{{ order_total|currency }}
{{ description|truncatewords:20 }}
&#123;% if person.plan == "vip" %&#125;
  Exclusive content here.
&#123;% endif %&#125;

The |filter syntax is the dangerous part. A translator who sees {{ first_name|default:"there" }} will sometimes translate "there" — which is actually correct — but will also sometimes translate the filter name default or the variable name first_name. Both break the template.

Django tags also use spaces inside the braces ({{ variable }} not {{variable}}), which means your regex needs to handle both variants.

3. Liquid — Shopify Email, some ActiveCampaign flows

Liquid is used in Shopify's email templates and some other platforms:

{{ customer.first_name }}
{{ customer.email | upcase }}
&#123;% if customer.orders_count > 1 %&#125;
  Thanks for being a returning customer.
&#123;% endif %&#125;
&#123;% for item in order.line_items %&#125;
  {{ item.title }}: {{ item.price | money }}
&#123;% endfor %&#125;

Liquid looks similar to Django but has important differences:

Filter syntax uses | with a space on both sides: {{ value | filter }}
Block tags use {% %} with the tag name: {% if %}, {% for %}
Object access uses dot notation: customer.first_name

The {% %} blocks are particularly problematic because translators sometimes interpret them as HTML comments or unknown tags and delete them.

4. Merge tags — Mailchimp, Constant Contact

Mailchimp uses a completely different pattern — asterisk-pipe delimiters:

*|FNAME|*
*|LNAME|*
*|EMAIL|*
*|UNSUB|*
*|MC:SUBJECT|*
*|IF:FNAME|* Hello *|FNAME|*, *|ELSE:|* Hello there, *|END:IF|*

This syntax stands out visually, which is good — translators usually recognize it as "code". But the conditional blocks (*|IF:...|*, *|ELSE:|*, *|END:IF|*) are often mishandled because they look more like markup than the simpler variable tags.

5. Percent-delimited — ActiveCampaign, some legacy ESPs

ActiveCampaign uses percent signs as delimiters:

%FIRSTNAME%
%LASTNAME%
%EMAIL%
%UNSUBSCRIBELINK%
%CUSTOM_FIELD_NAME%

This is the simplest syntax to detect but also the easiest to accidentally break. The uppercase convention helps — translators rarely translate uppercase strings — but %UNSUBSCRIBELINK% occasionally gets "translated" to %LIENDEDESABONNEMENT% in French workflows.

Detection approach

For each syntax, you need a regex that:

Matches the full token including delimiters
Handles nested or block structures
Avoids false positives on similar-looking content

Here's a starting point for each:

import re

PATTERNS = {
    # Handlebars: {{variable}}, {{{variable}}}, {{#helper}}...{{/helper}}
    "handlebars": re.compile(
        r'\{{2,3}[#/^]?\s*[\w.]+(?:\s+[\w"\'=\s,]+)?\s*\}{2,3}'
    ),

    # Django: {{ variable }}, {{ variable|filter }}, &#123;% tag %&#125;
    "django": re.compile(
        r'\&#123;%[-\s]*\w[\w\s"\'=,.|:()]*[-\s]*%\}|\{\{[-\s]*[\w.|:()"\' ]+[-\s]*\}\}'
    ),

    # Liquid: {{ variable | filter }}, &#123;% tag %&#125;
    "liquid": re.compile(
        r'\&#123;%-?\s*[\w\s"\'=,.|:()\-]+\s*-?%\}|\{\{-?\s*[\w.|:()"\'\ ]+\s*-?\}\}'
    ),

    # Mailchimp merge tags: *|TAG|*, *|IF:TAG|*...*|END:IF|*
    "mailchimp": re.compile(
        r'\*\|[A-Z0-9_:]+\|\*'
    ),

    # Percent-delimited: %VARIABLE%
    "percent": re.compile(
        r'%[A-Z][A-Z0-9_]+%'
    ),
}

def detect_esp(html: str) -> str | None:
    """Detect which ESP syntax is present in an HTML template."""
    scores = {}
    for name, pattern in PATTERNS.items():
        matches = pattern.findall(html)
        scores[name] = len(matches)

    if not any(scores.values()):
        return None

    return max(scores, key=scores.get)

def extract_tokens(html: str, esp: str) -> list[str]:
    """Extract all tokens from an HTML template for a given ESP."""
    if esp not in PATTERNS:
        raise ValueError(f"Unknown ESP: {esp}")
    return PATTERNS[esp].findall(html)

The protection strategy

Once you've detected and extracted tokens, the challenge is keeping them intact through translation while allowing surrounding text to be modified.

The approach we use:

1. Extract and replace with placeholders

def protect_tokens(html: str, esp: str) -> tuple[str, dict]:
    """Replace tokens with stable placeholders. Returns protected HTML and token map."""
    tokens = {}
    protected = html

    for i, token in enumerate(extract_tokens(html, esp)):
        placeholder = f"⟦T{i}⟧"  # Use unusual characters unlikely to appear in translations
        tokens[placeholder] = token
        protected = protected.replace(token, placeholder, 1)

    return protected, tokens

def restore_tokens(translated: str, token_map: dict) -> str:
    """Restore original tokens after translation."""
    restored = translated
    for placeholder, original in token_map.items():
        restored = restored.replace(placeholder, original)
    return restored

2. Validate after restoration

def validate_tokens(original: str, restored: str, esp: str) -> list[str]:
    """Check that all tokens from original are present in restored HTML."""
    original_tokens = set(extract_tokens(original, esp))
    restored_tokens = set(extract_tokens(restored, esp))

    missing = original_tokens - restored_tokens
    added = restored_tokens - original_tokens

    errors = []
    if missing:
        errors.append(f"Missing tokens after translation: {missing}")
    if added:
        errors.append(f"Unexpected tokens after translation: {added}")

    return errors

Edge cases that will burn you

Translated filter arguments in Django/Liquid

# Source:
{{ first_name|default:"there" }}

# After translation (broken):
{{ first_name|default:"là" }}   # French translator translated "there"

The default fallback string is technically translatable content — but it's inside a token. You need to decide: protect the entire token including the argument, or extract just the argument for translation. We protect the entire token and handle fallback strings separately.

RTL languages and bidirectional tokens

Arabic and Hebrew email templates render right-to-left, but token syntax is always LTR. Browsers handle this with dir attributes and unicode bidi marks. If you're using a WYSIWYG translation interface, you need to ensure the token placeholders don't inherit RTL direction and render backwards.

Tokens inside HTML attributes

<a href="https://example.com/account/{{customer_id}}">View account</a>
<img src="{{product_image_url}}" alt="{{product_name}}">

Tokens inside href and src attributes are at higher risk. Some translation tools will attempt to "fix" URLs they detect, normalizing or encoding the token syntax in the process.

Conditional blocks that span multiple translated segments

&#123;% if customer.plan == "premium" %&#125;
  You have access to all features.
&#123;% else %&#125;
  Upgrade to unlock everything.
&#123;% endif %&#125;

If your translation tool splits segments at sentence boundaries, the {% if %} and {% endif %} may end up in different segments. The translator working on the first segment has no idea the second segment is conditional on the same block.

What we built

This is the core problem Transendly solves — a governed workflow where the token detection, extraction, and restoration happens automatically, translators work in a clean interface without seeing raw token syntax, and validation runs before any locale can be exported to the ESP.

If you're building something similar or have hit edge cases we haven't covered here, I'd be interested to hear about it in the comments.

Tags: email webdev javascript python

DEV Community

How to detect and protect ESP tokens across 5 different template syntaxes

How to detect and protect ESP tokens across 5 different template syntaxes

The five syntaxes you'll encounter

1. Handlebars — SendGrid, Postmark

2. Django template tags — Klaviyo

3. Liquid — Shopify Email, some ActiveCampaign flows

4. Merge tags — Mailchimp, Constant Contact

5. Percent-delimited — ActiveCampaign, some legacy ESPs

Detection approach

The protection strategy

Edge cases that will burn you

What we built

Top comments (0)