SEN LLC

Posted on Jun 3

Try the Tech Radar #6 (Final) — Semantic Layer in 200 Lines: One Definition, Many SQL Consumers

#data #sql #webdev #javascript

Thoughtworks Technology Radar Vol 34 (April 2026) brought Semantic layer back to the Trial ring. It's the pattern of defining metrics, dimensions, and joins once — then letting BI tools, dashboards, and LLM agents consume them through the same definitions. Snowflake shipped "Semantic Views," Databricks shipped "Metric Views." It's no longer a BI plugin; it's infrastructure. I built a 500-line vanilla JS playground that takes a JSON metric model + a query → emits the SQL the engine would compile. The point isn't the SQL emission — it's the seam: one definition file, many query consumers, one compiler in the middle. This is also the final entry in the "Try the Tech Radar" series.

🌐 Demo: https://sen.ltd/portfolio/semantic-layer/
📦 GitHub: https://github.com/sen-ltd/semantic-layer

The failure mode the layer fixes

Three teams answer "what was revenue last month?" independently:

Executive dashboard: SELECT SUM(amount) FROM orders WHERE month = ...
BI team: SELECT SUM(amount) FROM orders WHERE status = 'paid' AND month = ...
LLM agent: SELECT SUM(amount) FROM orders (forgets the status filter entirely)

Three different numbers. Now the meeting is about whose number is right, not about the business decision. That's the world without a semantic layer.

With one, the metric lives in one definition:

{
  "name": "revenue",
  "source": "orders",
  "measure": { "agg": "sum", "column": "amount" },
  "dimensions": [
    { "name": "channel" },
    { "name": "country" },
    { "name": "month", "expr": "DATE_TRUNC('month', created_at)" }
  ]
}

Business rules (exclude refunds, only status='paid', fiscal-month boundary) get encoded inside the source view or hidden behind a base WHERE. Every consumer sees the same revenue. When an LLM does text-to-SQL, you point it at the semantic layer, not the raw schema. Radar's argument is that raw schemas alone aren't enough context for reliable LLM querying — they don't encode business rules.

What a query looks like

The consumer specifies "which metric, sliced by which dimensions, filtered how":

{
  "metric": "revenue",
  "group_by": ["channel"],
  "filters": [{ "column": "status", "op": "=", "value": "paid" }],
  "limit": 10
}

The compiler stitches it together:

SELECT
  channel AS channel,
  SUM(amount) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY channel
ORDER BY channel
LIMIT 10

Same revenue definition — country slice, month slice, both at once, filtered or not. Every consumer routes through the layer. Nobody is off-by-a-join from anyone else.

The guard: "you can only ask for declared dimensions"

The often-overlooked half of a semantic layer's value is what it forbids. Consumers can't group by a column the metric didn't declare:

export function validateQuery(model, query) {
  const issues = [];
  const metric = (model.metrics || []).find((m) => m.name === query.metric);
  if (!metric) {
    issues.push({ path: "$.metric", message: `unknown metric: ${query.metric}` });
    return issues;
  }
  const allowed = new Set((metric.dimensions || []).map((d) => d.name || d));
  for (let i = 0; i < (query.group_by || []).length; i++) {
    const g = query.group_by[i];
    if (!allowed.has(g)) {
      issues.push({ path: `$.group_by[${i}]`, message: `dimension '${g}' not defined for metric '${metric.name}'` });
    }
  }
  return issues;
}

If active_users doesn't declare channel, a query asking for it is rejected at validation time. "Let me just join on this random column and see what happens" stops being possible. That's the guardrail — and it's the reason LLM agents can actually be trusted to query the data: the model only sees blessed dimensions for each metric.

The compiler

Stays small on purpose:

export function compile(model, query) {
  const metric = model.metrics.find((m) => m.name === query.metric);
  const groupBy = query.group_by || [];
  const filters = query.filters || [];

  const dimCols = groupBy.map((g) => dimColumn(metric, g));
  const selectParts = [];
  for (let i = 0; i < groupBy.length; i++) {
    selectParts.push(`  ${dimCols[i]} AS ${groupBy[i]}`);
  }
  selectParts.push(`  ${aggExpression(metric.measure)} AS ${metric.name}`);

  const lines = [
    "SELECT",
    selectParts.join(",\n"),
    `FROM ${metric.source}`,
  ];
  if (filters.length > 0) {
    lines.push("WHERE " + filters.map(renderFilter).join("\n  AND "));
  }
  if (groupBy.length > 0) {
    lines.push("GROUP BY " + dimCols.join(", "));
    lines.push("ORDER BY " + dimCols.join(", "));
  }
  if (query.limit !== undefined) lines.push(`LIMIT ${query.limit}`);
  return lines.join("\n");
}

The clever bit is dimColumn: a plain dimension name like channel emits channel, but a dimension with an expr emits the expression:

{ "name": "month", "expr": "DATE_TRUNC('month', created_at)" }

group_by: ["month"] then yields GROUP BY DATE_TRUNC('month', created_at). The "what is a month" rule lives in the definition once. Change to fiscal-year boundary? One edit. Every consumer's queries update without anyone noticing.

Value formatting (the boring critical part)

Filter values render differently per type:

function formatValue(v) {
  if (typeof v === "number") return String(v);
  if (typeof v === "boolean") return v ? "TRUE" : "FALSE";
  if (v === null) return "NULL";
  return `'${String(v).replaceAll("'", "''")}'`;
}

Strings get single-quoted with ' escaping. Numbers render unquoted (amount > 1000, not amount > '1000' which breaks numeric comparison in many engines). Booleans become SQL-standard TRUE/FALSE. null becomes NULL. The test pins it down:

test("numeric values render unquoted", () => {
  const sql = compile(sampleModel, {
    metric: "revenue",
    filters: [{ column: "amount", op: ">", value: 1000 }],
  });
  assert.match(sql, /amount > 1000/);
  assert.doesNotMatch(sql, /amount > '1000'/);
});

The "we accidentally string-compared the price column" bug is real and easy to make. Tests at the seam stop it.

IN clauses

function renderFilter(f) {
  if (f.op === "in" || f.op === "not in") {
    const list = Array.isArray(f.value) ? f.value : [f.value];
    return `${f.column} ${f.op.toUpperCase()} (${list.map(formatValue).join(", ")})`;
  }
  return `${f.column} ${f.op} ${formatValue(f.value)}`;
}

{ column: "country", op: "in", value: ["JP", "US", "DE"] } becomes country IN ('JP', 'US', 'DE'). Each value still routes through formatValue, so strings get quoted, numbers don't.

Three domain presets to make it concrete

Semantic layers click only when you see them in context. The tool ships three:

E-commerce orders:

order_count (COUNT(*))
revenue (SUM(amount))
avg_order_value (AVG(amount))
Common dimensions: channel, country, status, month

SaaS user activity:

active_users (COUNT(DISTINCT user_id))
events_logged (COUNT(*))
Dimensions: plan, type, country, day

Support tickets:

ticket_count (COUNT(*))
avg_resolution_hours (AVG(resolution_hours))
Dimensions: priority, team, month

Swap the metric, swap the dimensions, swap the filter — SQL regenerates live. The shape "definition once, queries many" is visible in motion.

What real engines do that this doesn't

The omissions are deliberate; this is a teaching toy, not a product:

Multi-source joins — orders × customers to slice revenue by country of customer, not order
Time-grain expansion — month automatically rollable to quarter / year
Access control — role-based "marketing team can't see revenue data"
Materialisation — pre-aggregate expensive queries, cache, invalidate
Cross-metric queries — multiple metrics in one query, joined on common dimensions

Real engines that do all of this: dbt MetricFlow, Cube.dev, Snowflake Semantic Views, Databricks Metric Views.

Try it

Demo: https://sen.ltd/portfolio/semantic-layer/
GitHub: https://github.com/sen-ltd/semantic-layer

Try the E-commerce preset. Change group_by to ["country"]. Add a filter. The SQL recompiles. Swap the metric to avg_order_value. The SQL re-emits with AVG(amount). The definition didn't move; the query did. That's the point.

Takeaways

The semantic layer's value is "one definition, many consumers" — every BI / dashboard / LLM gets the same revenue number because they all compile through the same layer.
The validator is the half people miss. Restricting queries to declared dimensions is what makes the layer trustworthy for high-autonomy consumers like LLMs.
Expression dimensions (DATE_TRUNC('month', created_at)) keep business rules in one place.
Literal formatting is small but load-bearing — quote-mismatched numerics cause real production bugs.
The pattern predates the modern data stack. What's new is treating it as infrastructure rather than a BI plugin.
Real engines (MetricFlow / Cube / Snowflake / Databricks) handle joins, materialisation, access control. This playground covers the seam — definition / compiler / SQL — so the architectural idea lands.

This is OSS portfolio #252 from SEN LLC (Tokyo), the final entry in the "Try the Tech Radar" series. Previous five: #247 TOON, #248 Markdown → Typst, #249 Schema → LLM Prompt, #250 Server-driven UI, #251 Mutation Testing. https://sen.ltd/portfolio/

DEV Community