DEV Community

Cover image for CSS: the language with no syntax
Matthew Dean
Matthew Dean

Posted on

CSS: the language with no syntax

Congratulations! You have fallen prey to my clever click-bait-y title, but before you dismiss this topic out-of-hand ("Because," you say, "of course CSS has syntax"), or leave a smarmy comment, allow me to walk you through what CSS is, and how it's defined.

Some context: I've been a maintainer of Less.js (the CSS pre-processor) for about 10 years, and the lead maintainer for most of that time. When I started contributing, I knew very little about not just CSS parsing, but about parsing in general.

In the last 5 years, I've worked on not just Less but other CSS-related parsing projects, including one that re-imagines both Less and Sass parsing for the modern era (which has yet to be released). I've used a variety of parsing approaches, parsing generators, and parsing libraries.

In some cases, I had to abandon the use of certain parsers or parsing approaches because they just weren't flexible for CSS, and the reason is that CSS has no single syntax.

What do I mean by "single syntax"?

You can think of CSS as kind of a collection of "micro-syntaxes". CSS has kind of a top-level syntax for how things are organized (okay, okay, yes you can say that at the top level, there is a syntax), and then the way it works is that individual specifications have their own micro-syntax that defines the syntax within that location. On the whole, each micro-syntax has generally tried to mostly look consistent with other micro-syntaxes before it, but that doesn't mean that they actually are consistent or that the same syntax has the same semantics, as you might find in other languages.

You might be scratching your head at this, because, if you've never had to parse it, or have that parser actually understand all its component pieces (as it does in a system like Less), CSS syntax looks the same everywhere.

So, let's illustrate this with some examples.

Assignment or selector?

What does the colon : mean in CSS? You might be tempted to say assignment, because of this:

.box {
  /* Look! There's an assignment `:` operator! */
  color: red;
}
Enter fullscreen mode Exit fullscreen mode

Okay, then, what does this mean?

/* Look! There's... oh... */
div:hover {}
Enter fullscreen mode Exit fullscreen mode

"Not a problem!" you say. This is at the start of a ruleset (also called a qualified rule), and we know that pseudo-classes and pseudo-elements start with : followed by an identifier. Easy peasy.

Alright, so a pseudo-selector looks like :ident, cool. Okay, how do we parse this?

.box {
  /* Look There's a pseudo-... oh... */
  color:red;
}
Enter fullscreen mode Exit fullscreen mode

Okay, never mind that, we say. We can just have two modes! When we're at the "top-level" where we're expecting rulesets, we can have : mean one thing, and then inside it, we can have it mean another.

Welp, not so fast. One of the reasons Less and Sass were so popular for so long was the convenience of nesting rulesets and at-rules inside each other, and both engines support this:

.box {
  button:hover {}
}
Enter fullscreen mode Exit fullscreen mode

CSS Nesting has now been adopted by the core CSS language itself, and as of 2024, is now supported in browsers. So this form is valid in CSS now too.

If you know anything about parsing, you may now start to recognize that we're looking at an example of "infinite lookahead" and backtracking. Some engines either don't support infinite lookahead, or don't support it without major performance penalties. The reason it's "infinite" is because of cases like:

.box {
  foo:red blue green yellow purple brown black grey...
Enter fullscreen mode Exit fullscreen mode

As we parse this, we have to ask:

  1. is this a property called "foo" with a value of red blue green yellow purple brown black grey etc? or
  2. is this a selector selecting an element called "foo" with a pseudo-class of :red and descendant elements of blue green yellow purple brown black grey etc?

There are ways to avoid infinite lookahead, of course, such as having a list of all known CSS properties, and/or having a list of all known valid HTML elements, but this is already a little too deep into parsing, so let's get back on track to syntax and how CSS is defined.

How CSS syntax is actually defined

The way that CSS syntax is defined is that you can think of it as having a basic outer structure with "slots" where syntax definitions can go.

These "slots" are:

  • An at-rule's "prelude" (everything before outer ; or its {} block)
  • An at-rule's block ({}) contents
  • A property's value
  • A pseudo-selector's () block, if it exists
  • A CSS function's () block

Example 1: at-rules

Instead of having general parsing rules, parsing is specified by the at-rule specification itself. For example: the @supports at-rule parses (or can parse) property/value pairs, like:

@supports (transform-origin: 5% 5%) {
}
Enter fullscreen mode Exit fullscreen mode

...whereas the @scope rule expects to be parsing selectors e.g.

@scope (.card) {}
Enter fullscreen mode Exit fullscreen mode

Example 2: :nth-child() and dimensions

In general, CSS considers units like -1 to be distinct tokens. If you have this, for example, this will be parsed as two values: 1 and -1:

/* This has a value of 1 -1 */
padding: 1-1;
Enter fullscreen mode Exit fullscreen mode

However, :nth- functions define their own micro-syntax: An+B. Instead of the argument to :nth-child(n-1) being interpreted as two tokens, n and -1, it's interpreted as "n minus 1", which, confusingly, is different from the calc() function because, again, calc() has its own micro-syntax!

CSS: What's a "list"?

One of the most frustrating and inconsistent parts of CSS syntax is that it has no formal definition for "lists". Or, rather, it has a collection of syntaxes which all can represent lists, but not all of them are consistent in describing what's in the list and what isn't.

That said, probably the most consistent right now in CSS syntax and micro-syntaxes is comma-separated and semi-colon-separated lists, which is generally defined like:

// List members are ['1', '2', '3']
1, 2, 3

// List members are ['1 1', '2 2', '3 3']
1 1, 2 2, 3 3

// List members are ['a: 1', 'b: 2', 'c: 3']
a: 1; b: 2; c: 3;
Enter fullscreen mode Exit fullscreen mode

That is, in general:

  1. Each comma splits the value into distinct members of the list i.e. if you have two commas, you have 3 members of the list.
  2. A semi-colon-separated list "groups" other values that, themselves, can have comma-separated lists.

You may be scratching your head and saying, "What do you mean, 'in general'? Isn't this always true in CSS?"

Sorry, but nope. Where this concept begins to fall apart is in CSS function calls. The reason it begins to fall apart is because of these ambiguities in CSS syntax:

  1. CSS functions can receive arguments. Those arguments are CSS values.
  2. CSS function arguments are a comma-separated list, BUT (importantly),
  3. CSS values can also be a comma-separated list.

With early CSS functions, this wasn't an issue. All functions received values that didn't themselves have commas. Eventually, though, use cases and CSS proposals emerged for functions that could receive commas. And not just functions. There emerged use cases where other CSS values, which could be comma-separated, might themselves need sub-lists as individual parts of the value.

So, now what were proposal authors to do? If your guess is, "Hopefully, something consistent?" then prepare to be horribly disappointed.

The Many Failed Attempts at Lists-of-Lists Syntax in CSS

1. The / separator

You'll probably recognize this. In some CSS values, the list separator is a slash (/) character. But it would be a mistake to consider it a list separator like , or ;. I'll explain.

Let's take border-radius as an example:

border-radius: 10px 20px / 30px 40px;
Enter fullscreen mode Exit fullscreen mode

In the border-radius syntax, this syntax is used to figure out "radius pairs" for corners. I don't want to get into the minutia of the border-radius syntax, but suffice to say that the / separates a set of values preceding it from a set of values following it. In this case, it's a list like: ['10px 20px', '30px 40px'].

You might be tempted, then, to think of the / as just another list separator. But, of course, that isn't the case.

When calc() was introduced, it recognized that almost all programming languages use / as a "division operator". It wasn't going to re-invent the wheel, so the calc() micro-syntax uses it for division and not as a list separator:

/** This means "20px divided by 2" and not
 * "a list containing '20px' and '2' */
calc(20px / 2);
Enter fullscreen mode Exit fullscreen mode

Some property sets, like background-* for example, use comma-separated lists for individual properties, and slash-separated lists for other properties. So, when they get put together, you may get a value like this:

background: url(a.png) left top / cover, url(b.png) right / contain;
Enter fullscreen mode Exit fullscreen mode

From a syntax perspective, what does this value mean? If you're familiar with the background property(s), you may instinctively be able to read this unambiguously. But let's look at this from purely a syntax perspective.

Can you tell me what the "lists" or "lists of lists" are from these CSS values?

unknown: one two / three four;
unknown: one / two three, four;
unknown: one, two three / four;
Enter fullscreen mode Exit fullscreen mode

If your brain says, "The comma splits the value into list pairs, and the slash further splits THOSE values into list pairs," it's a good guess, but, in CSS, that's just not true. It's entirely dependent on the micro-syntax of the given property. You have to know the property name to know how to understand the syntax that follows.

In the case of background, this is how these lists are actually grouped semantically:

background: url(a.png) left top / cover, url(b.png) right / contain;

/**
  background: [
    [
      'url(a.png)',
      'left top'
      'cover'
    ],
    [
      'url(b.png)',
      'right',
      'contain'
    ]
  ]
*/
Enter fullscreen mode Exit fullscreen mode

If the / divided lists consistently across CSS, then the semantic division of this value would be:

background: [
    [
      'url(a.png) left top',
      'cover'
    ],
    [
      'url(b.png) right',
      'contain'
    ]
  ]
Enter fullscreen mode Exit fullscreen mode

Even url(a.png) left top requires you to understand the micro-syntax of background to know that url(a.png) is the background-image and left top represents a single property background-position and not two distinct properties. ✨ Micro-syntaxes! ✨

2. Lists of lists in function (or "function-like") calls

One of the first functions that needed a list of lists was var(). You might think of it as having two values, and it does: the custom property name and the fallback value:

color: var(--theme-color, red);
Enter fullscreen mode Exit fullscreen mode

However, CSS authors had to resolve something: what about values that have commas? The solution, in this case? Just treat everything after the first comma as being absorbed into a single value.

font-family: var(--theme-font, 'Helvetica', sans-serif);
Enter fullscreen mode Exit fullscreen mode

Is the comma consistent here? Again, take the var() out of this and just think of CSS functions as a general syntax. What does this mean?

unknown: new-css-function(one, two, three);
Enter fullscreen mode Exit fullscreen mode

Is that a function that is taking 3 value arguments? Or is it taking one comma-separated value argument?

Well, in the case of var(), delightfully, it's neither. It's taking 2 arguments, the second of which can be a comma-separated argument.

From a language syntax perspective, that's kind of insane, but it gets better!

A note about Less / Sass's handling of lists of lists

Since both Less and Sass (the SCSS variant) are supersets of CSS, there are cases where each had to solve lists-of-lists from a wholistic syntax perspective since CSS offers no consistent guidance in this regard.
Sass let's you wrap values in parentheses to pass lists to functions and mixins:

value: sass-function((1, 2, 3), 4, 5);

Less prefers that you be more explicit, because parentheses may be part of a valid value (think media query syntax), so it asks you to use Less's escape ~ character to more clearly denote that you want the final value to be without the parentheses:

value: less-function(~(1, 2, 3), 4, 5);

3. The CSS Functions and Mixins Module

Probably the most bizarre and perverse twist on trying to come up with a solution for lists of lists is the new CSS functions and mixins module proposal.

It uses NONE of the prior solutions, including Sass / Less prior art, to define lists of lists and comes up with yet another. So, if you're counting, CSS and CSS preprocessor syntax now have:

  1. comma-separated lists
  2. semi-colon separated lists
  3. slash-separated lists (inconsistent)
  4. parentheses-grouped lists (Less and Sass)
  5. ...a brand new one defined by this module: curly-brace-grouped lists

This syntax ends up looking like this:

 width: --max-plus-x({ 1px, 7px, 2px }, 3px);
Enter fullscreen mode Exit fullscreen mode

Let me be clear: it's not that I think this syntax is bad, necessarily; it's just that it's Yet Another Micro-syntax. IMO it has these two flaws:

Flaw 1

Until now, curly braces in CSS has actually been a mostly-consistent syntax, denoting a group of rules, like:

/* We thought we understood curly braces in CSS... */
.box {
  color: red;
  background: black;
}
Enter fullscreen mode Exit fullscreen mode

The authors of this module proposal have decided to break that syntax convention, entirely, such that one of the few syntax "rules" will now be, again, removed from "reliable" syntax. (Not that CSS ever specified that curly braces should / would always contain rules; it's just been a consistent syntax convention until now.)

Flaw 2

Instead of CSS authors saying, "You know what? Maybe we should make a formal 'list of lists' standard in CSS and start using it," the use of this format for this module has no general applicability, and there are no plans to migrate it anywhere else.

In other words, if CSS were a single syntax, a syntax proposal with this would come with certain expectations to update other syntaxes for function calls or values to be able to use it, such as:

/* Supporting this moving forward: */
font-family: var(--theme-font, { 'Helvetica', sans-serif });
/* Instead of the inconsistent `rgb(100 100 100 / 0.5)` syntax */
color: rgb({ 100, 100, 100 }, 0.5);
Enter fullscreen mode Exit fullscreen mode

There's no such proposal here. It's just Yet Another Micro-syntax.

CSS - not one syntax, but a smorgasbord of micro-syntaxes

Listen: I love CSS! Sure, from a language syntax perspective, it resembles something like Howl's Moving Castle. But, from a beginner perspective, its relatively-lean syntax means that it's easy to write and quite forgiving. In some ways, its weaknesses are also its strengths.

Just... don't ever try to build a CSS parser. Become a tomato farmer instead.


Addendum: despite knowing better, I've currently building a new CSS parsing & processing framework anyway, to replace Less / Sass / CSS modules / Tailwind / Styled Components / other styling systems, so follow me on here and/or leave a comment if you want to hear about it!

Top comments (1)

Collapse
 
paceaux profile image
Paceaux • Edited

This is a fantastic article and it fits perfectly in line with an article I wrote a while back called, How to Program Like a Linguist..

What you've done here is effectively applied linguistic principles and thinking to your programming language; you've thought about syntax and semantics and how the two affect each other. You've been a linguist of CSS — which is great! We need more of this kind of content where people are thinking hard about the little things of their languages.