Max Heiber

Posted on Nov 28, 2020 • Edited on Nov 30, 2020

What Makes a DSL Bad? Make, CSS, and how we can do better.

#programming #devops #css

There are many places where you can find arguments that DSLs (domain-specific languages) or "language oriented programming" are good. But I haven't found much on what makes them bad.

My experiences with DSLs tend to be negative. There are cases where I know exactly what I want the computer to do. I could write it in minutes in any "real" programming language, but struggle to express or approximate it in the DSL.

Here are some examples of "bad" DSLs:

Make
CSS
GitHub Actions

And here are sufficient conditions for a DSL to be bad:

Missing proper functions. By which I mean high school math functions: things that compute by turning variables into return values.
Missing proper variables. By which I mean a scoped way of naming values.
Mandatory. Regexes within a normal programming language are fine most of the time because we're not forced to use them: when things get too hard to express we can switch to writing a normal function with the full power of the programming language.

Honorable mention for a feature missing from bad DSLs: structured data.

On Make:

The Makefile DSL is mandatory for Make. You can't find out that you need a variable and function and then just do it. You have to trick Make with environment variables, or stitching together shell scripts, storing state on the file system, more and more complicated funny symbols, etc. etc. Here's an example where I needed variables and functions recently: our 'test' rule depended on our 'build' rule, but we had more than one way of building. We wanted to run the same tests for each of these different ways of building. I managed to hack around this: having the makefile re-invoke itself with an environment variable, then using that environment variable as a macro together with Makefile ternary conditional syntax. That's just silly!

On CSS:

In a stylesheet, CSS is mandatory: you can't nip out and write code in a real language, unless you count Houdini. CSS doesn't have variables and functions. Which is why in typical hand-written CSS, one repeats oneself a lot, with "magic strings" for widths, fonts, etc. The only methods of composition are to repeat yourself or rely on the cascade, which has all of the flavor of shared mutable state, such as action at a distance/no way to reason locally.

To see what CSS would look like if it had proper variables and functions, see how React Native does it: styles are just data, and can be assigned to variables, passed to functions, returned from functions, imported, distributed via package managers, etc.

There may be good performance reasons for not always executing a full-fledged programming language at runtime to style webpages. But then maybe instead of a limited language, there shouldn't be any language at all for styles: just data that is generated by ahead-of-time compilers.

If there's a perf argument for not having functions and variables in CSS, surely we can apply the same reasoning to CSS rule resolution. Rule resolution is hard work, maybe it's a bad idea for us to make browsers do it at runtime.
Most PLs without variables are doomed to eventually grow butchered variables. In CSS' case, it has at least two: custom properties and counters. The latter link goes to an entire article on how to number paragraphs with CSS. It's really an entire article showing you how to do the equivalent of this JavaScript: paragraphs.forEach((paragraph, i) => ....). (Note: the article is great, CSS is the target of my cricisism).
Even with all the power to crash the browser with var and calc, accidental Turing-completeness, ad-hockery (media query syntax, counters, more to come) it turns out people still want to style the browser using languages that have variables and functions (see SASS and CSS-in-JS).

On GitHub Actions:

In GitHub workflows, one very often has repeated names for things (such as an artifact one wants to generate and upload). One must copy and paste and then hope things are in sync. Or suppose one wants to do similar, but slightly different, things on pushing to a branch and on making a release. Copy/paste again, this time making several small tweaks to different steps.

Why do Bad DSLs happen to Good people?

If it's so awful working with these DSLs, then how and why did we get into this situation? The answer is a little different in each of these cases:

Why the Make language exists

I suspect Make has a DSL because high-level composable scripting languages weren't such a thing then. And if there were, it would be non-obvious how to mix the genuinely declarative aspects of Make with the "do this and then do that" nature of idiomatic Python. Rake seems to strike a nice balance here, though I haven't used it much. It's the mandatoriness that's the problem with Make, not the fundamental model.

Why CSS exists:

CSS: People are so used to CSS that it's hard to imagine alternatives, preprocessors aside. "The Languages which Almost Became CSS" gives some idea of constraints and alternatives. It was mainly for performance reasons that we didn't get a "real" programming language. That's fine, but it tells me that CSS makes more sense as a data format or bytecode than something for humans. See "What about SQL" below for more on what I mean here.

Why the GitHub Actions language exists:

GitHub actions are broken for no technical reason I can discern. They didn't butcher the language for performance reasons: You can write an infinite loop in a workflow with a "run" step with some Bash while true; do echo hello; done. You can waste memory and CPU all you want: GH will charge you for what you use, once you've exceeded the limits of the free tier. It's not for security reasons: Everything is sandboxed anyway, and is already side-effecty.

What about SQL?

SQL is pretty good in practice, but meets my definition of a "bad" DSL. The reason SQL doesn't seem totally broken is that it is typically generated, not hand-written. SQL is stitched together by code in real programming languages. For example, programmers of general-purpose languages write functions that generate and execute prepared statements. Parameters of functions in the general purpose language to bind parameters to the prepared statement. Or people use query builders or ORMs.

SQL could be better, of course, and I'm not just talking about all the syntactic inconsistencies. As a compile target, ideally it would be better-suited to being generated. Rethnkdb and MongoDB (in spite of its other faults) both get this right, treating queries more as data than as code in some butchered language without proper variables and functions, making query-building more composable and safe.

In the rare cases where apps are written in SQL, it's not really SQL, it's SQL+, where one has variables and functions: see the stored procedure languages for Postgres and SQL Server, which are particularly like real languages.

How can we do better?

Here are some alternatives to bad DSLs:

What's so bad about a real programming language with proper variables and functions? If your reasons are that you want something easy to learn and safe to embed, consider something like Lua. Lua seems to work well for Redis stored procedures, the video game industry (see Roblox for an example), Nginx config, etc. Guile is a similar option for audiences parentheses-sympathetic.
Would plain data meet your needs?
- Consider making it easy for app devs to write "real" code that generates the data, rather than providing them with a broken language that is somewhere between data and code.
  - eslint and webpack both enable users to write their config in either JSON or JS code that generates JSON-like objects. This has some of the benfits of a DSL, while enabling abstraction when it is needed.
  - If one wants static guarantees about the data generated by code, one can go a lot further in refining the code-that-generates-data approach (disclosure: I currently work at the company behind that paper, opinions are my own).
- If there are strong reasons to control side-effects and enforce limited modes of indirection, consider Dhall, if your audience is ML-syntax-sympathetic
As soon as you find yourself reinventing variables, loops, and functions with your "plain data" STOP STOP and consider one of the alternatives above. It isn't plain data anymore, it's a bad programming language.

May our tools be bicycles for the mind.

Update: I rewrote the section on CSS in response to feedback.

Latest comments (3)

Max Heiber • Dec 3 '20 • Edited

Follow-ups:

Parser generators have mandatory restrictions, don't have functions, but can be tolerable for tiny tasks. I'm not sure why.
Maybe it's OK to not have functions if you have relations or channels. Maybe "named things you can pass around" are what bad DSLs are missing.

stereobooster • Nov 29 '20

I agree with sentiment, but disagree with arguments. A lot of people write SQL manually and are happy with it. There is something else which makes SQL pleasant to work with and CSS less pleasant. My take on this is - predictability

By looking at SQL query I know what the outcome will be, by looking at CSS there is generally no way to predict what the outcome would be. In CSS other rule with the same specifier included later would override rule, or another style with higher specificity, or margins do unexpected things.

Max Heiber • Nov 29 '20 • Edited

Re what's wrong with CSS: I agree the cascade is the worst part of cascading style sheets, due to the reasons you describe. I think this is related to my idea that the problem is that it doesn't have variables or functions: the cascade is what CSS has instead of functions+variables+data. When writing inline styles in React (for example), one can imitate the cascade selectively using {style1, ...style2}, exactly where one wants, with predictable results.

So no disagreement that SQL is better than CSS. But I still think SQL is bad: when I said no one writes it by hand, I meant that it is usually constructed by code in a host language, which is using things like prepared statements. Some people describe this as writing the SQL by hand, as compared to using an ORM, but I want to point out that the power of variables and functions is coming from the host language. Which is entirely fine. The weird thing is that we're manipulating code as strings, which suggests to me that a better alternative to SQL would be closer to data than code.

This Python/ReQL is easier to manipulate from the host language, safer, and (I expect) easier to learn than the corresponding SQL, as there is no new syntax:

Python/ReQL:

r.table("users").insert({
   "user_id": "f62255a8259f",
   "age": 30,
   "name": "Peter"
})

SQL:

INSERT INTO users(user_id,
                  age,
                  name)
VALUES ("f62255a8259f",
        30,
        Peter)