DEV Community

Tomasz Wegrzanowski
Tomasz Wegrzanowski

Posted on • Updated on

100 Languages Speedrun: Episode 45: M4

First, a brief history lesson.

Preprocessors used to be a thing, most notable of them being the CPP (C PreProcessor) used by C, C++, and occasionally a few other languages. It started when there was a fairly simple language like the old style C, and people really wished it provided more functionality, like constants, and including one file in another. But instead of adding all that to the language itself, they'd pass the source code through some program first like CPP (C PreProcessor), and only then hand the result over to the compiler.

This result sort of works, but it's really quite terrible - imagine debugging anything if you cannot see the code the language seen, error lines are completely mysterious, preprocessor doesn't know anything about the language, language doesn't know anything about preprocessor directives, it's a total mess.

C has been walking back from this mess. Step by step, the "compiler" got extended to get hints from the "preprocessor", the language got extended to support many things (enum and const notably), the "preprocessor" has been integrated with the compiler, so you can actually get meaningful error messages, and finally the whole "including file" was replaced by "precompiled headers" so the end result is a hybrid CPP/C or CPP/C++ language that doesn't really have a "preprocessor" anymore, but does its best to pretend it does.

Nowadays if you need to extend the language, you instead use something like JavaScript Babel, which has full language of language it's translating syntax, so you never get syntax errors from the final language itself - Babel would get it for you. And source maps are used so you get correct line numbers in runtime messages as well. This still isn't perfect, as in-browser debugger will show you the translated code, but it's so much better than having a bunch of regular expressions translating => to function() { ... } and such.

Lesson learned, preprocessors are bad, don't use them.

Anyway, some people took a very different lesson out of the CPP mess, and decided to instead write a better preprocessor. That's how M4 came to be.

Hello, World!

You probably have m4 already installed, as it's used by some abominations like GNU autoconf.

Let's start with a Hello, World!

dnl Hello, World! in M4
define(`hello', `"Hello, World!"')dnl
Enter fullscreen mode Exit fullscreen mode

We can now preprocess our text:

$ m4 < hello.m4
Hello, World!
Enter fullscreen mode Exit fullscreen mode
  • dnl is like comment - it skips the rest of the line
  • define(...) defines a macro - in this case it's a very simple macro
  • even though we have define(...) directive on line two, m4 is not thinking in terms of lines, so it would print everything after the closing ), so we need to end every definition with an ugly dnl, without spaces in between.
  • notice unusual quoting syntax with opening backtick and closing single quote - this allows quotes to be nested

Macro arguments

We can pass string arguments to macros, they'll be available as $1, $2, etc.

define(`hello', `Hello, $1!')dnl
Enter fullscreen mode Exit fullscreen mode
$ m4 <name.m4
Hello, Alice!
Enter fullscreen mode Exit fullscreen mode


M4 has very few builtin macros. It can do basic integer math, eval(expression) returns the result. It doesn't do floating point numbers:

define(`addexample',`$1 + $2 = eval($1+$2)')dnl
addexample(350, 70)
addexample(19, 50)
Enter fullscreen mode Exit fullscreen mode
$ m4 <math.m4
350 + 70 = 420
19 + 50 = 69
Enter fullscreen mode Exit fullscreen mode

Odd Even

M4 can do simple if/else login. ifelse(A,B,THEN,ELSE) will check if A is same string as B, and if so, it will return THEN, otherwise it will return ELSE. You can also add more arguments to create an if/elsif/elsif/else chains.

define(`oddeven',`ifelse(eval($1%2),0,`$1 is even',`$1 is odd')')dnl
Enter fullscreen mode Exit fullscreen mode
$ m4 <oddeven.m4
69 is odd
420 is even
Enter fullscreen mode Exit fullscreen mode


There are no loops in M4, so we do the usual recursion. There's a lot of fiddling to get the newlines right:

Enter fullscreen mode Exit fullscreen mode

M4 documentation provides generic forloop(var, from, to, statement) but it actually has a lot more complex code so it can define var to be available in the statement.

You shouldn't be too surprised by what it does:

$ m4 <fizzbuzz.m4
Enter fullscreen mode Exit fullscreen mode

Numbered list

Let's try to use M4 to handle list numbering automatically for us:

define(`item', `* listcounter. $1 nextlistcounter')dnl
Most popular animals:
item(Fish for some reason, boring)
Enter fullscreen mode Exit fullscreen mode
$ m4 <list.m4
Most popular animals:

* 1. Cats
* 2. Dogs
* 3. Fish for some reason
* 4. Birds
* 5. Rabbits
Enter fullscreen mode Exit fullscreen mode

It sort of works, but it's defines doing defines, and again, spacing is likely not what you'd like it to be - like extra space at the end of each list item line. M4 is probably a lot more sensible in places where you really don't care about all that extra spacing. It's sort of possible to control spacing, but it really increases complexity of the code.


To get output out of order, M4 has divert functionality. divert(number) diverts the output to temporary buffer number. divert with no arguments resumes normal output. Then you can call undivert(number) to get it all back.

define(`footnote',`[footnotecounter]divert(1)[footnotecounter] $1 nextfootnotecounter

Preprocessors footnote("like CPP or M4") are terrible for programming footnote("or pretty much anything else").
Enter fullscreen mode Exit fullscreen mode

Which outputs:

m4 <footnotes.m4

Preprocessors [1] are terrible for programming [2].

[1] "like CPP or M4"
[2] "or pretty much anything else"
Enter fullscreen mode Exit fullscreen mode

All non-empty diversions are automatically printed, in order of their numbers, unless they've been undiverted or discarded before. The divert system is probably the most clever part of M4.

Running system commands

This is perhaps not something you'd expect from a preprocessor, but M4 can run any system command. This goes against common security assumptions. Running an untrusted program is obviously dangerous, but most people would assume that compiling or preprocessing untrusted programs (or in case of M4, just some random text) is fine. Well, not with M4:

define(command,`$ $1
command(`ping -c 3')dnl
Enter fullscreen mode Exit fullscreen mode
$ m4 <cmd.m4
$ ping -c 3
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=117 time=19.144 ms
64 bytes from icmp_seq=1 ttl=117 time=8.400 ms
64 bytes from icmp_seq=2 ttl=117 time=8.961 ms

--- ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 8.400/12.168/19.144/4.938 ms
Enter fullscreen mode Exit fullscreen mode

File includes

Obviously M4 handles file includes as well. Both text and definitions there will be included, just as if the text was copypasted.

Enter fullscreen mode Exit fullscreen mode
$ m4 <include.m4
Hello, Alice!
Hello, Bob!
Hello, Carol!
Hello, Dave!
Enter fullscreen mode Exit fullscreen mode

Should you use M4?

As preprocessor for a programming language? Definitely no. For other things? Also no.

Preprocessors for programming are inherently a terrible idea, and if a language needs specific feature, it just needs to get that feature. If it absolutely cannot, you should use a language-aware tool.

For other things, especially if you really don't care about spaces (as handling spaces correctly double the complexity of M4 code), it's tempting to use a preprocessor like CPP or M4. Every single time it was done, the result was a total mess. M4 is a very weak language - as you can see from how nasty the code for even those simple things was, so you could have slightly better results with a better preprocessor, but it's really the principle of using any preprocessor not aware of the language being preprocessed that's at fault here.

If you need to quickly hack some small language, and you're thinking of using preprocessor macros, don't. Many languages like especially Ruby let you write truly beautiful DSLs, with zero of preprocessor's limitations, and you get full power of a real language when you need it, with proper testing tools.


All code examples for the series will be in this repository.

Code for the M4 episode is available here.

Top comments (0)