ELI5: Why self-host a compiler?

github logo ・1 min read

What benefit is gained from making your new language's compiler self-hosting? To me, it seems this only adds complexity to your build. If it can generate code for your language, why does it matter what it's written in?

I understand that a compiler is a good proof-of-concept program. They're generally dependency-free and will put your language through its paces. The exercise of writing a compiler in your new language is a good idea, I'm not arguing that point. I'm less clear on why making that self-hosted compiler the canonical version of your language is beneficial. Wouldn't keeping your compiler written in something like Haskell or OCaml make it more extensible and flexible, and quicker to build?

twitter logo DISCUSS (9)
markdown guide
 

Ruby (C) isn't self-hosted, neither are Python (C), Lua (C), PHP (C) or JS (V8 and Node are both written in C++ + JS). So quite a few very popular languages aren't actually self-hosting.

As for the actual question, you already anticipated the answer, which most commonly is "eating your own dog food". What better way to find the limitations of your new language than actually building something moderately complex but relatively dependency free in it?

 

Gotcha, fair enough. So, there is no direct influence on the generated code, like how a multi-pass compilation process can build a more complete image of the program for more aggressive optimizations?

I'm just starting to graduate past interpreters into compilers and still don't have a full grasp of the end-to-end picture.

Great point about all these massively used tools being written in C or C++, but then I'm still not sure why, say, Rust has chosen this route. I totally get the dogfood argument, but I still feel like that sounds like a side-project that could dictate ongoing development, not the canonical implementation.

 

So, there is no direct influence on the generated code, like how a multi-pass compilation process can build a more complete image of the program for more aggressive optimizations?

I guess there is no general answer to this. On the one hand, nobody knows your own language and what optimizations it will benefit from better than you. On the other hand building in an established language like C gives you many optimizations on the compiler itself for free.

I still feel like that sounds like a side-project that could dictate ongoing development, not the canonical implementation

Presumably one builds a new language because of a dissatisfaction with all existing alternatives. So if the space you're aiming for is systems programming (Rust, Go to a certain extent) than not self-hosting seems a bit odd. The languages I mentioned before generally are on a relatively different level of abstraction than the ones they were implemented in (and often work as interpreters, not compilers).

dissatisfaction with all existing alternatives

That's a satisfying answer, thanks. You do it because you think you're doing it better than C would have.

 

Well, IMHO, if some language worth to be created it worth to be created self-hosting.

Or in other words, if even the creators of the language prefer to write in something else, why should I use it? Isn't it better to use the language the creators of the language use?

Anyway, self-hosting makes the language complete, mature and self-sufficient.

As an example, I can point on FlatAssembler.

It is self-hosting from very early versions. As a result, it can be ported really easy to arbitrary OS within one or two days! Actually it is more portable than any other high level language. Well, maybe excluding C, but I am not very sure. :)

 

Isn't it better to use the language the creators of the language use

I think this true to a point - Michael brought up a good point that it does depend on the purpose of that language. You skew very very low level, though, and I imagine in that domain this does hold true.

ported really easy to arbitrary OS within one or two days

This is very cool. I'm going to have to spend more time looking at this.

 

Well, I am not a big fan of the specialized languages. The same effort for learning, but much narrower niche. ;)

 

"Eating your own dogfood" was mentioned and I think that's certainly part of it.

But I think it's more about "street cred"- why would you use someone's toy language when they use something else? Being self-hosting is a right of passage.

 

street cred

I guess the better question was "why does anybody do anything" ;)

Classic DEV Post from May 3

Are you a multi-passionate developer?

When I started on the path towards being a developer, I did not realize how many ...

Ben Lovy profile image
Hobbyist. Learning Rust, ReasonML, JavaScript, C++. He/him.