Introduction
Compilers and interpreters are both computer programs that translate a high-level programming language into machine language 1 2 3.
Today, every programmer works either with a compiler or an interpreter, sometimes even both. There are only some edge cases that make it necessary for programmers to write assembly code themselves. In the 1950s and earlier, this was the primary way to write computer programs. So, every program had to be rewritten in a different assembly language to run on a different architecture 1 4. The first commercial compiler was written for Fortran in 1957 by IBM, but interpreters were used even earlier. These interpreters could only interpret assembly languages and were used to translate between the instruction sets of different architectures, making it possible to develop software for hardware still in development 1 4. The first interpreted high-level language was Lisp 1 3, developed in 1958.
Now that we know that compilers and interpreters are essential in developing software and making it portable let us take a look at the difference.
From the user's perspective, the difference is clear:
- A compiler reads in source code and creates an executable program. The user can run the program independently from the compiler.
- An Interpreter reads the source code and executes the code line by line. No executable is created, and every execution always needs the interpreter. Also, many interpreters have an interactive mode where you can type in your source code line by line, and each line is immediately executed.
From a technical perspective, the lines are more blurry, especially when we're looking at modern compilers and interpreters:
- Modern compilers can have interpreters to run code during compilation to find good optimisation strategies 2.
- Modern interpreters use a compiler to compile the source code into bytecode. The interpreter then interprets the intermediate bytecode instead of the source code.
In the following sections, we will look at the compiler and the interpreter and even a hybrid approach. Then, we will compare both in terms of performance, development, flexibility, portability and security.
The Compiler
A compiler translates a high-level programming language into assembly code, object code or directly into machine code and produces an executable binary for a specific architecture and system. The compilation process consists of five steps, each step performed by its corresponding component 1 2.
- Lexical Analysis
- Parsing
- Semantic Analysis
- Optimisation
- Code Generation
The compilation process is split up into three phases: front-end (steps 1-3), middle-end (step 4) and back-end (step 5) 2.
Lexical Analysis
Lexical analysis is the first step in the compilation process. It analyses a sentence and identifies all occurring words. In programming languages, we call these words tokens 1 2.
Example:
int x = y + 5;
Tokens: int
, x
, y
, +
, 5
and ;
Parsing
Parsing is used to identify the grammar by building an Abstract Syntax Tree (AST) 1 2. Let's look at a simple Example:
int x = 3 * (y + 5);
As we know, a term within parentheses is to be processed first. A parser would thereby produce the following tree:
x
|
*
/ \
3 +
/ \
y 5
During this step, the parser checks for correct syntax and displays occurring errors.
Semantic Analysis
In Semantic Analysis, sometimes also referred to as static analysis 2, the compiler checks for scope to perform name resolution or type checks 1 2. The AST and the additional semantic information are then transformed into an Intermediate Representation (IR). The IR is no longer tied to a specific programming language and can be used to generate many different output formats 2.
Optimisation
The optimisation step takes the IR of the source code and optimises the code without altering the program's intent. In this step, the compiler can optimise performance, memory usage and even power consumption 1.
More sophisticated optimisation techniques usually require knowledge of the underlying computer architecture 2. But some general optimisations can be done solely on the Intermediate Representation.
Code Generation
Code Generation is the fifth and last part of the compilation process. During this step, the IR is converted into assembly language, object code or machine code 1 2. Since the compiler works at this point only with the IR, it makes the back-end language agnostic and the front- and back-end can be switched out to support different languages and architectures.
The Interpreter
Like the compiler, the interpreter enables the user to execute source code written in a high-level programming language. However, an interpreter does not produce an executable; instead, it evaluates and executes the source code directly. Thus, to run a program multiple times, the source code must be processed each time by the interpreter.
Since the interpreter executes each instruction separately, it can be used interactively. This makes it easier to develop and debug programs.
The front-end (lexical analysis, parsing and semantic analysis) is the same for compilers and interpreters because they both need to transform the high-level programming language into a form that a computer program can understand.
To run the source code, an interpreter has a set of instructions that it can execute, and it maps the input source code to these instructions. The instructions of interpreters for general-purpose languages are turing complete and can. Therefore, execute any algorithm described by the source code.
Interpreters for Domain Specific Languages (DSLs) and those used for educational purposes use the AST to execute instructions 2 3. This process is slow, and interpreters for general-purpose languages speed up the execution by first compiling the source code into bytecode.
Byte /code is a virtual instruction set that is then executed by the runtime of the interpreter. A more sophisticated interpreter, therefore, consists of a compiler to generate bytecode and a runtime to execute the bytecode 2. Examples of these kinds of interpreters are Java, Python and JavaScript. Java is even a little bit more special as it also saves its byte code (.class and .jar files) so that they can be executed later without the need to recompile the source code first. Nevertheless, the Java Virtual Machine (JVM) is still needed to execute the code.
Just-in-Time Compilation
Just-in-Time (JIT) compilation is a hybrid of both. It is used inside interpreters to compile performance-critical bytecode during runtime into machine code and to execute it natively. To identify code where the performance gain outweighs the overhead of the compilation, the interpreter analyses the code during runtime 5.
Even though there was already research done on JIT compilers for Lisp and Fortran, the first available JIT Compiler was for Smalltalk in 1983 6. Today, this technique is used by Java Virtual Machine, Microsoft's .Net 5 And PHP 8. There is also PyPy, an alternative Python implementation, that offers JIT compilation 7 and other scripting languages currently working on a JIT compiler implementation.
Comparing Compilers and Interpreters
Now that we know the technical differences between compilers and interpreters, let us compare the aspects of performance, software development, flexibility, portability and security.
Performance Comparison
In terms of performance, compilers have the advantage, but interpreters with JIT compilation are able to make that gap significantly smaller. Compilers are also ahead when it comes to memory usage because the interpreter uses not only the memory for the program but also the memory for the runtime.
In most cases, modern hardware has enough resources not to worry about performance and memory, but if this is a concern, then a compiled language is more suitable.
Software Development
When it comes to developing software, it is usually easier with interpreted languages because the runtime offers a more approachable interface to the OS and hardware and provides garbage collection. Interpreted languages can also allow dynamic typing 2, making learning and prototyping easier.
But there are also compiled languages with a runtime environment that provide features like garbage collection and offer a vast standard library to abstract hardware and OS Interactions.
Whether debugging is easier depends more on the IDE and debugger we use. If we use Notepad or Gedit or whatever the equivalent of our OS is to write our code, we will have a hard time writing and debugging our program. However, there is one point that makes debugging compiled languages harder, and that is optimisation. Sometimes, it is harder to find bugs in optimised code. We usually deactivate any optimisations for our debug builds, but some errors don't appear in the debug build and only occur in the optimised end product.
This is not something the interpreter does better since it doesn't offer code optimisation, just something to keep in mind.
Flexibility and Portability
From the developer's perspective, it is much more convenient to distribute the source code and let the user run it on the Interpreter. This also means that the user has the burden of procuring an interpreter for the program.
Since the runtime handles all OS and hardware abstractions, code written in interpreted languages is more portable.
Big software projects like games, game engines, and 3D tools also use interpreters for scripting to give the user a better customisation option or to automate some processes.
Security
Interpreters are also a great way to limit what code is allowed to do. Examples would be games or web browsers. Games sometimes offer the ability to use scripting languages to write add-ons. A custom interpreter is an easy way to limit what the player can do with these add-ons.
The same principles apply to web browsers.
Conclusion
Compilers and Interpreters are both used to execute source code. The key difference from the user's perspective is that a compiler produces an executable that can be run independently. In contrast, the interpreter doesn't produce any output and is needed each time the source code is executed. From a technical perspective, the lines are more blurry.
Interpreted languages are easier to learn and use because of the abstraction layers of their runtime system.
Compiled languages have better performance and sometimes offer the only way to access specific OS or hardware features, i.e. if the interface of the hardware is only exposed as a native interface to the OS.
References
-
CS143, Lecture Stanford University https://youtube.com/playlist?list=PLoCMsyE1cvdUZRe1udlyjpzTww1U5olL2 ↩
-
Crafting Interpreters, Robert Nystrom, ISBN: 978-0990582939 https://craftinginterpreters.com/ ↩
-
Interpreter (Computing)) Wikipedia https://en.m.wikipedia.org/wiki/Interpreter_(computing) ↩
-
History of Compiler Construction Wikipedia https://en.m.wikipedia.org/wiki/History_of_compiler_construction ↩
-
Just-in-Time Compilation, Wikipedia https://en.m.wikipedia.org/wiki/Just-in-time_compilatio ↩
-
John Aycock. 2003. A brief history of just-in-time. ACM Comput. Surv. 35, 2 (June 2003), 97–113. https://doi.org/10.1145/857076.857077 ↩
-
PyPy.org https://www.pypy.org/ ↩
Oldest comments (0)