There seems to be a considerable amount of debate about what defines an interpreted language.
My view is pretty cut-and-dry:
A compiled language is one that is primarily compiled to machine code which is executed natively by the CPU on most standard hardware (Intel, AMD, ARM, etc.) C, C++, and Ada are three examples of this.
An interpreted language is one that is primarily executed either as source code or bytecode through a dedicated virtual machine. Python, Ruby, and Java are three examples of this.
It should be understood that, in theory, if not in unconvential practice, any compiled language can also be run as source or bytecode in a virtual machine; conversely, any interpreted language can be theoretically converted to machine code.
(Mind you, I'm not really invested in defending my viewpoint. I want to see what everyone else thinks, and why.)
A Few Definitions
Just to keep us on topic, instead of descending into the pedantic word-mincing that sometimes occurs in these sorts of discussions, let's use these as our common definitions.
Even if you think these are "slightly off", please just roll with this to make the core topic accessible to everyone. If you need to use a different term, be sure to define it.
Remember, not everyone has a vast academic background in computer science!
Assemble — to convert source code, bytecode, or object code into machine code. (Contrast with compile.)
Assembler — produces machine code (assembly code).
Binary file — an executable file containing machine code.
Bundled executable — an executable file containing both code and a virtual machine.
Bytecode — Code which is intended primarily to efficiently provide direct instructions to a virtual machine, rather than to be human-readable. (Contrast with source code, machine code and object code.)
Central Processing Unit [CPU] — the physical piece of hardware on the computer in question which is responsible for executing instructions in the form of machine code.
Code — catch-all term that can include source code, machine code, bytecode, and object code.
Compile — to convert code to source code in another language, to bytecode, or to object code. (Contrast with assemble.)
Compiler — converts code to either source code in another language, or to bytecode. (Contrast with assembler.)
Executable file — a file containing either source code, bytecode, or machine code, but which can be "executed" directly on the operating system. (In the case of bytecode, it may invoke a virtual machine as part of its execution.)
Interpreter — see virtual machine [VM].
(Programming) Language — a complete set of syntax and grammar, in which source code is written.
Machine code — assembly language code, in the "flavor" of assembly native to the physical CPU on the computer in question. (Contrast with bytecode and object code.)
Object code — "intermediary" code which is not intended to be executed directly (contrast with bytecode), but is converted by the assembler into machine code. (Borrowed from C/C++ terminology; goes by other names in other languages.)
Source code — code which is intended primarily to be readable to humans. (Contrast with bytecode, object code, and machine code.)
(CPU) Virtualizer — a piece of software which is intended primarily to emulate a CPU, but which executes the same machine code a CPU would. (Contrast with virtual machine and CPU.) [I know I'm flubbing the vocabulary, but I want to make sure we don't confuse the two; they're not the same in this topic.]
Virtual machine [VM] or interpreter — a piece of software which interprets and executes instructions from bytecode or source code. Examples include the Java JVM and the Python interpreter. (Contrast with (CPU) Virtualizer and CPU.)
Ground Rules
Be polite! We're here to debate ideas, not people.
Latest comments (31)
Here is an explain like I'm five, in case your wondering what is what.
What parsers and 2 year olds have in common. 🚼
Adam Crockett ・ Aug 7 ・ 2 min read
It is more an 'explain it like I have kids'
Well I do so I did 😁
With widespread use of AOT, JIT, and native language bindings not to mention exotic things like Roslyn and hardware implementations (e.g. Java processor), the distinction is perhaps murky and mostly accademic.
In practical terms, it comes down to a programmer's workflow. Do you run a compile step? No: interpreted.
Interesting approach, but I'm not sure the Java processor really makes it a compiled language in the typical sense of the term. They still run it off their own custom bytecode, instead of assembling it down to actual machine language for common architectures. It's kinda "cheating" to me (but props that it works regardless).
I'm playing devil's advocate.
It's not all that exotic, ARM has Jazelle.
Similarly, many CPUs have complex instructions that are decoded at execution time into smaller, native instructions (i.e "interpreted").
Since the beginning Java has had
javac
- the "Java compiler". Surely having a compiler makes you compiled.But mostly it doesn't really matter. It's an imprecise term that can be interpreted as pedantically as you like (pun intended).
Remember the definition, which I didn't make up.
You can compile to anything, but if the product of the compilation is not actually executable in and of itself, but requires an interpreter to execute, isn't that still an interpreted language?
After all, Python compiles to Python Bytecode (
*.pyc
), but that still can only be executed through the Python interpreter. According to the language's own documentation, Python is an interpreted language, and not a compiled language.There is no debate, really. Not in the academic computer science, at least. Any attempt at defining an "interpreted" language outside of CS is doomed to fail.
There is no such a thing as an interpreted or a compiled language. There are languages that make it hard to produce an efficient compiler - e.g., some very dynamic languages like Python, or languages with fexprs, languages heavily relying on runtime reflection, etc.
It is still possible to have a not very efficient compiled implementation for such languages though, so we cannot use a presense of such features as a definition for an "interpreted language".
Besides that, nothing in a PL semantics says it's "compiled" or "interpreted", not to mention that the boundary between compilation and interpretation is very blurred.
EDIT: another important point - a programming language is defined by its semantics, while syntax and grammar are only of a secondary importance. And how do we define semantics? Normally, via term rewriting rules, i.e., an interpretation. When you look from this angle, all the languages are "interpreted first".
At least two transformations happen for any code to run on your computer, transformation from the source to machine code, and transformation from the machine code to actual state changes internal to the hardware that do what you want.
Some languages add an additional transformation from the source to an intermediary code that then gets translated to machine code (C and FORTRAN translate to assembly before machine code, Java goes to JVM bytecode, C# goes to CIL bytecode, etc), and some hardware adds an extra layer between the CPU instructions and the internal state changes (most modern x86 CPU's translate from the high-level x86 'machine code' to a different lower-level machine code specific to the micro architecture), but both cases still fit that 'two transformations' model at a large scale.
The classical differentiation between compiled and interpreted languages is when the series of transformations from source code to machine code actually happens. For compiled languages, it's done ahead of time. For interpreted languages, it's done at runtime (either while executing, or in a single pass right before execution). Some languages though don't quite fit this concretely (Java - which may be compiled directly to machine code, or might be compiled to JVM byte code which is then transformed to machine code at runtime - is a good example of such a language).
These days though, the primary differentiation most people think of is that interpreted languages have the option of some kind of interactive REPL, while compiled languages usually do not. Pretty much every language can be classified in this manner, though it can be fuzzy here too (see for example: github.com/evmar/c-repl).
Ultimately though, it's largely irrelevant these days unless you're doing cross-builds or porting to a new platform (languages that fit the classical definition of being 'compiled' tend to be easier to use in both cases). Yeah, it has some impact on how you might develop and debug, but that impact isn't anywhere near as binary as the terminology implies, and may even vary by individual workflow.
Haskell is interpreted? I thought it was compiled.
There is at least one (unsupported and outdated) interpreted implementation,
Hugs98
.I was under the impression it was interpreted, but I'll change my example to Ruby just to be safe.
A few thoughts.
Does it actually matter?
"Interpreted language" should really be regarded more as a shorthand. It's not really the language which has the nature of being interpreted or not, but the execution environment. It just that some programs in some languages are more commonly run via an interpreter execution environment than others, and some languages lend themselves better to such an environment.
If a program is compiled (cross-compiled?) to bytecode, which is then translated to machine code immediately prior to it being run, then it is the bytecode which is the interpreted language, not the language in which the original source code was written.
It's really about what operations take place in a just-in-time-to-execute fashion. If the program is translated from its stored form to machine code just in time to execute it, then the program in the language in which it was stored can be said to have been interpreted from that language.
I think this makes the most sense. I'm a Java dev, and it doesn't matter that libraries are linked at runtime and bytecode is interpreted dynamically. Java has a compiler step, and it's actually quite difficult to get Java source code interpreted and run at runtime. There are solutions to execute Java from source at runetime, but they are all difficult and feel very hacky. Not to mention, the Java compiler is not distributed in the normal runtime, that's why there's a difference between the JDK and the JRE. Java is not an interpreted language, JVM bytecode is. But there are other JVM languages (Groovy) that do ship with their compiler and allow dynamic execution from source code, which are capable of working precisely because JVM bytecode is interpreted.
Now this I find intriguing.
Would you then say that Java is a compiled language? And, if so, what term do we apply to a language compiled down to machine code (C++), but not to a language compiled down to bytecode (Java). That seems like it would matter, since shipping a completed C++ project (here's the binary file, have fun) and shipping a completed Java project (runtime needed) are vastly different undertakings. What term do we use to distingish? "Assembled" language?
That's a matter of semantics. They're both compiled, because neither one can execute its own source code at runtime without considerable complexity. It's just that C/C++ is a language that effectively ships with it's runtime environment, Java is not.
I'm not sure this is accurate. By time a C or C++ application has been compiled and assembled, it is completely machine code, and is executed directly by the CPU. If it does not rely on any dynamically linked libraries (dependencies are a whole other topic), it can be executed on any machine for which the machine code is intended (X86, ARM, whatever). It isn't "shipping with its runtime environment," it doesn't have one.
Keep in mind that there is a difference between Java and the JVM. There are many different languages that run on the JVM, and not all of them are compatible with Java source code or Java libraries.
What I want to know is, is it commonly possible to compile (and assemble) Java down to pure machine code, such that it can be executed directly by the CPU without the "runtime environment", in the manner I described is true of C/C++?
P.S. I'm not talking about the special Java-specific hardware. I'm talking about conventional, mass market CPUs.
And that's why I said "effectively". If you want to be pedantic, the OS could technically be considered the runtime environment, since even native binaries are dependant upon OS system calls to request memory, files, networking, etc.
Why does it matter what it's compiled down to? The fact remains that you don't execute Java code from source, you compiled it and execute the bytecode. It doesn't compiled to the same thing as native languages, but there is an explicit compilation step.
Perhaps the word your trying to describe is more accurately "embedded" than compiled?
What I find frustrating about this sort of conversation is, every time I talk to a Java developer, they have to keep moving the finish line (shuffle definitions, frequently add/remove pedantry) in an attempt to define how Java somehow "isn't interpreted." Python doesn't do this sort of dancing around the point; our language is interpreted, and there's no point in pretending it's otherwise.
Java is compiled to bytecode, which is executed by an interpreter. Without a Java interpreter on the target machine, the Java code cannot be executed. It is seldom, if ever, compiled/assembled down to machine code. This is a clear and distinct difference from C/C++, Ada, FORTRAN, COBOL, and many other languages traditionally called "compiled languages", which are compiled down to machine code and executed without the need for an additional interpreter.
Is there some sort of mass feeling in Java that identifying it as an interpreted language (in the most straightforward sense of the term) somehow delegitimizes it as a "real language" (which, it really wouldn't)? I'm really wondering at this point.
I just fail to see how a language that has an explicit compilation step, from source to binary bytecode, could be called anything other than compiled. Your can't easily work in a REPL, your can't dynamically evaluate it from a String at runtime. Regardless of the implementation details, you use it in such a manner that once it's been compiled, you can't change it. The way you think about using it is fundamentally different from an interpreted language. In Python, PHP, Ruby, JS etc. you generally focus on including files to bring in libraries. You cannot do that in Java, or even anything like that without considerable complexity. You need to have your libraries compiled and ready to use bundled with your app at the moment it starts up, which is a fundamentally different way of thinking about dependencies, much more akin to a traditional natively compiled language.
All I mean by interpreted...and, actually, all I've ever known interpreted to mean...is that there's an additional software layer between the compilation result and execution.
Compiled language:
Final Compilation Result => CPU
Interpreted language:
Final Compilation Result => Interpreter/VM => CPU
That's all it ever means. Dependencies don't enter into it. Also, what you're describing with the REPL is an interactive language.
By the way, Python also has an explicit compilation step. It just doesn't demand the programmer invoke it. That doesn't make it any less an interpreted langauge.
But it still isn't a (traditional natively) compiled language. "Compiled language" needs to mean exactly what it has always meant, or we're going to confuse people. Which...is exactly what happens.
The way I see it, Java is an interpreted language for the reason stated above — we cannot pretend it doesn't (commonly) need that intermediate layer for the shipped result to be executed — but it is not an interactive language.
Yes, I get that Java is technically "interpreted". But the semantics of the language are not the same as the semantics of Python or other more common "interpreted" languages.
Try to think of it from the perspective of a new developer, who doesn't know the difference between the two. If you tell them that Java is not compiled, then they will be extremely confused when you tell them they have to compile it before using it. That's all I'm trying to say.
While "interpreted" is a part of the underlying infrastructure of Java, it's not a common paradigm of the language, and it does not service to Java to call it interpreted.
Conversely, I don't think it does much to call native languages just compiled, because they are so much more. I think embedded is more accurate for what your describing, helps keep the purity of those native languages, and also keeps the common semantics of "compiled" vs "interpreted" to mean what people typically think of when they hear those words, even if they aren't too knowledgeable of the paradigms/runtime properties themselves.
I'll agree the semantics are different, but the problem is that the misconceptions surrounding "interpreted" and "compiled" are made worse by this sort of pedantry.
When this comes up with Python (again, different...but then we're different from, say, Ruby), we simply define interpreted exactly as I did above, and explicitly separate out all other concerns, including but not limited to...
...et cetera. The confusion ceases immediately.
Maybe "embedded" would be a better term for C/C++, but then again, maybe not. That brings its own baggage, just as much as "interpreted" and "compiled" does.
In any case, the issue is probably that "compiled" is poorly defined. Maybe we should be referring to
source -> bytecode
as transpiling? Maybesource
->machine code
should only be called assembling? Is C an assembled language?In the least, Java devs would do well to say that they're a "compiled interpreted language," and then take the time to separate out the other concerns.
I can definitely agree with you here. These terms are murky, and Java is not purely one or the other, like Python or C++ are. Being more explicit about this in casual conversation might help to clear some of this confusion.
Thank you for such a thought-provoking thread and the good discussion!
Back at you!
By your logic, Python isn't really "purely" one or the other, either. Compilation does happen. Dependencies are handled differently than Java, but the interpreter doesn't just run the source any more than Java's VM does; it (implicitly) compiles it to bytecode first.
Yeah, I think it's not "interpreted" that's not clear. I think it's "compiled" that's the problem.
Just condense statements you've made.
'Java is an interpreted language, it is compiled'
Really the JIT is a system that confuses the definitions because this would be an accurate statement.
'Javascript is interpreted, the JIT compiles it'
Generally not every line is run through the JIT. You could even say
'D is a compiled language, it is interpreted at compile time'
But I think you are trying to bring Java into the same category as Python so you can use it to back your position that Python is a real language.
Java, as a language, is not interpreted. Byte code is not the written language.
Typescript is mixed because Javascript is valid and not compiled in Typescript. It is more analogous to the C preprocessor, which is referred to as a macro language.
Scripting languages are not well defined, I utilize D as my scripting language, but it is fully compiled to machine code. Then you through in JIT and things get more confusing.
To better understand, it is best to look at the term for the time it was emerging. You had C and Bash, Lisp and Fortran. Languages like perl and php follow closer to the style for bash, these languages start execution at the file entry and don't define a special entry (main).
As for inferiority of scripting over a real language, we need to look at the level of understanding necessary to use the language.
Bash required writing your shell commands to a file then calling bash on it. Similarly languages like visual basic would add container iteration. C required learning pointers and memory layout. While scripts could easily build the description of a task, but would be limited in performance. Today machines are resource abundant and optimization techniques are identified.
C++ was long considered a compiled language, but it wasn't until Walter that the first compiler to build machine code instead of C existed.
Pardon my ignorance, I really know very little about Python 😁 I had no idea it converted source to bytecode internally!