loading...
Cover image for Concept of the Day: Homoiconicity

Concept of the Day: Homoiconicity

awwsmm profile image Andrew (he/him) ・3 min read

photo by sum+it from Pexels

The word homoiconic can be understood by inspecting its roots: homo, meaning "the same" and icon, meaning "representation". According to Wikipedia, a language is homoiconic "if a program written in it can be manipulated as data using the language, and thus the program's internal representation can be inferred just by reading the program itself."

But this definition can be confusing. Most languages have an eval() function, or similar, which can read text (maybe from a file) and execute that text as though it were source code. Does this mean any program with eval()-like functionality is homoiconic? Not quite.

In layman's terms, a programming language is homoiconic if its internal and external representations are the same. In a perfectly homoiconic language, source code can be run, immediately, without any interpretation. Because homoiconicity precludes the need for any interpretation -- if the external representation matches the program's internal representation, what is there to interpret?

This strict definition excludes essentially any programming language from homoiconicity, since even a textual representation of binary code is still different from the binary code itself.

Accordingly, even languages like LISP, which purport to be homoiconic, are more correctly described as paraiconic, a term that the link above proposes. In these paraiconic languages, the source code of any program is itself the source code for a particular data structure within that language, which can be eval()-uated and manipulated as an object within that language. This is where the common definition of homoiconicity, "code as data", comes from.

One way to imagine this would be a theoretical language written entirely in JSON (JavaScript Object Notation):

{
  "main" : {
    "type"   : "function",
    "params" : [ "args" : [...] ],
    "body"   : [ ... ],
    ...
  }
}

If this language also had the ability to manipulate JSON files, then it could perform any arbitrary alteration to its own source code. This is why the "code as data" idea makes some sense. Another (maybe slightly simpler) example is LISP, where an S expression can describe an entire program, but is itself an object within the LISP programming language:

(1 2 3)

The above LISP code -- an S expression -- creates a list of three elements 1, 2, 3. The string representation of that object -- (1 2 3) -- is precisely equal to the code needed to create the object itself. Since LISP can manipulate S expressions, and since all LISP programs are written entirely in S expressions, every LISP program, no matter how complex, is simply a LISP list. Every LISP program is an object which can be manipulated using the LISP language.

This is not the same as the internal representation of the program (the abstract syntax tree, AST), despite some claims online.

This is not true for a language like Java, where the string representation of an object is often different than the source code required to create that object:

jshell> int arr[] = { 1, 2, 3 };
arr ==> int[3] { 1, 2, 3 }

jshell> arr.toString()
$3 ==> "[I@1bce4f0a"

In order to recreate this in a language like Java, for instance, we would need to be able to recover, via reflection or otherwise, the actual name of the variable arr. This is not currently possible in Java. Additionally, we would need to be able to recover the type of the arr object, which -- thanks to type erasure -- is also unavailable. All Java objects are of class Object at runtime.

Paraiconicity means that LISP programs can evaluate, interpret, and modify other LISP programs very easily. Since a properly-formatted S expression can be interpreted from its string representation, and since all LISP programs are simply complex S expressions, LISP can easily read in a LISP program from an external file and manipulate it as an object. "Code as data" indeed.


I hope this explanation has shed some light on the differences between homoiconicity and the more common paraiconicity, and how some languages enable this property while others make it difficult or impossible.

Discussion

pic
Editor guide
Collapse
vlasales profile image
Vlastimil Pospichal

S expression in Lisp may be program, input data, output data, database, query language, ...

In case web communication in:
HTML -> Lisp
CSS -> Lisp
JS -> Lisp
...
Lisp is a simplest language for transfer, store and manipulate information.