(Image from Stefan Alfbo's Post)
What do you think of when "Referential Transparency" comes to your mind? Perhaps you are thinking about purely functional programming languages. But People who think of it only this way, do not have a complete picture of referential transparency.
———
Conception
I have been browsing Stack Overflow and found no satisfying explanation about it. For me, it's no surprise that I know of no language which allows for expressing real-time and system-programming behaviour in a referentially transparent way.
Usually, referential transparency (in programming languages) is seen as a subset of expressions or functions whose result value can be replaced by a constant value without changing program semantics. Coders believe, it is necssary to constrain themselves to obtain referential transparency. They think of functions which do not have side effects. While this is just a special case of referential transparency, this is not a necessary constraint.
Here is, how I understand Referential Transparency:
Referential transparency is a property of a type system. With referential transparency, the system offers types whose element values can be used to replace any specific language expression without causing a change in the language's specified execution semantics.
———
What we need for referential transparency are values that can convey execution semantics (as much as they practically matter), including side effects.
This means, we could have a referentially transparent OOP language if the type system would be complex enough and permits type checks.
Code blocks or thunks would be values of such types. We could specify a parameter type to expect a value whose semantics does not alter shared variables. Or specify that a parameter value calls a function or writes IO a given number of times in a specific order (a side effect iterator of some kind).
Truth is, purely functional programming languages have side channels and side effects too but they are deliberately omitted from the language's semantics specification. If not optimized away, function calls have time and memory cost. Leaky Abstraction is another catchphrase for this issue which occurs when internal details are excluded from the public specification of abstractions. They produce effects that cannot be explained within the simplified specification of the language. Information leakage is due to incomplete explanation of the abstraction, not due to the abstraction itself.
Referential transparency is not a property reserved to functional programming. The problem just is, other popular languages are not trying to enhance their type system in adding side effects to their type checking.
In comparison, if we ask a competent human being, the semantics of programming code is sufficiently transparent (even when it needs compiler- and computer-specific knowledge). If we ask a competent developer to refactor code into a method call, they are able to do it (if anyhow possible). With improved types, programming language interpreters could refactor code for optimization purposes.
Global variables, (real)time, space requirements or other system functionality (IO, storage access and system calls) are typically considered to be side effects (technically, memory access is also one due to modification of caches). But, they are mainly called "side effects" because that stuff is not modelled in the type system.
———
Referential Fallacies
Referential transparency arises when semantics is a closed system. In natural language however, words generally have multiple or contextual meanings and may dynamically change over time, the expressed intension might even differ from the words' direct meaning, like in idioms or ironic statements.
If two things are equivalent, they can be exchanged and it will not alter the meaning. Referential opacity is the case where words are not equivalent even if they share a reference to the same entity (linguists say "co-referential").
Example time. If a person has two names "Al" and "Bo" (of course I mean Alex and Boris, what were you thinking 😉?), these names are not equivalent. The expression "A name of Al" has even multiple meanings (it could be "Al" or "Bo" or both). If we use "the" as in "The name of Al", we are refering to something that mathematicians call a "principal value". It produces the same meaning as "The name of Bo". (I interpret it as "The [principal] name of Al".)
The reason, why "Al" and "Bo" are not equivalent is, because words are more than one reference. They also refer to themselves as a name entity in a second meaning. (Disambiguating the meaning is subject to a context.) The sentence "Al begins with an 'A'" shows the second referential meaning of "Al": the reference to itself as a name object. In that regard, "Bo" is not equivalent because when "Bo" refers to a name object it's a different one with different letters, even if both names have the same referent (in a temporary context!).
Let's take a case from a popular animation show: assume, "Ladybug" is a cover name of "Marinette" and there is a boy "Adrien" who does not know of "Ladybug"'s true identity. We know that "Adrien loves Ladybug". Is the proposition referentially opaque because "Adrien loves Marinette" does not hold? No. Objectively seen, it is still true, because "Marinette" and "Ladybug" may refer to the same person and Adrien just does not know it. Subjectively, if Adrien would be asked about it, he would probably say yes or no to the second statement, because he doesn't know the truth. (This is certainly the notable part early in the show, that a real perception of (no) feelings doesn't have to match its actual truth.)
With another argument, Ladybug has her own skills and personality aside from Marinette. Both names do not have the same meaning and so the meaning of "loves Ladybug" and "loves Marinette" really is a different one. The previous argument would hold however if we had two names which are exactly aliases, not differing in context.
———
The previously mentioned Wikipedia article also claims _ lives in _
(e.g. She lives in London.
) were referentially transparent and _ contains _
(e.g. 'London' contains 6 letters.
) were referentially opaque. Both does not hold in general, there is no referential transparency property for natural language predicates. It is the verbatim context (characters appearing inside a string) which makes the quoted part of the sentence referentially opaque. Of course I can correctly say "The name of London contains six letters." (or the name of England's Capital) and I can say "She lives in 'London'" (but not "She lives in 'England's Capital'").
Does this mean, code inside string quotes is referentially opaque? Not in general. Funny thing is, did you know that Mr. Quine also is the one who came up with the idea of string interpolation in logic? (He called it Quasi-Quotation, using ⌜…⌝ as quotes.) Because in the context of string or code interpolation, we can make string contents referentially transparent.
Check out this case of referential transparency: She lives in ⌜{_}⌝
or ⌜{_}⌝ contains 6 letters
. (We can use London
and Capital of England
at the same time.)
Typically, referential opacity occurs with words which are co-referential in the limited definition of a typesystem but not sufficiently close to the typesystem in the actual meaning.
———
Prospect
Why after all would it be a problem to think referential transparency as only side-effect-less functions, if it works?
Because side-effects matter in practical software, the ability to hide extraneous mechanical details with side effects can reduce reading and writing overhead. The rejection of side-effects can limit expressivity. One consequence of the view of "referential transparency → no side effects" is, strictly referentially transparent languages (that I know of) have no realtime semantics (Haskell only has one for debugging purposes).
Nothing prevents us from defining types with time, space or other side effect semantics. A value with a duration of 3 seconds could be interpreted as a busy waiting loop with a sufficient number of instructions when it takes effect. A value can have a sequence of certain effects as properties, such as IO, memory operations or memory constraints. This means, we could specify allocation regions as part of a type. These "effect values" only need to be first-class citizens and can also be combined with other values.
An impediment could be, it makes type checking more complicated. A future-relevant language should make an ideal specification and treat realization as a research challenge. Maybe it isn't decidable in 100% of the theoretical cases but why not still permitting types for the cases that practically are decidable by a resource constraint? It does not sound plausible to me that someone actually needs code in practice which must have undecidable semantic properties. Practical software usage is bounded by limited resources. Code which exceeds specified resource limits of time and memory is unacceptable.
This is the end of my monologue for now. You think I confused something? Let us know in the comments.
Top comments (0)