Andrew (he/him)

Posted on Nov 4, 2019

Why is our source code so boring?

#discuss #design #healthydebate

Last year, I was privileged enough to see Peter Hilton give a presentation about source code typography at geeCON 2018, and the main points of that talk still roll around in my head pretty regularly. In a nutshell:

source code is boring

Most developers write ASCII source code into plain text files using a fixed-width font, often trying to keep line length under 80 characters out of habit or "best practices":

public class MyClass {
  public static void main (String[] args) {
    System.out.println("oh god it's so boring");
  }
}

Fixed-width fonts and 80-column limits are a throwback to the IBM 80-column punch card which dominated the industry for decades until it was slowly phased out in the latter half of the twentieth century:

When colour terminals became popular, we could add things like syntax highlighting:

public class MyClass {
  public static void main (String[] args) {
    System.out.println("boring, but in colour");
  }
}

And what Hilton calls "the most recent innovation in source code", ligatures, which burst onto the source code scene around 2012, have been in common use in print typography for hundreds of years:

This begs the question...

Why is source code stuck in the past?

In a best-case scenario, modern users will define different fonts or styles for different keywords in VS Code in order to construct some kind of visual hierarchy within their programs:

...but we're not reaching the full potential of what's possible. Why not drag-and-drop modules? Why not AI-powered automatic code generation? Why not graphical coding like Scratch or Flowgorithm?

Why has there not been a single revolution in the way source code is written in nearly a hundred years?

Top comments (29)

Sven Varkel • Nov 4 '19 • Edited

Mine is not boring. It contains all kinds of known and unknown bugs. A lot to research by entomologists ... ;)

Austin S. Hemmelgarn • Nov 4 '19

Line length limits are to keep things easy to follow. Most sane people these days go to 120 or so, not 80 (possibly more), but the concept is more that you're still able to fit the whole line on a single line in your editor, because wrapped lines are hard to follow and horizontal scroll-off makes it easy to miss things. Those aspects haven't really changed since punch cards died off. Some languages are more entrenched in this (some dialects of Forth for example), but it's generically useful no matter where you are, and has stuck around for reasons other than just punch cards.

Monospaced fonts are similar, they got used because that's all you had originally (little to do with punch cards there, the earliest text termiansl were character cell designs, partly because of punch cards but also partly because it was just easier to interface with them that way (you didn't need any kind of font handling, and text wrapping was 100% predictable). These days they're still used because some people still code in text only environments, but also because it makes sure that everything lines up in a way it's easy to follow. Proportional fonts have especially narrow spaces in many cases, which makes indentation difficult to follow at times, and the ability to line up a sequence of expressions seriously helps readability.

As far as graphical coding environments, the issue there is efficiency. Information density is important, and graphical coding environments are almost always lower information density than textual source code (this, coincidentally, is part of why they're useful for teaching). I can write a program in Scratch and in Python, and the Python code will take up less than half the space on-screen that the Scratch code will. On top of that, it often gets difficult to construct very large programs (read as 'commercially and industrially important programs') in graphical languages and keep track of them, both because of that low information density, and because the flow control gets unwieldy in many cases.

As for ligatures, it's hit or miss whether they realistically help. I don't personally like them myself, they often screw with the perceptual alignment of the code (because they often have wider spacing on the sides than the equivalent constructs without ligatures, and it's not unusual for different 2 or 3 character ligatures to have different spacing as well) and they make it easier to accidentally misread code (for example == versus === without surrounding code to compare length against).

I'm not particularly fond of using fonts to encode keyword differences for a lot of the same reasons as I don't like ligatures. It's also hard to find fonts that are distinctive enough relative to each other but still clearly readable (that sample picture fails the second requirement, even if it is otherwise a good demonstration).

You run into the same types of problems though when you start looking at custom symbols instead of plain ASCII text. APL has issues with this because of it's excessive use of symbols, but I run into issues with this just using non ASCII glyphs in string literals on a regular basis (if most people have to ask 'How do I type that on a keyboard?', you shouldn't be using it in your code).

Andrew (he/him) • Nov 4 '19

The funny thing is, APL probably has a higher "information density" than just about any other programming language that has ever been created, and it was one of the very first languages.

But people don't like entirely symbolic code, it seems. We want something in-between a natural, written language and just strings of symbols.

Will we be stuck with ASCII forever?

Austin S. Hemmelgarn • Nov 4 '19

There's a 'sweet spot' for information density in written languages (natural or otherwise). Too high, and the learning curve is too high for it to be useful. Too low, and it quickly becomes unwieldy to actually use it. You'll notice if you look at linguistic history that it's not unusual for logographic, ideographic, and pictographic writing systems to evolve towards segmental systems over time (for example, Egyptian Hieroglyphs eventually being replaced with Coptic, or the Classical Yi script givning way to a syllabary), and the same principle is at work there.

Most modern textual programming languages are right about there right now. There's some variance one way or the other for some linguistic constructs, but they're largely pretty consistent once you normalize symbology (that is, ignoring variance due to different choices of keywords or operators for the same semantic meaning).

The problem with natural language usage for this though is a bit different. The information density is right around that sweet spot, and there's even a rather good amount of erasure coding built in, but it's quite simply not precise enough for most coding usage. I mean, if we wanted to all start speaking Lojban (never going to happen), or possibly Esperanto (still unlikely, but much more realistic than Lojban), maybe we could use 'natural' language to code, but even then it's a stretch. There's quite simply no room in programming for things like metaphors or analogies, and stuff like hyperbole or sarcasm could be seriously dangerous if not properly inferred by a parser (hell, that's even true of regular communication).

As far as ASCII, it's largely practicality any more. Almost any sane modern language supports Unicode in the source itself (not just literals), and I've even personally seen some stuff using extended Latin (stuff like å, ü, or é), Greek, or Cyrillic characters in stuff like variable names. The problem with that is that you have to rely on everyone else who might be using the code to have appropriate fonts to be able to read it correctly, as well as contending with multi-byte encodings when you do any kind of text processing on it. It's just so much simpler to use plain ASCII, which works everywhere without issue (provided you don't have to interface with old IBM mainframes).

Thomas H Jones II • Nov 4 '19

How could you not love EBCDIC? :p

Thomas H Jones II • Nov 4 '19

Humans are funny critters. There was a recentishly-published study about information-density in various human languages. Basically, whether the language was oriented towards a lower or higher number of phonemes-per-minute, the amount of information conveyed across a given time-span was nearly the same.

One of the values of ASCII vice even Unicode is the greater degree of distinctness to the available tokens. I mean, adding support for Unicode in DNS has been a boon for phishers. Further, the simplicity of ASCII means I have fewer letters that I need to pay close attention to. Also means fewer keys I have to create finger-memory for when I want to type at a high speed ...returning us to the phonemes-per-minute versus effective information-density question.

Ben Halpern • Nov 4 '19

Emojis are entirely valid in Ruby, and I think they kind of fit the paradigm as well. A lot of Ruby methods have ? as the last character to indicate it's a question. Such as my_array.empty?.

I could see my_array.🤔 being an intuitive way to inspect an array.

Defining this method is easy as 🥧

def 🤔
  puts "this is the output"
end

Even making it a method of array, such as above, is a straightforward monkey patch.

class Array
  def 🤔
    puts "this is the output"
  end
end

There are gotchas in terms of emoji unicode rendering across operating systems, but there is something to this. Emojis are a big part of our text-based communication these days, so why not in the source code?

P.S.

I find it interesting how Erlang uses very english-esque punctuation techniques...

Comma at the end of a line of normal code.
Semicolon at the end of case statement, or if statement, etc. The last case or if statement doesn't have anything at the end.
A period at the end of a function.

Andrew (he/him) • Nov 4 '19

I see your point, but it's still just different characters in a text file. It's still the same medium.

Computer programs carry more information today than any other form of communication in human history, but they've always been (with few exceptions) text files.

Compare that to the information carried through artistic media. You can paint, sculpt, sketch, photoshop, or screen print and they're all just different kinds of visual artistic expression. Why is there only a single dominant form of programmatic expression?

Is it due to the exact nature of programs? That we need to tell machines precisely what it is we want them to do? Could we write a programming language where we only describe the jist of what it is we want to accomplish and let the interpreter figure out the rest?

Dian Fay • Nov 4 '19 • Edited

Could we write a programming language where we only describe the jist of what it is we want to accomplish and let the interpreter figure out the rest?

You mean like SQL? 😄

Ben Halpern • Nov 4 '19

The closest conversation we have here is "declarative vs imperative".

In frontend world, ReactJS came around to beat the drum claiming its declarative nature, which is fairly true in that you define possible end states and let the program figure out how to make the right changes.

It was a pretty big deal, but yeah, not that transformative.

I think text-based characters are just so damn useful for telling the computer what we want to do. People are so damn good at typing in general, the draggable modules thing is really hard to draw new power from.

It seems like the best tooling in software is augmentative rather than replacing. Linters and autocomplete seems like the kind of intelligence with the most potential to build on top of, and it's generally progressive enhancement allowing you to go all the way down to a text file if you need to.

GUIs that compile to code tend to result in gobbledigook that makes it hard to go in both directions. Apple has still been trying stuff like this for a while and the latest iteration might be more of a happy medium...

developer.apple.com/xcode/swiftui/

I want to think there is a big leap we can make that is entirely outside of the normal way we code, but I just don't think it's feasible to leap right there. I think it's a one step at a time thing that takes decades and decades, because big leaps can just lack so much edge case coverage we care about.

Dian Fay • Nov 4 '19 • Edited

I think art is not the place to be looking for inspiration -- programming languages, while quite restricted in scope, are languages, and in thousands of years we've only come up with so many modes of linguistic expression. It's pretty much just speech and writing, and writing is clearly the superior of the two for this kind of purpose.

Although it's interesting to consider programming a computer by means of tying quipu...

Andrew (he/him) • Nov 4 '19

A quiputer. 😉

I suppose that's true re: writing. But surely there's at least a better way to communicate these ideas.

Sometimes I find myself looking at the array of languages and paradigms available and thinking "that's it?" But then again, the book was invented a few hundred years ago and that's still going strong.

Maybe people will still be writing FORTRAN in 2520.

Ben Halpern • Nov 4 '19

the book was invented a few hundred years ago and that's still going strong.

I do most of my book reading through audiobook these days.

I wonder if a programming language optimized for audio consumption that can be effectively reviewed through one's ears.

idea.🤔

Andrew (he/him) • Nov 4 '19

Morgan Freeman reading LISP sounds terrible and soothing at the same time.

"Open parenthesis, open parenthesis, open parenthesis..."

Peyton McGinnis • Dec 26 '19

Haha!

Thomas H Jones II • Nov 4 '19

I think art is not the place to be looking for inspiration -- programming languages, while quite restricted in scope, are languages, and in thousands of years we've only come up with so many modes of linguistic expression. It's pretty much just speech and writing, and writing is clearly the superior of the two for this kind of purpose.

And, even in writing, it's pretty much just been glyphs and script ...and it's really only recently that we've sorta started to handle other-than-ASCII something resembling "well".

Thomas H Jones II • Nov 4 '19

Maybe people will still be writing FORTRAN in 2520.

Maybe not FORTRAN, but definitely COBOL. :p

Casey Brooks • Nov 4 '19

Why has there not been a single revolution in the way source code is written in nearly a hundred years?

I would argue that there has. To me, "source code" itself really isn't a thing. At least, the typography of ASCII-text code is not the important thing in the way that typography on a website or a book or a magazine article is. For that other media, the text is the end-result of the work, and the way it is presented is, in itself, part of the art of the whole piece.

But with code, the final presentation, the end-goal, is the execution of the source code and not the text of the source code itself. Text, as presented in a source file, is simply a medium for writing executable programs. Thus, typography does not improve the experience of coding in the same way that it does for reading a book or an article. In fact, usage of ligatures, fancy fonts, emojis, etc. in code can often detract from the experience of coding because it obscures what is actually underneath. A program, when compiled and executed, does not understand the ligature for ⇒. It understands =>. So while ligatures can help improve comprehension of prose materials, they can actually hinder the comprehension of code for someone who is not intimately familiar with the ASCII characters that are actually being interpreted.

But source code is not stuck in the past. We just have different mechanisms other than typography to improve comprehension of our code. Things like editor autocomplete, static code analysis, and yes, syntax highlighting, all help to improve our comprehension of the code as a parallel to the way typography improves comprehension for prose. Keep in mind that typography isn't important for its own sake: it is important because it helps our human minds interpret text faster and more accurately.

Code is interpreted and understood differently in our brains; namely, we do not read code strictly top-to-bottom and left-to-right. We understand it through its relationships with other parts of code. Thus, we can't simply expect the same things that worked in prose to help in the same way for code. This goes even as far as why most programmers prefer monospace fonts, because the relationships of individual characters in their columns are important, while it is not in prose.

So while typography has not changed much for source code, I would argue that is it because typography fundamentally isn't as helpful. There are other ways to help programmers that are better than just improving the visual display of the text in an editor.

Casey Brooks • Nov 4 '19

As for why we don't see more things like graphical programming languages, I could draw a parallel to why you see many more people writing blog posts and books than you see people making videos or photo-essays. It's simply easier and faster for humans to compose our thoughts through text than through other mediums. Consider how many thousands of years humanity has been writing, compared to just decades of alternative media even existing, and it's easy to understand why our brains are adapted to prefer text-based media.

Andrew (he/him) • Nov 4 '19

Fair enough, the idea of "information density" was discussed above and it makes sense that a flow chart carries less information per unit area of screen space than some equivalent Python code, for instance.

Andrew (he/him) • Nov 4 '19

Code is interpreted and understood differently in our brains; namely, we do not read code strictly top-to-bottom and left-to-right. We understand it through its relationships with other parts of code.

And yet we still, generally, arrange code into libraries, packages, and files. If the "atom" of programming is the method / function / routine (maybe variables are the subatomic particles?), why don't we have a better way of visualising and modifying the interrelationships between those atoms?

Surely, blocks of code arranged sequentially in a file is not the best representation of those relationships.

Casey Brooks • Nov 4 '19 • Edited

I can see the adoption of component-based programming as an iteration on this very concept (emergence of React and Vue; the change from Angular 1 to 2+; SwiftUI on iOS and Jetpack Compose on Android; even GraphQL; etc). Take what was previously a big jumble of code, and break them down into components that carry significantly more semantic and contextual meaning than just normal functions.

At their core, they're not so different from moving code from one file to another, or from one function to another, but conceptually it really is a big leap forward. Encapsulating/passing data through a tree of components, rather than a system of functions, makes it easier to understand the relationships among them all. These relationships are still expressed through text, but it is objectively better-organized and helps to make the relationships explicit. It feels like the "atom" has taken a big step up from just functions and classes.

Thorsten Hirsch • Nov 4 '19

Actually we have all that stuff already:

drag'n'drop modules: e.g. BPM tools, EAI/SOA tools (and some are pretty useful, you can really "orchestrate" your modules/workflow and get a nice high-level diagram for your documentation for free)
AI powered automatic code generation: no need for AI, code-generators are everywhere (e.g. in your IDE, in the Rails framework, in Oracle SQL developer, importers for WSDL/proto/... files, ...)
Scratch or Flowgorithm... well if these are tools in which you can't type "if...then...else..." on your keyboard, but have to use your mouse to drag'n'drop an [if] box, a [then] box, and an [else] box on some canvas, then this is crap.

Furthermore I believe that our mainstream programming languages have made huge progress in the last decade. Think of Java for example. It was designed when most computers had a single core CPU, so there were no nice abstractions for parallelism. You had to spawn threads manually. But now look at this beauty:

long sum=data.stream()
             .parallel()
             .map(i ->(int)Math.sqrt(i))
             .map(number->performComputation(number))
             .reduce(0,Integer::sum);

All the complicated stuff is hidden behind the scenes. And it's so easy to see what the code is doing, it's as if it's speaking to you. Or think of JavaScript and how beautiful they've hidden the complexity of asynchronous processing behind async/await.

A picture might be worth a thousand words, but pictures are hard to diff. I prefer modern programming language constructs, which make it possible that 20 words are worth a thousand words.

Jean-Michel 🕵🏻‍♂️ Fayard • Nov 5 '19 • Edited

Frederick P. Brooks saw it coming

There is no single development, in either technology or management
technique, which by itself promises even one order-of-magnitude improvement within a decade in productivity, in reliability, in simplicity.
No Silver Bullet

—Essence and Accident in Software Engineering

worrydream.com/refs/Brooks-NoSilve...

Ramón León • Nov 5 '19

Uh, there has been, many people keep attempting this and they keep failing because nothing anyone has come up with yet beats text. There have been countless attempts at graphical programming, they all fail. Why, because nothing is as flexible as language, language is the best medium to communicate intricate details of operations. People have tried to evolve language as well, see Subtext programming language.

Thomas H Jones II • Nov 4 '19

Frankly, I abhor the idea of gist-y programming. That's too in the ballpark of "figure out what I mean".

Humans that speak the same native language and come from similar socio-economic backgrounds have a hard enough time understanding each other. The more divergent we are on communication starting-points we are, the worse it gets (e.g., how many times do you have to resort to transliteration to try to convey linguistic constructs that don't have true analogues across two languages?). The prospect of getting something so alien as a machine to "understand" a human's vaguely-conveyed intent specified in an arbitrary language? Don't get me wrong, I'm not arguing for going back to the bashing out pure-binary or even assembler …but, precision counts.

Personally, what constitutes "boring" is less hewing to conventions like 80-columns of ASCII than the seeming loss of humorous code-comments, error responses and other easter eggs. I'm also old enough to vaguely remember the days when Sendmail would crap-out with a "giving up the ghost" message and it has to have been nearly a decade since I've had to deal with "martian source" networking problems.

Sergiy Yevtushenko • Nov 5 '19

Drag'n'drop is used by Eclipse and Intellij IDEA for refactoring (moving classes and packages).

Graphical coding is not productive. There were a number of IDE's called "Visual Age" by IBM. As far back as the mid of 90's. As tools they were as visual as possible, most coding looked like connecting pieces (UI components) with arrows using mouse. Hardly you can find comparable level of visuality in modern widely used tools. Because it appeared that creating apps this way is slow and distracting.

Programming languages are, well, languages. Most languages are more convenient when used in spoken or written form. Computer languages designed with written form in mind. That's why coding is so text oriented.