Discussion on: Why is our source code so boring?

View post

Replies for: The funny thing is, APL probably has a higher "information density" than just about any other programming language that has ever been created, and ...

There's a 'sweet spot' for information density in written languages (natural or otherwise). Too high, and the learning curve is too high for it to be useful. Too low, and it quickly becomes unwieldy to actually use it. You'll notice if you look at linguistic history that it's not unusual for logographic, ideographic, and pictographic writing systems to evolve towards segmental systems over time (for example, Egyptian Hieroglyphs eventually being replaced with Coptic, or the Classical Yi script givning way to a syllabary), and the same principle is at work there.

Most modern textual programming languages are right about there right now. There's some variance one way or the other for some linguistic constructs, but they're largely pretty consistent once you normalize symbology (that is, ignoring variance due to different choices of keywords or operators for the same semantic meaning).

The problem with natural language usage for this though is a bit different. The information density is right around that sweet spot, and there's even a rather good amount of erasure coding built in, but it's quite simply not precise enough for most coding usage. I mean, if we wanted to all start speaking Lojban (never going to happen), or possibly Esperanto (still unlikely, but much more realistic than Lojban), maybe we could use 'natural' language to code, but even then it's a stretch. There's quite simply no room in programming for things like metaphors or analogies, and stuff like hyperbole or sarcasm could be seriously dangerous if not properly inferred by a parser (hell, that's even true of regular communication).

As far as ASCII, it's largely practicality any more. Almost any sane modern language supports Unicode in the source itself (not just literals), and I've even personally seen some stuff using extended Latin (stuff like å, ü, or é), Greek, or Cyrillic characters in stuff like variable names. The problem with that is that you have to rely on everyone else who might be using the code to have appropriate fonts to be able to read it correctly, as well as contending with multi-byte encodings when you do any kind of text processing on it. It's just so much simpler to use plain ASCII, which works everywhere without issue (provided you don't have to interface with old IBM mainframes).

Thomas H Jones II • Nov 4 '19

How could you not love EBCDIC? :p