Discussion on: Why is our source code so boring?

View post

Replies for: Line length limits are to keep things easy to follow. Most sane people these days go to 120 or so, not 80 (possibly more), but the concept is more...

The funny thing is, APL probably has a higher "information density" than just about any other programming language that has ever been created, and it was one of the very first languages.

But people don't like entirely symbolic code, it seems. We want something in-between a natural, written language and just strings of symbols.

Will we be stuck with ASCII forever?

Austin S. Hemmelgarn • Nov 4 '19

There's a 'sweet spot' for information density in written languages (natural or otherwise). Too high, and the learning curve is too high for it to be useful. Too low, and it quickly becomes unwieldy to actually use it. You'll notice if you look at linguistic history that it's not unusual for logographic, ideographic, and pictographic writing systems to evolve towards segmental systems over time (for example, Egyptian Hieroglyphs eventually being replaced with Coptic, or the Classical Yi script givning way to a syllabary), and the same principle is at work there.

Most modern textual programming languages are right about there right now. There's some variance one way or the other for some linguistic constructs, but they're largely pretty consistent once you normalize symbology (that is, ignoring variance due to different choices of keywords or operators for the same semantic meaning).

The problem with natural language usage for this though is a bit different. The information density is right around that sweet spot, and there's even a rather good amount of erasure coding built in, but it's quite simply not precise enough for most coding usage. I mean, if we wanted to all start speaking Lojban (never going to happen), or possibly Esperanto (still unlikely, but much more realistic than Lojban), maybe we could use 'natural' language to code, but even then it's a stretch. There's quite simply no room in programming for things like metaphors or analogies, and stuff like hyperbole or sarcasm could be seriously dangerous if not properly inferred by a parser (hell, that's even true of regular communication).

As far as ASCII, it's largely practicality any more. Almost any sane modern language supports Unicode in the source itself (not just literals), and I've even personally seen some stuff using extended Latin (stuff like å, ü, or é), Greek, or Cyrillic characters in stuff like variable names. The problem with that is that you have to rely on everyone else who might be using the code to have appropriate fonts to be able to read it correctly, as well as contending with multi-byte encodings when you do any kind of text processing on it. It's just so much simpler to use plain ASCII, which works everywhere without issue (provided you don't have to interface with old IBM mainframes).

Thomas H Jones II • Nov 4 '19

How could you not love EBCDIC? :p

Thomas H Jones II • Nov 4 '19

Humans are funny critters. There was a recentishly-published study about information-density in various human languages. Basically, whether the language was oriented towards a lower or higher number of phonemes-per-minute, the amount of information conveyed across a given time-span was nearly the same.

One of the values of ASCII vice even Unicode is the greater degree of distinctness to the available tokens. I mean, adding support for Unicode in DNS has been a boon for phishers. Further, the simplicity of ASCII means I have fewer letters that I need to pay close attention to. Also means fewer keys I have to create finger-memory for when I want to type at a high speed ...returning us to the phonemes-per-minute versus effective information-density question.