DEV Community

Discussion on: Why is our source code so boring?

Collapse
 
ahferroin7 profile image
Austin S. Hemmelgarn

There's a 'sweet spot' for information density in written languages (natural or otherwise). Too high, and the learning curve is too high for it to be useful. Too low, and it quickly becomes unwieldy to actually use it. You'll notice if you look at linguistic history that it's not unusual for logographic, ideographic, and pictographic writing systems to evolve towards segmental systems over time (for example, Egyptian Hieroglyphs eventually being replaced with Coptic, or the Classical Yi script givning way to a syllabary), and the same principle is at work there.

Most modern textual programming languages are right about there right now. There's some variance one way or the other for some linguistic constructs, but they're largely pretty consistent once you normalize symbology (that is, ignoring variance due to different choices of keywords or operators for the same semantic meaning).


The problem with natural language usage for this though is a bit different. The information density is right around that sweet spot, and there's even a rather good amount of erasure coding built in, but it's quite simply not precise enough for most coding usage. I mean, if we wanted to all start speaking Lojban (never going to happen), or possibly Esperanto (still unlikely, but much more realistic than Lojban), maybe we could use 'natural' language to code, but even then it's a stretch. There's quite simply no room in programming for things like metaphors or analogies, and stuff like hyperbole or sarcasm could be seriously dangerous if not properly inferred by a parser (hell, that's even true of regular communication).


As far as ASCII, it's largely practicality any more. Almost any sane modern language supports Unicode in the source itself (not just literals), and I've even personally seen some stuff using extended Latin (stuff like å, ü, or é), Greek, or Cyrillic characters in stuff like variable names. The problem with that is that you have to rely on everyone else who might be using the code to have appropriate fonts to be able to read it correctly, as well as contending with multi-byte encodings when you do any kind of text processing on it. It's just so much simpler to use plain ASCII, which works everywhere without issue (provided you don't have to interface with old IBM mainframes).

Thread Thread
 
ferricoxide profile image
Thomas H Jones II

How could you not love EBCDIC? :p