This is something I've been thinking about for a while. Why are we creating programming languages for humans, rather than for an IDE, allowing the later represent the program in human-readable format?
If you're doing math, text sucks.
c = sqrt(pow(a, 2) + pow(b, 2))
It gets worse with non-primitives such as vectors in language that pass structs around by reference. If you want to avoid expensive memory allocations, you're stuck with:
add(c, pow(temp1, a, 2), pow(temp2, b, 2))
These are just the simplest formulas. Shader code I wrote back when I did graphics engine development was a nightmare to debug because it was so obtusely represented.
Then there's variable naming. Taking an example from math again: P(A|B)
has a precise and well-understood meaning, plus it's concise and easy to recognizable. Unfortunately the majority of programming languages don't allow us to name a variable P(A|B)
. Similarly, camelCase vs snake_case is a debate centered entirely on the inability of using spaces in variable names.
Another problem, posted a while ago: how can we code on a smartphone? Virtual keyboards are not great for code, and graph-based solutions might work, but I'm pretty sure not many people like drawing graphs on their desktop.
Tabs vs spaces is another debate based entirely on the assumption that code is text. Meanwhile, either approach is a severely outdated alignment tool.
In each of these cases, the key problem is that there is a tight coupling between formatting and semantics. I.e. we display the code as plain text, and that's what it is.
Ligatures and custom operators offer some help but are ultimately tacked-on solutions that fail to address the core problem.
So what if we changed our files from being plain text to something richer and more structured? By removing the coupling of formatting and semantics this way, we can also use wildly different formatting (e.g. a graph editor) on different systems, but modify the same underlying semantics.
What do you think?
Top comments (51)
You're missing the point: text files are easy. They can be a little verbose sometimes, but as a 'substance of expression', if you will, they're unmatched. You can create or modify text with the single most common human-interface device metaphor in existence, which children now learn to manipulate in or even before schooling. You can read text with a million different programs and style it, filter it, format it, cut and paste and perform a billion different operations on it. A certain clunkiness in writing out equations is a small price to pay for that kind of flexibility.
If you were to "decouple" formatting and semantics like you propose, you wouldn't be able to avoid coupling the semantics to the editor instead. And there'd only be one editor, until you developed something else that could interact with your structured representation of a syntax tree. This isn't to say it can't or shouldn't be done -- APL had its day, educational tools like Scratch do well, and there are of course a variety of "no programming experience required" flowcharting and modeling languages -- but it's an idea that can only compete against text files in some pretty specific niches.
Not necessarily, css/html is also a decoupling of formatting and semantics. The main point here being that the semantics on their own are no longer supposed to be read by humans.
HTML and CSS also aren't programming languages as such but markup and styling languages (although HTML5+CSS3 are evidently Turing complete if you're willing to put in the effort). But this is actually an interesting point: WYSIWYG markup editors which truly decoupled formatting or visual layout from the semantics of data binding and interaction were a thing in the 00s before everybody realized they were terrible and concentrated on building better plain-text templating languages instead.
True, but I wasn't suggesting using exactly those. In most languages, a program has a certain structure (rather like an AST, but not entirely). That is the "meaning" of a program. Where HTML has an <img> tag, an imperative program has a while loop.
Indentation, variable names, operators, import statements... those are all styling for humans. You could drop all of them and the language would be just as expressive, but not as readable.
This separation already exists. The source code defines the semantics, and the configuration of the editor describes the styling.
I think looking at Smalltalk will answer a lot of your questions. Smalltalk is exactly what you're describing: a language built for an IDE. It can't be used outside the IDE, as it doesn't store its data in plain text, but rather an image format. The IDE provides all sort of nice features and analysis to the user, and ideas from Smalltalk have influenced many other languages and IDEs since its creation in the 1970s. So why aren't we all using Smalltalk? I think the key lies in interoperability. Smalltalk is a world of its own. It doesn't interoperate well with tools that exist outside of the Smalltalk world. For example, you can't really benefit from git when you can't understand how to merge code in a binary image format. When I need to accomplish a specific task (say, some sort of build task), I need to know how to accomplish everything I need in the Smalltalk world, using its tools. Smalltalk goes against the unix philosophy: it doesn't do one thing, it does everything because its a mini virtual machine.
I don't have the authority to say if this is the entire reason we don't code in Smalltalk-like languages, but I think its part of it. There are plenty of new languages trying to push programming languages in new, interactive directions (ex: Eve), but none of them have gained critical mass or mind share. There must be some intrinsic reasons that languages like this don't take off. Hope this provided some insight!
I'm not an expert in the question, but as far as I know the main reason of SmallTalk failure was licensing model - it was very expensive. I consider Ruby the closest reincarnation of SmallTalk OOP model (which is widely used). Ruby doesn't have "forced" IDE though.
I've honestly never used Smalltalk, guess I will have to take a look at it.
I haven't used it much either, but what I've seen has been interesting. Pharo is a pretty modern implementation pharo.org/.
I devoted my entire PhD to the pursuit of a programming environment that goes beyond just text. At the core, I decouple what developers read and write from what's stored on disk. This enables significant enhancements to both the development UI and the program code compared to text-based systems.
I developed a prototype IDE called Envision to explore these ideas. Here is a youtube playlist with 5 short videos highlighting features you might find interesting.
In case you want to dig into the research here is the project page at ETH Zurich that has freely available PDFs of all our publications. I recommend just looking at the final PhD dissertation, as it contains extended versions of all the papers:
Envision: Reinventing the Integrated Development Environment. All the publications (and especially the dissertation) contain lots of screenshots that illustrate the main points.
As of now, Envision is somewhat on hold, as I've finished my PhD, but I hope to get back to developing it more actively soon. You can find Envision's code on GitHub.
Although I think we definitely should look for alternative ways to program, the concept of Envision is IMHO a dead-end. I've seen approaches like this before, one has even made it into a product I had to use at work. Everybody hated it. Here's the problem:
As long as code is text in your thoughts, there's no better visualisation for it than (syntax highlighted) text. Graphical representation of text is utter shit. It makes it hard to write, hard to version/diff, and hard to view anywhere else than in the IDE it has been developed in.
If you want to abstract code, don't try to display code. Slice code into packages/modules, display them as icons and orchestrate them! That's the way to go. I'm pretty sure about it. However you then have added the complexity of another abstraction layer.
Thanks for your comments, Thorsten. I am very curious what product you used that everybody hated, would you mind sharing?
I have myself used a few visual systems at work, such as Labview and Siemens Plant Simulator. It's true that these systems are not easy to use outside of their specific domain. Unlike these systems, Envision has been designed from the ground up to be generally applicable.
Regarding some of your other points:
Software developers do not "think in text". Developers think in abstractions (such as classes, functions, modules), control flow (branches, loops), data flow (steps of algorithms and data transformation), etc. Thinking in text would imply that the syntax of a language (as opposed to its semantics) may somehow influence the design of a system or a function, which is not the case.
Once we have decided on a design we have to create a corresponding program. This is mostly done as text, but doesn't have to be. As long as whatever editor we're using nicely maps to our mental model, things can work out smoothly.
Syntax highlighted text is in fact a basic form of a visually rich presentation of code. One way to think of the visual aspect of Envision is syntax highlighting on steroids.
Graphical representation of "text" might be utter shit, but we're talking about graphical representation of programs. For example:
Graphical representations absolutely include text where it's the best way to communicate something. E.g., in most cases showing expressions as text is a great option.
We have specifically designed Envision to support keyboard-based editing and shown that it is is as fast as typing in a text-editor.
Again, we have specifically designed a version control system that integrates with Git and provides a number of improvements over standard text-based diffs, both in terms of presentation and diff accuracy. You may want to watch the corresponding video and/or see the paper.
As long as the storage format is open and simple (both of which are the case for Envision), any number of editors can be made for it and show it in any number of ways. Take, for example, png image files. You can open/view/edit them in a number of different programs, each with its own strengths and weaknesses.
I agree. This is part of what Envision does.
This is true, but I see it as a strength, not a weakness. This extra layer allows us to decouple the backend (program structure/code) from the frontend (editor/visualizations/text) and enables both to evolve in ways that are impossible if they are coupled.
Thank you for your detailed reply. The "tool from hell" is SwissRisk's X-Gen, a transformation tool being used at some banks. One might think that it's highly specialised on orders and trades, but it's rather generic and can handle any data as long as you're using XML. But here's the point: the design philosophy of the IDE seems to be based on the assumption, that typing (like on a keyboard) is bad. Unfortunately that's the only "innovative" idea, thus the graphical building blocks that you can drag'n'drop in the IDE are in fact just representations of elements of structured programming. What does that mean? Well in order to program something like this...
...which you can type in a matter of seconds, you will have to complete the following steps in X-Gen:
You see, this tool really represents text as graphics. It does not even try to step up onto the next abstraction layer, it just makes it really hard to write code by disabling typing for all the keywords.
Why did I write "code is text in your thoughts"? After a long day of coding it happens that I dream of code and then I really see text. Syntax highlighted code. But that's probably just because I stared on it for countless hours. It's not what I think when I am working on code. So yes, I was wrong. Developers think in abstractions, I totally agree with you on that.
I guess this all leads to the question: What's a good abstraction layer for graphical representation? I'm pretty sure the answer is "it depends". When documenting/presenting I like to use Visio diagrams (and ASCII diagrams) for giving an overview of the system, I'm working on. However these diagrams have very different grades of detail, depending on the importance of the components for the audience. So a shape can represent a bunch of hosts (not important) as well as a single function or REST call (important). An IDE on the other side should present a consistent level of abstractions with similar grade of detail for all (technically) equivalent components.
I'll definitely check that dissertation out!
Donald Knuth described Literate Programming in 1979.
One of my coworkers at a previous company was Raymond Chen. As a grad student under Donald Knuth, he got to program using Donald Knuth's Literate Programming.
Raymond recommends against that style of programming.
Why does Raymond recommend against that style of programming?
Tooling is very poor. Debugging is very difficult. Documentation-and-code still become out-of-sync just as comments-and-code become out-of-sync, despite proximity (in both scenarios).
As yet-another-alternative to traditional text-based source code files, there are some potential novel ideas from Bret Victor, some alternative IDEs such as found in Lego Mindstorms that are visually oriented rather than text-oriented, Smalltalk style IDE where the code is in the general environment, old DEC Forté style (pre-JavaScript) IDE where the code was in a database backing store, and novel ways of having text-based source as in Light Table.
So there are people working on the leading edge. Maybe one of those concepts will become mainstream.
Finally, someone else who gets it! "Code as text" as a paradigm feels painfully outdated. It seems so obvious that we can do better. The comments here are a pretty good guide to what pitfalls we'd need to avoid:
don't be Scratch, interop with GitHub, find a way to leverage whatever the hell the vim power-user community is. Don't just be literate programming. It feels doable, though.
Have you ever tried the Lightbox IDE? It lets you put print statements in your code and see what they evaluate to on an example input inline, for multiple test cases, as you edit. It's a big step towards the feel of programming in a spreadsheet while using a real language.
I haven't, sounds interesting!
And here I got the name wrong - it's Light Table, not Lightbox
This sounds a lot like the structure editor Facebook was working on, but extending the idea further so that limits on what the contents of a node (eg. a variable name) are are removed - very intriguing!
Text-format math is harder to read, but it is much easier to edit and write.
Navigating a one-dimensional line of text can be done with two buttons; add two more and you can add line-oriented editing, but that's optional. Editing a multi-dimensional equation, like your version of the distance formula, means you have to come up with an interface for selecting just the radical, or just the equation that you're taking the root of; you can't do that with normal arrows and drag-select.
It's the same set of problems that any kind of WYSIWYG has, now that I think of it. Just because source code is read more often than it's written doesn't mean you can completely neglect the editing experience.
A very good point!
I do think this is solvable. E.g. if you write latex markup, there is a line-based counterpart to the formulas in the compiled pdf. If the relation between markup and formula is isomorphic, navigating with arrow keys in the formula is possible, because it is in the markup.
That's what it is to us - obviously to the interpreter/compiler/running process it's something else.
What is that something else that we could use to represent a program that is at the same time a plain text file? Some form of data format that could also be read as the AST of the program...
Say, you've got MSc. in AI - you must've heard of a once-popular AI language called Lisp at some point? You know, the one where the code is the data and the data is the code? Where you can see the AST right in front of you because of the ridiculously simple syntax?
Image based coding is so last century sadly, despite the most popular IDE in the world being Microsoft Excel. Plain text is... well, plain. No real worries about reading and writing - or forwards compatibility. Even Smalltalk can be represented as a text file.
Try something from Wolfram
I'm not really sure what you're trying to say here...
Fair enough... reading it back I'm not sure either!
Have you tried LabView?
Arent some parts of MATLAB supposed to help with this sort of thing (I'm not experienced with MATLAB, it's an assumption based on what I've heard about it).
But.... LabView is awful, if you need to refactor it's very difficult. If you need to debug, forget it. Plain text code is easy and perfect for standard software development, for scientific development (i.e. mathematics, grahpics) which involves complex equatics I would expect there are libraries which allow you to express math formula as plain code?
Why would you want to use spaces in variable names?
Why is snake Vs camel case a problem?
If you have a variable which holds the value for P(a|b), then use a creative name, which is what that value represents (I dont know what that expression is) so assume its something like ambient_pressure (I don't care how its value is calculated, the name is descriptive of what it is.....
A huge problem in code which I deal with on a daily basis is reading stuff like this(python syntax):
cv_to_ddv(cf_df):
I mean, what the hell is that? No comments, nothing, and the guy who wrote it left the company!!! I have to now go search where it's used and try to interpret its use to understand this functions purpose.... So it turns out it means:
Convert compensation voltage to derived dispersion field
So the name is totally rubbish. Naming stuff is one of the hardest things in writing software because it describes what you are doing. When you look at some complex equation you will "read" it, so text should also be able to be used to describe it.
I've used matlab a little bit. I might have missed something but I think it made the problem worse by just turning everything into non-standard operators to stay within the ascii characters and monospace/text format.
LabView I know nothing about.
Because we create variable names composed of multiple words.
fooBar
is less readable thanfoo_bar
is less readable thanfoo-bar
is less readable thanfoo bar
. Spaces are also easiest to write. The reasons not to use spaces is that it conflicts with syntax. Also some gestalt principles (characters of a variable are close together), but there's other options for that.P(a|b) is a mathematical notation for "probability of a being true given that b is true". That's a lot of words to write out. This was a real-world problem I've had, especially because I also needed P(a|¬b) and many similar variables. The resulting code was unreadable using full-length variable names.
The meaning of P(a|b) is well understood by people who have a minimal background in Bayesian statistics. So, essentially, it is the right name.
More generally speaking though, because variable names are styling for humans, you could have multiple names for the same variable and use whatever suits you most in a certain situation, e.g. short or long. Although both at the same time sounds like a very bad idea :-P.
In terms of readability using camel vs snake I have to disagree as I've never had an issue reading either syntax, but everyone is different, so for you it's a fair point.
The point you make about naming variables is very true; it's very difficult to map mathematical names to human readable without being obtrusively long. So again, I guess if you do a lot of it being able to use reserved chars in a variable name could be useful...
Thanks for explaining what P(a|b) is, I've never come across that before :)
Colors are not really part of the argument.
I'm talking more about what .doc files are to .txt files, but noting that part of the final look should be determined by personal settings. Like in the text vs graph editor example. Or in your case, having or not having syntax highlighting.