This is something I've been thinking about for a while. Why are we creating programming languages for humans, rather than for an IDE, allowing the later represent the program in human-readable format?
If you're doing math, text sucks.
c = sqrt(pow(a, 2) + pow(b, 2))
It gets worse with non-primitives such as vectors in language that pass structs around by reference. If you want to avoid expensive memory allocations, you're stuck with:
add(c, pow(temp1, a, 2), pow(temp2, b, 2))
These are just the simplest formulas. Shader code I wrote back when I did graphics engine development was a nightmare to debug because it was so obtusely represented.
Then there's variable naming. Taking an example from math again:
P(A|B) has a precise and well-understood meaning, plus it's concise and easy to recognizable. Unfortunately the majority of programming languages don't allow us to name a variable
P(A|B). Similarly, camelCase vs snake_case is a debate centered entirely on the inability of using spaces in variable names.
Another problem, posted a while ago: how can we code on a smartphone? Virtual keyboards are not great for code, and graph-based solutions might work, but I'm pretty sure not many people like drawing graphs on their desktop.
Tabs vs spaces is another debate based entirely on the assumption that code is text. Meanwhile, either approach is a severely outdated alignment tool.
In each of these cases, the key problem is that there is a tight coupling between formatting and semantics. I.e. we display the code as plain text, and that's what it is.
Ligatures and custom operators offer some help but are ultimately tacked-on solutions that fail to address the core problem.
So what if we changed our files from being plain text to something richer and more structured? By removing the coupling of formatting and semantics this way, we can also use wildly different formatting (e.g. a graph editor) on different systems, but modify the same underlying semantics.
What do you think?
Top comments (51)
You're missing the point: text files are easy. They can be a little verbose sometimes, but as a 'substance of expression', if you will, they're unmatched. You can create or modify text with the single most common human-interface device metaphor in existence, which children now learn to manipulate in or even before schooling. You can read text with a million different programs and style it, filter it, format it, cut and paste and perform a billion different operations on it. A certain clunkiness in writing out equations is a small price to pay for that kind of flexibility.
If you were to "decouple" formatting and semantics like you propose, you wouldn't be able to avoid coupling the semantics to the editor instead. And there'd only be one editor, until you developed something else that could interact with your structured representation of a syntax tree. This isn't to say it can't or shouldn't be done -- APL had its day, educational tools like Scratch do well, and there are of course a variety of "no programming experience required" flowcharting and modeling languages -- but it's an idea that can only compete against text files in some pretty specific niches.
Not necessarily, css/html is also a decoupling of formatting and semantics. The main point here being that the semantics on their own are no longer supposed to be read by humans.
HTML and CSS also aren't programming languages as such but markup and styling languages (although HTML5+CSS3 are evidently Turing complete if you're willing to put in the effort). But this is actually an interesting point: WYSIWYG markup editors which truly decoupled formatting or visual layout from the semantics of data binding and interaction were a thing in the 00s before everybody realized they were terrible and concentrated on building better plain-text templating languages instead.
True, but I wasn't suggesting using exactly those. In most languages, a program has a certain structure (rather like an AST, but not entirely). That is the "meaning" of a program. Where HTML has an <img> tag, an imperative program has a while loop.
Indentation, variable names, operators, import statements... those are all styling for humans. You could drop all of them and the language would be just as expressive, but not as readable.
This separation already exists. The source code defines the semantics, and the configuration of the editor describes the styling.
I think looking at Smalltalk will answer a lot of your questions. Smalltalk is exactly what you're describing: a language built for an IDE. It can't be used outside the IDE, as it doesn't store its data in plain text, but rather an image format. The IDE provides all sort of nice features and analysis to the user, and ideas from Smalltalk have influenced many other languages and IDEs since its creation in the 1970s. So why aren't we all using Smalltalk? I think the key lies in interoperability. Smalltalk is a world of its own. It doesn't interoperate well with tools that exist outside of the Smalltalk world. For example, you can't really benefit from git when you can't understand how to merge code in a binary image format. When I need to accomplish a specific task (say, some sort of build task), I need to know how to accomplish everything I need in the Smalltalk world, using its tools. Smalltalk goes against the unix philosophy: it doesn't do one thing, it does everything because its a mini virtual machine.
I don't have the authority to say if this is the entire reason we don't code in Smalltalk-like languages, but I think its part of it. There are plenty of new languages trying to push programming languages in new, interactive directions (ex: Eve), but none of them have gained critical mass or mind share. There must be some intrinsic reasons that languages like this don't take off. Hope this provided some insight!
I'm not an expert in the question, but as far as I know the main reason of SmallTalk failure was licensing model - it was very expensive. I consider Ruby the closest reincarnation of SmallTalk OOP model (which is widely used). Ruby doesn't have "forced" IDE though.
I've honestly never used Smalltalk, guess I will have to take a look at it.
I haven't used it much either, but what I've seen has been interesting. Pharo is a pretty modern implementation pharo.org/.
I devoted my entire PhD to the pursuit of a programming environment that goes beyond just text. At the core, I decouple what developers read and write from what's stored on disk. This enables significant enhancements to both the development UI and the program code compared to text-based systems.
I developed a prototype IDE called Envision to explore these ideas. Here is a youtube playlist with 5 short videos highlighting features you might find interesting.
In case you want to dig into the research here is the project page at ETH Zurich that has freely available PDFs of all our publications. I recommend just looking at the final PhD dissertation, as it contains extended versions of all the papers:
Envision: Reinventing the Integrated Development Environment. All the publications (and especially the dissertation) contain lots of screenshots that illustrate the main points.
As of now, Envision is somewhat on hold, as I've finished my PhD, but I hope to get back to developing it more actively soon. You can find Envision's code on GitHub.
Although I think we definitely should look for alternative ways to program, the concept of Envision is IMHO a dead-end. I've seen approaches like this before, one has even made it into a product I had to use at work. Everybody hated it. Here's the problem:
As long as code is text in your thoughts, there's no better visualisation for it than (syntax highlighted) text. Graphical representation of text is utter shit. It makes it hard to write, hard to version/diff, and hard to view anywhere else than in the IDE it has been developed in.
If you want to abstract code, don't try to display code. Slice code into packages/modules, display them as icons and orchestrate them! That's the way to go. I'm pretty sure about it. However you then have added the complexity of another abstraction layer.
Thanks for your comments, Thorsten. I am very curious what product you used that everybody hated, would you mind sharing?
I have myself used a few visual systems at work, such as Labview and Siemens Plant Simulator. It's true that these systems are not easy to use outside of their specific domain. Unlike these systems, Envision has been designed from the ground up to be generally applicable.
Regarding some of your other points:
Software developers do not "think in text". Developers think in abstractions (such as classes, functions, modules), control flow (branches, loops), data flow (steps of algorithms and data transformation), etc. Thinking in text would imply that the syntax of a language (as opposed to its semantics) may somehow influence the design of a system or a function, which is not the case.
Once we have decided on a design we have to create a corresponding program. This is mostly done as text, but doesn't have to be. As long as whatever editor we're using nicely maps to our mental model, things can work out smoothly.
Syntax highlighted text is in fact a basic form of a visually rich presentation of code. One way to think of the visual aspect of Envision is syntax highlighting on steroids.
Graphical representation of "text" might be utter shit, but we're talking about graphical representation of programs. For example:
Graphical representations absolutely include text where it's the best way to communicate something. E.g., in most cases showing expressions as text is a great option.
We have specifically designed Envision to support keyboard-based editing and shown that it is is as fast as typing in a text-editor.
Again, we have specifically designed a version control system that integrates with Git and provides a number of improvements over standard text-based diffs, both in terms of presentation and diff accuracy. You may want to watch the corresponding video and/or see the paper.
As long as the storage format is open and simple (both of which are the case for Envision), any number of editors can be made for it and show it in any number of ways. Take, for example, png image files. You can open/view/edit them in a number of different programs, each with its own strengths and weaknesses.
I agree. This is part of what Envision does.
This is true, but I see it as a strength, not a weakness. This extra layer allows us to decouple the backend (program structure/code) from the frontend (editor/visualizations/text) and enables both to evolve in ways that are impossible if they are coupled.
Thank you for your detailed reply. The "tool from hell" is SwissRisk's X-Gen, a transformation tool being used at some banks. One might think that it's highly specialised on orders and trades, but it's rather generic and can handle any data as long as you're using XML. But here's the point: the design philosophy of the IDE seems to be based on the assumption, that typing (like on a keyboard) is bad. Unfortunately that's the only "innovative" idea, thus the graphical building blocks that you can drag'n'drop in the IDE are in fact just representations of elements of structured programming. What does that mean? Well in order to program something like this...
...which you can type in a matter of seconds, you will have to complete the following steps in X-Gen:
You see, this tool really represents text as graphics. It does not even try to step up onto the next abstraction layer, it just makes it really hard to write code by disabling typing for all the keywords.
Why did I write "code is text in your thoughts"? After a long day of coding it happens that I dream of code and then I really see text. Syntax highlighted code. But that's probably just because I stared on it for countless hours. It's not what I think when I am working on code. So yes, I was wrong. Developers think in abstractions, I totally agree with you on that.
I guess this all leads to the question: What's a good abstraction layer for graphical representation? I'm pretty sure the answer is "it depends". When documenting/presenting I like to use Visio diagrams (and ASCII diagrams) for giving an overview of the system, I'm working on. However these diagrams have very different grades of detail, depending on the importance of the components for the audience. So a shape can represent a bunch of hosts (not important) as well as a single function or REST call (important). An IDE on the other side should present a consistent level of abstractions with similar grade of detail for all (technically) equivalent components.
I'll definitely check that dissertation out!
Donald Knuth described Literate Programming in 1979.
One of my coworkers at a previous company was Raymond Chen. As a grad student under Donald Knuth, he got to program using Donald Knuth's Literate Programming.
Raymond recommends against that style of programming.
Why does Raymond recommend against that style of programming?
Tooling is very poor. Debugging is very difficult. Documentation-and-code still become out-of-sync just as comments-and-code become out-of-sync, despite proximity (in both scenarios).
So there are people working on the leading edge. Maybe one of those concepts will become mainstream.
Finally, someone else who gets it! "Code as text" as a paradigm feels painfully outdated. It seems so obvious that we can do better. The comments here are a pretty good guide to what pitfalls we'd need to avoid:
don't be Scratch, interop with GitHub, find a way to leverage whatever the hell the vim power-user community is. Don't just be literate programming. It feels doable, though.
Have you ever tried the Lightbox IDE? It lets you put print statements in your code and see what they evaluate to on an example input inline, for multiple test cases, as you edit. It's a big step towards the feel of programming in a spreadsheet while using a real language.
I haven't, sounds interesting!
And here I got the name wrong - it's Light Table, not Lightbox
This sounds a lot like the structure editor Facebook was working on, but extending the idea further so that limits on what the contents of a node (eg. a variable name) are are removed - very intriguing!
That's what it is to us - obviously to the interpreter/compiler/running process it's something else.
What is that something else that we could use to represent a program that is at the same time a plain text file? Some form of data format that could also be read as the AST of the program...
Say, you've got MSc. in AI - you must've heard of a once-popular AI language called Lisp at some point? You know, the one where the code is the data and the data is the code? Where you can see the AST right in front of you because of the ridiculously simple syntax?
Image based coding is so last century sadly, despite the most popular IDE in the world being Microsoft Excel. Plain text is... well, plain. No real worries about reading and writing - or forwards compatibility. Even Smalltalk can be represented as a text file.
Try something from Wolfram
I'm not really sure what you're trying to say here...
Fair enough... reading it back I'm not sure either!
Have you tried LabView?
Arent some parts of MATLAB supposed to help with this sort of thing (I'm not experienced with MATLAB, it's an assumption based on what I've heard about it).
But.... LabView is awful, if you need to refactor it's very difficult. If you need to debug, forget it. Plain text code is easy and perfect for standard software development, for scientific development (i.e. mathematics, grahpics) which involves complex equatics I would expect there are libraries which allow you to express math formula as plain code?
Why would you want to use spaces in variable names?
Why is snake Vs camel case a problem?
If you have a variable which holds the value for P(a|b), then use a creative name, which is what that value represents (I dont know what that expression is) so assume its something like ambient_pressure (I don't care how its value is calculated, the name is descriptive of what it is.....
A huge problem in code which I deal with on a daily basis is reading stuff like this(python syntax):
I mean, what the hell is that? No comments, nothing, and the guy who wrote it left the company!!! I have to now go search where it's used and try to interpret its use to understand this functions purpose.... So it turns out it means:
Convert compensation voltage to derived dispersion field
So the name is totally rubbish. Naming stuff is one of the hardest things in writing software because it describes what you are doing. When you look at some complex equation you will "read" it, so text should also be able to be used to describe it.
I've used matlab a little bit. I might have missed something but I think it made the problem worse by just turning everything into non-standard operators to stay within the ascii characters and monospace/text format.
LabView I know nothing about.
Because we create variable names composed of multiple words.
fooBaris less readable than
foo_baris less readable than
foo-baris less readable than
foo bar. Spaces are also easiest to write. The reasons not to use spaces is that it conflicts with syntax. Also some gestalt principles (characters of a variable are close together), but there's other options for that.
P(a|b) is a mathematical notation for "probability of a being true given that b is true". That's a lot of words to write out. This was a real-world problem I've had, especially because I also needed P(a|¬b) and many similar variables. The resulting code was unreadable using full-length variable names.
The meaning of P(a|b) is well understood by people who have a minimal background in Bayesian statistics. So, essentially, it is the right name.
More generally speaking though, because variable names are styling for humans, you could have multiple names for the same variable and use whatever suits you most in a certain situation, e.g. short or long. Although both at the same time sounds like a very bad idea :-P.
In terms of readability using camel vs snake I have to disagree as I've never had an issue reading either syntax, but everyone is different, so for you it's a fair point.
The point you make about naming variables is very true; it's very difficult to map mathematical names to human readable without being obtrusively long. So again, I guess if you do a lot of it being able to use reserved chars in a variable name could be useful...
Thanks for explaining what P(a|b) is, I've never come across that before :)
Text-format math is harder to read, but it is much easier to edit and write.
Navigating a one-dimensional line of text can be done with two buttons; add two more and you can add line-oriented editing, but that's optional. Editing a multi-dimensional equation, like your version of the distance formula, means you have to come up with an interface for selecting just the radical, or just the equation that you're taking the root of; you can't do that with normal arrows and drag-select.
It's the same set of problems that any kind of WYSIWYG has, now that I think of it. Just because source code is read more often than it's written doesn't mean you can completely neglect the editing experience.
A very good point!
I do think this is solvable. E.g. if you write latex markup, there is a line-based counterpart to the formulas in the compiled pdf. If the relation between markup and formula is isomorphic, navigating with arrow keys in the formula is possible, because it is in the markup.
This exists, in the form of LabVIEW, Simulink, and other such thing. They are even widely used in industry because they're excellent for expressing complicated mathematics.
However, as others have pointed out, they all suffer in some way or another from portability issues. Until we have an equivalent of ASCII and UNICODE standards for these model based languages, they simply aren't very likely to catch the kind of traction text-based languages have. The tooling will never get to that point without an open and popular standard.
Additionally, for that kind of programming to kick off, something must be able to take up the role that C and C++ currently fill as the backbone of close-to-hardware software. I don't think it's impossible, but I also don't think there's enough incentive to put in that effort right now. Modern tooling has made C/C++ programming highly productive, and getting a non-text language up to feature-parity and portability-parity would be unjustifiably expensive right now.
Most IDEs are are bloated enough already. Moving away from raw text to something with a massive amount of overhead is only going to create other problems. A job I had years ago involved programming in visual system represented by a tree, where you dragged and drop objects, and editted their properties. It was truly the most awful thing ever. Imagine reading a text book in the format of pop-up book.
The main argument against designing a language for an IDE is that it ties it to the IDE. If you spot a simple typo on GitHub you can't change it (if you're even able to view it on GitHub). If you're on a new device you have a text editor, but you don't have an IDE. If you're on a phone you can't read it. You can't share code snippets with others unless they're on the same IDE.
While it's an interesting idea (although not an entirely new one), there are too many practical arguments stacked against it.
I think the more useful approach would be to let the IDE offer features to display the code better (such as a formula view for your above example, which could format it in a better legible way) which integrates into the editing process. One could argue that that's a large part of what IDEs are about.
IDE features would definitely solve some of these problems, but I feel this would be more a case of treating a symptom than fixing the underlying problem.
You're absolutely right about practicality, but sometimes you have to take a step back to make a step forward.
Intentional programming, and to some extent, model driven architectures went there. The idea with intentional programming was that you had intents that were composed of other intents all the way down to machine code. The idea being that instead of defining a language with syntax, you had abstract syntax trees that you could translate into less abstract syntax trees using transformations. Editing your intent could be done in a UI, using a DSL, or anything. At the high end you'd have DSLs, complex UIs, etc. transforming stuff to running code. MDA was more or less the same idea but focused around UML and the UML meta language. The former never really got out of the prototype stage. Charles Simonyi (one of early the MS millionaires and inventor of the infamous hungarian syntax) apparently is still running intentsoft.com/ for like the last 20 years or so but they are not currently promoting any products. MDA was briefly popular and I recall some seriously misguided projects that were using it (shudder).
Also, Eclipse is a descendant of VisualAge, which was a Smalltalk, and later, Java IDE that actually stored code in a DB instead of files to facilitate working with the code in a structured way. Smalltalk of course always worked that way.
Eclipse later went back to storing files but they did do something cool which was to incrementally maintain an a abstract syntax tree of the code base. This was the compromise that allowed them to stop using a database and this why it has its own incremental java compiler: a normal compiler would be way too slow. The eclipse compiler tends to only lag behind what you type by a few hundred ms. This is also what enables them to complex refactorings, quick fixes, and other sophisticated AST transformations.
Intellij of course does very similar things except they never really figured out incrementally updating the AST and instead implemented a lot of the same refactorings a bit differently. As a consequence, it is a lot slower doing things that are essentially happening in real time on Eclipse.
E.g. launching a unit test after a 1 character change on Intellij can be painfully slow whereas it is instant in Eclipse. I regularly end up waiting seconds or even tens of seconds for Intellij to catch up. It also loses the plot entirely quite often meaning that it's view of the world gets out of sync with what is actually in the files. At least it lies to me frequently about things being broken, or worse, not broken. Also you need to frequently do manual refreshes, rebuilds, and I occasionally just rage quit it so that it can figure out reality on startup. Eclipse always felt a lot faster and robust in this respect.
All languages should be homoiconic! In which case, you can easily (meta)program some representation.
And I know it's not exactly what you're describing, but check out ballerina. You can auto-gen sequence diagrams from it (it's "cloud-native" so the assumption is that most of your code can be represented as such)
Now, I think a compromise would be a kind of graphical interface that generates "normal" code, say C#. That way, especially for beginners, it is way easier to use this new IDE. However, if you want to, you can still use a traditional IDE like Visual Studio, if you feel more comfortable with that. In order to implement another visual IDE, you also don't require a new standardized save format: you just use the plain code.
For illustrative purposes say you wish to have both text and graph based 'views'. The two views need to be isomorphic. Generally, programming languages aren't made with that requirement in mind.
That said, UE4 has something similar that allows interop between graph and text, although I forgot the name.