I wrote my first literate program in November of 2015. Since then, I've been
writing literate programs on an almost daily basis. It has been an
experience with a sort of enlightenment that I haven't had in a long
time. Not only is it a lot of fun to write literate programs, I feel
like I have gained a new "super power" of sorts.
Below are my thoughts on my experiences with literate programming so
far.
Literate programs I've written to date
Project name | Lines of code | Words in literate document | Ratio of words to lines |
---|---|---|---|
Token Cleaner | 46 | 496 | 10.782609 |
Okta Sign-In Widget | 120 | 3862 | 32.183333 |
Okta SCIM Beta | 360 | 7966 | 22.127778 |
Dial-a-cat | 370 | 3620 | 9.7837838 |
Okta OIDC Beta | 752 | 7731 | 10.280585 |
Explanation:
- Lines of code is the number of lines of "code" generated by
the literate program. "code" is in quotes because the code is
anything in an Org Babel "source" block, it could be Python,
Shell, HTML, etc. This number comes from running this command on
an Org Babel file:
awk '/BEGIN_SRC/,/END_SRC/' $filename
- Words in document per lines of code is the number of words in a
file, note that in includes the source code as well. This number
comes from running this command on an Org Babel file:
wc -w
- Ratio of words to lines is: Words in literate document divided by Lines of code
What I like about literate programming
The expressiveness of literate programming
I feel hobbled by in-line code comments now. One of the clearest
benefits of literate programming is being able to weave my code
into large sections of prose, instead of interleaving small
sections of prose into code.
Now I find myself spending time to add in detailed references an
explanations with my code. One example of this is including a
curl
command, links to RFCs, and a note about "1-indexed" versus
"0-indexed" differences in the "Resource Paging" section of my
Okta SCIM Beta document. Another example is being able to cover a
function line-by-line, as well as include unit tests in-line in
the "Validating an OIDC id_token from Okta" section of my
Okta OIDC Beta document.
Ability to generate data from tables
Take a look at the table listed in the "Dependencies" section of
the Okta SCIM Beta document. Then take a look at the
requirements.txt file in the same repository. The dependencies
table and the requirements.txt
file are both generated from a
single table in Org Babel!
This is all made possible by the ability to write in-line code to
generate the requirements.txt file and code to generate the table
markdown for GitHub Pages.
Being able to include a Copyright notice across multiple files
In the Okta SCIM Beta document, I am able define a copyright
notice in one place and have that copyright notice written to a
LICENSE.txt file and have the same notice included into the
scim-server.py file.
Having shell commands, and output, all in one place
When I was writing the Okta OIDC Beta document, I spent a lot of
time finding the arcane openssl
incantations needed to generate
x509 certificates that I needed for a unit test. Because I was
writing a literate document, I was able to include all of the
commands that I used, along with a detailed explanation of what
each command did.
Being able to execute these commands and see their output without
needing to switch to a terminal made it really easy to
interactively explore the commands that I needed to generate the
certificates, and to keep track of those commands for the next
time that I need to generate similar certificates or keys.
Tracking time and future work in the same document
Since Org Babel is part of Org Mode, it is trivial to make use
of Org's time tracking and project planning features.
I usually keep track of my time in my usual Org setup, but when
appropriate it's nice to be able to keep track of my time inside
the same literate document that I'm working on.
You can see an example of time tracking and project planning in
the dial-a-cat document. (Look for the lines with CLOCK
, DONE
,
or TODO
.
Easily keep code and prose consistent
Being able to keep code and prose synchronized is a problem that I
first encountered when I was writing blog posts for Twilio,
searching for this "holy grail" is what first got me interested in
literate programming. What I wanted was a tool that could generate
Markdown and code from the same file, which is exactly what Org
Babel can do.
I was sold on Org Babel after I converted my favorite
blog post on the Twilio blog to a literate document, and
found an inconsistency in the process.
Now, because I know that my code and prose will always be
consistent, I find myself making quick changes to code that I
wouldn't have made otherwise.
What is the best way to describe code?
Now that I have a tool which makes it easy to write an "essay"
about my code, I'm faced with new questions that I never had the
luxury of considering before.
I imagine that satisfactorily answering the question "how can I describe this code best?"
is a lifelong endeavor.
Here are the specific questions that I find myself facing at this moment:
Literate documents are harder to refactor
Once a section of code is embedded in prose, it becomes a lot
harder to change that code.
Because of this, I find that the best way to write a literate
program is to start with the code first and wait to turn the code
into a literate document after extensive refactoring.
The structure of a document wants to change as a project grows
Describing a short program can be as easy as describing the
program from "top to bottom", as I do in my
token cleaner project.
However, as a project grows, I'm finding that the structure I used
to describe a small project will change when it becomes a medium
sized project, and change again when it becomes a large project.
Should I describe every line of every program?
Another thing that I've been struggling with is figuring out the
right balance between describing a program line-by-line, or
skipping large sections of code entirely. I feel like the ideal
method will lean more towards the "line-by-line" approach, but I
think I need more hands on experience in this area.
Closing
Literate programming has been a very worthwhile discipline to learn,
I still have a lot of things to learn, and I look forward to
getting better at writing literate programs in the decades ahead.
I'd love to hear and feedback that you have on this document. Please
reach out to me on Twitter!
- Joël Franusic, May 2016
Top comments (4)
One of my co-workers (long before I knew him) was a grad student under Donald Knuth. He got to experience literate programming right from the tap, over a long time.
Overall, he found it unworkable. The tooling, toolchain, and debugging being some of the biggest hindrances.
The grad student's name is Raymond Chen. Joel Spolsky said this of Raymond, "The only person in the world who leapt to my defense was, of course, Raymond Chen, who is, by the way, the best programmer in the world, so that has to say something, right?"
Joel Spolsky calling Raymond Chen the best programmer in the world is quite the compliment. (And if you know Raymond, or know of him, it is an accurate description.)
I'll be the first to agree that literate programming has its faults. As the "literate" in the name implies, I think that the practice is best used when writing something to "publish" - I do most of my literate programming for blog posts or documentation. The tooling and toolchain has improved, I think, with Org-Babel - I tried to use web/cweb briefly, but couldn't get them working. Debugging hasn't been an issue so much. What I've found to be difficult however is refactoring. When writing a literate document, I've always started with the standard toolchain and setup. I only start writing the literate document after I have code that I'm happy with, breaking apart the code into an Org-Babel file so I can describe it.
I would try it in combination with BDD. They appear to have similar goals. If you generate user manual from it, then the tests can be examples if you design them carefully. It's just an idea though.
I would do something like this if I'd want to include the code too. I am not fond of the idea of breaking code blocks with text and documenting each line. It makes the code impossible to refactor. Even in this case the "function that returns true or false when ..." might be too much implementation detail.
TODO sample:
We want to describe tasks we intend to do with the description text and the expiration date.
We should be able to decide if the task is beyond the deadline. In order to do that we need a function,
which should return true when the task is beyond deadline,
and it should return false when the task is not beyond deadline.
END
I think literate programming has it's dangers too. I prefer clean code, where the variable and function names carry the meaning, not the comments or texts. If we add texts, then we will neglect variable naming and I am afraid we will end up writing something like math, with
x,y,z
-s and let the text alone explain everything. On the other hand if we start to think about the proper variable names we might find out that our initial model is wrong and we need something else. Yet another twist that the text can help name the variables and functions. E.g. instead ofisExpired
we could writeisBeyondDeadline
. I use to do that sort of thinking in github issues or txt files, but you can think it through in UML or without writing a line too. It just depends on how you prefer to think. I use to write down almost everything.I'm creating a tool for teaching real programming instead of something like scratch for kids only it should also be productive (not a substitute for IDE but visual complement) and scalable enough for training on the jobs and real code. I am actually experimenting self dog fooding with 25000 lines of code for a figma plugin (not ready yet). This is a tiny illustration below, I like the philosophy of literate programming and I'd like to be as close as possible but I'm not specialist of it, so what do you think of this illustration : what would be missing ? i.imgur.com/sU8aiV6.png