ComputerPionee.rs or: How six hours turned into six months
Andrew Buntine Aug 18 '16
Several months ago I made an off-the-cuff comment to my girlfriend along the lines of:
I want to be able to see a "cloud" of computer pioneers sized relative to their impact in a given sub-field of computer science
At the time it felt like a simple enough project. I'd just grab some pre-packaged data and whip up a little circle-packing algorithm. Or maybe I'd just utilise some existing framework for data visualisation like D3. It'll probably take a few hours... Right?
The first hurdle was the data. I could find many sources of data for achievements in the field, but nothing that was both consumable and sufficiently complete. Of the sources I found, all simply exclaimed that an achievement was exactly that: an achievement. None attempted to purvey the relative impact of each achievement - probably because of the inherent risk in doing so. And by risk I am referring to the fact that achievements can be difficult (or impossible) to truly compare to one another. Was Grace Hopper's FLOW-MATIC compiler more impactful than Robin Milner's ML? Would one have even happened without the other? Oh no, it's all getting too hard! In either case, I decided that I'd need to collect my own data in my own format and hope that the end result was both sufficiently fair and sufficiently accurate.
Three hours in and I was up to pioneer #4. Great, I though, this is going to take a while... Although, I must admit, part of me was silently kind of excited as I knew doing it manually would force me into actually learning a tonne of interesting things about the past achievements of the giants in my field. And as an amateur history buff, such a challenge is not entirely uninteresting to me!
But moving onto the actual circle-packing. And, in case you are unsure, when I say "circle packing" I mean (to quote wikipedia):
The arrangement of circles (of equal or varying sizes) on a given surface such that no overlapping occurs and so that all circles touch one another
Think something like this (spoiler alert!) or this. It turned out to be far more interesting (and challenging) than I had first imagined. I quickly decided against using an off-the-shelf data viz library as I wanted deep control over the eventual effect, didn't want to burden the user with unnecessary dependencies and, heck, I wanted to learn something interesting!
As it turns out, certain forms of circle packing are considered NP-hard. But considering we are dealing with a visualisation we are not trying to find the "best" pack possible, but simply a pack that is visually "good enough" to the Human observer. Lucky us! The eventual algorithm I came up with was one that performs successive refinements by applying an attractive force to each circle pulling them to the center point of the canvas, which is then slightly over-powered by a detractive force when two circles collide. The exact amount of refinement passes we perform is a function of the number of circles we are packing. And to add padding of
N pixels between each circle we simply pretend each circles radius is
N/2 larger than it is in reality. An upside to this approach is that it makes animating the whole thing trivial; it's just a matter of repositioning every circle on each iteration of the algorithm.
As for tools I decided to keep everything simple. The website is implemented in TypeScript and Python. In place of a potentially gratuitous frontend framework I have a simple dispatch pattern (who I am sure the Gang Of Four has a nice name for which I've forgotten). State is handled by a simple dictionary of three items:
operation . I did make use of the (very handy) Snap SVG library in order to wrap the physical generation of SVG elements at runtime. I'll leave my analysis of TypeScript as a viable language to another post.
During this project I realised just how much effort goes into the collection and curation of data. I must have spent at least 50 hours scouring the web for birth dates, suitable pictures and details of achievements. And then, of course, I was struggling with trying to "rank" achievements that I quite simply couldn't truly appreciate. For example, in 1971, Stephen Cook published a landmark paper that now-famously proposed the P vs. NP problem. This is still an unsolved problem in computer science. I know what this problem states as far as the practicing programmer needs to and I also know it is incredibly important. But without recent education in theoretical computer science it's somewhat difficult for me to feel confident in exclaiming just how important it is to the field. Hopefully over time others will help me correct some of my mistakes in this area.
Because of the amount of time it took me to collate the data, I felt it was important that it was presented back to the community in a parsable format. And so I've packaged up the current data set in CSV format and hosted it on the official repository.
Anyway... Without further ado, I present: The Pioneers of Computer Science