A lighter week for this issue.
Where Tcl and Tk went wrong (2014) - an interesting read about an interesting language. Tcl was the first dynamic/scripting/etc language I used in a production setting. Itβs a very elegant one, similar to Lips, and itβs a shame it hasnβt caught on more. The author goes into some details about the economic and βpoliticalβ issues which led to the languageβs demise. Standing out I think is a failure of the language design team to choose either the batteries included approach of Python, Ruby etc. or the embedebble language approach of Lua, JavaScript etc.
The rise of test impact analysis (2017) - talks about advanced ways of running tests. Test Impact Analysis tries to figure out which tests would be affected by a certain set of changes, and only runs those. The idea is to speed up the testing phase, for cases where thereβs a large burden of tests.
The infrastructure behind Twitter (2017) - very high-level overview of the infrastructure systems used at Twitter. I found the graphs with the percentages of machines dedicated to certain tasks interesting. Otherwise thereβs a lot of pointers to the interesting tools Twitter has put out.
Jeff Deanβs lecture for YC AI (2017) - interesting mostly because Jeff Dean. Otherwise itβs an overview of what the Google Brain team has been up to these last few years. Itβs quite impressive, but there isnβt that much to take away other than more machines help.
Damn cool algorithms: log structured storage (2009) - an alternative to storing a B-Tree and write ahead log for storage engines. Itβs also a close relative of Log structured merge trees, of BigTable and HBase fame. The idea is similar to that of an in-memory persistent data structure. Instead of modifying data in-place, you create new versions of it, and update the indexing structure to reflect that. In the case of disks, you need to organize the whole process through a log structure, so you have to worry about copies and memory consumption a bit more.
Log structured merge trees (2015) - a presentation on how log structured merge trees work. These is an alternative data structure to B-Trees & their friends for usage in database systems. BigTable, HBase, MongoDB etc all use or can use them. Whereas B-Trees organize data as a tree, which requires a bunch of writes for every mutation, thereby providing a limited write throughput, LSM trees organize data as a series of files, chronologically arranged. Inside each file, the data is sorted by primary key. Writes first go to a in-memory sorted tree and a secondary WAL. But once this gets too big, it is flushed to disk as another of these files. Write throughput is large, as each mutation corresponds to a single write (in the WAL). A read first checks the in-memory tree, then each file in reverse order. The check for a file is quite fast, as thereβs just a binary search to do. But since many files could be checked, it is heavier than a read. Periodic compaction needs to happen in order to keep the file set small. Read throughput is not that great even with compaction, in principle, but large memory caches for reads largely alleviate the problem. The article is well worth a read, and contains extensions and pointers to relevant literature.
Top comments (0)