Hard things in computer science

#openedge #progress #coding #datadigger

The DataDigger is an open source database browser for developers in OpenEdge. It enables them to view, update, delete, im- and export the data inside the database. The DataDigger is written fully in OpenEdge 4GL and has over 40,000 lines of code. Inside are some real treasures, so I will dissect the DataDigger, to reveal them. Today: caching.

Years ago, at Netscape, Phil Karlton stated:

There are only two hard things in Computer Science:

Naming things, cache invalidation and off-by-1 errors

And he was right.

The first thing I already tackled in 2008, with the first version of DataDigger. It was a fork of Richard Tardivon’s tool DataHack. The second problem manifested itself not until 2013, when in DataDigger 18 I introduced caching to speed up some things. Because, you know, caching *can* speed up things. Big time.

I remember from that version that I was amazed at the improved startup time of DataDigger. The exact figures escape me at this point, but it was in the leage of 10 times as fast. And boy I was glad that I introduced it. But man, did I regret it later, because time after time old stuff kept creeping up from the cache. It took me a lot of time to get it right. Right now, there are basically two caching systems in DataDigger: one for the table definitions and one for the settings. I’ll walk you through these and explain how I managed to fill them and to invalidate them at the right moment.

File definitions

The file definitions take quite some time to gather. For a small database, this goes unnoticed, but if you have – like some of my users – a set of 8 databases with a couple of thousand (yes, thousand) tables, then analyzing them takes a considerable amount of time. On startup, DataDigger first checks the date and time of the last modification to the the database. This file – if it exists – should reside in the cache folder. Lets look at the contents of my cache folder:

The first two files are database cache files. We can recognize them by their name that starts with ‘db’, followed by the name and the date and time of last modification. The date of last modification can be found this way:

FIND FIRST \_DbStatus NO-LOCK. 
MESSAGE \_DbStatus.\_dbstatus-cachestamp VIEW-AS ALERT-BOX.

You can see two files for my sports database. That is because I changed the schema by adding a field. That changed the date of last modification to today. When I started DataDigger again, it looked at the date in _DbStatus and found a value of 10 may 2018, 19:45:42. It then looked for a file with the name ‘db.sports.Thu_May_10_194542_2018.xml’ but since it wasn’t there, it recreated the file and analyzed the CRC of all tables. This mechanism makes sure that the cache for database definitions is invalidated at the right time.

The database cache file contains a list of all tables in the database with their crc number. This value is important when we want to read the cache for the individual tables.

As you can see, the current crc for my customer table is 6269. If you look in the file list above, you will spot a file called ‘sports.Customer.6269.xml’. This is the file with the settings for this version of the customer table. There is another file for customer with the number 48132. Apparently, this was the CRC before my change.

Because the CRC changed and the CRC is part of the file name, this also makes sure the cache is invalidated. If DataDigger cannot find a cache file, it will read the definitions from the _file table and create a cache file, so at the next start it can read it in.

Since caching gives startup of DataDigger such a boost, it is handy to have them. But if DataDigger should wait until you select the table, you would still have to wait. In order to avoid this, at startup DataDigger does some pre-caching in the background. It uses the list of most recently used tables and checks if the cache file for each of those tables is present. In the settings you can find this in the tab for ‘behavior’:

You can untick the caching settings, but normally this would not be needed. Building the cache is done in the background, by using a scheduling mechanism, as I described in my previous post Turn timers into a scheduler. To make sure the DataDigger remains responsive, it checks only one table per 2 seconds.

That’s it. In a next post I will elaborate more on the caching of settings, since that is totally other beast. Let me know if you have used caching in your own project or if my technique can be improved. For now: have fun!

DEV Community

Hard things in computer science

Top comments (0)