Nivethan

Posted on Apr 23, 2022 • Edited on Apr 28, 2022

Benchmarking Reads in Universe

#node #javascript #performance

I have plans to use my node pick-universe library but one thing that weighs it down is that reading entire files is a pretty expensive operation. Reading a single record is slow but I can deal with it. However selecting a file of 100k items and reading in everything is slow.

I think one solution would be to do all the reads in C rather than in javascript but before I start trying to optimize it's probably a good idea to validate things.

So the first step is to see how fast BASIC is. This would be the fastest option most likely.

Testing BASIC

The test I'll be running will be selecting a file with about 95k records with 200 fields each. Only 150 of them are populated consistently though.

      OPEN '','INVENTORY-FILE' TO INVENTORY.FILE ELSE
         PRINT 'Unable to open file: INVENTORY-FILE - Press RETURN':
         INPUT ANYTHING
         STOP
      END
*
      BUFFER = ''
*
      SELECT INVENTORY.FILE
*
      LOOP
         READNEXT ITEM.ID ELSE ITEM.ID = ''
      UNTIL ITEM.ID = '' DO
         READ INVENTORY.ITEM FROM INVENTORY.FILE, ITEM.ID ELSE INVENTORY.ITEM = ''
         BUFFER<-1> = LOWER(INVENTORY.ITEM)
      REPEAT
*
      PRINT 'Items: ' : DCOUNT(BUFFER, @AM)

This is a pretty simple program. It simply opens the inventory file, selects it, and then reads in every record into a buffer.

To see how long it takes, I simply use time from the linux commandline a few times and I'll take a rough guess of it.

> time uv "RUN BP TEST.READS"

This gives a general result of:

bash-4.2$ time uv "RUN BP TEST.READS"
Items: 94872

real    0m0.522s
user    0m0.285s
sys     0m0.241s
bash-4.2$ time uv "RUN BP TEST.READS"
Items: 94872

real    0m0.510s
user    0m0.284s
sys     0m0.230s

Surprising note here is that changing the READ statement to a MATREAD makes the program run longer. I thought dimensioning an array would be faster but it actually makes it longer.

This is probably because dimensioning an array is really declaring 200 variables and reading a record involves allocating each field to one of the variables. Versus, using READ which I assume uses 1 big chunk of indexed memory for the fields.

MATREAD run in about 1.2 seconds whereas the READ runs in 0.52. Very interesting and I'm already glad to have run this performance test.

Addendum

Reading specific values into a buffer took longer than just adding the entire data to the buffer. Sort of makes sense but I'm curious what's going on. I didn't think the cost would be so much but reading just the first 2 values was stupidly expensive. One reason could be because universe uses string parsing to get the values. It could be that because I'm doing a read, I need to parse each thing versus MATREAD which would be far quicker for getting individual values but has the cost of setting up the variables.

This is a fun little point where READ is great to get the data quickly but hard to handle whereas MATREAD is slower to get the data but fast to handle.

Now the assumption I'm going with is that the best we can do is this BASIC program. The node version definitely takes longer.

Testing Node

The node version has some glaring issues. The first is that I cross from javascript to C for every read. This has to be expensive. The next issues is that each Read requires going over the RPC port. On localhost, it's probably fine but on a faraway server the network time would be killer.

const mv = require("pick-mv");
const Universe = require('pick-universe');

const uv = new Universe("localhost", "user", "password", "/path/to/account");

uv.StartSession();

const INV = uv.Open("INVENTORY-FILE");
uv.Select(INV);

let buffer = [];

while (true) {
    let id = uv.ReadNext();
    if (id === null) break;
    let record = uv.Read(id, INV);
    buffer.push(record);
}

uv.EndAllSessions();

console.log(`Items: ${buffer.length}`);

I do like that the BASIC and node versions are almost identical and the line counts are in the same range.

The performance test, this will be on localhost:

bash-4.2$ time node test.js
Items: 94873

real    0m7.528s
user    0m1.610s
sys     0m2.391s
bash-4.2$

It is definitely longer! 15x longer. This also goes up drastically over the network. I waited almost 15 minutes and still hadn't finished when I killed my test.
This basically means that using the node library probably makes 0 sense over the network and it would be better to simply call a subroutine on the server to do the work and return the data.

A change we can do is to use readlist to read in all the ids in one shot. This should speed things up as now we only need to go back to C for the record reads.

const mv = require("pick-mv");
const Universe = require('./index');

const uv = new Universe("localhost", "user", "password", "/path/to/account");

uv.StartSession();

const INV = uv.Open("INVENTORY-FILE");
uv.Select(INV);

let buffer = [];

let ids = mv.MVToArray(uv.ReadList());

for (let id of ids) {
    let record = uv.Read(id, INV);
    buffer.push(record);
}

uv.EndAllSessions();

console.log(`Items: ${buffer.length}`);

This takes:

bash-4.2$ time node  test.js
Items: 94873

real    0m4.818s
user    0m1.267s
sys     0m1.331s

This is a bit better than the 7.5 seconds we had from doing the readnexts in javascript but it's still quite slow.

Now that we have proof, I'm going to take a stab at writing a ReadAll function that will stay in C and read a list of records into an array and then return that array to node. This still does the network call so I don't think it'll solve the deeper issue of making sure the Universe server is running on localhost.

DEV Community

Benchmarking Reads in Universe

Testing BASIC

Testing Node

Top comments (0)