<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Christiaan Pretorius</title>
    <description>The latest articles on DEV Community by Christiaan Pretorius (@teejip).</description>
    <link>https://dev.to/teejip</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F269692%2F6f9219a8-9f17-4cc6-a19a-c5fdac218734.jpg</url>
      <title>DEV Community: Christiaan Pretorius</title>
      <link>https://dev.to/teejip</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/teejip"/>
    <language>en</language>
    <item>
      <title>Efficient JSON Data formats for Node.js</title>
      <dc:creator>Christiaan Pretorius</dc:creator>
      <pubDate>Thu, 14 Nov 2019 00:03:41 +0000</pubDate>
      <link>https://dev.to/teejip/efficient-json-data-formats-for-node-js-3n5n</link>
      <guid>https://dev.to/teejip/efficient-json-data-formats-for-node-js-3n5n</guid>
      <description>&lt;h1&gt;
  
  
  Efficient JSON Data formats
&lt;/h1&gt;

&lt;p&gt;JSON has become an ubiquitous format for transferring data between applications and web api’s. For all its popularity there are quite a lot of downsides in using it efficiently. More specifically, the way its being used in the popular way can be quite inefficient. This perceived inefficiency has led to many binary formats like MessagePack, BSON, PROTOBuff, Thrift and many more.   &lt;/p&gt;

&lt;p&gt;It would be nice if we can improve the efficiency of JSON while not having to include a new library that may introduce unwanted side effects like protocol version dependency, application version dependency, readability and other issues which are typically associated with binary formats. We would like to introduce measures to retain the generic and user friendly nature of JSON itself, while improving efficiency by a few factors. Also, lowering parsing and application processing latency are additional benefits which will become clear, in most part, due to the reduction in data size. Peculiarly these measures have a lot in common with database types and the general design of database systems. For the purpose of this article we will refer to the measures or JSON formats respectively as Object oriented, Row Oriented and Column Oriented. At the end we will produce some test data to illustrate the possible advantages which may be had.&lt;/p&gt;

&lt;h3&gt;
  
  
  Object Oriented
&lt;/h3&gt;

&lt;p&gt;This is the popular way for web api’s to produce data and can be seen generally like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[
    {
        name1:value_11,
        name2:value_12,
        name3:value_13,
        ...,
        namek:value_1k
    },
    {
        name1:value_21,
        name2:value_22,
        name3:value_23,
        ...,
        namek:value_2k
    },
    {...},
    {
        name1:value_n1,
        name2:value_n2,
        name3:value_n3,
        ...,
        namek:value_nk
    }
]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Some of examples of well known public web api’s which support this style:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developers.facebook.com/docs/marketing-api/using-the-api" rel="noopener noreferrer"&gt;https://developers.facebook.com/docs/marketing-api/using-the-api&lt;/a&gt; &lt;br&gt;
&lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/DBActivityStreams.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/DBActivityStreams.html&lt;/a&gt; &lt;br&gt;
&lt;a href="https://code.msdn.microsoft.com/REST-in-Bing-Maps-Windows-6d547d69/sourcecode?fileId=82515&amp;amp;pathId=1683851973" rel="noopener noreferrer"&gt;https://code.msdn.microsoft.com/REST-in-Bing-Maps-Windows-6d547d69/sourcecode?fileId=82515&amp;amp;pathId=1683851973&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A more exhaustive list of public API’s can be found here:&lt;br&gt;
&lt;a href="https://github.com/n0shake/Public-APIs" rel="noopener noreferrer"&gt;https://github.com/n0shake/Public-APIs&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Row Oriented
&lt;/h3&gt;

&lt;p&gt;A more efficient way to represent data in a row oriented pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    meta: {“name1”,“name2”, “name3”, ..., "namek"}
    data: 
    [
        [ value_11, value_12, value_13, ..., value_1k ],
        [ value_21, value_22, value_23, ..., value_2k ],
        [...],
        [ value_n1, value_n2, value_n3, ..., value_nk ]
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I havn't found any api’s based on this format but there are some partial examples especially among mapping api’s where large amounts of data is transferred.&lt;/p&gt;

&lt;p&gt;Partial Example&lt;br&gt;
&lt;a href="https://code.msdn.microsoft.com/REST-in-Bing-Maps-Windows-6d547d69/sourcecode?fileId=82515&amp;amp;pathId=1683851973" rel="noopener noreferrer"&gt;https://code.msdn.microsoft.com/REST-in-Bing-Maps-Windows-6d547d69/sourcecode?fileId=82515&amp;amp;pathId=1683851973&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Column Oriented
&lt;/h3&gt;

&lt;p&gt;A more efficient way to represent data in a column oriented pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    meta: {“name1”,“name2”, “name3”, .., "namek"}
    data: 
    [
        [ value_11, value_12, value_13, ..., value_1n ],
        [ value_21, value_22, value_23, ..., value_2n ],
        [ value_31, value_32, value_33, ..., value_3n ],
        [...],
        [ value_k1, value_k2, value_k3, ..., value_kn ]
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: that there will be far less individual arrays created if this type of orientation is parsed in comparison to the Row oriented format although the total data size may not be much less. This will become apparent in processing latency tests. &lt;/p&gt;

&lt;h3&gt;
  
  
  Tests
&lt;/h3&gt;

&lt;p&gt;For the tests we will be using node js and javascript as a sandbox. I chose javascript for its optimized native implementation of the json parser. It is also a very popular environment for both consuming and producing json API’s. The test is simulating the task of transferring a set of database rows. We will be recording the row count, json creation time, json parse time and transmitted data size.&lt;br&gt;
The source code can be found in Appendix A at the end of this document.&lt;br&gt;
And finally the results.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Object&lt;/th&gt;
&lt;th&gt;Row&lt;/th&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;Best v Worst&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Row Count&lt;/td&gt;
&lt;td&gt;10000&lt;/td&gt;
&lt;td&gt;10000&lt;/td&gt;
&lt;td&gt;10000&lt;/td&gt;
&lt;td&gt;na&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Size (KiB)&lt;/td&gt;
&lt;td&gt;1190&lt;/td&gt;
&lt;td&gt;565&lt;/td&gt;
&lt;td&gt;487&lt;/td&gt;
&lt;td&gt;2.44353&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parsing Time (ms)&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2.66667&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Creation Time (ms)&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Row Count&lt;/td&gt;
&lt;td&gt;100000&lt;/td&gt;
&lt;td&gt;100000&lt;/td&gt;
&lt;td&gt;100000&lt;/td&gt;
&lt;td&gt;na&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Size (KiB)&lt;/td&gt;
&lt;td&gt;11316&lt;/td&gt;
&lt;td&gt;5750&lt;/td&gt;
&lt;td&gt;4969&lt;/td&gt;
&lt;td&gt;2.27732&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parsing Time (ms)&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;td&gt;55&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;3.11111&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Creation Time (ms)&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;3.13333&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Row Count&lt;/td&gt;
&lt;td&gt;1000000&lt;/td&gt;
&lt;td&gt;1000000&lt;/td&gt;
&lt;td&gt;1000000&lt;/td&gt;
&lt;td&gt;na&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Size (KiB)&lt;/td&gt;
&lt;td&gt;120613&lt;/td&gt;
&lt;td&gt;58485&lt;/td&gt;
&lt;td&gt;50672&lt;/td&gt;
&lt;td&gt;2.38027&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parsing Time (ms)&lt;/td&gt;
&lt;td&gt;1075&lt;/td&gt;
&lt;td&gt;616&lt;/td&gt;
&lt;td&gt;388&lt;/td&gt;
&lt;td&gt;2.77062&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Creation Time (ms)&lt;/td&gt;
&lt;td&gt;750&lt;/td&gt;
&lt;td&gt;342&lt;/td&gt;
&lt;td&gt;266&lt;/td&gt;
&lt;td&gt;2.81955&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It becomes clear that at about 100000 rows the overall efficiency is at an optimum. I can only guess that it is the influence of the CPU cache size vs final data size which causes this effect. It seems that the row count can be fine tuned for optimal efficiency on a case by case basis.&lt;/p&gt;
&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;This test is purely an indication of what possible improvements can be made to well known JSON formats. If your JSON format already incorporates these kind of optimizations it would serve better to follow the binary path. However, if your application is following the popular JSON Object oriented pattern for object serialization then there is much to be gained from changing your format pattern first without having to rewrite large parts of your infrastructure. If gzip is used to minimize size the more efficient format will produce round about the same transit size just much less latency or even better transit size with gzip applied to the efficient format itself.&lt;/p&gt;
&lt;h3&gt;
  
  
  Appendix A
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Test Code&lt;/strong&gt;&lt;br&gt;
The test can be executed using the v8 engine in node js.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;//
// Copyright 2019 
//
function createMetaData(){
    console.debug("createMetaData")

    return [
        'user',
        'sessionId',
        'command',
        'statement',
        'transactionId',
        'time'
    ]
}

function createData(count){
    console.debug("createData: %i",count)
    var data = []
    var meta = createMetaData()
    for(var d = 0; d &amp;lt; count; ++d){
        var object = {}
        object[meta[0]] = 'test'
        object[meta[1]] = 1
        object[meta[2]] = 'SELECT'
        object[meta[3]] = 'SELECT * from mydata'
        object[meta[4]] = d
        object[meta[5]] = new Date().getMilliseconds()
        data.push(object)
    }
    return {data:data}
}


function createRowData(count){
    console.debug("createRowData %i",count)
    var meta = createMetaData()
    var data = []
    for(var d = 0; d &amp;lt; count; ++d){
        for(var d = 0; d &amp;lt; count; ++d){
            var row = []
            row.push('test')
            row.push(1)
            row.push('SELECT')
            row.push('SELECT * from mydata')
            row.push(d)
            row.push(new Date().getMilliseconds())
            data.push(row)
        }
    }
    return {data:data, meta:meta}
}

function createColData(count){
    console.debug("createColData: %i",count)
    var meta = createMetaData()
    var cols = {}
    for(var r = 0; r &amp;lt; meta.length; ++r){
        cols[meta[r]] = []
    }

    for(var d = 0; d &amp;lt; count; ++d){
        cols[meta[0]].push('test')
        cols[meta[1]].push(1)
        cols[meta[2]].push('SELECT')
        cols[meta[3]].push('SELECT * from mydata')
        cols[meta[4]].push(d)
        cols[meta[5]].push(new Date().getMilliseconds())
    }

    var data = []
    for(var d = 0; d &amp;lt; meta.length; ++d){
        data.push(cols[meta[d]]);
    }
    return {data:data, meta:meta}
}

function bench(data){
    console.log("bench %i",data.data.length)
    var start = new Date()
    var serialized = JSON.stringify(data)
    var endSerialized = new Date()
    console.info("JSON Size: %f KiB Time to serialize %dms",serialized.length/1024.0,(endSerialized-start))
    start = new Date()
    var deserialized = JSON.parse(serialized)
    var endDeSerialized = new Date()
    console.info("Time to deserialize %dms Deserialized size %i ",(endDeSerialized-start),deserialized.data.length)
}

var counts = [10000, 100000, 1000000]
console.info(" ----------------- Object oriented ----------------------")
for (var c in counts){
    var data = createData(counts[c])
    bench(data)
}
console.info(" -----------------  Row oriented  -----------------------")
for (var c in counts){
    var rowData = createRowData(counts[c])
    bench(rowData)
}
console.info(" -----------------  Col oriented  -----------------------")
for (var c in counts){
    var colData = createColData(counts[c])
    bench(colData)
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>performance</category>
      <category>json</category>
      <category>node</category>
    </item>
  </channel>
</rss>
