oq
A performant, and portable jq wrapper thats facilitates the consumption and output of formats other than JSON; using jq filters to transform the data.
Background
I've been using jq for a while for transforming a master JSON document into partner dependent structures for their consumption. However, up until recently all of the partner structures have also been in JSON. Since jq
does not support outputting XML on its own, I began to look around to see if there were any libraries that would allow using jq
filters to transform the data, but output XML in addition to JSON. I ended up finding a Python library called yq that seemed to be perfect.
It supports outputting to XML and JSON while being able to use the same jq
filter for both. After playing around with it for a while it became clear that, while quite speedy for smaller files, it really struggled with some of the larger documents I needed to process. The fact that it's Python also complicated things as Python needs to be installed to use it, without going through some extra process to make it a singular binary. Thus, the idea for a more performant and portable option began to take shape.
Introduction
Using the relatively new Crystal language; I created oq with the primary goals being portability, performance, and to extend the formats that jq
supports.
Usage
oq
has three additional arguments that sets the input/output formats to use, in additional to the name of the root element if serializing to XML. All other arguments are passed on to jq
.
Examples
Consuming JSON and output XML
echo '{"name": "Jim"}' | oq -o xml .
<?xml version="1.0" encoding="UTF-8"?>
<root>
<name>Jim</name>
</root>
Consuming JSON and output YAML
echo '{"name": "Jim"}' | oq -o yaml .
---
name: Jim
Consume YAML from a file and output XML
data.yaml
---
name: Jim
numbers:
- 1
- 2
- 3
oq -i yaml -o xml . data.yaml
<?xml version="1.0" encoding="UTF-8"?>
<root>
<name>Jim</name>
<numbers>1</numbers>
<numbers>2</numbers>
<numbers>3</numbers>
</root>
Consume JSON, transform it, and output XML
data.json
{
"guests": [
{
"name": "Jim",
"age": 17,
"numbers": [
1,
2,
3
]
},
{
"name": "Bob",
"age": 51,
"numbers": [
4,
5,
6
]
},
{
"name": "Susan",
"age": 85,
"numbers": [
7,
8,
9
]
}
]
}
filter
.guests |
{
"person": [
.[] | {
"age": {
"@scale": .scale,
"#text": .age
},
"name": .name,
"favorite_numbers": {
"number": .numbers
}
}
]
}
oq -o xml --xml-root people -f filter data.json
<?xml version="1.0" encoding="UTF-8"?>
<people>
<person>
<age scale="months">289</age>
<name>Jim</name>
<favorite_numbers>
<number>1</number>
<number>2</number>
<number>3</number>
</favorite_numbers>
</person>
<person>
<age scale="years">51</age>
<name>Bob</name>
<favorite_numbers>
<number>4</number>
<number>5</number>
<number>6</number>
</favorite_numbers>
</person>
<person>
<age scale="days">31025</age>
<name>Susan</name>
<favorite_numbers>
<number>7</number>
<number>8</number>
<number>9</number>
</favorite_numbers>
</person>
</people>
The approach on handling the JSON to XML transcoding is based on this article.
Benchmarks
I also ran some benchmarks for jq
, yq
, and oq
to show how they compare in various situations.
Setup
OS: #1 SMP Debian 4.9.168-1+deb9u3 (2019-06-16)
CPU: Intel i7-7700k
Memory: 32GB @ 3,000 MHz
SSD: Samsung 850 PRO - 512GB
Benchmarks are done via the /usr/bin/time -v
command
Simple
First, I used the data.json
file to see how they perform simply parsing the file and output itself via the .
filter.
jq
jq . data.json | wc -l
Command being timed: "jq . data.json"
User time (seconds): 0.02
System time (seconds): 0.01
Percent of CPU this job got: 68%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.06
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 16236
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 3860
Voluntary context switches: 224
Involuntary context switches: 8
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
31
yq
yq . spec/assets/data1.json | wc -l
Command being timed: "yq . data.json"
User time (seconds): 0.08
System time (seconds): 0.01
Percent of CPU this job got: 77%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.11
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 16252
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1
Minor (reclaiming a frame) page faults: 7179
Voluntary context switches: 189
Involuntary context switches: 10
Swaps: 0
File system inputs: 1672
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
31
oq
oq . data.json | wc -l
Command being timed: "oq . data.json"
User time (seconds): 0.02
System time (seconds): 0.04
Percent of CPU this job got: 74%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.10
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 16140
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 4499
Voluntary context switches: 306
Involuntary context switches: 13
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
31
For this first test, all three are pretty much equal, with only a negligible difference in wallclock/memory used.
Jeopardy.json (#2)
The next benchmark uses the jeopardy.json
~56mb file as retrieved in jq
's benchmark wiki page.
First up, a simple length jeopardy.json
command.
jq
jq length jeopardy.json
216930
Command being timed: "jq length jeopardy.json"
User time (seconds): 0.64
System time (seconds): 0.10
Percent of CPU this job got: 97%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.76
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 230080
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 63213
Voluntary context switches: 240
Involuntary context switches: 13
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
yq
yq length jeopardy.json
216930
Command being timed: "yq length jeopardy.json"
User time (seconds): 152.45
System time (seconds): 1.27
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:33.04
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3853532
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1117041
Voluntary context switches: 13708
Involuntary context switches: 3189
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
oq
oq length jeopardy.json
216930
Command being timed: "oq length jeopardy.json"
User time (seconds): 0.67
System time (seconds): 0.17
Percent of CPU this job got: 105%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.80
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 230224
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 63839
Voluntary context switches: 13832
Involuntary context switches: 12
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
The big files do not bode well with yq
, with it taking ~190x longer than either oq
or jq
, while also using almost 17x more memory.
YAML => XML
The last benchmark I did was giving both yq
and oq
a large yaml file (~57mb), then having them convert it to XML. Since jq
can't consume YAML
, I excluded it.
The file used: invItems.yaml
from the EVE Online SDE Export.
Example Input:
- flagID: 0
itemID: 0
locationID: 0
ownerID: 0
quantity: -1
typeID: 0
- flagID: 0
itemID: 1
locationID: 0
ownerID: 0
quantity: -1
typeID: 0
...
yq
For yq, I had to give it a filter and some extra args for it to output correctly
yq -s -x --xml-root items --xml-dtd '{"item": .[] | .}' invItems.yaml > invItems.yq.xml
Command being timed: "yq -s -x --xml-root items --xml-dtd {"item": .[] | .} invItems.yaml"
User time (seconds): 309.21
System time (seconds): 2.76
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 5:11.90
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 7817608
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2262904
Voluntary context switches: 32918
Involuntary context switches: 2504
Swaps: 0
File system inputs: 0
File system outputs: 195072
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Example Output
<?xml version="1.0" encoding="utf-8"?>
<items>
<item>
<flagID>0</flagID>
<itemID>0</itemID>
<locationID>0</locationID>
<ownerID>0</ownerID>
<quantity>-1</quantity>
<typeID>0</typeID>
</item>
<item>
<flagID>0</flagID>
<itemID>1</itemID>
<locationID>0</locationID>
<ownerID>0</ownerID>
<quantity>-1</quantity>
<typeID>0</typeID>
</item>
...
</items>
oq
oq -i yaml -o xml --xml-root items . invItems.yaml > invItems.oq.xml
Command being timed: "oq -i yaml -o xml --xml-root items . invItems.yaml"
User time (seconds): 20.08
System time (seconds): 0.48
Percent of CPU this job got: 107%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:19.13
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1332328
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 522235
Voluntary context switches: 30478
Involuntary context switches: 974
Swaps: 0
File system inputs: 0
File system outputs: 195072
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Example Output
<?xml version="1.0" encoding="UTF-8"?>
<items>
<item>
<flagID>0</flagID>
<itemID>0</itemID>
<locationID>0</locationID>
<ownerID>0</ownerID>
<quantity>-1</quantity>
<typeID>0</typeID>
</item>
<item>
<flagID>0</flagID>
<itemID>1</itemID>
<locationID>0</locationID>
<ownerID>0</ownerID>
<quantity>-1</quantity>
<typeID>0</typeID>
</item>
...
</items>
Similarly to the jeopary.json
benchmark, yq
just has a hard time dealing with the larger inputs with this test case taking ~16x longer and using almost 6x the memory than oq
.
Road to 1.0.0
Since this project is still early in its development, I put together a roadmap of what I would like to get done before calling it 1.0.0
:
- Support XML input format
- Address bugs/issues that arise
- Small feature requests
- Possibly additional formats
Feel free to submit issues/PRs.
Top comments (2)
@blacksmoke16 Great - looking forward to a new release including the xml input format :)
Just finished releasing
1.0.0
, let me know if you have any trouble :)