loading...

Using JQ to Convert JSON to CSV: Top Level Values

dannylee8 profile image Danny Lee ・12 min read


source: https://www.pexels.com/photo/adventure-background-backpack-backpacker-346707/

In my previous two posts I discussed my journey into using jq to convert JSON to CSV. Well, honestly, in my first post, I went over how I got into this predicament, and then, in my second post I wrote about JSON and jq basic commands.

In this post, I’ll cover how to get JSON values into CSV format. The properties(values) of the JSON file we will work with are “top-level” . These are simple strings and numbers sitting at the top level of the JSON “tree”. Next week, I’ll dive into extracting the array and object values, as well as nested data in our sample JSON files.

Aside: *jq* has a tremendous amount of depth as a tool to manipulate JSON. I highly recommend looking through the official docs.


The Sample Data

We’ll be working with this sample order (gist|raw format) from the Shopify API. I’d recommend saving this to a local file so you can play with it as well as pasting it into Chris Nielson’s JSON Visualization page to output a more readable, block divided format to follow along. This will be useful later when we are running commands and I omit the terminal output for brevity.

Jump right in!

First, let’s run the following command:

$ cat sampleOrder.json | jq '.order | [.id, .email, .phone, .billing_address, .shipping_address, .line_items]'

What are we doing here?

We are con’ cat ’enating the JSON file and piping it into jq . (Check this link for a guide on cat, or here if you want a deeper dive into piping and redirection.)

We’re accessing the order object and specifically pulling out these 6 keys:

  • id (type: number)

  • email (type: string)

  • phone (type: string)

  • billing address (type: object with 15 properties, aka key-value pairs)

  • shipping_address (type: object with 15 properties, aka key-value pairs)

  • line_items (type: array containing 3 embedded objects with 26 properties)

Did you catch it?

Did you notice that we used thejq array construction operator (manual reference)[] to encapsulate our returned values as an array? If not, take a moment to scroll up and take a look. 👀

Cool, huh? Well, jq has many more of these operators and functions, including some interesting ones like:

  • object construction

  • addition, subtraction: arithmetic for numbers, arrays and strings are joined, objects are merged

  • map, select: act like enumerables from other javascript or ruby

  • walk(f), split, join: walk will apply the function f on every element and component of the input. split and join are similar as javascript’s

  • keys, has(key), in, del(ete), any, all: the keys related commands work with object keys. any and all returns true if their boolean inputs are true.

  • tonumber, tostring, flatten, sort, reverse, startswith, endswith: these are methods to work with inputs (whether strings, numbers or arrays)

You can also use conditional expressions, as well as regular expressions (regexp). So,jq is very much its own programming language, but specifically for the manipulation and reformation of JSON data.

Let’s see that command again:

$ cat sampleOrder.json | jq '.order | [.id, .email, .phone, .billing_address, .shipping_address, .line_items]'

So after running the above command,jq has placed these keys into a new array, and returns it to us, which looks thusly:

[
  450789469,
  "[bob.norman@hostmail.com](mailto:bob.norman@hostmail.com)",
  "+557734881234",
  {
    "first_name": "Bob",
    "address1": "Chestnut Street 92",
    "phone": "555-625-1199",
    "city": "Louisville",
    "zip": "40202",
    "province": "Kentucky",
    "country": "United States",
    "last_name": "Norman",
    "address2": "",
    "company": null,
    "latitude": 45.41634,
    "longitude": -75.6868,
    "name": "Bob Norman",
    "country_code": "US",
    "province_code": "KY"
  },
  {
    "first_name": "Bob",
    "address1": "Chestnut Street 92",
    "phone": "555-625-1199",
    "city": "Louisville",
    "zip": "40202",
    "province": "Kentucky",
    "country": "United States",
    "last_name": "Norman",
    "address2": "",
    "company": null,
    "latitude": 45.41634,
    "longitude": -75.6868,
    "name": "Bob Norman",
    "country_code": "US",
    "province_code": "KY"
  },
  [
    {
      "id": 466157049,
      "variant_id": 39072856,
      "title": "IPod Nano - 8gb",
      "quantity": 1,
      "sku": "IPOD2008GREEN",
      "variant_title": "green",
      "vendor": null,
      "fulfillment_service": "manual",
      "product_id": 632910392,
      "requires_shipping": true,
      "taxable": true,
      "gift_card": false,
      "name": "IPod Nano - 8gb - green",
      "variant_inventory_management": "shopify",
      "properties": [
        {
          "name": "Custom Engraving Front",
          "value": "Happy Birthday"
        },
        {
          "name": "Custom Engraving Back",
          "value": "Merry Christmas"
        }
      ],
      "product_exists": true,
      "fulfillable_quantity": 1,
      "grams": 200,
      "price": "199.00",
      "total_discount": "0.00",
      "fulfillment_status": null,
      "price_set": {
        "shop_money": {
          "amount": "199.00",
          "currency_code": "USD"
        },
        "presentment_money": {
          "amount": "199.00",
          "currency_code": "USD"
        }
      },
      "total_discount_set": {
        "shop_money": {
          "amount": "0.00",
          "currency_code": "USD"
        },
        "presentment_money": {
          "amount": "0.00",
          "currency_code": "USD"
        }
      },
      "discount_allocations": [],
      "admin_graphql_api_id": "gid://shopify/LineItem/466157049",
      "tax_lines": [
        {
          "title": "State Tax",
          "price": "3.98",
          "rate": 0.06,
          "price_set": {
            "shop_money": {
              "amount": "3.98",
              "currency_code": "USD"
            },
            "presentment_money": {
              "amount": "3.98",
              "currency_code": "USD"
            }
          }
        }
      ]
    },
    {
      "id": 518995019,
      "variant_id": 49148385,
      "title": "IPod Nano - 8gb",
      "quantity": 1,
      "sku": "IPOD2008RED",
      "variant_title": "red",
      "vendor": null,
      "fulfillment_service": "manual",
      "product_id": 632910392,
      "requires_shipping": true,
      "taxable": true,
      "gift_card": false,
      "name": "IPod Nano - 8gb - red",
      "variant_inventory_management": "shopify",
      "properties": [],
      "product_exists": true,
      "fulfillable_quantity": 1,
      "grams": 200,
      "price": "199.00",
      "total_discount": "0.00",
      "fulfillment_status": null,
      "price_set": {
        "shop_money": {
          "amount": "199.00",
          "currency_code": "USD"
        },
        "presentment_money": {
          "amount": "199.00",
          "currency_code": "USD"
        }
      },
      "total_discount_set": {
        "shop_money": {
          "amount": "0.00",
          "currency_code": "USD"
        },
        "presentment_money": {
          "amount": "0.00",
          "currency_code": "USD"
        }
      },
      "discount_allocations": [],
      "admin_graphql_api_id": "gid://shopify/LineItem/518995019",
      "tax_lines": [
        {
          "title": "State Tax",
          "price": "3.98",
          "rate": 0.06,
          "price_set": {
            "shop_money": {
              "amount": "3.98",
              "currency_code": "USD"
            },
            "presentment_money": {
              "amount": "3.98",
              "currency_code": "USD"
            }
          }
        }
      ]
    },
    {
      "id": 703073504,
      "variant_id": 457924702,
      "title": "IPod Nano - 8gb",
      "quantity": 1,
      "sku": "IPOD2008BLACK",
      "variant_title": "black",
      "vendor": null,
      "fulfillment_service": "manual",
      "product_id": 632910392,
      "requires_shipping": true,
      "taxable": true,
      "gift_card": false,
      "name": "IPod Nano - 8gb - black",
      "variant_inventory_management": "shopify",
      "properties": [],
      "product_exists": true,
      "fulfillable_quantity": 1,
      "grams": 200,
      "price": "199.00",
      "total_discount": "0.00",
      "fulfillment_status": null,
      "price_set": {
        "shop_money": {
          "amount": "199.00",
          "currency_code": "USD"
        },
        "presentment_money": {
          "amount": "199.00",
          "currency_code": "USD"
        }
      },
      "total_discount_set": {
        "shop_money": {
          "amount": "0.00",
          "currency_code": "USD"
        },
        "presentment_money": {
          "amount": "0.00",
          "currency_code": "USD"
        }
      },
      "discount_allocations": [],
      "admin_graphql_api_id": "gid://shopify/LineItem/703073504",
      "tax_lines": [
        {
          "title": "State Tax",
          "price": "3.98",
          "rate": 0.06,
          "price_set": {
            "shop_money": {
              "amount": "3.98",
              "currency_code": "USD"
            },
            "presentment_money": {
              "amount": "3.98",
              "currency_code": "USD"
            }
          }
        }
      ]
    }
  ]
]

Note: If you’re feeling a bit lost 🤷🏻‍♂️🤷🏻‍♀️, I’d recommend going back to the previous posts (here and here) and reading through them. It was very confusing to me at first and understanding these basic concepts. The way in which jq accesses “order” key’s values, pipes them so I can access them without typing order.id, order.email, order.phone and then encloses them with ‘[‘ and ‘]’ is unusual. Understanding these basics are key 😜to using jq .


A Correction (or addition?) and Apology

https://giphy.com/gifs/abcnetwork-fresh-off-the-boat-xT0xeFzKfeZQ52LoFG

Previously I implied the only way to use jq was to pipe the result of cat into it. Then I went on and pointed out a resource for redirection operators for unix/linux. Well, in the midst of writing this blog I discovered that you DO NOT have to pipe data into jq . It is perfectly happy to accept the normal command syntax as follows:

$ jq '.order | [.id, .email, .phone, .billing_address, .shipping_address, .line_items]' sampleOrder.json

I’m sorry!

I feel a bit silly and naive, but that’s okay. I’m always finding myself corrected (or, as I think of it ‘updated’) either by others or through self-discovery and its all a part of the process we call growth! 🌱

Aside: That’s partly why I love gardening and plants so much, especially durable, vigorous plants. You can prune and shape, sometimes even cut them right above the root and they’ll grow back stronger and more robust than ever.


⚒ Useful tool alert: JQTERM, JQ Play

The jqterm website by Remy Sharp is great for playing around with jq queries. He also has an interesting list of recipes for jq with real world applications and much more complicated scenarios than I have shown you.

Another jq playground is jq play, interfaces are a little different, so take your pick. If you have a favorite or know other jq playgrounds, please let me know! 🤔

Also, this website, Web Geo Data Vore, has some very straightforward recipes for jq that make learning fun and easy. I highly recommend trying out these commands on our sampleOrder.json, or Web Geo Data Vore’s sample data, or even your own JSON data.


Let’s get our CSV!

Okay, lets get our CSV files from our sampleOrder.JSON file!

So, as you’ve seen getting top-level values is really simple. We can get a complete list of top-level values with the keys_unsorted command:

$ cat sampleOrder.json | jq '.order | keys_unsorted'

or with our newly learned command syntax:

$ jq '.order | keys_unsorted ' sampleOrder.json

This command returns all our top-level values, in the order they show up in the JSON file. If you need a list sorted, alphabetically, use ‘keys’:

[
  "id",
  "email",
  "closed_at",
  "created_at",
  "updated_at",
  "number",
  "note",
  "token",
  "gateway",
  "test",
  "total_price",
  "subtotal_price",
  "total_weight",
  "total_tax",
  "taxes_included",
  "currency",
  "financial_status",
  "confirmed",
  "total_discounts",
  "total_line_items_price",
  "cart_token",
  "buyer_accepts_marketing",
  "name",
  "referring_site",
  "landing_site",
  "cancelled_at",
  "cancel_reason",
  "total_price_usd",
  "checkout_token",
  "reference",
  "user_id",
  "location_id",
  "source_identifier",
  "source_url",
  "processed_at",
  "device_id",
  "phone",
  "customer_locale",
  "app_id",
  "browser_ip",
  "landing_site_ref",
  "order_number",
  "discount_applications",
  "discount_codes",
  "note_attributes",
  "payment_gateway_names",
  "processing_method",
  "checkout_id",
  "source_name",
  "fulfillment_status",
  "tax_lines",
  "tags",
  "contact_email",
  "order_status_url",
  "presentment_currency",
  "total_line_items_price_set",
  "total_discounts_set",
  "total_shipping_price_set",
  "subtotal_price_set",
  "total_price_set",
  "total_tax_set",
  "total_tip_received",
  "admin_graphql_api_id",
  "shipping_lines",
  "billing_address",
  "shipping_address",
  "client_details",
  "payment_details",
  "customer",
  "line_items",
  "fulfillments",
  "refunds"
]

CSV from Simple, Top-level Values

To get top-level, non-object and non-array values we can use:

$ cat sampleOrder.json | jq '.order | [.id, .email, .order_number, .total_price] | [@csv](http://twitter.com/csv) '

One thing you’ll notice is that we used the [] Array Construction operator again. This is because jq requires an array input for its @csv formatter. The output of this command is:

"450789469,\\"[bob.norman@hostmail.com](mailto:bob.norman@hostmail.com)\\",1001,\\"598.94\\""

This is for a single order. If we had multiple orders inside of our JSON object, then jq would repeat this output for each object. However, those of you who came here to create spreadsheet that requires a header row are probably wondering…

Where are my headers?

I was wondering the same thing, and the first solution I found seemed simple and sufficient. By adding an array of headers as type:string elements of an array, we get:

$ cat sampleOrder.json | jq '.order | ["id", "email","order_number", "total_price"], [.id, .email, .order_number, .total_price] | [@csv](http://twitter.com/csv) '

which returns:

"\"id\",\"email\",\"order_number\",\"total_price\""
"450789469,\"[bob.norman@hostmail.com](mailto:bob.norman@hostmail.com)\",1001,\"598.94\""

This is okay for a single order, but the command we wrote would not work properly for 10 orders, packaged as a single JSON object. Currently our JSON file has structure of KEY:VALUE pair of that consists of a KEY of “order” and a VALUE of the order’s data. However, if we received 10 orders it would be delivered with a structure of KEY of “order” and a VALUE of a type:array that would be made up of 10 elements, each of those elements being a single order object.

Here’s what a single order looks like vs multiple orders:

On the left side, there is a single order where the value is a Javascript object containing key-value pairs. On the right side, each of these objects is separated by commas and enclosed by [] making it an array of objects.

Working with Multiple Orders

The Shopify API docs do not actually have data for multiple orders, what they did was change the key to orders and enclose the single order in an array. So for this example, I copy-pasted the same order 4 times into this JSON file (gist|raw format) so you can try out this command. Notice our first argument to jq is now .orders, rather than .order.

$ cat sampleMultipleOrders | jq '**.orders** | ["id", "email","order_number", "total_price"],(to_entries| .[] | [ .value.id, .value.email, .value.order_number, .value.total_price]) | [@csv](http://twitter.com/csv) '

This command will return the CSV output, with our header row printed once. Please note, this is not (technically) repeating the same order over and over again. Its the same information because I copy-pasted the order 4 times into the JSON file.

"\\"id\\",\\"email\\",\\"order_number\\",\\"total_price\\""
"450789469,\\"[bob.norman@hostmail.com](mailto:bob.norman@hostmail.com)\\",1001,\\"598.94\\""
"450789469,\\"[bob.norman@hostmail.com](mailto:bob.norman@hostmail.com)\\",1001,\\"598.94\\""
"450789469,\\"[bob.norman@hostmail.com](mailto:bob.norman@hostmail.com)\\",1001,\\"598.94\\""
"450789469,\\"[bob.norman@hostmail.com](mailto:bob.norman@hostmail.com)\\",1001,\\"598.94\\""

Trust me!

https://gph.is/g/ZkRk0Aa

In case you don’t feel like you can trust me just yet or have deep-seeded trust issues (like me!), we can use jq to prove that these are indeed different orders. Let’s use this following command, that will look very similar to the one we used just previously:

$ cat sampleOrders.json | jq '.orders | ["id", "email","order_number", "total_price"],(to_entries| .[] | [ .value.id, .value.email, .value.order_number, .value.total_price, **.key**]) | @csv '

Do you see the difference? Yep, it’s bolded — we added a comma and ‘.key’ (which provides they key output by to_entries). Maybe you’ve been looking at these commands and are a little fuzzy on what’s going on, so let’s walk through it.

First, we are con cat enating the sampleOrders.json file, the output of which goes into (or ‘gets piped into’) jq . As we start our jq command we open up our top-level key named orders and pipe the list of values associated with it into the next command. Here something interesting happens, we aren’t working with the passed in data, we are adding our own type:array of type:string elements (“id”, “email”, etc.). This is our “header row”, and its just simple text. We separate this array with a comma ‘,’ to tell jq we’re not done yet. We still have that data set of values, and now they get funneled into the to_entries function. Notice the open parenthesis ‘(’ before to_entries, do you see where the closing parenthesis ‘)’ is? It’s after the .key that we added, jq uses these parens as a grouping operator.

After that, we pipe all that output which includes the return from those grouped functions and the type:array of type:strings to the @csv format string.

// same command, just copy-pasted so you don't have to scroll up to follow along.

$ cat sampleOrders.json | jq '.orders | ["id", "email","order_number", "total_price"],(to_entries | .[] | [ .value.id, .value.email, .value.order_number, .value.total_price, .key]) | @csv

Let’s just step back into the grouped (enclosed by parens) functions for a second.

The first command we use is the to_entries function which converts between each individual order object and spits out an array of key-value pairs. The command “.[]” accesses that array and pipes the output to the next command which is enclosed in square brackets ‘[]’ which are used by jq for array construction. Whatever functions are inside it execute on the data passed to it and then result is output inside an array (which is the data type that @csv requires as an input).

So, what does this all look like when it comes out?

"\"id\",\"email\",\"order_number\",\"total_price\""
"450789469,\"[bob.norman@hostmail.com](mailto:bob.norman@hostmail.com)\",1001,\"598.94\",0"
"450789469,\"[bob.norman@hostmail.com](mailto:bob.norman@hostmail.com)\",1001,\"598.94\",1"
"450789469,\"[bob.norman@hostmail.com](mailto:bob.norman@hostmail.com)\",1001,\"598.94\",2"
"450789469,\"[bob.norman@hostmail.com](mailto:bob.norman@hostmail.com)\",1001,\"598.94\",3"

There’s one extra CSV column here. It’s at the very end. 0, 1, 2, 3

These are the ids of each of the order objects that reside inside of the array that is the value for the top-level key of orders.

There you go! Proof that I was telling the truth!

“I am a man you can trust.”
anne tyler, “A Patchwork Planet, 1998


Aside: to_entries:

For those curious about to_entries, here’s an example from the jq manual:

Command: jq 'to_entries'Input:  {"a": 1, "b": 2}
Output: [{"key":"a", "value":1}, {"key":"b", "value":2}]

If you’re curious about this I highly recommend pasting some JSON data into jqterm or jq play, two great learning tools I mentioned earlier. Once you paste data you can run through some jq commands and see an instant reload of how small changes in the commands affect the output.

This was super helpful for me.


Until Next Time…

Next time, let’s dive into handling objects and arrays, as well as nested values (string, num, objects and arrays) and pulling out the values we need so that we can output them as CSV.

To tell you the truth 😇 I haven’t figured it out yet. Like I always say, most of what I share is what I’m learning at the moment with some filler stuff that I do already have under my belt. So, if you find my explanations and understanding is confused, naive, or downright wrong I welcome you to let me know! (I really want to know! 👂🏻💖) If you know a better way, or more importantly, an EASIER way, please share!

See you next week! Until then — stay safe, and be well! 🌊🏄🏻‍♂️🍻

Posted on by:

dannylee8 profile

Danny Lee

@dannylee8

a nyc based programming enthusiast, who also loves handywork, propagating african violets, gardening, woodwork, welding, painting, and travelling to bali, to visit my future home.

Discussion

markdown guide