Recently I needed to analyze some JSON data. Now, I'm no data scientist, so I wanted to convert my JSON data to comma separated (CSV) and "flatten" the data... just so I could use the data in Excel.

And Merry Christmas and Happy New Year!

JSON data is usually hierarchical (and described in detail at json.org). A single record may look like this:

{
    "first_name": "James",
    "last_name": "Bond",
    "attributes": {
        "charisma": 9,
        "intelligence": 4
    }
}

CSV, on the other hand, is "flat" i.e. one row represents one record (with an optional header row).

first_name,last_name,charisma,intelligence
James,Bond,9,4

jq

Enter jq, the JSON command-line processor. You can either get the complied version (of course be careful what you download), or just use the on-line playground at jqplay.org.

Enter this filter into jqplay: .first_name, .last_name, .attributes.charisma, .attributes.intelligence and you'll get the desired data extracted (one result per row):

jq: Testing at jqplay.org

A little tweak to the filter will return the output as CSV: [.first_name, .last_name, .attributes.charisma, .attributes.intelligence] | @csv

The square brackets will convert the output to an array and then pass that to @csv. Note I checked Raw Output, which omits the opening and closing quote (") symbols:

jq: Converting JSON to CSV

At the bottom of the playground, you'll find a handy Command Line if you're going to use jq from shell. I think -r is equivalent to --raw-output.

Operators

jq also supports operators, e.g. .first_name + " " + .last_name, .attributes.charisma + .attributes.intelligence will return James Bond and 13.

Arrays

Assuming the input JSON in an array, e.g.

{   
    "agents": [ 
    {
            "first_name": "James",
            "last_name": "Bond",
            "attributes": {
                "charisma": 9,
                "intelligence": 4
            }
        }, {
            "first_name": "Jason",
            "last_name": "Bourne",
            "attributes": {
                "charisma": 5,
                "intelligence": 8
            }
        }
    ]
}

And, if you need to read each array item individually - you can't just use .agents[].first_name, .agents[].last_name, .agents[].attributes.charisma, .agents[].attributes.intelligence - do this and you'll get the data out of order, since jq processing all first_names, then processing all last_names, and so on.

Instead you'll get the array first, then pipe it to the next portion: .agents[] | [.first_name, .last_name, .attributes.charisma, .attributes.intelligence]

Since I wanted the output to be an array for the @csv function later, the second part is enclosed in square brackets.

jq: Handling arrays

So, for CSV: .agents[] | [.first_name, .last_name, .attributes.charisma, .attributes.intelligence] | @csv

Functions

jq has various functions you can use to process your data (apart from @csv, of course), e.g. [.agents[].first_name | ascii_downcase] | sort | reverse convert to lowercase, sorts the array, then reverses the array.

jq: Functions

curl

As a side note, I needed to get the JSON data from a REST service. macOS comes with curl, which is the de facto tool for transferring data from a server over file transfer (S)FTP, web HTTP(S), mail IMAP / SMTP / POP3(S), and more!

curl -G -u "username":"password" \
 -H "header:value" \
 -d "query=param" \
 --data-urlencode "query=encoded param" \
 -o "output.json" \
 "http://service_url/goes/here"

Pretty self-explanatory:

  • -G for HTTP GET
  • -u for basic authentication username and password
  • -H to set a HTTP header, repeating as as needed
  • -d for HTTP GET data passed as a query parameter i.e. it appends ?query=param (repeat as needed)
  • --data-url-encode is similar to -d except it url encodes data i.e. it appends &query=encoded%20param with all those % codes.
  • -o save the returned data (JSON or otherwise) to a file
  • and last but not least, the http url to get data from!

Conclusion

Well, that's how to retrieved JSON data from a remote server, did some basic processing, and then converted the data I needed to CSV.

This post kinda complements my previous post - a Bash text manipulation Cheat Sheet.