Recently I needed to analyze some JSON data. Now, I'm no data scientist, so I wanted to convert my JSON data to comma separated (CSV) and "flatten" the data... just so I could use the data in Excel.
And Merry Christmas and Happy New Year!
JSON data is usually hierarchical (and described in detail at json.org). A single record may look like this:
{
"first_name": "James",
"last_name": "Bond",
"attributes": {
"charisma": 9,
"intelligence": 4
}
}
CSV, on the other hand, is "flat" i.e. one row represents one record (with an optional header row).
first_name,last_name,charisma,intelligence
James,Bond,9,4
jq
Enter jq, the JSON command-line processor. You can either get the complied version (of course be careful what you download), or just use the on-line playground at jqplay.org.
Enter this filter into jqplay: .first_name, .last_name, .attributes.charisma, .attributes.intelligence
and you'll get the desired data extracted (one result per row):
A little tweak to the filter will return the output as CSV: [.first_name, .last_name, .attributes.charisma, .attributes.intelligence] | @csv
The square brackets will convert the output to an array and then pass that to @csv
. Note I checked Raw Output, which omits the opening and closing quote (") symbols:
At the bottom of the playground, you'll find a handy Command Line if you're going to use jq
from shell. I think -r
is equivalent to --raw-output
.
Operators
jq
also supports operators, e.g. .first_name + " " + .last_name, .attributes.charisma + .attributes.intelligence
will return James Bond
and 13
.
Arrays
Assuming the input JSON in an array, e.g.
{
"agents": [
{
"first_name": "James",
"last_name": "Bond",
"attributes": {
"charisma": 9,
"intelligence": 4
}
}, {
"first_name": "Jason",
"last_name": "Bourne",
"attributes": {
"charisma": 5,
"intelligence": 8
}
}
]
}
And, if you need to read each array item individually - you can't just use .agents[].first_name, .agents[].last_name, .agents[].attributes.charisma, .agents[].attributes.intelligence
- do this and you'll get the data out of order, since jq
processing all first_name
s, then processing all last_name
s, and so on.
Instead you'll get the array first, then pipe it to the next portion: .agents[] | [.first_name, .last_name, .attributes.charisma, .attributes.intelligence]
Since I wanted the output to be an array for the @csv
function later, the second part is enclosed in square brackets.
So, for CSV: .agents[] | [.first_name, .last_name, .attributes.charisma, .attributes.intelligence] | @csv
Functions
jq
has various functions you can use to process your data (apart from @csv, of course), e.g. [.agents[].first_name | ascii_downcase] | sort | reverse
convert to lowercase, sorts the array, then reverses the array.
curl
As a side note, I needed to get the JSON data from a REST service. macOS comes with curl, which is the de facto tool for transferring data from a server over file transfer (S)FTP, web HTTP(S), mail IMAP / SMTP / POP3(S), and more!
curl -G -u "username":"password" \
-H "header:value" \
-d "query=param" \
--data-urlencode "query=encoded param" \
-o "output.json" \
"http://service_url/goes/here"
Pretty self-explanatory:
-G
for HTTP GET-u
for basic authentication username and password-H
to set a HTTP header, repeating as as needed-d
for HTTP GET data passed as a query parameter i.e. it appends?query=param
(repeat as needed)--data-url-encode
is similar to -d except it url encodes data i.e. it appends&query=encoded%20param
with all those%
codes.-o
save the returned data (JSON or otherwise) to a file- and last but not least, the http url to get data from!
Conclusion
Well, that's how to retrieved JSON data from a remote server, did some basic processing, and then converted the data I needed to CSV.
This post kinda complements my previous post - a Bash text manipulation Cheat Sheet.