I recently had the "opportunity" to perform some text manipulation to get data from a huge log file into a spreadsheet. I had done this sort of work a long time ago as a developer, but had forgotten over the years. So I decided to compile a text manipulation cheat-sheet for macOS.
Text Manipulation Utilities
First up, a few of the basic shell utilities:
|Print the contents of a plain text file called |
|Prints the first five lines from the file.|
|Prints the last |
which prints out the contents of a plain text file.
The shell in macOS is called "bash" and is access via the Terminal App. If you are confused by this statement, then please stop reading!
grep which prints out lines of text matching a given text.
grep can find lines that match a "pattern" called a regular expression (regex).
If you are not familiar with regex, there are many on-line tools and tutorials to get you started, e.g. Apple's Regular Expressions Unfettered guide, regular-expressions.info, regexr.com and regex101.com.
|Prints the lines after the line containing |
|Prints the lines before the line containing |
|Prints the lines not containing|
|Ignores case when finding |
|Prints lines starting with|
|Prints lines starting with|
|Prints lines matching |
|Prints lines matching |
|Extended regular expression |
-B, a line with two dashes
-- will be used to separate groups of matches.
cut which simply cuts out a portion of the line.
|Print the text cut from the 5th to the 10th character inclusive from |
|For tab-delimited content, print the first and third fields.|
|For space-delimited content, print the 2nd, 3rd, 4th and 5th fields.|
tr which "translates" characters read from standard input (i.e. not from a file like the commands above):
|Replaces every space character with a tab character|
|Replaces newlines (|
|Replaces multiple spaces with a single space character.|
|Replace multiple spaces with a single tab character.|
|Deletes all spaces.|
Another useful command is
pbcopy which copies the content of standard input to the clipboard, ready to paste your data into a word processors or spreadsheet for further analysis (
pbpaste prints out the content of the clipboard).
pbcopy read from "standard input" which means they take text typed in from the terminal, instead of reading from a file. That is,
tr does not read from a file, instead you'll have to "pipe" (that's this symbol
|) input to it, e.g.
echo hello world | tr h H would replace all occurrences of lower case h with uppercase.
And finally, there is
sort, which sorts text or numbers on separate rows i.e. separated by a newline
|Sorts text in ascending order, A-Z then a-z|
|Reverses the sort |
|Sorts numbers, ascending (smallest to largest)|
|Sorts numbers, descending (largest to smallest)|
What if your data is delimited by a space and not a new line? Well, start combining commands above, e.g.
echo "d a c b" | tr " " "\n" | sort.
Stringing Them Together
By stringing various commands together, you can easily manipulate the text output e.g. to extract columns, or transform text. I'm going to be reading from
file only once using the
cat file command to print the file, and then pass the printed output from one command to the next via what is called a "pipe" (
|). You can read more about Pipes and Redirection on Apple's Shell Input and Output guide.
In this example, I have a text file called
iostat.log containing the output of a couple of runs of the
iostat (to get I/O statistics):
disk0 disk2 cpu load average KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m 54.20 4 0.25 9.08 0 0.00 14 8 78 3.20 3.05 3.06 disk0 disk2 cpu load average KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m 54.20 4 0.22 9.08 0 0.00 14 8 78 3.20 3.05 3.06
So, if I just want the third and fourth column (MB/s and KB/t):
cat iostat.log | tr -s " " "\t" | cut -f4,5
Note the output has a tab character between the columns (which does not get displayed nicely in HTML), which is great for copy-and-pasting into a spreadsheet:
cpu load MB/s KB/t 0.25 9.08 cpu load MB/s KB/t 0.22 9.08
Assuming I had lots of other lines in the file, and I just wanted the data lines after the "header" starting with "MB/s" and "KB/t":
cat iostat.log | grep -A1 "MB/s"
But now the output contains both the line with "MB/s" and the delimiter "--". So, to exclude these lines and combining the earlier command to just get the third and fourth columns nicely separated with tabs:
cat iostat.log | grep -A1 "MB/s" | grep -v "MB/s\|--" | tr -s " " "\t" | cut -f4,5
Which prints out:
0.22 9.08 0.22 9.08
Now, I could save this output to a file called
results.txt using "redirection" (
>) which you can read more about on the Apple Pipes and Redirection guide:
- To write to a file called
results.txt(over-writing each time),
cat iostat.log | grep -A1 "MB/s" | grep -v "MB/s\|--" | tr -s " " "\t" | cut -f4,5 > results.text
- Or, to keep appending to the same file use
cat iostat.log | grep -A1 "MB/s" | grep -v "MB/s\|--" | tr -s " " "\t" | cut -f4,5 >> results.text
However, since I want to copy the output data to a spreadsheet, I use
pbcopy instead. I find it easier and faster to then swipe/tab over to my spreadsheet, and paste (Command-V) the data! Compare this to the many steps to import a tet file into a spreadsheet, and then still land up copy-and-pasting to an existing spreadsheet. So:
cat iostat.log | grep -A1 "MB/s" | grep -v "MB/s\|--" | tr -s " " "\t" | cut -f4,5 | pbcopy
There you have it - a cheat-sheet for text manipulation, which hopefully saves someone a lot of Googling. My intent is not to cover every possibility, or provide really optimized all-in-one commands, but rather to keep it simple so that anyone can string together the commands to get the output required.
It's better to have a solution that works as expected and is clear, even if it's less than optimal.
Updated 14 Feb 2019: Added a few more examples and