---
title: "Pipelines and your Unix toolbox"
createdAt: "2019-05-29T00:00:00.000Z"
updatedAt: "2019-05-29T00:00:00.000Z"
tags: ["code","unix"]
draft: false
aliases: ["/code/2019/05/29/unix.html","/posts/2019-05-29-unix"]
atUri: "at://did:plc:mracrip6qu3vw46nbewg44sm/site.standard.document/3mhnqy73kt52c"
readingTime: false
---

Unix commands are great for manipulating data and files. They get even better when used in shell pipelines. The following are a few of my go-tos -- I'll list the commands with an example or two. While many of the commands can be used standalone, I'll provide examples that assume the input is piped in because that's how you'd used these commands in a pipeline. Lastly, most of these commands are pretty simple and that is by design -- the Unix philosophy focuses of simple, modular code, which can be composed to perform more complex operations.

Note: if you're using a Mac, the builtin tools shipped with macOS might behave a little differently than the most recent versions. You can get more recently compiled version of these tools by running `brew install coreutils`. The typically usage is then `ghead` instead of `head`, `gtail` instead of `tail`, `gpaste` instead of `paste`, etc.

Here are the commands and a one sentence description:

Please forgive the "useless use of `cat`". I'm using `cat` to show the pipe-able versions of the piped-to commands.

Print the first 5 lines

```sh
cat somefile.txt | head -n 5
```

Print the last 5 lines:

```sh
cat somefile.json | tail -n 5
```

Print all but the first line

```sh
cat somefile.csv | tail -n +2
```

Send all lines of output as arguments to `echo`

```sh
cat somefile.xml | xargs echo
```

Send each line individually as an argument to `echo`

```sh
cat somefile.yaml | xargs -n 1 echo
```

Send two arguments at a time to `echo`

```sh
cat somefile | xargs -n 2 echo
```

Print only column two of each line (assumes whitespace between columns)

```sh
cat somefile | awk '{print $2}'
```

Print only column two of each line with `,` separators

```sh
cat somefile | awk -F',' '{print $2}'
```

or

```sh
cat somefile | cut -d',' -f 2
```

String replace each line matching 1, 2, or 3 using regex with the letter 'x'

```sh
cat somefile.txt | sed 's/[1-3]/x/'
```

Count the number of times each line appears in the file

```sh
cat somefile.txt | sort | uniq -c
```

Write out the intermediate result to a file in the middle of a pipeline using `tee`

```sh
cat somefile.txt | grep "mystring" | tee newfile.txt | wc -l
```

The above are a bunch of "tools" to file away for when you need them. I like to store them along with a few keywords or a sentence describing what each does for easy searching and recall.

Now, let's consider the following example file `myfile.txt`:

```sh
uuid,name
72e925a8-58fb-11e9-87f4-fbe50933ad95,name1
237fd8a4-58fb-11e9-8bb7-bf5388556288,name2
91834624-58fb-11e9-8c01-2393beecfc80,name3
223c7438-58fc-11e9-a629-7707a76a95c5,name4
a0362ab0-58fb-11e9-8aed-1f7ca90ae318,name5
f3f153aa-58fb-11e9-a37c-3ffed25b518b,name6
09f6cdb0-58fc-11e9-9663-93ede25c7774,name7
21f4c2ec-58fc-11e9-baf6-7792114ee968,name8
```

We want to create files in the current folder using the names in column two, where the uuids in column one start with a "2".

Break down the transformation into parts:

- remove the first line (the header)
- filter for lines starting with `2`
- grab the second column using a `,` separator
- use the result to create the files with `touch`

```sh
tail -n +2 myfile.txt | grep "^2" | awk -F',' '{print $2}' | xargs touch
```

Confirm it worked:

```sh
$ ls name*
name2 name4 name8
```

As you add new tools to your toolbox, you can plug them in to your shell one-liners to manipulate data streams.

Some other useful commands for text processing worth looking into:

[`paste`](http://cheat.sh/paste)
[`tr`](http://cheat.sh/tr)
[`wc`](http://cheat.sh/wc)
[`jq`](https://stedolan.github.io/jq/manual/)