Shell Scripting Essentials
At any given moment I have a few dozen shell scripts I actively use - I am professionally lazy. It's useful, and surprisingly easy to learn, so I thought I'd share what I know. This will be referring to Bash, but much of it can be fitted to POSIX shell.
Why Care?
Scripting offers up far more potential for the software tools you have at your disposal on a Unix-like system, to create larger, specific meta-tools from them. Typing a single command can refer to hundreds of others, with all the usual logical structures associated with any other programming language, removing the manual effort of complex actions. If you have tens of thousands of e-mail addresses you would like to send death threats to, you can do it.
I think having this kind of power is a strong selling point on Unix-likes in general, but that's an aside.
With this post, I aim to give a reasonably comprehensive over to scripting aimed at someone unfamiliar, as a basic reference point, and to show some of what's available, which may inspire them to write their own scripts to simplify some task.
Helpful Concepts to Know
- 'stdin' - input. This can be from the command line, or piped from another program.
- 'stdout' - output. This can be output to the screen, or piped into another program.
- 'pipe' - directing the output of one program to be used as the input of another. Uses the '|' pipe character.
- 'variables' - keywords referring to stored data. You're probably already familiar enough with this. An example variable 'var' is set with 'var=' and accessed with '$var'.
The example below shows some of the concepts in use. Here, echo will output 'Hello world', which will becomes awk's input - awk will then output 'world':
echo 'Hello world' | awk '{print $2}'
Many other methods besides just piping are used to control the flow of a script, allowing more complex scripts to be made. Knowing the below is a good starting point.
- ';' - wait until the preceding program ends before the next begins.
- '&' - execute the preceding and next program simultaneously.
- '&&' - if the preceding program ends successfully or statement is true, execute the next program.
- '||' - if the preceding program fails or statement is false, execute the next program.
- '>' - direct output to file. '>>' - if file already exists, this appends rather than overwrites the file.
Here's another example of this chain in use:
(echo 'Return this' && echo 'and this' && echo 'and this' || echo 'If any fail, return this') >> output_file
Beyond a certain level of complexity, you may find if-else statements more legible:
if [[ -z $(ls ~) ]]; then echo 'Bro someone took your stuff' else echo 'All good' fi
'while' and 'for' loops, case statements, and much else you may be familiar with is also available here.
You may also notice the use of '-z', which is a conditional - note: not the same as program arguments, though they look similar. '-z' here means empty, and likewise '-n' means non-empty. '-lt' means less than, '-ge' means greater than or equal to, etc. Running 'man test' on your Unix-like, or the manual page for your shell (if you don't know your shell, likely 'bash') will give you a longer list of some of the other expressions available to you, which I won't go into any further detail here.
Now, Some Actual Programs
These get used most frequently in any fairly complex script I'm writing, so I would consider them foundational knowledge. There are plenty more, of course. To learn more about them, just precede their name with 'man'.
- echo / printf: 'echo' prints its input, which are either strings (in quotes, or otherwise), or variables. 'printf' is similar, although it has additional formmating sequences by default, and doesn't automatically add newlines.
- cat: 'cat' is technically used to concatenate two or more files. It can also print the contents of a single file to stdout - which can go to the screen or another program, which it's often used for.
- grep: 'grep' can perform regex pattern matching on its input, and return matches. This also applies to returning the lines of a file which have a match. It can also check for multiple matches, use OR/AND logic for multiple searches, and do reverse matching.
- head / tail: 'head' and 'tail' may output the first or last few lines or characters of their input.
- curl: 'curl' can connect to URLs and fetch their content, directing their content to files or stdout, among other things.
- cut: 'cut' cuts input based on a given delimiter, and prints a field after cutting.
- awk: 'awk' is something like a far more powerful version of cut, often preferred because it can use multiple characters in a string as its delimiter, print multiple fields and lines algorithmically, even performing conditionals and calculations. It's basically a language in its own right.
- sed: 'sed' (stream editor) is very useful for finding and deleting, or finding and altering, matches and lines in a file, or to filter output before passing it to the next program.
Other Stuff Worth Mentioning
cron - cron jobs are used to automate the running of programs, and I find myself using them often. Their format allows you to run scripts at any specified moment weeks or months away. For example, cron could run a script at a specific time on your server, and even send an alert if something went wrong, none of which requires you to immediately be there.
mutt - mutt is a terminal-based e-mail client, with an ncurses TUI interface for browsing mailboxes (and offers a lot in terms of writing macros, which is great). It's important, because after the unanimously painful set-up, it also allows you to send mail directly from the terminal without needing to open the client. You still want to automate those death threats, right?