Shell Scripting: practical notes from production

Most broken shell scripts I read fail the same way: an unquoted variable expands into whitespace, rm -rf "$dir/" runs with $dir empty, and a silent cd failure sends the next command to the wrong directory. None of these are exotic. They are the default behavior of /bin/sh and Bash unless you explicitly opt out. Good shell scripting in 2026 is less about clever one-liners and more about turning off the footguns before you write a single line of logic.

This guide is a working set of conventions I apply to every Bash script that runs unattended — cron jobs, CI steps, systemd units, Docker entrypoints, backup runners. It assumes you already know the basics of pipes, redirection, and variables, and want to stop writing scripts that work on your laptop and explode in production.

Start every script with the unfashionable prelude

The first three lines of any Bash script I write, without exception:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

Each of these changes matters. set -e makes the shell exit on the first command that returns non-zero (with caveats I’ll get to). set -u makes unset variables an error instead of silently expanding to the empty string — this alone kills an entire class of “rm -rf on root” disasters. set -o pipefail makes a pipeline fail if any stage fails, not just the last one; without it, curl https://broken | tee out.txt returns success even when curl explodes. The GNU Bash manual’s page on the set builtin documents each of these options in detail and is worth reading once end to end.

Resetting IFS to newline-and-tab is the one most people skip. The default includes a literal space, which means for f in $(ls) will split on every space in every filename. Removing space from IFS doesn’t fix the underlying bug — you should not be parsing ls — but it narrows the blast radius when a teammate’s script does.

The cases where set -e lies to you

Bash’s errexit has sharp edges. It does not trigger inside a command whose exit status is being tested (if, while, &&, ||, !), and it does not propagate out of functions called in those contexts. This is the single most common source of “but I set -e, why did it keep going?” confusion. Greg Wooledge’s BashFAQ #105 has the complete list of situations where errexit silently does nothing, and if you are going to rely on it, you need to read that page once.

The practical consequence: do not treat set -e as a substitute for checking return codes on the things that actually matter. For critical operations — database dumps, file moves, remote uploads — check explicitly:

if ! pg_dump -Fc -d "$DB" -f "$OUT"; then
  logger -t backup "pg_dump failed for $DB"
  exit 1
fi

Quote everything, then run ShellCheck

If there is one rule that prevents more production incidents than any other, it is: quote every variable expansion. Always "$var", not $var. Always "$@", not $*, when forwarding arguments. The number of seasoned sysadmins who have shipped a backup script that worked fine until someone created a directory with a space in it is embarrassingly high.

You don’t have to catch this by eye. ShellCheck is a static analyzer for Bash written in Haskell, and it catches roughly 95% of the quoting, subshell, and word-splitting bugs I used to hunt by hand. Install it from your distro repo (apt install shellcheck on Debian/Ubuntu, dnf install ShellCheck on Fedora, pacman -S shellcheck on Arch) and run it on every script before committing. Every warning has a numeric code with a dedicated wiki page; SC2086, for example, is the unquoted-variable warning and explains exactly why cp $src $dst is dangerous.

Wire it into your pre-commit hook. A one-line hook that runs shellcheck -x -S warning scripts/*.sh will stop most bad code from ever reaching main. In CI, run it with -S error first as a hard gate, then tighten gradually.

Official documentation for Shell Scripting — Official documentation — the primary source for this topic.

Portable shebangs and the sh vs bash trap

On Debian and Ubuntu, /bin/sh is dash, not Bash. On Alpine, it is ash from BusyBox. On macOS it is still Bash 3.2 from 2007 because of the GPLv3 switch. A script with #!/bin/sh at the top that uses [[ ... ]], arrays, local, or process substitution will break on some of those systems and work on others, usually at the worst possible moment.

Pick one and stick with it. If you need arrays, associative arrays, [[, or any of the ergonomic Bash features, use #!/usr/bin/env bash and document that Bash 4.0 or newer is required. If you are writing something that must run on an Alpine container with no Bash installed, restrict yourself to POSIX sh — run your script under dash locally and let it find the incompatibilities for you. The Google Shell Style Guide takes a harder line: executables should have no extension, libraries should end in .sh, and if you need anything beyond basic POSIX features you should be writing the whole thing in Bash or switching languages entirely. I agree with that cutoff. The moment a shell script grows past 300 lines or starts parsing structured data, rewrite it in Python.

When Shell Scripting stops being the right tool

Shell is excellent at orchestration: calling other programs, wiring their stdout into each other, checking exit codes, moving files. It is terrible at arithmetic, string manipulation beyond trivial substitution, data structures, and anything resembling a data format that isn’t line-oriented text. If your script has grown a function named parse_json, that is the signal to stop. Use jq for JSON, call out to Python for YAML or TOML, and never try to write your own config parser in awk. I’ve seen too many multi-hundred-line Bash scripts that became unmaintainable precisely because somebody pushed shell into territory it was never designed for.

Temp files, traps, and cleanup

A script that creates a temporary directory and doesn’t clean up after itself will eventually fill /tmp or, worse, /var. The pattern I use on every script that writes anywhere temporary:

tmpdir=$(mktemp -d -t myscript.XXXXXX)
trap 'rm -rf -- "$tmpdir"' EXIT INT TERM

# ... use "$tmpdir" freely ...

Three things to notice. First, mktemp -d with a template ending in XXXXXX is the only safe way to create a temp directory — never roll your own with /tmp/$$, which is predictable and racey. Second, the trap is set immediately after creation, before anything can fail and leave the directory behind. Third, the -- in the rm protects against a $tmpdir that somehow starts with a dash. Belt and braces, but it costs nothing.

If you need multiple cleanup actions, accumulate them in a function rather than piling them into a single trap string. Traps replace previous traps for the same signal by default, which is the source of another subtle class of bugs where someone adds a second trap later in the script and silently kills the first one.

Arguments, flags, and getopts

Hand-rolled argument parsing is the second-biggest source of shell script bugs I see, after unquoted variables. People write if [ "$1" = "--verbose" ] chains, forget about combined flags, mishandle --, and ship something that cannot accept a filename starting with a dash. Use getopts for short flags; it is a POSIX built-in and it handles the annoying cases correctly:

verbose=0
output=""

while getopts ":vo:h" opt; do
  case "$opt" in
    v) verbose=1 ;;
    o) output=$OPTARG ;;
    h) usage; exit 0 ;;
    \?) echo "Unknown option: -$OPTARG" >&2; exit 2 ;;
    :) echo "Option -$OPTARG requires an argument" >&2; exit 2 ;;
  esac
done
shift $((OPTIND - 1))

For long options (--verbose), Bash’s built-in getopts does not support them. You have two reasonable choices: call out to getopt from util-linux (which does support long options but has portability caveats between GNU and BSD), or write a simple case loop that handles --foo=bar and --foo bar explicitly. For anything more complicated than half a dozen flags, that is also the point where Python’s argparse starts looking attractive.

Logging and observability in Shell Scripting

A script that runs under cron at 3am and silently fails because its log output went to a file nobody reads is, operationally, not running at all. Two small patterns make this dramatically better.

First, route operational output through logger so it lands in the system journal. On any modern systemd host, logger -t mybackup -p user.info "starting run" will show up in journalctl -t mybackup, with timestamps and log levels, viewable alongside every other service. You get filtering, rotation, and remote forwarding for free, and you no longer need to manage log files yourself.

Second, make failure loud. The pattern I use is an ERR trap that reports the failing line and command before exiting:

on_err() {
  local exit_code=$?
  local line=$1
  logger -t mybackup -p user.err \
    "failed at line $line with exit $exit_code: $BASH_COMMAND"
  exit "$exit_code"
}
trap 'on_err $LINENO' ERR

Combined with set -e, this gives you a precise one-line failure message in the journal every time the script dies, which is a huge improvement over the usual “backup didn’t run, no idea why” postmortem.

Fixing a Broken Linux Server: Permissions, Disks, and Panic

The subshell and pipeline gotchas worth knowing

A few behaviors are worth memorizing because they burn everyone at least once.

Variables set inside a pipeline subshell do not survive to the parent. echo foo | read x; echo "$x" prints nothing, because the right side of the pipe runs in a subshell. Use process substitution or a here-string instead: read x <<< "$(echo foo)", or in Bash 4.2+, set shopt -s lastpipe and run non-interactively. Wooledge’s BashPitfalls page catalogues this and several dozen other specific traps — it is probably the single best document on the web for leveling up past intermediate shell.

Command substitution strips trailing newlines. file_contents=$(cat file.txt) will silently drop any trailing newline, which matters if you are round-tripping the contents back out. If you need to preserve them, append a sentinel character and strip it: contents=$(cat file.txt; echo x); contents=${contents%x}. Ugly, but it is the only reliable fix.

Arithmetic with $((...)) does not handle floating point. If you need decimals, pipe through bc -l or, again, stop and use Python. Bash is not the right place to compute a ratio.

Testing shell scripts without losing your mind

Shell scripts get tested far less than other code because the tooling is awkward, but it exists and it is worth the effort for anything business-critical. bats-core is the de facto test runner for Bash — it gives you per-test isolation, assertions, setup and teardown, and plays nicely with CI. A test file looks like this:

#!/usr/bin/env bats

setup() {
  tmpdir=$(mktemp -d)
}

teardown() {
  rm -rf -- "$tmpdir"
}

@test "script creates output file" {
  run ./myscript.sh -o "$tmpdir/out.txt"
  [ "$status" -eq 0 ]
  [ -f "$tmpdir/out.txt" ]
}

Run it with bats tests/. The main advantage over ad-hoc “just run it and see” testing is that you capture regressions — the next time someone changes the argument parsing and breaks the -o flag, CI tells you before you ship. For scripts that shell out to external services, combine bats with a fake PATH prefix containing stub executables, so your tests never hit the real API.

Security details most scripts get wrong

A handful of small things make shell scripts noticeably safer without adding real complexity. Set restrictive permissions on anything that might write secrets: umask 077 at the top of the script means every file it creates is only readable by the owner. Never pass secrets on the command line — they show up in ps and in shell history — use environment variables or read from a file with mode 0600. Avoid eval entirely; if you think you need it, you don’t. And if you must run anything as root, drop privileges as early as possible with sudo -u unprivileged_user or a dedicated systemd User= directive rather than writing the whole script as root.

For anything that takes untrusted input — filenames from a directory listing, lines from a user-supplied file, values from an HTTP request — treat that input the way you would treat user input in a web app. Validate it against a narrow whitelist, quote it aggressively, and never interpolate it into a command that will be passed to another shell. The classic mistake is ssh host "do-something $user_input", where $user_input gets re-parsed by the remote shell and becomes a command injection. Use printf '%q' to escape, or better, pass data via stdin instead of the argv.

The takeaway

Good Shell Scripting in 2026 is a small number of defensive habits applied consistently: strict mode at the top of every file, ShellCheck in your commit hook, quoted expansions everywhere, traps for cleanup, explicit error handling on the commands that matter, and a hard rule that anything past a few hundred lines gets rewritten in a real language. The payoff is scripts that survive unusual filenames, partial failures, and the 3am cron run on a filesystem you didn’t expect. Adopt the prelude and the linter today; everything else is refinement on top of that foundation.

Start every script with the unfashionable prelude

The cases where set -e lies to you

Quote everything, then run ShellCheck

Portable shebangs and the sh vs bash trap

When Shell Scripting stops being the right tool

Temp files, traps, and cleanup

Arguments, flags, and getopts

Logging and observability in Shell Scripting

Fixing a Broken Linux Server: Permissions, Disks, and Panic

The subshell and pipeline gotchas worth knowing

Testing shell scripts without losing your mind

Security details most scripts get wrong

The takeaway

ss vs netstat: The Linux Socket Inspection Tool You Should Be Using

How to Manage Linux File Permissions and Ownership Using Python

How to Grant a Linux User Access to Just One Specific Directory

How to Get Started with Linux Tutorial

Shell Scripting: practical notes from production

Start every script with the unfashionable prelude

The cases where set -e lies to you

Quote everything, then run ShellCheck

Portable shebangs and the sh vs bash trap

When Shell Scripting stops being the right tool

Temp files, traps, and cleanup

Arguments, flags, and getopts

Logging and observability in Shell Scripting

Fixing a Broken Linux Server: Permissions, Disks, and Panic

The subshell and pipeline gotchas worth knowing

Testing shell scripts without losing your mind

Security details most scripts get wrong

The takeaway

ss vs netstat: The Linux Socket Inspection Tool You Should Be Using

How to Manage Linux File Permissions and Ownership Using Python

How to Grant a Linux User Access to Just One Specific Directory

How to Get Started with Linux Tutorial

Categories

Subscribe Today