Notes on Bash arrays, strings, and whatnot

I was told that there is a special place in hell for people who use arrays and do arithmetic in shell.

Unfortunate for me, laziness is a powerful thing, and instead of learning how to use Python or some other sane scripting language, I decided to do exactly that in bash since hey, I already sort of know how to write shell scripts. Or so I thought.

The following are some notes on bash scripting based on the lessons learned, with the caveat that it’s been tested most heavily on OS X (so I may have compromised portability) and that it assumes that the reader has seen a shell script before.

Contents

The Motivation

These notes are the result of my recent attempts to write scripts for auto-generating reStructured Text (reST) files for a technical documentation project. Specifically, I wanted to see if I can generate a reST table, since it’s far more self-documenting than its HTML counterpart:

+---+---+---+---+
| A | B | C | D |
+===+===+===+===+
| 1 | 2 | 3 | 4 |
+---+---+---+---+
|   | 5   6 | 7 |
+---+-------+---+

The above creates something that looks like this (prettiness limited by how it’s rendered):

A B C D
1 2 3 4
5 6 7

Long story short, it seemed like a good idea at the time.

Arrays

Bash arrays are 0-indexed and one-dimenstional (i.e., you can’t have arrays with arrays as elements).

Instantiation/access

There are several ways to declare them. Below, we instantiate array1, array2, and array3 using the various methods:

$ # use the `declare` builtin explicitly
$ declare -a array1=(a b c)
$
$ # assign contents directly with compound assingnment
$ array2=(a b c)
$
$ # assign a value to each element explicitly
$ array3[0]=a
$ array3[1]=b
$ array3[2]=c

array1, array2, and array3 all contain three elements, ‘a’, ‘b’, and ‘c’. We can check by printing their contents:

$ printf '%s ' "${array1[@]}"; printf '\n'
a b c

"${array1[@]}" means “all non-null elements in the array”.

Of course, we can also refer to an index to print what’s there:

$ printf '%s' ${array2[0]}
a

Or pick out a sub-array out of an array. To get just the first two elements out of array3, for example:

printf '%s ' ${array3[@]:0:2}; printf '\n'
-> a b

The 1:2 can be seen as “two elements, starting from index 1”. This idiom can also be used on strings <str-iter>.

Iteration

Starting off with an example:

for i in ${array3[@]}; do
    printf '%s ' "$i"
done
printf '\n'

This translates to returns “a b c” on its own line. We can also use ${!array[@]}, which expands to the indicies of the non-null elements:

for i in ${!array3[@]}; do
    printf '%s ' "${array3[$i]}"
done
printf '\n'

This can be thought of as iterating over an array containing the numbers representing the indices of the contents of array3.

Arrays and arithmetic

When arrays or lists are involved, sooner or later you want the length. ${#array[@]} returns the number of non-null elements in a given array:

$ printf '%s\n' "${#array2[@]}"
3

We can combine this with bash arithmetic using $(()), to say, get the last element of a non-sparse array:

$ size=${#array2[@]}                     # get the number of elements
$ last=$(( ${size} - 1 ))                # calculate the last index (size-1)
$ printf '%s\n' "${array2[${last}]}"     # access the index and print
c

(I tried to break it down to make it more legible, but this should be an indication that, even before we talk about performance, bash is not the choice for these types of things)

The ‘non-null’ bit…

Note how I mention “non-null” repeatedly. This is because arrays in bash can be sparse (have unassigned indices), and it may not be immediately obvious:

$ sparse[0]=a
$ sparse[2]=b
$ sparse[4]=c
$ printf '%s ' "${sparse[@]}"; printf "\n"
a b c
$ printf '%s\n' "${#sparse[@]}"
3
$ printf '%s ' "${!sparse[@]}"; printf "\n"
0 2 4

This means that ${#array[@]} isn’t quite a good indicator of the actual length of an array in bash, so the above example for getting the last element <find_last> doesn’t work for sparse arrays. I guess one can take the last element of the array of indices, then use that to reference the actual last element.

Comparing it to another language (Ruby):

irb(main)> sparse=[]
=> []
irb(main)> sparse[0]='a'
=> "a"
irb(main)> sparse[2]='b'
=> "b"
irb(main)> sparse[4]='c'
=> "c"
irb(main)> sparse
=> ["a", nil, "b", nil, "c"]
irb(main)> sparse.size
=> 5

Strings

Some of the idioms used in arrays apply to strings, like finding its length:

$ str=test-string
$ printf '%s\n' "${#str}"
11

Or getting a substring:

$ printf '%s\n' "${str:0:4}"
test
iteration over strings

The above allows us to do things like iterate over each character in a string:

str=test
i=0
while [ $i -lt ${#str} ]; do
    printf '%d: %s\n' "$i" "${str:$i:1}"
    i=$(( $i +1 ))
done

Saving and running the above with bash <filename> returns:

0: t
1: e
2: s
3: t
String comparisons

The characters in a string have numeric values. The classic interpretation is ASCII – so the following comparison between ‘a’ and ‘b’ compare the ASCII values of the characters, decimal value ’97’ and ’98’:

$ [ 'a' \< 'b' ] && printf '%s\n' "true" || printf '%s\n' "false"
true
$ [ 'a' \> 'b' ] && printf '%s\n' "true" || printf '%s\n' "false"
false

The above uses bash‘s short-circuit && and || operators, which quit evaluation as soon as the result is known.

Note that the operaters need to be escaped (i.e. ‘\>’ not ‘>’), and that comparison will return an error if you try to use -lt, -gt, etc. since they expect integers.

The comparison doesn’t have to stop at single (or alphanumeric) characters:

$ [ '+--+' \< '----' ] && printf '%s\n' "true" || printf '%s\n' "false"
true

The above is true because ‘+’ is ASCII 43, and ‘-‘ is 45. The ‘-‘ actually brings me to:

printf(1) and special characters

I tend to use printf instead of echo because of its consistency across different platforms. If you’re printing something simple, you can invoke the string directly, without formatting:

$ printf "Hello\n"
Hello

However, printf tends to interpret strings starting with ‘-‘ as (invalid) options:

$ printf "--Hello\n"
-bash: printf: --: invalid option
printf: usage: printf [-v var] format [arguments]

Hence the reason why it’s good to go with printf 'formatting' 'strings'...:

$ printf '%s\n' "--Hello"
--Hello

Another method would have been to use printf -- "--Hello\n", to tell it that the string after ‘–‘ shouldn’t be interpreted as an option (just like for rm, for removing files that start with ‘-‘).

Passing arrays to functions

The general function looks like this (Some people omit the ‘function’ bit) :

function foo() {
   # things go here....
}

The arguments to the function basically follow the same format as the bash script itself, i.e. $@ -> argument vector (all arguments), $1->first argument, $2->second… and so on.

When an array is passed to a function, it gets expanded out into its contents, and essentially gets merged into the argument vector. This means that if you want to pass arrays to your functions, you need to do some book-keeping yourself.

A simple way to do this would be to start an array with its length, but another way that could work is by using string representations of arrays as arguments, and converting them into arrays in the function:

function arr-ifs() {
    arr1=( "$1" )

    # print your newly-made array and a newline
    printf '%s ' "${arr1[@]}"
    printf '\n'
}

Pasting that into the shell and invoking it as arr-ifs "a b c" "1 2 3" will return “a b c” (the first 3-element array).

Bash “strict” mode

Although it does get in the way at first, true to its claims, bash strict mode is a sanity saver when it comes to improving the debuggability of your shell script.