How to define hash tables in Bash?

B

Benjamin W.

Bash 4

Bash 4 natively supports this feature. Make sure your script's hashbang is #!/usr/bin/env bash or #!/bin/bash so you don't end up using sh. Make sure you're either executing your script directly, or execute script with bash script. (Not actually executing a Bash script with Bash does happen, and will be really confusing!)

You declare an associative array by doing:

declare -A animals

You can fill it up with elements using the normal array assignment operator. For example, if you want to have a map of animal[sound(key)] = animal(value):

animals=( ["moo"]="cow" ["woof"]="dog")

Or declare and instantiate in one line:

declare -A animals=( ["moo"]="cow" ["woof"]="dog")

Then use them just like normal arrays. Use

animals['key']='value' to set value

"${animals[@]}" to expand the values

"${!animals[@]}" (notice the !) to expand the keys

Don't forget to quote them:

echo "${animals[moo]}"
for sound in "${!animals[@]}"; do echo "$sound - ${animals[$sound]}"; done

Bash 3

Before bash 4, you don't have associative arrays. Do not use eval to emulate them. Avoid eval like the plague, because it is the plague of shell scripting. The most important reason is that eval treats your data as executable code (there are many other reasons too).

First and foremost: Consider upgrading to bash 4. This will make the whole process much easier for you.

If there's a reason you can't upgrade, declare is a far safer option. It does not evaluate data as bash code like eval does, and as such does not allow arbitrary code injection quite so easily.

Let's prepare the answer by introducing the concepts:

First, indirection.

$ animals_moo=cow; sound=moo; i="animals_$sound"; echo "${!i}"
cow

Secondly, declare:

$ sound=moo; animal=cow; declare "animals_$sound=$animal"; echo "$animals_moo"
cow

Bring them together:

# Set a value:
declare "array_$index=$value"

# Get a value:
arrayGet() { 
    local array=$1 index=$2
    local i="${array}_$index"
    printf '%s' "${!i}"
}

Let's use it:

$ sound=moo
$ animal=cow
$ declare "animals_$sound=$animal"
$ arrayGet animals "$sound"
cow

Note: declare cannot be put in a function. Any use of declare inside a bash function turns the variable it creates local to the scope of that function, meaning we can't access or modify global arrays with it. (In bash 4 you can use declare -g to declare global variables - but in bash 4, you can use associative arrays in the first place, avoiding this workaround.)

Summary:

Upgrade to bash 4 and use declare -A for associative arrays.

Use the declare option if you can't upgrade.

Consider using awk instead and avoid the issue altogether.

Can't upgrade: the only reason I write scripts in Bash is for "run anywhere" portability. So relying on a non-universal feature of Bash rules this approach out. Which is a shame, because otherwise it would have been an excellent solution for me!

It's a shame that OSX defaults to Bash 3 still as this represents the "default" for a lot of people. I thought the ShellShock scare might have been the push they needed but apparently not.

@ken it's a licensing issue. Bash on OSX is stuck at the latest non-GPLv3 licensed build.

@jww Apple will not upgrade GNU bash beyond 3 due to its ill will against the GPLv3. But that should not be a deterrent. brew install bash brew.sh

...or sudo port install bash, for those (wisely, IMHO) unwilling to make directories in the PATH for all users writable without explicit per-process privilege escalation.

B

Bubnoff

There's parameter substitution, though it may be un-PC as well ...like indirection.

#!/bin/bash

# Array pretending to be a Pythonic dictionary
ARRAY=( "cow:moo"
        "dinosaur:roar"
        "bird:chirp"
        "bash:rock" )

for animal in "${ARRAY[@]}" ; do
    KEY="${animal%%:*}"
    VALUE="${animal##*:}"
    printf "%s likes to %s.\n" "$KEY" "$VALUE"
done

printf "%s is an extinct animal which likes to %s\n" "${ARRAY[1]%%:*}" "${ARRAY[1]##*:}"

The BASH 4 way is better of course, but if you need a hack ...only a hack will do. You could search the array/hash with similar techniques.

I would change that to VALUE=${animal#*:} to protect the case where ARRAY[$x]="caesar:come:see:conquer"

It's also useful to put double quotes around the ${ARRAY[@]} in case there are spaces in the keys or values, as in for animal in "${ARRAY[@]}"; do

But isn't the efficiency quite poor? I'm thinking O(n*m) if you want to compare to another list of keys, instead of O(n) with proper hashmaps (constant time lookup, O(1) for a single key).

The idea is less about efficiency, more about understand/read-ability for those with a background in perl, python or even bash 4. Allows you to write in a similar fashion.

@CoDEmanX: this is a hack, a clever and elegant but still rudimentary workaround to help the poor souls still stuck in 2007 with Bash 3.x. You cannot expect "proper hashmaps" or efficiency considerations in such a simple code.

r

rubo77

This is what I was looking for here:

declare -A hashmap
hashmap["key"]="value"
hashmap["key2"]="value2"
echo "${hashmap["key"]}"
for key in ${!hashmap[@]}; do echo $key; done
for value in ${hashmap[@]}; do echo $value; done
echo hashmap has ${#hashmap[@]} elements

This did not work for me with bash 4.1.5:

animals=( ["moo"]="cow" )

Note, that the value may not contain spaces, otherwise you adde more elements at once

Upvote for the hashmap["key"]="value" syntax which I, too, found missing from the otherwise fantastic accepted answer.

@rubo77 key neither, it adds multiple keys. Any way to workaround this?

l

lovasoa

Just use the file system

The file system is a tree structure that can be used as a hash map. Your hash table will be a temporary directory, your keys will be filenames, and your values will be file contents. The advantage is that it can handle huge hashmaps, and doesn't require a specific shell.

Hashtable creation

hashtable=$(mktemp -d)

Add an element

echo $value > "$hashtable/$key"

Read an element

value=$(< "$hashtable/$key")

Performance

Of course, its slow, but not that slow. I tested it on my machine, with an SSD and btrfs, and it does around 3000 element read/write per second.

Which version of bash supports mkdir -d? (Not 4.3, on Ubuntu 14. I'd resort to mkdir /run/shm/foo, or if that filled up RAM, mkdir /tmp/foo.)

Perhaps mktemp -d was meant instead?

Curious what is the difference between $value=$(< $hashtable/$key) and value=$(< $hashtable/$key)? Thanks!

"tested it on my machine" This sounds like a great way to burn a hole through your SSD. Not all Linux distros use tmpfs by default.

This won't work with values that has "/" slashes in them

R

Roger Lipscombe

You can further modify the hput()/hget() interface so that you have named hashes as follows:

hput() {
    eval "$1""$2"='$3'
}

hget() {
    eval echo '${'"$1$2"'#hash}'
}

and then

hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid
echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`

This lets you define other maps that don't conflict (e.g., 'rcapitals' which does country lookup by capital city). But, either way, I think you'll find that this is all pretty terrible, performance-wise.

If you really want fast hash lookup, there's a terrible, terrible hack that actually works really well. It is this: write your key/values out to a temporary file, one-per line, then use 'grep "^$key"' to get them out, using pipes with cut or awk or sed or whatever to retrieve the values.

Like I said, it sounds terrible, and it sounds like it ought to be slow and do all sorts of unnecessary IO, but in practice it is very fast (disk cache is awesome, ain't it?), even for very large hash tables. You have to enforce key uniqueness yourself, etc. Even if you only have a few hundred entries, the output file/grep combo is going to be quite a bit faster - in my experience several times faster. It also eats less memory.

Here's one way to do it:

hinit() {
    rm -f /tmp/hashmap.$1
}

hput() {
    echo "$2 $3" >> /tmp/hashmap.$1
}

hget() {
    grep "^$2 " /tmp/hashmap.$1 | awk '{ print $2 };'
}

hinit capitals
hput capitals France Paris
hput capitals Netherlands Amsterdam
hput capitals Spain Madrid

echo `hget capitals France` and `hget capitals Netherlands` and `hget capitals Spain`

Great! you can even iterate it: for i in $(compgen -A variable capitols); do hget "$i" "" done

A

AsymLabs

Consider a solution using the bash builtin read as illustrated within the code snippet from a ufw firewall script that follows. This approach has the advantage of using as many delimited field sets (not just 2) as are desired. We have used the | delimiter because port range specifiers may require a colon, ie 6001:6010.

#!/usr/bin/env bash

readonly connections=(       
                            '192.168.1.4/24|tcp|22'
                            '192.168.1.4/24|tcp|53'
                            '192.168.1.4/24|tcp|80'
                            '192.168.1.4/24|tcp|139'
                            '192.168.1.4/24|tcp|443'
                            '192.168.1.4/24|tcp|445'
                            '192.168.1.4/24|tcp|631'
                            '192.168.1.4/24|tcp|5901'
                            '192.168.1.4/24|tcp|6566'
)

function set_connections(){
    local range proto port
    for fields in ${connections[@]}
    do
            IFS=$'|' read -r range proto port <<< "$fields"
            ufw allow from "$range" proto "$proto" to any port "$port"
    done
}

set_connections

@CharlieMartin : read is a very powerful feature and is under-utilized by many bash programmers. It allows compact forms of lisp-like list processing. For example, in the above example we can strip off just the first element and retain the rest (ie a similar concept to first and rest in lisp) by doing: IFS=$'|' read -r first rest <<< "$fields"

D

DigitalRoss

hput () {
  eval hash"$1"='$2'
}

hget () {
  eval echo '${hash'"$1"'#hash}'
}
hput France Paris
hput Netherlands Amsterdam
hput Spain Madrid
echo `hget France` and `hget Netherlands` and `hget Spain`

$ sh hash.sh
Paris and Amsterdam and Madrid

Sigh, that seems unnecessarily insulting and it's inaccurate anyway. One would not put input validation, escaping, or encoding (see, I actually do know) in the guts of the hash table, but rather in a wrapper and as soon as possible after input.

@DigitalRoss can you explain what is the use of #hash in eval echo '${hash'"$1"'#hash}'. for me it seems me as a comment not more then that. does #hash have any special meaning here?

@Sanjay ${var#start} removes the text start from the beginning of the value stored in the variable var.

m

marco

I agree with @lhunath and others that the associative array are the way to go with Bash 4. If you are stuck to Bash 3 (OSX, old distros that you cannot update) you can use also expr, which should be everywhere, a string and regular expressions. I like it especially when the dictionary is not too big.

Choose 2 separators that you will not use in keys and values (e.g. ',' and ':' ) Write your map as a string (note the separator ',' also at beginning and end) animals=",moo:cow,woof:dog," Use a regex to extract the values get_animal { echo "$(expr "$animals" : ".*,$1:$[^,]*$,.*")" } Split the string to list the items get_animal_items { arr=$(echo "${animals:1:${#animals}-2}" | tr "," "\n") for i in $arr do value="${i##*:}" key="${i%%:*}" echo "${value} likes to $key" done }

Now you can use it:

$ animal = get_animal "moo"
cow
$ get_animal_items
cow likes to moo
dog likes to woof

C

Cole Stanfield

I really liked Al P's answer but wanted uniqueness enforced cheaply so I took it one step further - use a directory. There are some obvious limitations (directory file limits, invalid file names) but it should work for most cases.

hinit() {
    rm -rf /tmp/hashmap.$1
    mkdir -p /tmp/hashmap.$1
}

hput() {
    printf "$3" > /tmp/hashmap.$1/$2
}

hget() {
    cat /tmp/hashmap.$1/$2
}

hkeys() {
    ls -1 /tmp/hashmap.$1
}

hdestroy() {
    rm -rf /tmp/hashmap.$1
}

hinit ids

for (( i = 0; i < 10000; i++ )); do
    hput ids "key$i" "value$i"
done

for (( i = 0; i < 10000; i++ )); do
    printf '%s\n' $(hget ids "key$i") > /dev/null
done

hdestroy ids

It also performs a tad bit better in my tests.

$ time bash hash.sh 
real    0m46.500s
user    0m16.767s
sys     0m51.473s

$ time bash dirhash.sh 
real    0m35.875s
user    0m8.002s
sys     0m24.666s

Just thought I'd pitch in. Cheers!

Edit: Adding hdestroy()

A

Adam Katz

A coworker just mentioned this thread. I've independently implemented hash tables within bash, and it's not dependent on version 4. From a blog post of mine in March 2010 (before some of the answers here...) entitled Hash tables in bash:

I previously used cksum to hash but have since translated Java's string hashCode to native bash/zsh.

# Here's the hashing function
ht() {
  local h=0 i
  for (( i=0; i < ${#1}; i++ )); do
    let "h=( (h<<5) - h ) + $(printf %d \'${1:$i:1})"
    let "h |= h"
  done
  printf "$h"
}

# Example:

myhash[`ht foo bar`]="a value"
myhash[`ht baz baf`]="b value"

echo ${myhash[`ht baz baf`]} # "b value"
echo ${myhash[@]} # "a value b value" though perhaps reversed
echo ${#myhash[@]} # "2" - there are two values (note, zsh doesn't count right)

It's not bidirectional, and the built-in way is a lot better, but neither should really be used anyway. Bash is for quick one-offs, and such things should quite rarely involve complexity that might require hashes, except perhaps in your ~/.bashrc and friends.

The link in the answer is scary! If you click it, you are stuck in a redirection loop. Please update.

@MohammadRakibAmin – Yeah, my website is down and I doubt I'll be resurrecting my blog. I have updated the above link to an archived version. Thanks for your interest!

It doesn't look like this will handle hash collisions.

@neuralmer – True. This was designed to be an actual hash implementation of a hash structure. If you want to handle hash collisions, I recommend a real hash implementation rather than a hack like this. Adapting this to manage collisions would remove all of its elegance.

j

jrichard

Two things, you can use memory instead of /tmp in any kernel 2.6 by using /dev/shm (Redhat) other distros may vary. Also hget can be reimplemented using read as follows:

function hget {

  while read key idx
  do
    if [ $key = $2 ]
    then
      echo $idx
      return
    fi
  done < /dev/shm/hashmap.$1
}

In addition by assuming that all keys are unique, the return short circuits the read loop and prevents having to read through all entries. If your implementation can have duplicate keys, then simply leave out the return. This saves the expense of reading and forking both grep and awk. Using /dev/shm for both implementations yielded the following using time hget on a 3 entry hash searching for the last entry :

Grep/Awk:

hget() {
    grep "^$2 " /dev/shm/hashmap.$1 | awk '{ print $2 };'
}

$ time echo $(hget FD oracle)
3

real    0m0.011s
user    0m0.002s
sys     0m0.013s

Read/echo:

$ time echo $(hget FD oracle)
3

real    0m0.004s
user    0m0.000s
sys     0m0.004s

on multiple invocations I never saw less then a 50% improvement. This can all be attributed to fork over head, due to the use of /dev/shm.

k

kojiro

Prior to bash 4 there is no good way to use associative arrays in bash. Your best bet is to use an interpreted language that actually has support for such things, like awk. On the other hand, bash 4 does support them.

As for less good ways in bash 3, here is a reference than might help: http://mywiki.wooledge.org/BashFAQ/006

M

Milan Adamovsky

Bash 3 solution:

In reading some of the answers I put together a quick little function I would like to contribute back that might help others.

# Define a hash like this
MYHASH=("firstName:Milan"
        "lastName:Adamovsky")

# Function to get value by key
getHashKey()
 {
  declare -a hash=("${!1}")
  local key
  local lookup=$2

  for key in "${hash[@]}" ; do
   KEY=${key%%:*}
   VALUE=${key#*:}
   if [[ $KEY == $lookup ]]
   then
    echo $VALUE
   fi
  done
 }

# Function to get a list of all keys
getHashKeys()
 {
  declare -a hash=("${!1}")
  local KEY
  local VALUE
  local key
  local lookup=$2

  for key in "${hash[@]}" ; do
   KEY=${key%%:*}
   VALUE=${key#*:}
   keys+="${KEY} "
  done

  echo $keys
 }

# Here we want to get the value of 'lastName'
echo $(getHashKey MYHASH[@] "lastName")


# Here we want to get all keys
echo $(getHashKeys MYHASH[@])

I think this is a pretty neat snippet. It could use a little cleanup (not much, though). In my version, I've renamed 'key' to 'pair' and made KEY and VALUE lowercase (because I use uppercase when variables are exported). I also renamed getHashKey to getHashValue and made both key and value local (sometimes you would want them not to be local, though). In getHashKeys, I do not assign anything to value. I use semicolon for separation, since my values are URLs.

A

Alex

I also used the bash4 way but I find and annoying bug.

I needed to update dynamically the associative array content so i used this way:

for instanceId in $instanceList
do
   aws cloudwatch describe-alarms --output json --alarm-name-prefix $instanceId| jq '.["MetricAlarms"][].StateValue'| xargs | grep -E 'ALARM|INSUFFICIENT_DATA'
   [ $? -eq 0 ] && statusCheck+=([$instanceId]="checkKO") || statusCheck+=([$instanceId]="allCheckOk"
done

I find out that with bash 4.3.11 appending to an existing key in the dict resulted in appending the value if already present. So for example after some repetion the content of the value was "checkKOcheckKOallCheckOK" and this was not good.

No problem with bash 4.3.39 where appenging an existent key means to substisture the actuale value if already present.

I solved this just cleaning/declaring the statusCheck associative array before the cicle:

unset statusCheck; declare -A statusCheck

C

Community

I create HashMaps in bash 3 using dynamic variables. I explained how that works in my answer to: Associative arrays in Shell scripts

Also you can take a look in shell_map, which is a HashMap implementation made in bash 3.

How to define hash tables in Bash?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Links

Contact US