An associative array (also map, hash, dictionary, finite map, lookup table, and in query-processing an index or index file) is an abstract data type composed of a collection of keys and a collection of values, where each key is associated with one value. The operation of finding the value associated with a key is called a lookup or indexing, and this is the most important operation supported by an associative array. The relationship between a key and its value is sometimes called a mapping or binding. For example, if the value associated with the key
7, we say that our array maps
7. Associative arrays are very closely related to the mathematical concept of a function with a finite domain. As a consequence, a common and important use of associative arrays is in memoization.
From the perspective of a programmer using an associative array, it can be viewed as a generalization of an array: While a regular array maps integers to arbitrarily typed objects (integers, strings, pointers, and, in an OO sense, objects), an associative array maps arbitrarily typed objects to arbitrarily typed objects. (Implementations of the two data structures, though, may be considerably different.)
The operations that are usually defined for an associative array are:
Add: Bind a new key to a new value
Reassign: Bind an old key to a new value
Remove: Unbind a key from a value and remove the key from the key set
Lookup: Find the value (if any) that is bound to a key Examples
Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures (such as association lists).
Data structures for associative arrays
There are two main efficient data structures used to represent associative arrays, the hash table and the self-balancing binary search tree. Skip lists are also an alternative, though relatively new and not as widely used. Relative advantages and disadvantages include:
Hash tables have faster average lookup and insertion time (O(1)), while some kinds of binary search tree have faster worst-case lookup and insertion time (O(log n) instead of O(n)). Hash tables have seen extensive use in realtime systems, but trees can be useful in high-security realtime systems where untrusted users may deliberately supply information that triggers worst-case performance in a hash table, although careful design can remove that issue. Hash tables shine in very large arrays, where O(1) performance is important. Skip lists have worst-case operation time of O(n), but average-case of O(log n), with much less insertion and deletion overhead than balanced binary trees.
Hash tables can have more compact storage for small value types, especially when the values are bits.
There are simple persistent versions of balanced binary trees, which are especially prominent in functional languages.
Building a hash table requires a reasonable hash function for the key type, which can be difficult to write well, while balanced binary trees and skip lists only require a total ordering on the keys. On the other hand, with hash tables the data may be cyclically or partially ordered without any problems.
Balanced binary trees and skip lists preserve ordering — allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently.
Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys, or to count the number of keys in an ordered range. Efficient representations
A simple but generally inefficient type of associative array is an association list, often called an alist for short, which simply stores a linked list of key-value pairs. Each lookup does a linear search through the list looking for a key match.
Strong advantages of association lists include:
No knowledge is needed about the keys, such as an order or a hash function.
For small associative arrays, common in some applications, association lists can take less time and space than other data structures.
Insertions are done in constant time by cons'ing the new association to the head of the list. Association lists
If the keys have a specific type, one can often use specialized data structures to gain performance. For example, integer-keyed maps can be implemented using Patricia trees or Judy arrays, and are useful space-saving replacements for sparse arrays. Because this type of data structure can perform longest-prefix matching, they're particularly useful in applications where a single value is assigned to most of a large range of keys with a common prefix except for a few exceptions, such as in routing tables.
String-keyed maps can avoid extra comparisons during lookups by using tries.