Monday, August 27, 2007

An associative array (also map, hash, dictionary, finite map, lookup table, and in query-processing an index or index file) is an abstract data type composed of a collection of keys and a collection of values, where each key is associated with one value. The operation of finding the value associated with a key is called a lookup or indexing, and this is the most important operation supported by an associative array. The relationship between a key and its value is sometimes called a mapping or binding. For example, if the value associated with the key "bob" is 7, we say that our array maps "bob" to 7. Associative arrays are very closely related to the mathematical concept of a function with a finite domain. As a consequence, a common and important use of associative arrays is in memoization.
From the perspective of a programmer using an associative array, it can be viewed as a generalization of an array: While a regular array maps integers to arbitrarily typed objects (integers, strings, pointers, and, in an OO sense, objects), an associative array maps arbitrarily typed objects to arbitrarily typed objects. (Implementations of the two data structures, though, may be considerably different.)
The operations that are usually defined for an associative array are:

Add: Bind a new key to a new value
Reassign: Bind an old key to a new value
Remove: Unbind a key from a value and remove the key from the key set
Lookup: Find the value (if any) that is bound to a key Examples
Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures (such as association lists).

Data structures for associative arrays
There are two main efficient data structures used to represent associative arrays, the hash table and the self-balancing binary search tree. Skip lists are also an alternative, though relatively new and not as widely used. Relative advantages and disadvantages include:

Hash tables have faster average lookup and insertion time (O(1)), while some kinds of binary search tree have faster worst-case lookup and insertion time (O(log n) instead of O(n)). Hash tables have seen extensive use in realtime systems, but trees can be useful in high-security realtime systems where untrusted users may deliberately supply information that triggers worst-case performance in a hash table, although careful design can remove that issue. Hash tables shine in very large arrays, where O(1) performance is important. Skip lists have worst-case operation time of O(n), but average-case of O(log n), with much less insertion and deletion overhead than balanced binary trees.
Hash tables can have more compact storage for small value types, especially when the values are bits.
There are simple persistent versions of balanced binary trees, which are especially prominent in functional languages.
Building a hash table requires a reasonable hash function for the key type, which can be difficult to write well, while balanced binary trees and skip lists only require a total ordering on the keys. On the other hand, with hash tables the data may be cyclically or partially ordered without any problems.
Balanced binary trees and skip lists preserve ordering — allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently.
Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys, or to count the number of keys in an ordered range. Efficient representations
A simple but generally inefficient type of associative array is an association list, often called an alist for short, which simply stores a linked list of key-value pairs. Each lookup does a linear search through the list looking for a key match.
Strong advantages of association lists include:

No knowledge is needed about the keys, such as an order or a hash function.
For small associative arrays, common in some applications, association lists can take less time and space than other data structures.
Insertions are done in constant time by cons'ing the new association to the head of the list. Association lists
If the keys have a specific type, one can often use specialized data structures to gain performance. For example, integer-keyed maps can be implemented using Patricia trees or Judy arrays, and are useful space-saving replacements for sparse arrays. Because this type of data structure can perform longest-prefix matching, they're particularly useful in applications where a single value is assigned to most of a large range of keys with a common prefix except for a few exceptions, such as in routing tables.
String-keyed maps can avoid extra comparisons during lookups by using tries.

Specialized representations

Main article: Multimap (data structure) Multimap
Associative arrays can be implemented in any programming language as a package and many language systems provide them as part of their standard library. In some languages, they are not only built into the standard system, but have special syntax, often array-like subscripting.
Built-in syntactic support for associative arrays was introduced by Snobol4, under the name "table". MUMPS made multi-dimensional associative arrays, optionally persistent, its key data structure. SETL supported them as one possible implementation of sets and maps. Most modern scripting languages, starting with awk and including Perl, tcl, Javascript, Python, and Ruby, support associative arrays as their primary array type.
In many more languages, they are available as library functions without special syntax.
Associative arrays have a variety of names. In Smalltalk, Objective-C and Python they are called dictionaries; in Perl and Ruby they are called hashes; in C++ and Java they are called maps (see Map) and in Common Lisp and Windows PowerShell they are called hashtables (since both typically use this implementation). In PHP all arrays can be associative, except that the keys are limited to integers and strings and can only be a single level of subscripts.
In the scripting language Lua, associative arrays, called tables, are used as the primitive building block for all data structures, even arrays. Likewise, in JavaScript, all objects are associative arrays. In MUMPS, the associative arrays are typically stored as B-trees.

Language support
Awk has built-in, language-level support for associative arrays.
For example:
You can also loop through an associated array as follows:
You can also check if an element is in the associative array, and delete elements from an associative array.
Multi-dimensional associative arrays can be implemented in standard Awk using concatenation and e.g. SUBSEP:
Thompson AWK [1] provides built-in multi-dimensional associative arrays:

There is no standard implementation of an associative array in C, but a 3rd party library with BSD license is available here. POSIX 1003.1-2001 describes the functions hcreate(), hdestroy() and hsearch().
Another 3rd party library, uthash, also creates associative arrays from C structures. A structure represents a value, and one of the structure fields acts as the key.
Finally, the Glib library also supports associative arrays, along with many other advanced data types and is the recommended implementation of the GNU Project.

You can use a ColdFusion structure to perform as an associative array. Here is a sample in ColdFusion:

There is no standard implementation common to all dialects. Visual Basic can use the Dictionary class from the Microsoft Scripting Runtime (which is shipped with Visual Basic 6):
Visual Basic .NET relies on the collection classes provided by .NET Framework:

In the above sample the Hashtable class is only capable of associating a String key with a value of Object type. Because in .NET all types (save pointers) ultimately derive from Object anything can be put into a Hashtable, even data of different types. This could lead to errors if consuming code expects data to be of a singular type. In the above code casting is required to convert the Object variables back to their original type. Additionally, casting value-types (structures such as integers) to Object to put into the Hashtable and casting them back requires boxing/unboxing which incurs both a slight performance penalty and pollutes the heap with garbage. This changes in C# 2.0 with generic hashtables called dictionaries. There are significant performance and reliability gains to these strongly typed collections because they do not require boxing/unboxing or explicit type casts and introduce compile-time type checks.

C++ also has a form of associative array called std::map (see Standard Template Library#Containers). One could create a map with the same information as above using C++ with the following code:
You can iterate through the list with the following code:
In C++, the std::map class is templated which allows the data types of keys and values to be different for different map instances. For a given instance of the map class the keys must be of the same base type. The same must be true for all of the values. Although std::map is typically implemented using a self-balancing binary search tree, the SGI STL also provides a std::hash_map which has the algorithmic benefits of a hash table.

Cocoa (API) and GNUstep handle associative arrays using NSMutableDictionary (a mutable version of NSDictionary) class cluster. This class allows assignments between any two objects to be made. A copy of key object is made before it is being inserted into NSMutableDictionary, therefore the keys must conform to the NSCopying protocol. When being inserted to a dictionary, the value object receives a retain message to increase its reference count. The value object will receive the release message when it will be deleted from the dictionary (both explicitly or by adding to the dictionary a different object with the same key).
To access assigned objects this command may be used:
All keys or values can be simply enumerated using NSEnumerator
What is even more practical, structured data graphs may be easily created using Cocoa (API), especially NSDictionary (NSMutableDictionary). This can be ilustrated with this compact example:
And relevant fields can be quickly accessed using key paths:

Cocoa/GNUstep (Objective-C)
D offers direct support for associative arrays in the core language - they are implemented as trees . The equivalent example would be:
Keys and values can be any types, but all the keys in an associative array must be of the same type, and the same for values.
You can also loop through all properties and associated values, i.e. as follows:
A property can be removed as follows:

Delphi does not offer direct support for associative arrays. However, you can simulate associative arrays using TStrings object. Here's an example:

In Java associative arrays are implemented as "maps"; they are part of the Java Collections Framework. Since J2SE 5.0 and the introduction of generics into Java, collections can have a type specified; for example, an associative array mapping strings to strings might be specified as follows:
The get method is used to access a key; for example, the value of the expression phoneBook.get("Sally Smart") is "555-9999".
This code uses a hash map to store the associative array, by calling the constructor of the HashMap class; however, since the code only uses methods common to the interface Map, one could also use a self-balancing binary tree by calling the constructor of the TreeMap class (which implements the subinterface SortedMap, without changing the definition of the phone_book variable or the rest of the code, or use a number of other underlying data structures that implement the Map interface.
The hash function in Java is provided by the method Object.hashCode(). Since every class in Java inherits from Object, every object has a hash function. A class can override the default implementation of hashCode() to provide a custom hash function based on the properties of the object.
The Object class also contains the method equals(Object) that tests the object for equality with another object. Maps in Java rely on objects maintaining the following contract between their hashCode() and equals() methods:
In order to maintain this contract, a class that overrides equals() must also override hashCode(), and vice versa, so that hashCode() is based on the same properties (or a subset of the properties) as equals().
A further contract that the map has with the object is that the results of the hashCode() and equals() methods will not change once the object has been inserted into the map. For this reason, it is generally a good practice to base the hash function on immutable properties of the object.

JavaScript (and its standardized version: ECMAScript) is a prototype-based object-oriented language. In JavaScript an object is a mapping from property names to values -- that is, an associative array with one caveat: since property names are strings, only string keys are allowed. Other than that difference, objects also include one feature unrelated to associative arrays: a prototype link to the object they inherit from. Doing a lookup for a property will forward the lookup to the prototype if the object does not define the property itself.
An object literal is written as { property1 : value1, property2 : value2, ... }. For example:
If the property name is a valid identifier, the quotes can be omitted, e.g.:
Lookup is written using property access notation, either square brackets, which always works, or dot notation, which only works for identifier keys:
You can also loop through all properties and associated values as follows:
A property can be removed as follows:
As mentioned before, properties are strings. However, since every native object and primitive can be implicitly converted to a string, you can do:
Any object, including built-in objects such as Array, can be dynamically extended with new properties. For example:
In modern JavaScript it's considered bad form to use the Array type as an associative array. Consensus is that the Object type is best for this purpose. The reasoning behind this is that if Array is extended via prototype and Object is kept pristine, 'for(in)' loops will work as expected on associative 'arrays'. This issue has been drawn into focus by the popularity of JavaScript frameworks that make heavy and sometimes indiscriminate use of prototype to extend JavaScript's inbuilt types.
See JavaScript Array And Object Prototype Awareness Day for more information on the issue.

Note: The latest version of bash, 3.2, doesn't support associative arrays properly yet.

KornShell 93 (and compliant shells: ksh93, zsh, ...)
Lisp was originally conceived as a "LISt Processing" language, and one of its most important data types is the linked list, which can be treated as an association list ("alist").
The syntax (x . y) is used to indicate a consed pair. Keys and values need not be the same type within an alist. Lisp and Scheme provide operators such as assoc to manipulate alists in ways similar to associative arrays.
Because of their linear nature, alists are used for relatively small sets of data. Common Lisp also supports a hash table data type, and for Scheme they are implemented in SRFI 69. Hash tables have greater overhead than alists, but provide much faster access when there are many elements.
It is easy to construct composite abstract data types in Lisp, using structures and/or the object-oriented programming features, in conjunction with lists, arrays, and hash tables.

In Lua, table is a fundamental type that can be used either as array (numerical index, fast) or as associative array. The keys and values can be of any type, except nil. The following with focus on non-numerical indexes.
A table literal is written as { value, key = value, [index] = value, ["non id string"] = value }. For example:
If the key is a valid identifier (not a keyword), the quotes can be omitted. They are case sensitive.
Lookup is written using either square brackets, which always works, or dot notation, which only works for identifier keys:
You can also loop through all keys and associated values with iterators or for loops:
An entry can be removed by setting it to nil:
Likewise, you can overwrite values or add them:

In MUMPS every array is an associative array. The built-in, language-level, direct support for associative arrays applies to private, process-specific arrays stored in memory called "locals" as well as to the permanent, shared arrays stored on disk which are available concurrently by multiple jobs. The name for globals is preceded by the circumflex "^" to distinquish it from local variable names.
To access the value of an element, simply requires using the name with the subscript:
You can also loop through an associated array as follows:

The OCaml programming language provides three different associative containers. The simplest is a list of pairs:
The second is a polymorphic hash table:
Finally, functional maps (represented as immutable balanced binary trees):
Lists of pairs and functional maps both provide a purely functional interface. In contrast, hash tables provide an imperative interface. For many operations, hash tables are significantly faster than lists of pairs and functional maps.

Perl has built-in, language-level support for associative arrays. Modern Perl vernacular refers to associative arrays as hashes; the term associative array is found in older documentation, but is considered somewhat archaic. Perl hashes are flat: keys are strings and values are scalars. However, values may be references to arrays or other hashes.
A hash variable is marked by a % sigil, to distinguish it from scalar, array and other data types. A hash can be initialized from a key-value list:
Perl offers the => syntax, semantically (almost) equivalent to the comma, to make the key-value association more visible:
Accessing a hash element uses the syntax $hash_name{$key} - the key is surrounded by curly braces and the hash name is prefixed by a $, indicating that the hash element itself is a scalar value, even though it is part of a hash. The value of $phone_book{"John Doe"} is "555-1212". The % sigil is only used when referring to the hash as a whole, such as when asking for keys %phone_book.
The list of keys and values can be extracted using the built-in functions keys and values, respectively. So, for example, to print all the keys of a hash:
One can iterate through (key, value) pairs using the each function:

Associative array Perl
PHP's built-in array type is in reality an associative array. Even when using numerical indexes, PHP internally stores it as an associative array.This is why one in PHP can have non-consecutive numerically indexed arrays.
An associative array can be formed in one of two ways:
You can also loop through an associative array as follows:
PHP has an extensive set of functions to operate on arrays.

Pike has built-in support for Associative Arrays, which are referred to as mappings. Mappings are created as follows:
Accessing and testing for presence in mappings is done using the indexing operator. So phonebook["Sally Smart"] would return the string "555-9999", and phonebook["John Smith"] would return 0.
Iterating through a mapping can be done using either foreach:
Or using an iterator object:
Elements of a mapping can be removed using m_delete, which returns the value of the removed index:

In Python, associative arrays are called dictionaries. Dictionary literals are marked with curly braces:
To access an entry in Python simply use the array indexing operator. For example, the expression phonebook['Sally Smart'] would return '555-9999'.
An example loop iterating through all the keys of the dictionary:
Iterating through (key, value) tuples:
Dictionaries can also be constructed with the dict builtin, which is most commonly found inside list comprehensions and generator expressions, and it takes a key-value list:
Dictionary keys can be individually deleted using the del statement. The corresponding value can be returned before the key-value pair are deleted using the pop method of dict types;

In REXX, associative arrays are called Stem variables or Compound variables.
Stem variables with numeric keys typically start at 1 and go up from there. The 0 key stem variable is used (by convention) as the count of items in the whole stem.
REXX has no easy way of automatically accessing the keys for a stem variable and typically the keys are stored in a separate associative array with numeric keys.

In Ruby a Hash is used as follows:
phonebook['John Doe'] produces '555-1212'
To iterate over the hash, use something like the following:
Additionally, each key may be shown individually:
Each value may, of course, also be shown:

In Smalltalk a dictionary is used:
To access an entry the message #at: is sent to the dictionary object.

Unlike many other command line interpreters, PowerShell has built-in, language-level support for defining associative arrays.
For example:
Like in JavaScript, if the property name is a valid identifier, the quotes can be omitted, e.g.:
It is also possible to create an empty associative array and add single entries or even other associative arrays to it later on.
New entries can also be added by using the array index operator, the property operator or the Add() method of the underlying .NET object:
To dereference assigned objects the array index operator, the property operator or the parameterized property Item() of the .NET object can be used:
You can loop through an associative array as follows:
An entry can be removed using the Remove() method of the the underlying .NET object:

No comments: