Movatterモバイル変換

[0]ホーム

Jump to content

Array (data structure)

Edit links

From Wikipedia, the free encyclopedia

Type of data structure

This article is about the byte-layout-level structure. For the abstract data type, seeArray (data type).

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Array" data structure – news ·newspapers ·books ·scholar ·JSTOR(September 2008) (Learn how and when to remove this message)

Incomputer science, anarray is adata structure consisting of a collection ofelements (values orvariables), of same memory size, each identified by at least onearray index orkey, a collection of which may be atuple, known as an index tuple. In general, an array is a mutable and linear collection of elements with the same data type. An array is stored such that the position (memory address) of each element can be computed from its index tuple by a mathematical formula.^[1]^[2]^[3] The simplest type of data structure is a linear array, also called a one-dimensional array.

For example, an array of ten32-bit (4-byte) integer variables, with indices 0 through 9, may be stored as tenwords at memory addresses 2000, 2004, 2008, ..., 2036, (inhexadecimal:0x7D0,0x7D4,0x7D8, ...,0x7F4) so that the element with indexi has the address 2000 + (i × 4).^[4]The memory address of the first element of an array is called first address, foundation address, or base address.

Because the mathematical concept of amatrix can be represented as a two-dimensional grid, two-dimensional arrays are also sometimes called "matrices". In some cases the term "vector" is used in computing to refer to an array, althoughtuples rather thanvectors are the more mathematically correct equivalent.Tables are often implemented in the form of arrays, especiallylookup tables; the word "table" is sometimes used as a synonym of array.

Arrays are among the oldest and most important data structures, and are used by almost every program. They are also used to implement many other data structures, such aslists andstrings. They effectively exploit the addressing logic of computers. In most modern computers and manyexternal storage devices, the memory is a one-dimensional array of words, whose indices are their addresses.Processors, especiallyvector processors, are often optimized for array operations.

Arrays are useful mostly because the element indices can be computed atrun time. Among other things, this feature allows a single iterativestatement to process arbitrarily many elements of an array. For that reason, the elements of an array data structure are required to have the same size and should use the same data representation. The set of valid index tuples and the addresses of the elements (and hence the element addressing formula) are usually,^[3]^[5] but not always,^[2] fixed while the array is in use.

The term "array" may also refer to anarray data type, a kind ofdata type provided by mosthigh-level programming languages that consists of a collection of values or variables that can be selected by one or more indices computed at run-time. Array types are often implemented by array structures; however, in some languages they may be implemented byhash tables,linked lists,search trees, or other data structures.

The term is also used, especially in the description ofalgorithms, to meanassociative array or "abstract array", atheoretical computer science model (anabstract data type or ADT) intended to capture the essential properties of arrays.

History

[edit]

The first digital computers used machine-language programming to set up and access array structures for data tables, vector and matrix computations, and for many other purposes.John von Neumann wrote the first array-sorting program (merge sort) in 1945, during the building of thefirst stored-program computer.^[6] Array indexing was originally done byself-modifying code, and later usingindex registers andindirect addressing. Some mainframes designed in the 1960s, such as theBurroughs B5000 and its successors, usedmemory segmentation to perform index-bounds checking in hardware.^[7]

Assembly languages generally have no special support for arrays, other than what the machine itself provides. The earliest high-level programming languages, includingFORTRAN (1957),Lisp (1958),COBOL (1960), andALGOL 60 (1960), had support for multi-dimensional arrays, and so hasC (1972). InC++ (1983), class templates exist for multi-dimensional arrays whose dimension is fixed at runtime^[3]^[5] as well as for runtime-flexible arrays.^[2]

Applications

[edit]

Arrays are used to implement mathematicalvectors andmatrices, as well as other kinds of rectangular tables. Manydatabases, small and large, consist of (or include) one-dimensional arrays whose elements arerecords.

Arrays are used to implement other data structures, such as lists,heaps,hash tables,deques,queues,stacks,strings, and VLists. Array-based implementations of other data structures are frequently simple and space-efficient (implicit data structures), requiring little spaceoverhead, but may have poor space complexity, particularly when modified, compared to tree-based data structures (compare asorted array to asearch tree).

One or more large arrays are sometimes used to emulate in-programdynamic memory allocation, particularlymemory pool allocation. Historically, this has sometimes been the only way to allocate "dynamic memory" portably.

Arrays can be used to determine partial or completecontrol flow in programs, as a compact alternative to (otherwise repetitive) multipleIF statements. In this context, they are known ascontrol tables and are used in conjunction with a purpose-built interpreter whosecontrol flow is altered according to values contained in the array. The array may containsubroutine pointers (or relative subroutine numbers that can be acted upon bySWITCH statements) that direct the path of the execution of the program.

Element identifier and addressing formulas

[edit]

When data objects are stored in an array, individual objects are selected by an index that is usually a non-negativescalar integer. Indexes are also called subscripts. An indexmaps the array value to a stored object.

There are three ways in which the elements of an array can be indexed:

0 (zero-based indexing): The first element of the array is indexed by subscript of 0.^[8]
1 (one-based indexing): The first element of the array is indexed by subscript of 1.
n (n-based indexing): The base index of an array can be freely chosen. Usually programming languages allowingn-based indexing also allow negative index values and otherscalar data types likeenumerations, orcharacters may be used as an array index.

Using zero based indexing is the design choice of many influential programming languages, includingC,Java andLisp. This leads to simpler implementation where the subscript refers to an offset from the starting position of an array, so the first element has an offset of zero.

Arrays can have multiple dimensions, thus it is not uncommon to access an array using multiple indices. For example, a two-dimensional arrayA with three rows and four columns might provide access to the element at the 2nd row and 4th column by the expressionA[1][3] in the case of a zero-based indexing system. Thus two indices are used for a two-dimensional array, three for a three-dimensional array, andn for ann-dimensional array.

The number of indices needed to specify an element is called the dimension, dimensionality, orrank of the array.

In standard arrays, each index is restricted to a certain range of consecutive integers (or consecutive values of someenumerated type), and the address of an element is computed by a "linear" formula on the indices.

One-dimensional arrays

[edit]

A one-dimensional array (or single dimension array) is a type of linear array. Accessing its elements involves a single subscript which can either represent a row or column index.

As an example consider the C declarationinta[10]; which declares a one-dimensional array nameda of ten integers. Here, the array can store ten elements of typeint . This array has indices starting from zero through nine. For example, the expressionsa[0] anda[9] are the first and last elements respectively.

For a vector with linear addressing, the element with indexi is located at the addressB +c ·i, whereB is a fixedbase address andc a fixed constant, sometimes called theaddress increment orstride.

If the valid element indices begin at 0, the constantB is simply the address of the first element of the array. For this reason, theC programming language specifies that array indices always begin at 0; and many programmers will call that element "zeroth" rather than "first".

However, one can choose the index of the first element by an appropriate choice of the base addressB. For example, if the array has five elements, indexed 1 through 5, and the base addressB is replaced byB + 30c, then the indices of those same elements will be 31 to 35. If the numbering does not start at 0, the constantB may not be the address of any element.

Multidimensional arrays

[edit]

For a multidimensional array, the element with indicesi,j would have addressB +c ·i +d ·j, where the coefficientsc andd are therow andcolumn address increments, respectively.

More generally, in ak-dimensional array, the address of an element with indicesi₁,i₂, ...,i_k is

B +c₁ ·i₁ +c₂ ·i₂ + … +c_k ·i_k.

For example: int a[2][3];

This means that array a has 2 rows and 3 columns, and the array is of integer type. Here we can store 6 elements they will be stored linearly but starting from first row linear then continuing with second row. The above array will be stored as a₁₁, a₁₂, a₁₃, a₂₁, a₂₂, a₂₃.

This formula requires onlyk multiplications andk additions, for any array that can fit in memory. Moreover, if any coefficient is a fixed power of 2, the multiplication can be replaced bybit shifting.

The coefficientsc_k must be chosen so that every valid index tuple maps to the address of a distinct element.

If the minimum legal value for every index is 0, thenB is the address of the element whose indices are all zero. As in the one-dimensional case, the element indices may be changed by changing the base addressB. Thus, if a two-dimensional array has rows and columns indexed from 1 to 10 and 1 to 20, respectively, then replacingB byB +c₁ − 3c₂ will cause them to be renumbered from 0 through 9 and 4 through 23, respectively. Taking advantage of this feature, some languages (like FORTRAN 77) specify that array indices begin at 1, as in mathematical tradition while other languages (like Fortran 90, Pascal and Algol) let the user choose the minimum value for each index.

Dope vectors

[edit]

Main article:Dope vector

The addressing formula is completely defined by the dimensiond, the base addressB, and the incrementsc₁,c₂, ...,c_k. It is often useful to pack these parameters into a record called the array's descriptor, stride vector, ordope vector.^[2]^[3] The size of each element, and the minimum and maximum values allowed for each index may also be included in the dope vector. The dope vector is a completehandle for the array, and is a convenient way to pass arrays as arguments toprocedures. Many usefularray slicing operations (such as selecting a sub-array, swapping indices, or reversing the direction of the indices) can be performed very efficiently by manipulating the dope vector.^[2]

Compact layouts

[edit]

Main article:Row- and column-major order

Often the coefficients are chosen so that the elements occupy a contiguous area of memory. However, that is not necessary. Even if arrays are always created with contiguous elements, some array slicing operations may create non-contiguous sub-arrays from them.

Illustration of row- and column-major order

There are two systematic compact layouts for a two-dimensional array. For example, consider the matrix

A={\begin{bmatrix}1&2&3\\4&5&6\\7&8&9\end{bmatrix}}.

In the row-major order layout (adopted by C for statically declared arrays), the elements in each row are stored in consecutive positions and all of the elements of a row have a lower address than any of the elements of a consecutive row:

In column-major order (traditionally used by Fortran), the elements in each column are consecutive in memory and all of the elements of a column have a lower address than any of the elements of a consecutive column:

For arrays with three or more indices, "row major order" puts in consecutive positions any two elements whose index tuples differ only by one in thelast index. "Column major order" is analogous with respect to thefirst index.

In systems which useprocessor cache orvirtual memory, scanning an array is much faster if successive elements are stored in consecutive positions in memory, rather than sparsely scattered. This is known as spatial locality, which is a type oflocality of reference. Many algorithms that use multidimensional arrays will scan them in a predictable order. A programmer (or a sophisticated compiler) may use this information to choose between row- or column-major layout for each array. For example, when computing the productA·B of two matrices, it would be best to haveA stored in row-major order, andB in column-major order.

Resizing

[edit]

Main article:Dynamic array

Static arrays have a size that is fixed when they are created and consequently do not allow elements to be inserted or removed. However, by allocating a new array and copying the contents of the old array to it, it is possible to effectively implement adynamic version of an array; seedynamic array. If this operation is done infrequently, insertions at the end of the array require only amortized constant time.

Some array data structures do not reallocate storage, but do store a count of the number of elements of the array in use, called the count or size. This effectively makes the array adynamic array with a fixed maximum size or capacity;Pascal strings are examples of this.

Non-linear formulas

[edit]

More complicated (non-linear) formulas are occasionally used. For a compact two-dimensionaltriangular array, for instance, the addressing formula is a polynomial of degree 2.

Efficiency

[edit]

Bothstore andselect take (deterministic worst case)constant time. Arrays take linear (O(n)) space in the number of elementsn that they hold.

In an array with element sizek and on a machine with a cache line size of B bytes, iterating through an array ofn elements requires the minimum of ceiling(nk/B) cache misses, because its elements occupy contiguous memory locations. This is roughly a factor of B/k better than the number of cache misses needed to accessn elements at random memory locations. As a consequence, sequential iteration over an array is noticeably faster in practice than iteration over many other data structures, a property calledlocality of reference (this doesnot mean however, that using aperfect hash ortrivial hash within the same (local) array, will not be even faster - and achievable inconstant time). Libraries provide low-level optimized facilities for copying ranges of memory (such asmemcpy) which can be used to movecontiguous blocks of array elements significantly faster than can be achieved through individual element access. The speedup of such optimized routines varies by array element size, architecture, and implementation.

Memory-wise, arrays are compact data structures with no per-elementoverhead. There may be a per-array overhead (e.g., to store index bounds) but this is language-dependent. It can also happen that elements stored in an array requireless memory than the same elements stored in individual variables, because several array elements can be stored in a singleword; such arrays are often calledpacked arrays. An extreme (but commonly used) case is thebit array, where every bit represents a single element. A singleoctet can thus hold up to 256 different combinations of up to 8 different conditions, in the most compact form.

Array accesses with statically predictable access patterns are a major source ofdata parallelism.

Comparison with other data structures

[edit]

Comparison of list data structures
	Peek (index)	Mutate (insert or delete) at …			Excess space, average
	Peek (index)	Beginning	End	Middle	Excess space, average
Linked list	Θ(n)	Θ(1)	Θ(1), known end element; Θ(n), unknown end element	Θ(n)	Θ(n)
Array	Θ(1)	—	—	—	0
Dynamic array	Θ(1)	Θ(n)	Θ(1)amortized	Θ(n)	Θ(n)^[9]
Balanced tree	Θ(log n)	Θ(log n)	Θ(logn)	Θ(logn)	Θ(n)
Random-access list	Θ(log n)^[10]	Θ(1)	—^[10]	—^[10]	Θ(n)
Hashed array tree	Θ(1)	Θ(n)	Θ(1)amortized	Θ(n)	Θ(√n)

Dynamic arrays or growable arrays are similar to arrays but add the ability to insert and delete elements; adding and deleting at the end is particularly efficient. However, they reserve linear (Θ(n)) additional storage, whereas arrays do not reserve additional storage.

Associative arrays provide a mechanism for array-like functionality without huge storage overheads when the index values are sparse. For example, an array that contains values only at indexes 1 and 2 billion may benefit from using such a structure. Specialized associative arrays with integer keys includePatricia tries,Judy arrays, andvan Emde Boas trees.

Balanced trees require O(logn) time for indexed access, but also permit inserting or deleting elements in O(logn) time,^[11] whereas growable arrays require linear (Θ(n)) time to insert or delete elements at an arbitrary position.

Linked lists allow constant time removal and insertion in the middle but take linear time for indexed access. Their memory use is typically worse than arrays, but is still linear.

AnIliffe vector is an alternative to a multidimensional array structure. It uses a one-dimensional array ofreferences to arrays of one dimension less. For two dimensions, in particular, this alternative structure would be a vector of pointers to vectors, one for each row(pointer on c or c++). Thus an element in rowi and columnj of an arrayA would be accessed by double indexing (A[i][j] in typical notation). This alternative structure allowsjagged arrays, where each row may have a different size—or, in general, where the valid range of each index depends on the values of all preceding indices. It also saves one multiplication (by the column address increment) replacing it by a bit shift (to index the vector of row pointers) and one extra memory access (fetching the row address), which may be worthwhile in some architectures.

Dimension

[edit]

Thedimension of an array is the number of indices needed to select an element. Thus, if the array is seen as a function on a set of possible index combinations, it is the dimension of the space of which its domain is a discrete subset. Thus a one-dimensional array is a list of data, a two-dimensional array is a rectangle of data,^[12] a three-dimensional array a block of data, etc.

This should not be confused with the dimension of the set of all matrices with a given domain, that is, the number of elements in the array. For example, an array with 5 rows and 4 columns is two-dimensional, but such matrices form a 20-dimensional space. Similarly, a three-dimensional vector can be represented by a one-dimensional array of size three.

References

[edit]

^Black, Paul E. (13 November 2008)."array".Dictionary of Algorithms and Data Structures.National Institute of Standards and Technology. Retrieved22 August 2010.
^^a ^b ^c ^d ^eBjoern Andres; Ullrich Koethe; Thorben Kroeger; Hamprecht (2010). "Runtime-Flexible Multi-dimensional Arrays and Views for C++98 and C++0x".arXiv:1008.2909 [cs.DS].
^^a ^b ^c ^dGarcia, Ronald; Lumsdaine, Andrew (2005). "MultiArray: a C++ library for generic programming with arrays".Software: Practice and Experience.35 (2):159–188.doi:10.1002/spe.630.ISSN 0038-0644.S2CID 10890293.
^David R. Richardson (2002), The Book on Data Structures. iUniverse, 1112 pages.ISBN 0-595-24039-9,ISBN 978-0-595-24039-5.
^^a ^bVeldhuizen, Todd L. (December 1998).Arrays in Blitz++. Computing in Object-Oriented Parallel Environments. Lecture Notes in Computer Science. Vol. 1505. Berlin: Springer. pp. 223–230.doi:10.1007/3-540-49372-7_24.ISBN 978-3-540-65387-5.^{[dead link]}
^Knuth, Donald (1998).Sorting and Searching.The Art of Computer Programming. Vol. 3. Reading, MA: Addison-Wesley Professional. p. 159.
^Levy, Henry M. (1984),Capability-based Computer Systems, Digital Press, p. 22,ISBN 9780932376220.
^"Array Code Examples - PHP Array Functions - PHP code". Computer Programming Web programming Tips. Archived fromthe original on 13 April 2011. Retrieved8 April 2011.In most computer languages array index (counting) starts from 0, not from 1. Index of the first element of the array is 0, index of the second element of the array is 1, and so on. In array of names below you can see indexes and values.
^Brodnik, Andrej; Carlsson, Svante;Sedgewick, Robert; Munro, JI; Demaine, ED (1999),Resizable Arrays in Optimal Time and Space (Technical Report CS-99-09)(PDF), Department of Computer Science, University of Waterloo
^^a ^b ^cChris Okasaki (1995). "Purely Functional Random-Access Lists".Proceedings of the Seventh International Conference on Functional Programming Languages and Computer Architecture:86–95.doi:10.1145/224164.224187.
^"Counted B-Trees".
^"Two-Dimensional Arrays \ Processing.org".processing.org. Retrieved1 May 2020.

External links

[edit]

Wikimedia Commons has media related toArray data structure.

Look uparray in Wiktionary, the free dictionary.

Data Structures/Arrays at Wikibooks

v t e Data structures
Types	Collection Container
Abstract	Associative array Multimap Retrieval Data Structure List Stack Queue Double-ended queue Priority queue Double-ended priority queue Set Multiset Disjoint-set
Arrays	Bit array Circular buffer Dynamic array Hash table Hashed array tree Sparse matrix
Linked	Association list Linked list Skip list Unrolled linked list XOR linked list
Trees	B-tree Binary search tree AA tree AVL tree Red–black tree Self-balancing tree Splay tree Heap Binary heap Binomial heap Fibonacci heap R-tree R* tree R+ tree Hilbert R-tree Rope Trie Hash tree
Graphs	Binary decision diagram Directed acyclic graph Directed acyclic word graph
List of data structures

v t e Parallel computing
General	Distributed computing Parallel computing Parallel algorithm Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing