Movatterモバイル変換

[0]ホーム

Jump to content

Set (abstract data type)

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromSet (computer science))

Abstract data type for storing unique values

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Set" abstract data type – news ·newspapers ·books ·scholar ·JSTOR(October 2011) (Learn how and when to remove this message)

Incomputer science, aset is anabstract data type that can store unique values, without any particularorder. It is a computer implementation of themathematical concept of afinite set. Unlike most othercollection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.

Some set data structures are designed forstatic orfrozen sets that do not change after they are constructed. Static sets allow only query operations on their elements — such as checking whether a given value is in the set, or enumerating the values in some arbitrary order. Other variants, calleddynamic ormutable sets, allow also the insertion and deletion of elements from the set.

Amultiset is a special kind of set in which an element can appear multiple times in the set.

Type theory

[edit]

Intype theory, sets are generally identified with theirindicator function (characteristic function): accordingly, a set of values of type $A {\displaystyle A}$ may be denoted by $2^{A}$ or ${\mathcal {P}}(A)$ . (Subtypes and subsets may be modeled byrefinement types, andquotient sets may be replaced bysetoids.) The characteristic function $F {\displaystyle F}$ of a set $S {\displaystyle S}$ is defined as:

F(x)={\begin{cases}1,&{\mbox{if }}x\in S\\0,&{\mbox{if }}x\not \in S\end{cases}}

In theory, many other abstract data structures can be viewed as set structures with additional operations and/or additionalaxioms imposed on the standard operations. For example, an abstractheap can be viewed as a set structure with amin(S) operation that returns the element of smallest value.

Operations

[edit]

Core set-theoretical operations

[edit]

One may define the operations of thealgebra of sets:

union(S,T): returns theunion of setsS andT.
intersection(S,T): returns theintersection of setsS andT.
difference(S,T): returns thedifference of setsS andT.
subset(S,T): a predicate that tests whether the setS is asubset of setT.

Static sets

[edit]

Typical operations that may be provided by a static set structureS are:

is_element_of(x,S): checks whether the valuex is in the setS.
is_empty(S): checks whether the setS is empty.
size(S) orcardinality(S): returns the number of elements inS.
iterate(S): returns a function that returns one more value ofS at each call, in some arbitrary order.
enumerate(S): returns a list containing the elements ofS in some arbitrary order.
build(x₁,x₂,…,x_n,): creates a set structure with valuesx₁,x₂,...,x_n.
create_from(collection): creates a new set structure containing all the elements of the givencollection or all the elements returned by the giveniterator.

Dynamic sets

[edit]

Dynamic set structures typically add:

create(): creates a new, initially empty set structure.
- create_with_capacity(n): creates a new set structure, initially empty but capable of holding up ton elements.
add(S,x): adds the elementx toS, if it is not present already.
remove(S,x): removes the elementx fromS, if it is present.
capacity(S): returns the maximum number of values thatS can hold.

Some set structures may allow only some of these operations. The cost of each operation will depend on the implementation, and possibly also on the particular values stored in the set, and the order in which they are inserted.

Additional operations

[edit]

There are many other operations that can (in principle) be defined in terms of the above, such as:

pop(S): returns an arbitrary element ofS, deleting it fromS.^[1]
pick(S): returns an arbitrary element ofS.^[2]^[3]^[4] Functionally, the mutatorpop can be interpreted as the pair of selectors(pick, rest), whererest returns the set consisting of all elements except for the arbitrary element.^[5] Can be interpreted in terms ofiterate.^[a]
map(F,S): returns the set of distinct values resulting from applying functionF to each element ofS.
filter(P,S): returns the subset containing all elements ofS that satisfy a givenpredicateP.
fold(A₀,F,S): returns the valueA_|S| after applyingA_i+1 :=F(A_i,e) for each elemente ofS, for some binary operationF.F must be associative and commutative for this to be well-defined.
clear(S): delete all elements ofS.
equal(S₁',S₂'): checks whether the two given sets are equal (i.e. contain all and only the same elements).
hash(S): returns ahash value for the static setS such that ifequal(S₁,S₂) thenhash(S₁) = hash(S₂)

Other operations can be defined for sets with elements of a special type:

sum(S): returns the sum of all elements ofS for some definition of "sum". For example, over integers or reals, it may be defined asfold(0, add,S).
collapse(S): given a set of sets, return the union.^[6] For example,collapse({{1}, {2, 3}}) == {1, 2, 3}. May be considered a kind ofsum.
flatten(S): given a set consisting of sets and atomic elements (elements that are not sets), returns a set whose elements are the atomic elements of the original top-level set or elements of the sets it contains. In other words, remove a level of nesting – likecollapse, but allow atoms. This can be done a single time, or recursively flattening to obtain a set of only atomic elements.^[7] For example,flatten({1, {2, 3}}) == {1, 2, 3}.
nearest(S,x): returns the element ofS that is closest in value tox (by somemetric).
min(S),max(S): returns the minimum/maximum element ofS.

Implementations

[edit]

Sets can be implemented using variousdata structures, which provide different time and space trade-offs for various operations. Some implementations are designed to improve the efficiency of very specialized operations, such asnearest orunion. Implementations described as "general use" typically strive to optimize theelement_of,add, anddelete operations. A simple implementation is to use alist, ignoring the order of the elements and taking care to avoid repeated values. This is simple but inefficient, as operations like set membership or element deletion areO(n), as they require scanning the entire list.^[b] Sets are often instead implemented using more efficient data structures, particularly various flavors oftrees,tries, orhash tables.

As sets can be interpreted as a kind of map (by the indicator function), sets are commonly implemented in the same way as (partial) maps (associative arrays) – in this case in which the value of each key-value pair has theunit type or a sentinel value (like 1) – namely, aself-balancing binary search tree for sorted sets^{[definition needed]} (which has O(log n) for most operations), or ahash table for unsorted sets (which has O(1) average-case, but O(n) worst-case, for most operations). A sorted linear hash table^[8] may be used to provide deterministically ordered sets.

Further, in languages that support maps but not sets, sets can be implemented in terms of maps. For example, a commonprogramming idiom inPerl that converts an array to a hash whose values are the sentinel value 1, for use as a set, is:

my%elements=map{$_=>1}@elements;

Other popular methods includearrays. In particular a subset of the integers 1..n can be implemented efficiently as ann-bitbit array, which also support very efficient union and intersection operations. ABloom map implements a set probabilistically, using a very compact representation but risking a small chance of false positives on queries.

The Boolean set operations can be implemented in terms of more elementary operations (pop,clear, andadd), but specialized algorithms may yield lower asymptotic time bounds. If sets are implemented as sorted lists, for example, the naive algorithm forunion(S,T) will take time proportional to the lengthm ofS times the lengthn ofT; whereas a variant of thelist merging algorithm will do the job in time proportional tom+n. Moreover, there are specialized set data structures (such as theunion-find data structure) that are optimized for one or more of these operations, at the expense of others.

Language support

[edit]

One of the earliest languages to support sets wasPascal; many languages now include it, whether in the core language or in astandard library.

InC++, theStandard Template Library (STL) provides theset template class, which is typically implemented using a binary search tree (e.g.red–black tree);SGI's STL also provides thehash_set template class, which implements a set using a hash table.C++11 has support for theunordered_set template class, which is implemented using a hash table. In sets, the elements themselves are the keys, in contrast to sequenced containers, where elements are accessed using their (relative or absolute) position. Set elements must have a strict weak ordering.
TheRust standard library provides the genericHashSet andBTreeSet types.
Java offers theSetinterface to support sets (with theHashSet class implementing it using a hash table), and theSortedSet sub-interface to support sorted sets (with theTreeSet class implementing it using a binary search tree).
Apple'sFoundation framework (part ofCocoa) provides theObjective-C classesNSSet,NSMutableSet,NSCountedSet,NSOrderedSet, andNSMutableOrderedSet. TheCoreFoundation APIs provide theCFSet andCFMutableSet types for use inC.
Python has built-inset andfrozenset types since 2.4, and since Python 3.0 and 2.7, supports non-empty set literals using a curly-bracket syntax, e.g.:{x, y, z}; empty sets must be created usingset(), because Python uses{} to represent the empty dictionary.
The.NET Framework provides the genericHashSet andSortedSet classes that implement the genericISet interface.
Smalltalk's class library includesSet andIdentitySet, using equality and identity for inclusion test respectively. Many dialects provide variations for compressed storage (NumberSet,CharacterSet), for ordering (OrderedSet,SortedSet, etc.) or forweak references (WeakIdentitySet).
Ruby's standard library includes aset module which containsSet andSortedSet classes that implement sets using hash tables, the latter allowing iteration in sorted order.
OCaml's standard library contains aSet module, which implements a functional set data structure using binary search trees.
TheGHC implementation ofHaskell provides aData.Set module, which implements immutable sets using binary search trees.^[9]
TheTcl Tcllib package provides a set module which implements a set data structure based upon TCL lists.
TheSwift standard library contains aSet type, since Swift 1.2.
JavaScript introducedSet as a standard built-in object with the ECMAScript 2015^[10] standard.
Erlang's standard library has asets module.
Clojure has literal syntax for hashed sets, and also implements sorted sets.
LabVIEW has native support for sets, from version 2019.
Ada provides theAda.Containers.Hashed_Sets andAda.Containers.Ordered_Sets packages.

As noted in the previous section, in languages which do not directly support sets but do supportassociative arrays, sets can be emulated using associative arrays, by using the elements as keys, and using a dummy value as the values, which are ignored.

Multiset

[edit]

Main article:Multiset

A generalization of the notion of a set is that of amultiset orbag, which is similar to a set but allows repeated ("equal") values (duplicates). This is used in two distinct senses: either equal values are consideredidentical, and are simply counted, or equal values are consideredequivalent, and are stored as distinct items. For example, given a list of people (by name) and ages (in years), one could construct a multiset of ages, which simply counts the number of people of a given age. Alternatively, one can construct a multiset of people, where two people are considered equivalent if their ages are the same (but may be different people and have different names), in which case each pair (name, age) must be stored, and selecting on a given age gives all the people of a given age.

Formally, it is possible for objects in computer science to be considered "equal" under someequivalence relation but still distinct under another relation. Some types of multiset implementations will store distinct equal objects as separate items in the data structure; while others will collapse it down to one version (the first one encountered) and keep a positive integer count of the multiplicity of the element.

As with sets, multisets can naturally be implemented using hash table or trees, which yield different performance characteristics.

The set of all bags over type T is given by the expression bag T. If by multiset one considers equal items identical and simply counts them, then a multiset can be interpreted as a function from the input domain to the non-negative integers (natural numbers), generalizing the identification of a set with its indicator function. In some cases a multiset in this counting sense may be generalized to allow negative values, as in Python.

C++'sStandard Template Library implements both sorted and unsorted multisets. It provides themultiset class for the sorted multiset, as a kind ofassociative container, which implements this multiset using aself-balancing binary search tree. It provides theunordered_multiset class for the unsorted multiset, as a kind ofunordered associative container, which implements this multiset using ahash table. The unsorted multiset is standard as ofC++11; previously SGI's STL provides thehash_multiset class, which was copied and eventually standardized.
ForJava, third-party libraries provide multiset functionality:
- Apache Commons Collections provides theBag andSortedBag interfaces, with implementing classes likeHashBag andTreeBag.
- Google Guava provides theMultiset interface, with implementing classes likeHashMultiset andTreeMultiset.
Apple provides theNSCountedSet class as part ofCocoa, and theCFBag andCFMutableBag types as part ofCoreFoundation.
Python's standard library includescollections.Counter, which is similar to a multiset.
Smalltalk includes theBag class, which can be instantiated to use either identity or equality as predicate for inclusion test.

Where a multiset data structure is not available, a workaround is to use a regular set, but override the equality predicate of its items to always return "not equal" on distinct objects (however, such will still not be able to store multiple occurrences of the same object) or use anassociative array mapping the values to their integer multiplicities (this will not be able to distinguish between equal elements at all).

Typical operations on bags:

contains(B,x): checks whether the elementx is present (at least once) in the bagB
is_sub_bag(B₁,B₂): checks whether each element in the bagB₁ occurs inB₁ no more often than it occurs in the bagB₂; sometimes denoted asB₁ ⊑B₂.
count(B,x): returns the number of times that the elementx occurs in the bagB; sometimes denoted asB #x.
scaled_by(B,n): given anatural numbern, returns a bag which contains the same elements as the bagB, except that every element that occursm times inB occursn *m times in the resulting bag; sometimes denoted asn ⊗B.
union(B₁,B₂): returns a bag containing just those values that occur in either the bagB₁ or the bagB₂, except that the number of times a valuex occurs in the resulting bag is equal to (B₁ # x) + (B₂ # x); sometimes denoted asB₁ ⊎B₂.

Multisets in SQL

[edit]

Inrelational databases, a table can be a (mathematical) set or a multiset, depending on the presence of unicity constraints on some columns (which turns it into acandidate key).

SQL allows the selection of rows from a relational table: this operation will in general yield a multiset, unless the keywordDISTINCT is used to force the rows to be all different, or the selection includes the primary (or a candidate) key.

InANSI SQL theMULTISET keyword can be used to transform a subquery into a collection expression:

SELECTexpression1,expression2...FROMtable_name...

is a general select that can be used assubquery expression of another more general query, while

MULTISET(SELECTexpression1,expression2...FROMtable_name...)

transforms the subquery into acollection expression that can be used in another query, or in assignment to a column of appropriate collection type.

Notes

[edit]

^
For example, in Pythonpick can be implemented on a derived class of the built-inset as follows:
```
classSet(set):defpick(self):returnnext(iter(self))
```
^Element insertion can be done inO(1) time by simply inserting at an end, but if one avoids duplicates this takesO(n) time.

References

[edit]

^Python:pop()
^Management and Processing of Complex Data Structures: Third Workshop on Information Systems and Artificial Intelligence, Hamburg, Germany, February 28 - March 2, 1994. Proceedings, ed. Kai v. Luck, Heinz Marburger,p. 76
^PythonIssue7212: Retrieve an arbitrary element from a set without removing it; seemsg106593 regarding standard name
^RubyFeature #4553: Add Set#pick and Set#pop
^Inductive Synthesis of Functional Programs: Universal Planning, Folding of Finite Programs, and Schema Abstraction by Analogical Reasoning,Ute Schmid, Springer, Aug 21, 2003,p. 240
^Recent Trends in Data Type Specification: 10th Workshop on Specification of Abstract Data Types Joint with the 5th COMPASS Workshop, S. Margherita, Italy, May 30 - June 3, 1994. Selected Papers, Volume 10, ed. Egidio Astesiano, Gianna Reggio, Andrzej Tarlecki,p. 38
^Ruby:flatten()
^Wang, Thomas (1997),Sorted Linear Hash Table, archived fromthe original on 2006-01-12
^Stephen Adams,"Efficient sets: a balancing act", Journal of Functional Programming 3(4):553-562, October 1993. Retrieved on 2015-03-11.
^"ECMAScript 2015 Language Specification – ECMA-262 6th Edition".www.ecma-international.org. Retrieved2017-07-11.

v t e Data types
Uninterpreted	Bit Byte Trit Tryte Word Bit array
Numeric	Arbitrary-precision or bignum Complex Decimal Fixed point Block floating point Floating point Reduced precision Minifloat Half precision bfloat16 Single precision Double precision Quadruple precision Octuple precision Extended precision Long double Integer signedness Interval Rational
Pointer	Address physical virtual Reference
Text	Character String null-terminated
Composite	Algebraic data type generalized Array Associative array Class Dependent Equality Inductive Intersection List Object metaobject Option type Product Record or Struct Refinement Set Union tagged
Other	Any type Boolean Bottom type Collection Enumerated type Exception Function type Opaque data type Recursive data type Semaphore Stream Strongly typed identifier Type class Empty type Unit type Void
Related topics	Value Abstract data type Boxing Data structure Generic Kind metaclass Parametric polymorphism Primitive data type Interface Subtyping Type constructor Type conversion Type system Type theory Variable

v t e Data structures
Types	Collection Container
Abstract	Associative array Multimap Retrieval Data Structure List Stack Queue Double-ended queue Priority queue Double-ended priority queue Set Multiset Disjoint-set
Arrays	Bit array Circular buffer Dynamic array Hash table Hashed array tree Sparse matrix
Linked	Association list Linked list Skip list Unrolled linked list XOR linked list
Trees	B-tree Binary search tree AA tree AVL tree Red–black tree Self-balancing tree Splay tree Heap Binary heap Binomial heap Fibonacci heap R-tree R* tree R+ tree Hilbert R-tree Rope Trie Hash tree
Graphs	Binary decision diagram Directed acyclic graph Directed acyclic word graph
List of data structures