8.7.sets
— Unordered collections of unique elements¶
New in version 2.3.
Thesets
module provides classes for constructing and manipulatingunordered collections of unique elements. Common uses include membershiptesting, removing duplicates from a sequence, and computing standard mathoperations on sets such as intersection, union, difference, and symmetricdifference.
Like other collections, sets supportxinset
,len(set)
, andforxinset
. Being an unordered collection, sets do not record element position ororder of insertion. Accordingly, sets do not support indexing, slicing, orother sequence-like behavior.
Most set applications use theSet
class which provides every set methodexcept for__hash__()
. For advanced applications requiring a hash method,theImmutableSet
class adds a__hash__()
method but omits methodswhich alter the contents of the set. BothSet
andImmutableSet
derive fromBaseSet
, an abstract class useful for determining whethersomething is a set:isinstance(obj,BaseSet)
.
The set classes are implemented using dictionaries. Accordingly, therequirements for set elements are the same as those for dictionary keys; namely,that the element defines both__eq__()
and__hash__()
. As a result,sets cannot contain mutable elements such as lists or dictionaries. However,they can contain immutable collections such as tuples or instances ofImmutableSet
. For convenience in implementing sets of sets, inner setsare automatically converted to immutable form, for example,Set([Set(['dog'])])
is transformed toSet([ImmutableSet(['dog'])])
.
- class
sets.
Set
([iterable])¶ Constructs a new empty
Set
object. If the optionaliterableparameter is supplied, updates the set with elements obtained from iteration.All of the elements initerable should be immutable or be transformable to animmutable using the protocol described in sectionProtocol for automatic conversion to immutable.
- class
sets.
ImmutableSet
([iterable])¶ Constructs a new empty
ImmutableSet
object. If the optionaliterableparameter is supplied, updates the set with elements obtained from iteration.All of the elements initerable should be immutable or be transformable to animmutable using the protocol described in sectionProtocol for automatic conversion to immutable.Because
ImmutableSet
objects provide a__hash__()
method, theycan be used as set elements or as dictionary keys.ImmutableSet
objects do not have methods for adding or removing elements, so all of theelements must be known when the constructor is called.
8.7.1.Set Objects¶
Instances ofSet
andImmutableSet
both provide the followingoperations:
Operation | Equivalent | Result |
---|---|---|
| number of elements in sets(cardinality) | |
| testx for membership ins | |
| testx for non-membership ins | |
|
| test whether every element ins is int |
|
| test whether every element int is ins |
|
| new set with elements from boths andt |
|
| new set with elements common tos andt |
|
| new set with elements insbut not int |
|
| new set with elements in eithers ort but not both |
| new set with a shallow copy ofs |
Note, the non-operator versions ofunion()
,intersection()
,difference()
, andsymmetric_difference()
will accept any iterable asan argument. In contrast, their operator based counterparts require theirarguments to be sets. This precludes error-prone constructions likeSet('abc')&'cbs'
in favor of the more readableSet('abc').intersection('cbs')
.
Changed in version 2.3.1:Formerly all arguments were required to be sets.
In addition, bothSet
andImmutableSet
support set to setcomparisons. Two sets are equal if and only if every element of each set iscontained in the other (each is a subset of the other). A set is less thananother set if and only if the first set is a proper subset of the second set(is a subset, but is not equal). A set is greater than another set if and onlyif the first set is a proper superset of the second set (is a superset, but isnot equal).
The subset and equality comparisons do not generalize to a complete orderingfunction. For example, any two disjoint sets are not equal and are not subsetsof each other, soall of the following returnFalse
:a<b
,a==b
,ora>b
. Accordingly, sets do not implement the__cmp__()
method.
Since sets only define partial ordering (subset relationships), the output ofthelist.sort()
method is undefined for lists of sets.
The following table lists operations available inImmutableSet
but notfound inSet
:
Operation | Result |
---|---|
| returns a hash value fors |
The following table lists operations available inSet
but not found inImmutableSet
:
Operation | Equivalent | Result |
---|---|---|
| s |=t | return sets with elementsadded fromt |
| s &=t | return sets keeping onlyelements also found int |
| s -=t | return sets after removingelements found int |
| s ^=t | return sets with elementsfroms ort but not both |
| add elementx to sets | |
| removex from sets; raises | |
| removesx from sets ifpresent | |
| remove and return an arbitraryelement froms; raises | |
| remove all elements from sets |
Note, the non-operator versions ofupdate()
,intersection_update()
,difference_update()
, andsymmetric_difference_update()
will acceptany iterable as an argument.
Changed in version 2.3.1:Formerly all arguments were required to be sets.
Also note, the module also includes aunion_update()
method which is analias forupdate()
. The method is included for backwards compatibility.Programmers should prefer theupdate()
method because it is supported bythe built-inset()
andfrozenset()
types.
8.7.2.Example¶
>>>fromsetsimportSet>>>engineers=Set(['John','Jane','Jack','Janice'])>>>programmers=Set(['Jack','Sam','Susan','Janice'])>>>managers=Set(['Jane','Jack','Susan','Zack'])>>>employees=engineers|programmers|managers# union>>>engineering_management=engineers&managers# intersection>>>fulltime_management=managers-engineers-programmers# difference>>>engineers.add('Marvin')# add element>>>printengineersSet(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])>>>employees.issuperset(engineers)# superset testFalse>>>employees.update(engineers)# update from another set>>>employees.issuperset(engineers)True>>>forgroupin[engineers,programmers,managers,employees]:...group.discard('Susan')# unconditionally remove element...printgroup...Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])Set(['Janice', 'Jack', 'Sam'])Set(['Jane', 'Zack', 'Jack'])Set(['Jack', 'Sam', 'Jane', 'Marvin', 'Janice', 'John', 'Zack'])
8.7.3.Protocol for automatic conversion to immutable¶
Sets can only contain immutable elements. For convenience, mutableSet
objects are automatically copied to anImmutableSet
before being addedas a set element.
The mechanism is to always add ahashable element, or if it is nothashable, the element is checked to see if it has an__as_immutable__()
method which returns an immutable equivalent.
SinceSet
objects have a__as_immutable__()
method returning aninstance ofImmutableSet
, it is possible to construct sets of sets.
A similar mechanism is needed by the__contains__()
andremove()
methods which need to hash an element to check for membership in a set. Thosemethods check an element for hashability and, if not, check for a__as_temporarily_immutable__()
method which returns the element wrapped bya class that provides temporary methods for__hash__()
,__eq__()
,and__ne__()
.
The alternate mechanism spares the need to build a separate copy of the originalmutable object.
Set
objects implement the__as_temporarily_immutable__()
methodwhich returns theSet
object wrapped by a new class_TemporarilyImmutableSet
.
The two mechanisms for adding hashability are normally invisible to the user;however, a conflict can arise in a multi-threaded environment where one threadis updating a set while another has temporarily wrapped it in_TemporarilyImmutableSet
. In other words, sets of mutable sets are notthread-safe.
8.7.4.Comparison to the built-inset
types¶
The built-inset
andfrozenset
types were designed based onlessons learned from thesets
module. The key differences are:
Set
andImmutableSet
were renamed toset
andfrozenset
.There is no equivalent to
BaseSet
. Instead, useisinstance(x,(set,frozenset))
.The hash algorithm for the built-ins performs significantly better (fewercollisions) for most datasets.
The built-in versions have more space efficient pickles.
The built-in versions do not have a
union_update()
method. Instead, usetheupdate()
method which is equivalent.The built-in versions do not have a
_repr(sorted=True)
method.Instead, use the built-inrepr()
andsorted()
functions:repr(sorted(s))
.The built-in version does not have a protocol for automatic conversion toimmutable. Many found this feature to be confusing and no one in the communityreported having found real uses for it.