Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Tagged union

From Wikipedia, the free encyclopedia
(Redirected fromVariant type)
Data structure used to hold a value that could take on several different, but fixed, types

Incomputer science, atagged union, also called avariant,variant record,choice type,discriminated union,disjoint union,sum type, orcoproduct, is adata structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time, and atag field explicitly indicates which type is in use. It can be thought of as a type that has several "cases", each of which should be handled correctly when that type is manipulated. This is critical in defining recursive datatypes, in which some component of a value may have the same type as that value, for example in defining a type for representingtrees, where it is necessary to distinguish multi-node subtrees and leaves. Like ordinaryunions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time.

Description

[edit]

Tagged unions are most important infunctional programming languages such asML andHaskell, where they are calleddatatypes (seealgebraic data type) and thecompiler can verify that all cases of a tagged union are always handled, avoiding many types of errors. Compile-time checked sum types are also extensively used inRust, where they are calledenum. They can, however, be constructed in nearly anyprogramming language, and are much safer than untagged unions, often simply called unions, which are similar but do not explicitly track which member of a union is currently in use.

Tagged unions are often accompanied by the concept of aconstructor, which is similar but not the same as aconstructor for aclass. A constructor is a function or an expression that produces a value of the tagged union type, given a tag and a value of the corresponding type.

Mathematically, tagged unions correspond todisjoint ordiscriminated unions, usually written using +. Given an element of a disjoint unionA +B, it is possible to determine whether it came fromA orB. If an element lies in both, there will be two effectively distinct copies of the value inA +B, one fromA and one fromB.

Intype theory, a tagged union is called asum type. Sum types are thedual ofproduct types. Notations vary, but usually the sum typeA +B comes with two introduction forms (injections)inj1:AA +B andinj2:BA +B. The elimination form is case analysis, known aspattern matching inML-style languages: ife has typeA +B ande1 ande2 have typeτ{\displaystyle \tau } under the assumptionsx:A andy:B respectively, then the termcase e of xe1ye2{\displaystyle {\mathsf {case}}\ e\ {\mathsf {of}}\ x\Rightarrow e_{1}\mid y\Rightarrow e_{2}} has typeτ{\displaystyle \tau }. The sum type corresponds tointuitionisticlogical disjunction under theCurry–Howard correspondence.

Anenumerated type can be seen as a degenerate case: a tagged union ofunit types. It corresponds to a set of nullary constructors and may be implemented as a simple tag variable, since it holds no additional data besides the value of the tag.

Many programming techniques and data structures, includingrope,lazy evaluation,class hierarchy (see below),arbitrary-precision arithmetic,CDR coding, theindirection bit, and other kinds oftagged pointers, are usually implemented using some sort of tagged union.

A tagged union can be seen as the simplest kind ofself-describingdata format.The tag of the tagged union can be seen as the simplest kind ofmetadata.

In languages withflow-sensitive typing, tagged unions can be implemented by a combination ofunion types andrecord types.[1]

Advantages and disadvantages

[edit]

The primary advantage of a tagged union over an untagged union is that all accesses are safe, and the compiler can even check that all cases are handled. Untagged unions depend on program logic to correctly identify the currently active field, which may result in strange behavior and hard-to-find bugs if that logic fails.

The primary advantage of a tagged union over a simplerecord containing a field for each type is that it saves storage by overlapping storage for all the types. Some implementations reserve enough storage for the largest type, while others dynamically adjust the size of a tagged union value as needed. When the value isimmutable, it is simple to allocate just as much storage as is needed.

The main disadvantage of tagged unions is that the tag occupies space. Since there are usually a small number of alternatives, the tag can often be squeezed into 2 or 3 bits wherever space can be found, but sometimes even these bits are not available. In this case, a helpful alternative may befolded,computed orencoded tags, where the tag value is dynamically computed from the contents of the union field. Common examples are the use ofreserved values, where, for example, a function returning a positive number may return -1 to indicate failure, andsentinel values, most often used intagged pointers.

Sometimes, untagged unions are used to perform bit-level conversions between types, called reinterpret casts in C++. Tagged unions are not intended for this purpose; typically a new value is assigned whenever the tag is changed.

Many languages support, to some extent, auniversal data type, which is a type that includes every value of every other type, and often a way is provided to test the actual type of a value of the universal type. These are sometimes referred to asvariants. While universal data types are comparable to tagged unions in their formal definition, typical tagged unions include a relatively small number of cases, and these cases form different ways of expressing a single coherent concept, such as a data structure node or instruction. Also, there is an expectation that every possible case of a tagged union will be dealt with when it is used. The values of a universal data type are not related and there is no feasible way to deal with them all.

Likeoption types andexception handling, tagged unions are sometimes used to handle the occurrence of exceptional results. Often these tags are folded into the type asreserved values, and their occurrence is not consistently checked: this is a fairly common source of programming errors. This use of tagged unions can be formalized as amonad with the following functions:

return:A(A+E)=avaluea{\displaystyle {\text{return}}\colon A\to \left(A+E\right)=a\mapsto {\text{value}}\,a}
bind:(A+E)(A(B+E))(B+E)=af{erreif a=errefaif a=valuea{\displaystyle {\text{bind}}\colon \left(A+E\right)\to \left(A\to \left(B+E\right)\right)\to \left(B+E\right)=a\mapsto f\mapsto {\begin{cases}{\text{err}}\,e&{\text{if}}\ a={\text{err}}\,e\\f\,a'&{\text{if}}\ a={\text{value}}\,a'\end{cases}}}

where "value" and "err" are the constructors of the union type,A andB are valid result types andE is the type of error conditions. Alternately, the same monad may be described byreturn and two additional functions,fmap andjoin:

fmap:(AB)((A+E)(B+E))=fa{erreif a=errevalue(fa)if a=valuea{\displaystyle {\text{fmap}}\colon (A\to B)\to \left(\left(A+E\right)\to \left(B+E\right)\right)=f\mapsto a\mapsto {\begin{cases}{\text{err}}\,e&{\text{if}}\ a={\text{err}}\,e\\{\text{value}}\,{\text{(}}\,f\,a'\,{\text{)}}&{\text{if}}\ a={\text{value}}\,a'\end{cases}}}
join:((A+E)+E)(A+E)=a{erreif a=erreerreif a=value(erre)valueaif a=value(valuea){\displaystyle {\text{join}}\colon ((A+E)+E)\to (A+E)=a\mapsto {\begin{cases}{\text{err}}\,e&{\mbox{if}}\ a={\text{err}}\,e\\{\text{err}}\,e&{\text{if}}\ a={\text{value}}\,{\text{(err}}\,e\,{\text{)}}\\{\text{value}}\,a'&{\text{if}}\ a={\text{value}}\,{\text{(value}}\,a'\,{\text{)}}\end{cases}}}

Examples

[edit]

Say we wanted to build abinary tree of integers. In ML, we would do this by creating a datatype like this:

datatypetree=Leaf|Nodeof(int*tree*tree)

This is a tagged union with two cases: one, the leaf, is used to terminate a path of the tree, and functions much like a null value would in imperative languages. The other branch holds a node, which contains an integer and a left and right subtree. Leaf and Node are the constructors, which enable us to actually produce a particular tree, such as:

Node(5,Node(1,Leaf,Leaf),Node(3,Leaf,Node(4,Leaf,Leaf)))

which corresponds to this tree:

The tree produced by the above constructors
The tree produced by the above constructors

Now we can easily write a typesafe function that, for example, counts the number of nodes in the tree:

funcountNodes(Leaf)=0|countNodes(Node(int,left,right))=1+countNodes(left)+countNodes(right)

Timeline of language support

[edit]

1960s

[edit]

InALGOL 68, tagged unions are calledunited modes, the tag is implicit, and thecase construct is used to determine which field is tagged:

modenode =union (real,int,compl,string);

Usage example forunioncase ofnode:

node n := "1234"; case nin  (real r):   print(("real:", r)),  (int i):    print(("int:", i)),  (compl c):  print(("compl:", c)),  (string s): print(("string:", s))out         print(("?:", n))esac

In ALGOL 68, a union can be automatically coerced into a wider union, for example if all its constituents can be handled by the union parameter ofprint, a union can simply be passed to print as in theout case above.

1970s & 1980s

[edit]

Functional programming languages such asML (from the 1970s) andHaskell (from the 1990s) give a central role to tagged unions and have the power to check that all cases are handled. Some other languages also support tagged unions.

Pascal,Ada, andModula-2 call themvariant records (formallydiscriminated type in Ada), and require the tag field to be manually created and the tag values specified, as in this Pascal example:

typeshapeKind=(square,rectangle,circle);shape=recordcenterx:integer;centery:integer;casekind:shapeKindofsquare:(side:integer);rectangle:(width,height:integer);circle:(radius:integer);end;

and this Ada equivalent:

typeShape_Kindis(Square,Rectangle,Circle);typeShape(Kind:Shape_Kind)isrecordCenter_X:Integer;Center_Y:Integer;caseKindiswhenSquare=>Side:Integer;whenRectangle=>Width,Height:Integer;whenCircle=>Radius:Integer;endcase;end record;-- Any attempt to access a member which existence depends-- on a certain value of the discriminant, while the-- discriminant is not the expected one, raises an error.


InC andC++, a tagged union can be created from untagged unions using a strict access discipline where the tag is always checked:

#include<assert.h>typedefenum{Invalid,Square,Rectangle,Circle,}ShapeKind;typedefstruct{union{struct{intside;};// Squarestruct{intwidth,height;};// Rectanglestruct{intradius;};// Circle};ShapeKindkind;intcenter_x;intcenter_y;}Shape;[[nodiscard]]intsquare_get_side(constShape*s){assert(s->kind==Square&&"get_square_side: Shape is not a Square!");returns->side;}voidsquare_set_side(Shape*s,intside){s->kind=Square;s->side=side;}/* and so on with Rectangle and Circle */

As long as the union fields are only accessed through the functions, the accesses will be safe and correct. The same approach can be used for encoded tags; we simply decode the tag and then check it on each access. If the inefficiency of these tag checks is a concern, they may be automatically removed in the final version.

C and C++ also have language support for one particular tagged union: the possibly-nullpointer. This may be compared to theoption type in ML or theMaybe type in Haskell, and can be seen as atagged pointer: a tagged union (with an encoded tag) of two types:

  • Valid pointers,
  • Anull pointer type with only one value,null, indicating an exceptional condition.

Unfortunately, C compilers do not verify that the null case is always handled. This is a particularly common source of errors in C code, since there is a tendency to ignore exceptional cases.

However, using both untagged unions and pointers, another way to accomplish a tagged union in C and C++ is through the use ofopaque pointer types with functionality exposed only through an associated interfaceheader file:

Shape.h

#pragma once// assert function#include<assert.h>#ifdef __cplusplusextern"C"{#endiftypedefenum{Invalid,Square,Rectangle,Circle,}ShapeKind;// forward declare the Shape struct.structShape;// functions to support the opaque pointer.Shape*shape_new(intcenter_x,intcenter_y);voidshape_free(Shape**s_ref);ShapeKindshape_get_kind(constShapeconst*s);[[nodiscard]]intsquare_get_side(constShape*s);voidsquare_set_side(Shape*s,intside);// so on for Rectangle and Circle#ifdef __cplusplus}#endif

The benefits of using an opaque pointers to access tagged unions restricts all data access functionality to that provided by the opaque pointer's interfacing API. By restricting data access through an API, this approach significantly reduces the likelihood of unsafe and/or incorrect union data accesses.

The downsides to this approach however is that dynamic memory allocation cannot be avoided as a pointer or other type ofhandle must be provided to access the internal data and allocating implicitly places the burden of memory management of the opaque pointer onto the end-using developer as well as

2000s

[edit]

One advanced dialect of C, calledCyclone, has extensive built-in support for tagged unions.[2]

The enum types in theRust,Haxe, andSwift languages also work as tagged unions.

The variant library from theBoost C++ Libraries demonstrated it was possible to implement a safe tagged union as a library in C++, visitable using function objects.boost::variant was later introduced to C++17 asstd::variant.

#include<boost/variant.hpp>importstd;usingstd::string;usingboost::static_visitor;usingboost::variant;classDisplay:publicstatic_visitor<void>{voidoperator()(inti){std::println("An int, with value {}",i);}voidoperator()(conststring&s){std::println("A string, with value {}",s);}};intmain(intargc,char*argv[]){variant<int,string>v1=42;boost::apply_visitor(display(),v1);variant<int,string>v2="Hello, world!";boost::apply_visitor(display(),v2);}

Scala has case classes:

sealedabstractclassTreecaseobjectLeafextendsTreecaseclassNode(value:Int,left:Tree,right:Tree)extendsTreevaltree=Node(5,Node(1,Leaf,Leaf),Node(3,Leaf,Node(4,Leaf,Leaf)))

Because the class hierarchy is sealed, the compiler can check that all cases are handled in a pattern match:

treematch{caseNode(x,_,_)=>println("top level node value: "+x)caseLeaf=>println("top level node is a leaf")}

Scala's case classes also permit reuse through subtyping:

sealedabstractclassShape(centerX:Int,centerY:Int)caseclassSquare(side:Int,centerX:Int,centerY:Int)extendsShape(centerX,centerY)caseclassRectangle(length:Int,height:Int,centerX:Int,centerY:Int)extendsShape(centerX,centerY)caseclassCircle(radius:Int,centerX:Int,centerY:Int)extendsShape(centerX,centerY)

F# has discriminated unions:

typeTree=|Leaf|Nodeofvalue:int*left:Tree*right:Treelettree=Node(5,Node(1,Leaf,Leaf),Node(3,Leaf,Node(4,Leaf,Leaf)))

Because the defined cases are exhaustive, the compiler can check that all cases are handled in a pattern match:

matchtreewith|Node(x,_,_)->printfn"top level node value: %i"x|Leaf->printfn"top level node is a leaf"

Haxe's enums also work as tagged unions:[3]

enumColor{Red;Green;Blue;Rgb(r:Int,g:Int,b:Int);}

These can be matched using a switch expression:

switch(color){caseRed:trace("Color was red");caseGreen:trace("Color was green");caseBlue:trace("Color was blue");caseRgb(r,g,b):trace("Color had a red value of "+r);}

Nim has object variants[4] similar in declaration to those in Pascal and Ada:

typeShapeKind=enumskSquare,skRectangle,skCircleShape=objectcenterX,centerY:intcasekind:ShapeKindofskSquare:side:intofskRectangle:length,height:intofskCircle:radius:int

Macros can be used to emulate pattern matching or to create syntactic sugar for declaring object variants, seen here as implemented by the packagepatty:

importpattyproc`~`[A](a:A):refA=new(result)result[]=avariantList[A]:NilCons(x:A,xs:refList[A])proclistHelper[A](xs:seq[A]):List[A]=ifxs.len==0:Nil[A]()else:Cons(xs[0],~listHelper(xs[1..xs.high]))proclist[A](xs:varargs[A]):List[A]=listHelper(@xs)procsum(xs:List[int]):int=(block:matchxs:Nil:0Cons(y,ys):y+sum(ys[]))echosum(list(1,2,3,4,5))

2010s

[edit]

Enums are added in Scala 3,[5] allowing us to rewrite the earlier Scala examples more concisely:

enumTree[+T]:caseLeafcaseNode(x:Int,left:Tree[T],right:Tree[T])enumShape(centerX:Int,centerY:Int):caseSquare(side:Int,centerX:Int,centerY:Int)extendsShape(centerY,centerX)caseRectangle(length:Int,height:Int,centerX:Int,centerY:Int)extendsShape(centerX,centerY)caseCircle(radius:Int,centerX:Int,centerY:Int)extendsShape(centerX,centerY)

TheRust language has extensive support for tagged unions, called enums.[6] For example:

enumTree{Leaf,Node(i64,Box<Tree>,Box<Tree>)}

It also allows matching on unions:

lettree:Tree=Tree::Node(2,Box::new(Tree::Node(0,Box::new(Tree::Leaf),Box::new(Tree::Leaf))),Box::new(Tree::Node(3,Box::new(Tree::Leaf),Box::new(Tree::Node(4,Box::new(Tree::Leaf),Box::new(Tree::Leaf))))));fnadd_values(tree:Tree)->i64{matchtree{Tree::Node(v,a,b)=>v+add_values(*a)+add_values(*b),Tree::Leaf=>0}}fnmain(){assert_eq!(add_values(tree),9);}

Rust's error handling model relies extensively on these tagged unions, especially theOption<T> type, which is eitherNone orSome(T), and theResult<T, E> type, which is eitherOk(T) orErr(E).[7]

Swift also has substantial support for tagged unions via enumerations.[8] For example:

enumTree{caseleafindirectcasenode(Int,Tree,Tree)}lettree=Tree.node(2,.node(0,.leaf,.leaf),.node(3,.leaf,.node(4,.leaf,.leaf)))funcadd_values(_tree:Tree)->Int{switchtree{caselet.node(v,a,b):returnv+add_values(a)+add_values(b)case.leaf:return0}}assert(add_values(tree)==9)

WithTypeScript it is also possible to create tagged unions. For example:

interfaceLeaf{kind:"leaf";}interfaceNode{kind:"node";value:number;left:Tree;right:Tree;}typeTree=Leaf|Nodeconstroot:Tree={kind:"node",value:5,left:{kind:"node",value:1,left:{kind:"leaf"},right:{kind:"leaf"}},right:{kind:"node",value:3,left:{kind:"leaf"},right:{kind:"node",value:4,left:{kind:"leaf"},right:{kind:"leaf"}}}}functionvisit(tree:Tree){switch(tree.kind){case"leaf":breakcase"node":console.log(tree.value)visit(tree.left)visit(tree.right)break}}

Python 3.9 introduces support for typing annotations that can be used to define a tagged union type (PEP-593[9]):

Currency=Annotated[TypedDict('Currency',{'dollars':float,'pounds':float},total=False),TaggedUnion,]

C++17 introduces std::variant andconstexpr if:

importstd;template<typenameT>usingTree<T>=std::variant<Leaf<T>,Node<T>>;usingstd::is_same_v;usingstd::unique_ptr;template<typenameT>classLeaf{private:Tvalue;public:explicitLeaf(Tval):value{std::move(val)}{}[[nodiscard]]constT&getValue()constnoexcept{returnvalue;}};template<typenameT>classNode{private:unique_ptr<Tree<T>>left=nullptr;unique_ptr<Tree<T>>right=nullptr;public:Node(unique_ptr<Tree<T>>leftTree,unique_ptr<Tree<T>>rightTree):left{std::move(leftTree)},right{std::move(rightTree)}{}[[nodiscard]]constTree<T>*getLeft()constnoexcept{returnleft.get();}[[nodiscard]]constTree<T>*getRight()constnoexcept{returnright.get();}};template<typenameT>structTransverser{template<typenameU>voidoperator()(U&&v){ifconstexpr(is_same_v<U,Leaf<T>&>){std::println("{}",v.getValue());}elseifconstexpr(is_same_v<U,Node<T>&>){if(v.left){std::visit(Transverser<U>{},*v.getLeft());}if(v.right){std::visit(Transverser<U>{},*v.getRight());}}else{// The !sizeof(T) expression is always falsestatic_assert(!sizeof(T),"Non-exhaustive visitor!");};}};intmain(intargc,char*argv[]){Tree<int>forest=Node<int>(std::make_unique<Tree<int>>(Node<int>(std::make_unique<Tree<int>>(Leaf<int>(1)),std::make_unique<Tree<int>>(Leaf<int>(2))),std::make_unique<Tree<int>>(Leaf<int>(3))));std::visit(Transverser<int>{},forest);}

Class hierarchies as tagged unions

[edit]

In a typicalclass hierarchy inobject-oriented programming, each subclass can encapsulate data unique to that class. The metadata used to performvirtual method lookup (for example, the object'svtable pointer in most C++ implementations) identifies the subclass and so effectively acts as a tag identifying the data stored by the instance (seeRTTI).An object'sconstructor sets this tag, and it remains constant throughout the object's lifetime.

Nevertheless, a class hierarchy involves truesubtype polymorphism. It can be extended by creating further subclasses of the same base type, which could not be handled correctly under a tag/dispatch model. Hence, it is usually not possible to do case analysis or dispatch on a subobject's 'tag' as one would for tagged unions. Some languages such asScala allow base classes to be "sealed", and unify tagged unions with sealed base classes.

See also

[edit]

References

[edit]
  1. ^https://arxiv.org/pdf/2111.03354 p. 8
  2. ^"Cyclone: Tagged Unions".
  3. ^"Using Enums - Haxe - The Cross-platform Toolkit". Haxe Foundation.
  4. ^"Nim Manual".nim-lang.org. Retrieved2020-01-23.
  5. ^"Scala 3 Language Reference: Enumerations". The Scala Team.
  6. ^"The Rust Programming Language". Mozilla.
  7. ^"Rust By Example". Mozilla.
  8. ^"Enumerations — The Swift Programming Language (Swift 5.4)".docs.swift.org. Retrieved2021-04-28.
  9. ^"PEP 593 -- Flexible function and variable annotations".Python.org. Retrieved2021-06-20.

External links

[edit]
Uninterpreted
Numeric
Reference
Text
Composite
Other
Related
topics
Retrieved from "https://en.wikipedia.org/w/index.php?title=Tagged_union&oldid=1317280067"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp