Movatterモバイル変換

[0]ホーム

Jump to content

Algebraic data type

Edit links

From Wikipedia, the free encyclopedia

Data type defined by combining other types

Not to be confused withAbstract data type.

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Algebraic data type" – news ·newspapers ·books ·scholar ·JSTOR(October 2015) (Learn how and when to remove this message)

Incomputer programming, especially infunctional programming andtype theory, analgebraic data type (ADT) is acomposite data type—a type formed by combining other types.

An algebraic data type is defined by two key constructions: asum and aproduct. These are sometimes referred to as "OR" and "AND" types.

Asum type is a choice between possibilities. The value of a sum type can match one of several definedvariants. For example, a type representing the state of a traffic light could be eitherRed,Amber, orGreen. A shape type could be either aCircle (which stores a radius)or aSquare (which stores a width). In formal terms, these variants are known astagged unions ordisjoint unions. Each variant has a name, called aconstructor, which can also carry data.Enumerated types are a simple form of sum type where the constructors carry no data.

Aproduct type combines types together. A value of a product type will contain a value for each of its component types. For example, aPoint type might be defined to contain anx coordinate (an integer)and ay coordinate (also an integer). Formal examples of product types includetuples andrecords. The set of all possible values of a product type is theCartesian product of the sets of its component types.

Values of algebraic data types are typically handled usingpattern matching. This feature allows a programmer to check which constructor a value was made with and extract the data it contains in a convenient and type-safe way.

History

[edit]

Algebraic data types were introduced inHope, a smallfunctional programming language developed in the 1970s at theUniversity of Edinburgh.^[1]

Examples

[edit]

Singly linked list

[edit]

One of the most common examples of an algebraic data type is thesingly linked list. A list type is a sum type with two variants,Nil for an empty list andConsxxs for the combination of a new elementx with a listxs to create a new list. Here is an example of how a singly linked list would be declared inHaskell:

dataLista=Nil|Consa(Lista)

data[]a=[]|a:[a]

Cons is an abbreviation ofconstruct. Many languages have special syntax for lists defined in this way. For example, Haskell andML use[] forNil,: or:: forCons, respectively, and square brackets for entire lists. SoCons 1 (Cons 2 (Cons 3 Nil)) would normally be written as1:2:3:[] or[1,2,3] in Haskell, or as1::2::3::[] or[1,2,3] in ML.

Binary tree

[edit]

For a slightly more complex example,binary trees may be implemented in Haskell as follows:

dataTree=Empty|LeafInt|NodeIntTreeTree

dataBinaryTreea=BTNil|BTNodea(BinaryTreea)(BinaryTreea)

Here,Empty represents an empty tree,Leaf represents a leaf node, andNode organizes the data into branches.

In most languages that support algebraic data types, it is possible to defineparametric types. Examples are given later in this article.

Somewhat similar to a function, a data constructor is applied to arguments of an appropriate type, yielding an instance of the data type to which the type constructor belongs. For example, the data constructorLeaf is logically a functionInt -> Tree, meaning that giving an integer as an argument toLeaf produces a value of the typeTree. AsNode takes two arguments of the typeTree itself, the datatype isrecursive.

Operations on algebraic data types can be defined by usingpattern matching to retrieve the arguments. For example, consider a function to find the depth of aTree, given here in Haskell:

depth::Tree->IntdepthEmpty=0depth(Leafn)=1depth(Nodenlr)=1+max(depthl)(depthr)

Thus, aTree given todepth can be constructed using any ofEmpty,Leaf, orNode and must be matched for any of them respectively to deal with all cases. In case ofNode, the pattern extracts the subtreesl andr for further processing.

Abstract syntax

[edit]

Algebraic data types are highly suited to implementingabstract syntax. For example, the following algebraic data type describes a simple language representing numerical expressions:

dataExpression=NumberInt|AddExpressionExpression|MinusExpressionExpression|MultExpressionExpression|DivideExpressionExpression

An element of such a data type would have a form such asMult (Add (Number 4) (Minus (Number 0) (Number 1))) (Number 2).

Writing an evaluation function for this language is a simple exercise; however, more complex transformations also become feasible. For example, an optimization pass in a compiler might be written as a function taking an abstract expression as input and returning an optimized form.

Pattern matching

[edit]

Further information:Pattern matching

Algebraic data types are used to represent values that can be one of severaltypes of things. Each type of thing is associated with an identifier called aconstructor, which can be considered a tag for that kind of data. Each constructor can carry with it a different type of data.

For example, considering the binaryTree example shown above, a constructor could carry no data (e.g.,Empty), or one piece of data (e.g.,Leaf has one Int value), or multiple pieces of data (e.g.,Node has oneInt value and twoTree values).

To do something with a value of thisTree algebraic data type, it isdeconstructed using a process calledpattern matching. This involves matching the data with a series ofpatterns. The example functiondepth above pattern-matches its argument with three patterns. When the function is called, it finds the first pattern that matches its argument, performs any variable bindings that are found in the pattern, and evaluates the expression corresponding to the pattern.

Each pattern above has a form that resembles the structure of some possible value of this datatype. The first pattern simply matches values of the constructorEmpty. The second pattern matches values of the constructorLeaf. Patterns are recursive, so then the data that is associated with that constructor is matched with the pattern "n". In this case, a lowercase identifier represents a pattern that matches any value, which then is bound to a variable of that name — in this case, a variable "n" is bound to the integer value stored in the data type — to be used in the expression to evaluate.

The recursion in patterns in this example are trivial, but a possible more complex recursive pattern would be something like:

Nodei(Nodej(Leaf4)x)(Nodeky(NodeEmptyz))

Recursive patterns several layers deep are used for example in balancingred–black trees, which involve cases that require looking at colors several layers deep.

The example above is operationally equivalent to the followingpseudocode:

switchon(data.constructor)caseEmpty:return0caseLeaf:letn=data.field1return1caseNode:letl=data.field2letr=data.field3return1+max(depthl)(depthr)

The advantages of algebraic data types can be highlighted by comparison of the above pseudocode with a pattern matching equivalent.

Firstly, there istype safety. In the pseudocode example above, programmer diligence is required to not accessfield2 when the constructor is aLeaf. The type system would have difficulties assigning a static type in a safe way for traditionalrecord data structures. However, in pattern matching such problems are not faced. The type of each extracted value is based on the types declared by the relevant constructor. The number of values that can be extracted is known based on the constructor.

Secondly, in pattern matching, the compiler performs exhaustiveness checking to ensure all cases are handled. If one of the cases of thedepth function above were missing, the compiler would issue a warning. Exhaustiveness checking may seem easy for simple patterns, but with many complex recursive patterns, the task soon becomes difficult for the average human (or compiler, if it must check arbitrary nested if-else constructs). Similarly, there may be patterns which never match (i.e., are already covered by prior patterns). The compiler can also check and issue warnings for these, as they may indicate an error in reasoning.

Algebraic data type pattern matching should not be confused withregular expression string pattern matching. The purpose of both is similar (to extract parts from a piece of data matching certain constraints) however, the implementation is very different. Pattern matching on algebraic data types matches on the structural properties of an object rather than on the character sequence of strings.

Theory

[edit]

Further information:Recursive data type

A general algebraic data type is a possibly recursivesum type ofproduct types. Each constructor tags a product type to separate it from others, or if there is only one constructor, the data type is a product type. Further, the parameter types of a constructor are the factors of the product type. A parameterless constructor corresponds to theempty product. If a datatype is recursive, the entire sum of products is wrapped in arecursive type, and each constructor also rolls the datatype into the recursive type.

For example, the Haskell datatype:

dataLista=Nil|Consa(Lista)

is represented intype theory as $\lambda \alpha .\mu \beta .1+\alpha \times \beta$ with constructors $\mathrm {nil} _{\alpha }=\mathrm {roll} \ (\mathrm {inl} \ \langle \rangle )$ and $\mathrm {cons} _{\alpha }\ x\ l=\mathrm {roll} \ (\mathrm {inr} \ \langle x,l\rangle )$ .

The Haskell List datatype can also be represented in type theory in a slightly different form, thus: $\mu \phi .\lambda \alpha .1+\alpha \times \phi \ \alpha$ .(Note how the $\mu$ and $\lambda$ constructs are reversed relative to the original.) The original formation specified a type function whose body was a recursive type. The revised version specifies a recursive function on types. (The type variable $\phi$ is used to suggest a function rather than abase type like $\beta$ , since $\phi$ is like a Greekf.) The function must also now be applied $\phi$ to its argument type $\alpha$ in the body of the type.

For the purposes of the List example, these two formulations are not significantly different; but the second form allows expressing so-callednested data types, i.e., those where the recursive type differs parametrically from the original. (For more information on nested data types, see the works ofRichard Bird,Lambert Meertens, and Ross Paterson.)

Inset theory the equivalent of a sum type is adisjoint union, a set whose elements are pairs consisting of a tag (equivalent to a constructor) and an object of a type corresponding to the tag (equivalent to the constructor arguments).^[2]

Programming languages with algebraic data types

[edit]

Further information:Comparison of programming languages (algebraic data type)

Many programming languages incorporate algebraic data types as a first class notion, including:

ATS
Ceylon
Clean
C++^[3]
Elm
Dart^[4]
Flow
F#
F*
Free Pascal
Haskell^[5]
Haxe^[6]
Hope
Idris
Java (16 for product types,^[7] 17 for sum types^[8])
Kotlin^[9]
Limbo
Language Of Temporal Ordering Specification (LOTOS)
Mercury
Miranda
Nemerle
Nim
OCaml
Opa
OpenCog
Perl
PureScript
Python
Racket
Reason^[10]
ReScript^[11]
Rocq^[12]
Rust^[13]
Scala
Standard ML
Swift
Tom
TypeScript
Visual Prolog

References

[edit]

^Hudak, Paul;Hughes, John;Peyton Jones, Simon;Wadler, Philip (9 June 2007). "A history of Haskell: being lazy with class".Proceedings of the third ACM SIGPLAN conference on History of programming languages.ISBN 978-1-59593-766-7.Presentations included Rod Burstall, Dave MacQueen, and Don Sannella on Hope, the language that introduced algebraic data types
^This article is based on material taken fromAlgebraic+data+type at theFree On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of theGFDL, version 1.3 or later.
^"CppCon 2016: Ben Deane "Using Types Effectively"".Archived from the original on 2021-12-12 – via www.youtube.com.
^"Sealed class modifier".Dart.
^"Algebraic Data Types in Haskell".Serokell.
^"Enum Instance".Haxe - The Cross-platform Toolkit.
^"JEP 395: Records".OpenJDK.
^"JEP 409: Sealed Classes".OpenJDK.
^"Sealed Classes".Kotlin Language.
^"Variants".Reason Programming Language. Retrieved2021-11-30.
^"Variant | ReScript Language Manual".ReScript Documentation. Retrieved2025-12-21.
^Calculus of Inductive Constructions, and basic standard libraries:DatatypesArchived 2020-06-10 at theWayback Machine andLogicArchived 2020-06-10 at theWayback Machine.
^"Enums and Pattern Matching - The Rust Programming Language".doc.rust-lang.org. Retrieved31 August 2021.

v t e Data types
Uninterpreted	Bit Byte Trit Tryte Word Bit array
Numeric	Arbitrary-precision or bignum Complex Decimal Fixed point Block floating point Floating point Reduced precision Minifloat Half precision bfloat16 Single precision Double precision Quadruple precision Octuple precision Extended precision Long double Integer signedness Interval Rational
Reference	Address physical virtual Pointer
Text	Character String null-terminated
Composite	Algebraic data type generalized Array Associative array Class Dependent Equality Inductive Intersection List Object metaobject Option type Product Record or Struct Refinement Set Union tagged
Other	Any type Boolean Bottom type Collection Enumerated type Exception Function type Opaque data type Recursive data type Semaphore Stream Strongly typed identifier Type class Empty type Unit type Void
Related topics	Value Abstract data type Boxing Data structure Generic Kind metaclass Parametric polymorphism Primitive data type Interface Subtyping Type constructor Type conversion Type system Type theory Variable