- Notifications
You must be signed in to change notification settings - Fork26
Algebraic data types for C99
License
hirrolot/datatype99
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation

Safe, intuitivealgebraic data types with exhaustive pattern matching & compile-time introspection facilities. No external tools required, pure C99.
Type-safe. Such things as improperly typed variants, non-exhaustive pattern matching, and invalid field access are caught at compile-time.
Portable. Everything you need is a standard-conforming C99 compiler; neither the standard library, nor compiler/platform-specific functionality or VLA are required.
Predictable. Datatype99 comes with formalcode generation semantics, meaning that the generated data layout is guaranteed to always be the same.
Comprehensible errors. Datatype99 isresilient to bad code.
Battle-tested. Datatype99 is used atOpenIPC to develop real-time streaming software for IP cameras; this includes anRTSP 1.0 implementation along with ~50k lines of private code.
Datatype99 consists of one header filedatatype99.h and one dependencyMetalang99. To use it in your project, you need to:
- Add
datatype99andmetalang99/includeto your include directories. - Specify
-ftrack-macro-expansion=0(GCC) or-fmacro-backtrace-limit=1(Clang) to avoid useless macro expansion errors.
If you use CMake, the recommended way isFetchContent:
include(FetchContent)FetchContent_Declare( datatype99 URL https://github.com/hirrolot/datatype99/archive/refs/tags/vx.y.z.tar.gz# vx.y.z)FetchContent_MakeAvailable(datatype99)target_link_libraries(MyProject datatype99)# Disable full macro expansion backtraces for Metalang99.if(CMAKE_C_COMPILER_IDSTREQUAL"Clang") target_compile_options(MyProjectPRIVATE -fmacro-backtrace-limit=1)elseif(CMAKE_C_COMPILER_IDSTREQUAL"GNU") target_compile_options(MyProjectPRIVATE -ftrack-macro-expansion=0)endif()
(By default,datatype99/CMakeLists.txt downloads Metalang99v1.13.5 from the GitHub releases; if you want to override this behaviour, you can do so by invokingFetchContent_Declare earlier.)
Optionally, you canprecompile headers in your project that rely on Datatype99. This will decrease compilation time, because the headers will not be compiled each time they are included.
Happy hacking!
Put simply, Datatype99 is just a syntax sugar overtagged unions; the only difference is that it is more safe and concise. For example, to represent a binary tree, you would normally write something like this:
typedefstruct {structBinaryTree*lhs;intx;structBinaryTree*rhs;}BinaryTreeNode;typedefstruct {enum {Leaf,Node }tag;union {intleaf;BinaryTreeNodenode; }data;}BinaryTree;
To avoid this boilerplate, you can use Datatype99:
datatype(BinaryTree, (Leaf,int), (Node,BinaryTree*,int,BinaryTree*));
Say you want to sum all nodes and leafs in your binary tree. Then you may write something like this:
intsum(constBinaryTree*tree) {switch (tree->tag) {caseLeaf:returntree->data.leaf;caseNode:returnsum(tree->data.node.lhs)+tree->data.node.x+sum(tree->data.node.rhs); }// Invalid input (no such variant).return-1;}
... but what if you accidentally accesstree->data.node aftercase Leaf:? Your compiler would not warn you, thus resulting in a business logic bug.
With Datatype99, you can rewritesum as follows, using a technique calledpattern matching:
intsum(constBinaryTree*tree) {match(*tree) {of(Leaf,x)return*x;of(Node,lhs,x,rhs)returnsum(*lhs)+*x+sum(*rhs); }// Invalid input (no such variant).return-1;}
of gives you variables calledbindings:x,lhs, orrhs. This design has a few neat aspects:
- Compile-time safety. The bindings of
Nodeare invisible afterof(Leaf, x)and vice versa, so compilation will fail to proceed if you access them inappropriately. - Flexibility. Bindings have pointer types so that you can mutate them, thereby mutating the whole
tree; in order to obtain a value, you can dereference them, as shown in the example:return *x;.
The last thing unmentioned is how you construct variants. Internally, Datatype99 generatesinline static functions calledvalue constructors; you can use them as follows:
BinaryTreeleaf5=Leaf(5);BinaryTreeleaf7=Leaf(7);BinaryTreenode=Node(&leaf5,123,&leaf7);
Finally, just a few brief notes about pattern matching:
- To match the default case, write
otherwise { ... }at the end ofmatch. - To ignore a binding, write
_:of(Foo, a, b, _, d). - Please,do not use top-level
break/continueinside statements provided toofandifLet; usegotolabels instead.
Congratulations, this is all you need to know to write most of the stuff! If you feel fancy, you can also introspect your types at compile-time; seeexamples/derive/ for the examples.
Having a well-defined semantics of the macros, you can write an FFI which is quite common in C.
<datatype>::="datatype(" [ <derive-clause>"," ] <datatype-name> {"," <variant> }+")" ;<record>::="record(" [ <derive-clause>"," ] <record-name> {"," <field> }*")" ;<datatype-name>::= <ident> ;<record-name>::= <ident> ;<variant>::="(" <variant-name> {"," <type> }*")" ;<field>::="(" <type>"," <field-name>")" ;<variant-name>::= <ident> ;<field-name>::= <ident> ;<derive-clause>::="derive(" <deriver-name> {"," <deriver-name> }*")" ;<deriver-name>::= <ident> ;<match>::="match(" <lvalue>") {" { <of> }* [ <otherwise> ]"}" ;<matches>::="MATCHES(" <expr>"," <ident>")" ;<if-let>::="ifLet(" <lvalue>"," <variant-name>"," <ident> {"," <ident> }*")" <stmt> ;<of>::="of(" <variant-name> {"," <ident> }*")" <stmt> ;<otherwise>::="otherwise" <stmt> ;
Note: shortened vs. postfixed versions
Each listed identifier in the above grammar corresponds to a macro name defined by default -- these are calledshortened versions. On the other hand, there are alsopostfixed versions (match99,of99,derive99, etc.), which are defined unconditionally. If you want to avoid name clashes caused by shortened versions, defineDATATYPE99_NO_ALIASES before includingdatatype99.h. Library headers are strongly advised to use the postfixed macros, but without resorting toDATATYPE99_NO_ALIASES.
(It might be helpful to look at thegenerated data layout ofexamples/binary_tree.c.)
- Before everything, the following type definition is generated:
typedef struct <datatype-name> <datatype-name>;- For each non-empty variant, the following type definition is generated (the metavariable
<type>ranges over a corresponding variant's types):
typedef struct <datatype-name><variant-name> { <type>0 _0; ... <type>N _N;} <datatype-name><variant-name>;- For each non-empty variant, the following type definitions to types of each field of
<datatype-name><variant-name>are generated:
typedef <type>0 <variant-name>_0;...typedef <type>N <variant-name>_N;- For each variant, the following type definition to a corresponding sum type is generated:
typedef struct <datatype-name> <variant-name>SumT;- For each sum type, the following tagged union is generated (inside the union, only fields to structures of non-empty variants are generated):
typedef enum <datatype-name>Tag { <variant-name>0Tag, ..., <variant-name>NTag} <datatype-name>Tag;typedef union <datatype-name>Variants { char dummy; <datatype-name><variant-name>0 <variant-name>0; ... <datatype-name><variant-name>N <variant-name>N;} <datatype-name>Variants;struct <datatype-name> { <datatype-name>Tag tag; <datatype-name>Variants data;};Note on char dummy;
char dummy; is needed to make the union contain at least one item, according to the standard, even if all variants are empty. Such adatatype would enforce strict type checking unlike plain Cenums.
- For each variant, the following function called avalue constructor is generated:
inline static <datatype-name> <variant-name>(/* ... */) { /* ... */ }If the variant has no parameters, this function will takevoid and initialise.data.dummy to'\0'; otherwise, it will take the corresponding variant parameters and initialise the result value as expected.
- Now, when a sum type is fully generated, the derivation process takes place. Each deriver taken from
derive(...)is invoked sequentially, from left to right, as
ML99_call(DATATYPE99_DERIVE_##<deriver-name>I, v(<datatype-name>), variants...)where
<deriver-name>Icorresponds to aMetalang99-compliant macro of the form#define DATATYPE99_DERIVE_##<deriver-name>I_IMPL(name, variants) /* ... */.variants...is alist of variants represented as two-placetuples:(<variant-name>, types...), wheretypes...is alist of types of the corresponding variant.
Put simply, a deriver is meant to generate something global for a sum type, like interface implementations or almost any other stuff. In terms of Rust, you can think of it as of thederive attribute.
record represents arecord type: it is simply astruct for which the derivation process is defined.
- The following structure is generated:
typedef struct <record-name> { // Only if <record-name> has no fields: char dummy; <type>0 <field-name>0; ... <type>N <field-name>N;} <record-name>;Note on char dummy;
char dummy; is needed to make the structure contain at least one item, according to the standard. Suchrecord(Foo) can be used to implement interfaces for it (seeInterface99).
- Each deriver taken from
derive(...)is invoked sequentially, from left to right, as
ML99_call(DATATYPE99_RECORD_DERIVE_##<deriver-name>I, v(<record-name>), fields...)where
<deriver-name>Icorresponds to aMetalang99-compliant macro of the form#define DATATYPE99_RECORD_DERIVE_##<deriver-name>I_IMPL(name, fields) /* ... */.fields...is alist of fields represented as two-placetuples:(<type>, <field-name>). If a record contains no fields, the list would consist only of(char, dummy).
match has the expected semantics: it sequentially tries to match the given instance of a sum type against the given variants, and, if a match has succeeded, it executes the corresponding statement and moves down to the next instruction (match(val) { ... } next-instruction;). If all the matches have failed, it executes the statement afterotherwise and moves down to the next instruction.
A completematch construct results in a single C statement.
of accepts a matched variant name as a first argument and the rest of arguments comprise a comma-separated list of bindings.
- A binding equal to
_is ignored. - A bindingnot equal to
_stands for a pointer to a corresponding data of the variant (e.g., let there be(Foo, T1, T2)andof(Foo, x, y), thenxhas the typeT1 *andyisT2 *).
There can be more than one_ binding, however, non-_ bindings must be distinct.
To match an empty variant, writeof(Bar).
MATCHES just tests an instance of a sum type for a given variant. If the given instance corresponds to the given variant, it expands to truthfulness, otherwise it expands to falsehood.
DEPRECATED: useMATCHES instead.
ifLet tries to match the given instance of a sum type against the given variant, and, if a match has succeeded, it executes the corresponding statement.
Think ofifLet(<expr>, <variant-name>, vars...) { /* ... */ } as of an abbreviation of
match(<expr>) { of(<variant-name>, vars...) { /* ... */ } otherwise {}}A completeifLet construct results in a single C statement.
The unit typeUnitT99 represents the type of a single value,unit_v99 (it should not be assigned to anything else). These are defined as follows:
typedefcharUnitT99;staticconstUnitT99unit_v99='\0';
IfDATATYPE99_NO_ALIASES remains undefined prior to#include <datatype99.h>,UnitT99 andunit_v99 are also accessible through object-like macrosUnitT &unit_v.
You can pass named arguments to a deriver; these are calledderive helper attributes. They must be specified as object-like macros of the form:
#define <variant-name>_<namespace>_<attribute-name> attr(/* attribute value */)where<namespace> is either<datatype-name>/<record-name> or<variant-name>/<field-name> fordatatype/record-specific and variant/field-specific attributes, respectively.
To manipulate derive helper attributes, there are a few predefined macros:
DATATYPE99_attrIsPresent/DATATYPE99_ATTR_IS_PRESENTAccepts an attribute name and checks if it is present or not. It can be used to check the presence of an optional attribute.
DATATYPE99_attrValue/DATATYPE99_ATTR_VALUEAccepts an attribute name extracts its value. A provided attributemust be present.
DATATYPE99_assertAttrIsPresentAccepts an attribute name and emits a fatal error if the attribute is not present, otherwise results in emptiness. It can be used for mandatory attributes.
(The naming convention here is the sameas of Metalang99.)
The macros
DATATYPE99_MAJOR,DATATYPE99_MINOR,DATATYPE99_PATCH,DATATYPE99_VERSION_COMPATIBLE(x, y, z), andDATATYPE99_VERSION_EQ(x, y, z)have thesame semantics as of Metalang99.For each macro using
ML99_EVAL, Datatype99 provides itsMetalang99-compliant counterpart which can be used inside derivers and other Metalang99-compliant macros:
| Macro | Metalang99-compliant counterpart |
|---|---|
datatype | DATATYPE99_datatype |
record | DATATYPE99_record |
of | DATATYPE99_of |
ifLet | DATATYPE99_ifLet |
(Anarity specifier anddesugaring macro are provided for each of the above macros.)
- There is a built-in deriver
dummywhich generates nothing. It is defined both for record and sum types.
If you useClang-Format, cancel formatting for adatatype definition using// clang-format off &// clang-format on to make it look prettier, as in the examples.
Always#undef derive helper attributes after a correspondingdatatype definition not to pollute your namespace.
If the meaning of variant parameters is not clear from the context, give them descriptive names. This can be achieved in several ways:
// 1. Define type aliases to variant parameters.typedefdoubleXCoordinate;typedefdoubleYCoordinate;typedefdoubleWidth;typedefdoubleHeight;datatype(Shape, (Point,XCoordinate,YCoordinate), (Rectangle,Width,Height));// 2. Define separate structures.typedefstruct {doublex,y;}Point;typedefstruct {doublewidth,height;}Rectangle;datatype(Shape, (MkPoint,Point), (MkRectangle,Rectangle));
Comparison:
- The former option has more concise syntax:
MkPoint(x, y)instead ofMkPoint((Point){x, y}). - The latter option is more appropriate when the structures are to be used separately from the containing sum type.
- The latter option allows for more graduate control over the data layout: you can accompain the structures with compiler-specific attributes, alignment properties like
__attribute__ ((__packed__)), etc.
Donot usebreak/continue inside a statement provided toof/ifLet but outside of anyfor/while loops in that statement. For example, this code is fine:
match(x) {of(Foo,a,b,c) {for (inti=0;i<10;i++) {continue; } }}
But this code isnot fine:
for (inti=0;i<10;i++) {match(x) {of(Foo,a,b,c) {if (a==7) {break; }continue; } }}
To make it valid, you can rewrite it as follows:
for (inti=0;i<10;i++) {match(x) {of(Foo,a,b,c) {if (a==7) { gotomy_break; } gotomy_continue; } }// Datatype99 prohibits top-level `break`/`continue`.my_continue:;}my_break:;
To specify an array as a variant parameter, you must put it into a separatestruct; seeexamples/array_in_variant.c.
Bindings introduced byof arealways mutable, so make sure you donot mutate them if the value passed tomatch is qualified asconst.
Thanks to Rust and ML for their implementations of sum types.
- Pretty-Printable Enumerations in Pure C
- What’s the Point of the C Preprocessor, Actually?
- Macros on Steroids, Or: How Can Pure C Benefit From Metaprogramming
- Extend Your Language, Don’t Alter It
- Compiling Algebraic Data Types in Pure C99
- Comparing Rust and Datatype99
- Compile-Time Introspection of Sum Types in Pure C99
- Unleashing Sum Types in Pure C99
- Update
DATATYPE99_MAJOR,DATATYPE99_MINOR, andDATATYPE99_PATCHindatatype99.h. - Update
CHANGELOG.md. - Release the project inGitHub Releases.
A: There is a lot of software written in plain C that can benefit from Datatype99; C is #1 programming language as of 2020,according to TIOBE. People use C due to technical and social reasons:
Datatype99 can be seamlessly integrated into existing codebases written in pure C -- just
#include <datatype99.h>and you are ready to go. On the other hand, other languages force you to separate native C files from their sources, which is clearly less convenient.In some environments, developers strick to pure C for historical reasons (e.g., embedded devices, Linux and other operating systems).
C has a stable ABI which is vital for some projects (e.g., plugin systems such asMetaCall).
C is a mature language with a complete specification and a plenitude of libraries. Rust has no complete specification, andZig is not yet production-ready. I know a few stories when these two languages were rejected for new projects, and I can understand this decision.
Historically, C has been targeting nearly all platforms. This is not the case with Rust, which depends on LLVM as for now.
Your company obligates you to use C.
Etc.
See also:
- "Rust is not a good C replacement" by Drew DeVault.
Overall, if you can afford a more modern/high-level language, I encourage you to do so instead of using old C. However, many people do not have this possibility (or it would be too costly).
A: SeeMetalang99's README >>.
A: In short,datatype expands to a tagged union with value constructors;match expands to a switch statement. To generate all this stuff,Metalang99 is used, a preprocessor metaprogramming library.
More on it in"Compiling Algebraic Data Types in Pure C99".
A: Yes, C++11 and onwards is supported.
A:Metalang99 is a functional language for metaprogramming, whereas Datatype99 is an implementation of algebraic data types written in this language.
A: Some kinds of syntactic errors are detected by the library itself:
[playground.c]
datatype(A, (Foo,int),Bar(int));
[/bin/sh]
$ gcc playground.c -Imetalang99/include -Idatatype99 -ftrack-macro-expansion=0playground.c:3:1: error: static assertion failed: "ML99_assertIsTuple: Bar(int) must be (x1, ..., xN)" 3 | datatype(A, (Foo, int), Bar(int)); | ^~~~~~~~[playground.c]
datatype(A, (Foo,int) (Bar,int));
[/bin/sh]
$ gcc playground.c -Imetalang99/include -Idatatype99 -ftrack-macro-expansion=0playground.c:3:1: error: static assertion failed: "ML99_assertIsTuple: (Foo, int) (Bar, int) must be (x1, ..., xN), did you miss a comma?" 3 | datatype(A, (Foo, int) (Bar, int)); | ^~~~~~~~[playground.c]
datatype(A, (Foo,int), (Bar,int),/* trailing comma is prohibited */);
[/bin/sh]
$ gcc playground.c -Imetalang99/include -Idatatype99 -ftrack-macro-expansion=0playground.c:3:1: error: static assertion failed: "ML99_assertIsTuple: must be (x1, ..., xN)" 3 | datatype(A, (Foo, int), (Bar, int), /* trailing comma is prohibited */); | ^~~~~~~~(For better diagnostics, use the latest Metalang99.)
The others are understandable as well:
[playground.c]
datatype(Foo, (FooA,NonExistingType));
[/bin/sh]
playground.c:3:1: error: unknown type name ‘NonExistingType’ 3 | datatype( | ^~~~~~~~playground.c:3:1: error: unknown type name ‘NonExistingType’playground.c:3:1: error: unknown type name ‘NonExistingType’[playground.c]
match(*tree) {of(Leaf,x)return*x;// of(Node, lhs, x, rhs) return sum(*lhs) + *x + sum(*rhs);}
[/bin/sh]
playground.c: In function ‘sum’:playground.c:6:5: warning: enumeration value ‘NodeTag’ not handled in switch [-Wswitch] 6 | match(*tree) { | ^~~~~[playground.c]
match(*tree) {of(Leaf,x,excess)return*x;of(Node,lhs,x,rhs)returnsum(*lhs)+*x+sum(*rhs);}
[/bin/sh]
playground.c: In function ‘sum’:playground.c:15:9: error: unknown type name ‘Leaf_1’; did you mean ‘Leaf_0’? 15 | of(Leaf, x, excess) return *x; | ^~ | Leaf_0playground.c:15:9: error: ‘BinaryTreeLeaf’ has no member named ‘_1’; did you mean ‘_0’? 15 | of(Leaf, x, excess) return *x; | ^~ | _0[playground.c]
BinaryTreetree=Leaf("hello world");
[/bin/sh]
playground.c: In function ‘main’:playground.c:18:28: warning: passing argument 1 of ‘Leaf’ makes integer from pointer without a cast [-Wint-conversion] 18 | BinaryTree tree = Leaf("hello world"); | ^~~~~~~~~~~~~ | | | char *playground.c:6:1: note: expected ‘int’ but argument is of type ‘char *’ 6 | datatype( | ^~~~~~~~[playground.c]
intsum(constBinaryTree*tree) {match(*tree) {of(Leaf,x)returnx;// x is int *of(Node,lhs,x,rhs)returnsum(*lhs)+*x+sum(*rhs); }}
[/bin/sh]
playground.c: In function ‘sum’:playground.c:17:28: warning: returning ‘Leaf_0 *’ {aka ‘int *’} from a function with return type ‘int’ makes integer from pointer without a cast [-Wint-conversion] 17 | of(Leaf, x) return x; // x is int * | ^From my experience, nearly 95% of errors make sense.
If an error is not comprehensible at all, try to look at generated code (-E). Hopefully, thecode generation semantics is formally defined so normally you will not see something unexpected.
A: VS Code automatically enables suggestions of generated types but, of course, it does not support macro syntax highlighting.
A: Datatype99 is known to work on these compilers:
- GCC
- Clang
- MSVC
- TCC
This warning happens when you try to return control from within amatch statement, and your compiler thinks that not all hypothetical variants are handled. For example:
datatype(MyType, (Foo), (Bar));inthandle(MyTypeval) {match(val) {of(Foo)return5;of(Bar)return7; }}
The above code may seem perfect at first glance, but in fact, it is not. The reason is this:match(val) boils down toswitch(val.tag) under the hood, withval.tag being an ordinary C enumeration consisting of the variantsFoo andBar. But what if a caller provides us with neitherFoo norBar, but with something like42 (not a valid variant)? Sinceenum is merely another way to give integers names, a compiler would not complain on thecaller site. However, on thecallee site, we would have the warning:
test.c: In function ‘handle’:test.c:10:1: warning: control reaches end of non-void function [-Wreturn-type] 10 | } | ^The solution is to either panic or return some error-signaling code, like this:
inthandle(MyTypeval) {match(val) {of(Foo)return5;of(Bar)return7; }// Invalid input (no such variant).return-1;}
Seeissue #9.
About
Algebraic data types for C99
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
