rfc:union_types_v2

PHP RFC: Union Types 2.0

Date: 2019-09-02
Author: Nikita Popovnikic@php.net
Status: Implemented
Proposed Version: PHP 8.0
Pull request discussion:php/php-rfcs#0001
Mailing list thread:https://externals.io/message/106844
Implementation:php/php-src#4838

This proposal was originally introduced and discussed onGitHub, as part of anRFC workflow experiment. This wiki page contains the final version of the proposal.

Introduction

A “union type” accepts values of multiple different types, rather than a single one. PHP already supports two special union types:

Type ornull, using the special?Type syntax.
array orTraversable, using the specialiterable type.

However, arbitrary union types are currently not supported by the language. Instead, phpdoc annotations have to be used, such as in the following example:

class Number{/**     * @var int|float $number     */private$number; /**     * @param int|float $number     */publicfunction setNumber($number){$this->number=$number;} /**     * @return int|float     */publicfunction getNumber(){return$this->number;}}

The statistics section shows that the use of union types is indeed pervasive in the open-source ecosystem, as well as PHP's own standard library.

Supporting union types in the language allows us to move more type information from phpdoc into function signatures, with the usual advantages this brings:

Types are actually enforced, so mistakes can be caught early.
Because they are enforced, type information is less likely to become outdated or miss edge-cases.
Types are checked during inheritance, enforcing the Liskov Substitution Principle.
Types are available through Reflection.
The syntax is a lot less boilerplate-y than phpdoc.

After generics, union types are currently the largest “hole” in our type declaration system.

Proposal

Union types are specified using the syntaxT1|T2|... and can be used in all positions where types are currently accepted:

class Number{private int|float$number; publicfunction setNumber(int|float$number): void{$this->number=$number;} publicfunction getNumber(): int|float{return$this->number;}}

Supported Types

Union types support all types currently supported by PHP, with some caveats outlined in the following.

void type

Thevoid type can never be part of a union. As such, types likeT|void are illegal in all positions, including return types.

Thevoid type indicates that the function has no return value, and enforces that argument-lessreturn; is used to return from the function. It is fundamentally incompatible with non-void return types.

What is likely intended instead is?T, which allows returning eitherT ornull.

Nullable union types

Thenull type is supported as part of unions, such thatT1|T2|null can be used to create a nullable union. The existing?T notation is considered a shorthand for the common case ofT|null.

An earlier version of thisRFC proposed to use?(T1|T2) for nullable union types instead, to avoid having two ways of expressing nullability in PHP. However, this notation is both rather awkward syntactically, and differs from the well-establishedT1|T2|null syntax used by phpdoc comments. The discussion feedback was overwhelmingly in favor of supporting theT1|T2|null notation.

?T remains valid syntax that denotes the same type asT|null. It is neither discouraged nor deprecated, and there are no plans to deprecate it in the future. It is merely a shorthand alias for a particularly common union type.

Thenull type is only allowed as part of a union, and can not be used as a standalone type. Allowing it as a standalone type would make bothfunction foo(): void andfunction foo(): null legal function signatures, with similar but not identical semantics. This would negatively impact teachability for an unclear benefit.

false pseudo-type

While we nowadays encourage the use ofnull overfalse as an error or absence return value, for historical reasons many internal functions continue to usefalse instead. As shown in the statistics section, the vast majority of union return types for internal functions includefalse.

A classical example is thestrpos() family of functions, which returnsint|false.

While it would be possible to model this less accurately asint|bool, this gives the false impression that the function can also return atrue value, which makes this type information significantly less useful to humans and static analyzers both.

For this reason, support for thefalse pseudo-type is included in this proposal. Atrue pseudo-type isnot part of the proposal, because similar historical reasons for its necessity do not exist.

Thefalse pseudo-type cannot be used as a standalone type (including nullable standalone type). As such, all offalse,false|null and?false are not permitted.

Duplicate and redundant types

To catch some simple bugs in union type declarations, redundant types that can be detected without performing class loading will result in a compile-time error. This includes:

Each name-resolved type may only occur once. Types likeint|string|INT result in an error.
Ifbool is used,false cannot be used additionally.
Ifobject is used, class types cannot be used additionally.
Ifiterable is used,array andTraversable cannot be used additionally.

This does not guarantee that the type is “minimal”, because doing so would require loading all used class types.

For example, ifA andB are class aliases, thenA|B remains a legal union type, even though it could be reduced to eitherA orB. Similarly, ifclass B extends A {}, thenA|B is also a legal union type, even though it could be reduced to justA.

function foo(): int|INT{}// Disallowedfunction foo(): bool|false{}// Disallowed use Aas B;function foo(): A|B{}// Disallowed ("use" is part of name resolution) class_alias('X','Y');function foo(): X|Y{}// Allowed (redundancy is only known at runtime)

Type grammar

Excluding the specialvoid type, PHP's type syntax may now be described by the following grammar:

type: simple_type    | "?" simple_type    | union_type    ;union_type: simple_type "|" simple_type          | union_type "|" simple_type          ;simple_type: "false"          # only legal in unions           | "null"           # only legal in unions           | "bool"           | "int"           | "float"           | "string"           | "array"           | "object"           | "iterable"           | "callable"       # not legal in property types           | "self"           | "parent"           | namespaced_name           ;

Variance

Union types follow the existing variance rules:

Return types are covariant (child must be subtype).
Parameter types are contravariant (child must be supertype).
Property types are invariant (child must be subtype and supertype).

The only change is in how union types interact with subtyping, with three additional rules:

A unionU_1|...|U_n is a subtype ofV_1|...|V_m if for eachU_i there exists aV_j such thatU_i is a subtype ofV_j.
Theiterable type is considered to be the same (i.e. both subtype and supertype) asarray|Traversable.
Thefalse pseudo-type is considered a subtype ofbool.

In the following, some examples of what is allowed and what isn't are given.

Property types

Property types are invariant, which means that types must stay the same during inheritance. However, the “same” type may be expressed in different ways. Prior to union types, one such possibility was to have two aliased classesA andB, in which case a property type may legally change fromA toB or vice versa.

Union types expand the possibilities in this area: For exampleint|string andstring|int represent the same type. The following example shows a more complex case:

class A{}class Bextends A{} class Test{public A|B$prop;}class Test2extends Test{public A$prop;}

In this example, the unionA|B actually represents the same type as justA, and this inheritance is legal, despite the type not being syntactically the same.

Formally, we arrive at this result as follows: First,A is a subtype ofA|B, because it is a subtype ofA. Second,A|B is a subtype ofA, becauseA is a subtype ofA andB is a subtype ofA.

Adding and removing union types

It is legal to remove union types in return position and add union types in parameter position:

class Test{publicfunction param1(int$param){}publicfunction param2(int|float$param){} publicfunction return1(): int|float{}publicfunction return2(): int{}} class Test2extends Test{publicfunction param1(int|float$param){}// Allowed: Adding extra param typepublicfunction param2(int$param){}// FORBIDDEN: Removing param type publicfunction return1(): int{}// Allowed: Removing return typepublicfunction return2(): int|float{}// FORBIDDEN: Adding extra return type}

Variance of individual union members

Similarly, it is possible to restrict a union member in return position, or widen a union member in parameter position:

class A{}class Bextends A{} class Test{publicfunction param1(B|string$param){}publicfunction param2(A|string$param){} publicfunction return1(): A|string{}publicfunction return2(): B|string{}} class Test2extends Test{publicfunction param1(A|string$param){}// Allowed: Widening union member B -> Apublicfunction param2(B|string$param){}// FORBIDDEN: Restricting union member A -> B publicfunction return1(): B|string{}// Allowed: Restricting union member A -> Bpublicfunction return2(): A|string{}// FORBIDDEN: Widening union member B -> A}

Of course, the same can also be done with multiple union members at a time, and be combined with the addition/removal of types mentioned previously.

Coercive typing mode

Whenstrict_types is not enabled, scalar type declarations are subject to limited implicit type coercions. These are problematic in conjunction with union types, because it is not always obvious which type the input should be converted to. For example, when passing a boolean to anint|string argument, both0 and“” would be viable coercion candidates.

If the exact type of the value is not part of the union, then the target type is chosen in the following order of preference:

int
float
string
bool

If the type both exists in the union, and the value can be coerced to the type under PHPs existing type checking semantics, then the type is chosen. Otherwise the next type is tried.

As an exception, if the value is a string and bothint andfloat are part of the union, the preferred type is determined by the existing “numeric string” semantics. For example, for“42” we chooseint, while for“42.0” we choosefloat.

Types that are not part of the above preference list are not eligible targets for implicit coercion. In particular no implicit coercions to thenull andfalse types occur.

Conversion Table

The following table shows how the above order of preference plays out for different input types, assuming that the exact type is not part of the union:

Original type	1st try	2nd try	3rd try
bool	int	float	string
int	float	string	bool
float	int	string	bool
string	int/float	bool
object	string

Examples

// int|string42-->42// exact type"42"-->"42"// exact typenew ObjectWithToString-->"Result of __toString()"// object never compatible with int, fall back to string42.0-->42// float compatible with int42.1-->42// float compatible with int1e100-->"1.0E+100"// float too large for int type, fall back to stringINF-->"INF"// float too large for int type, fall back to stringtrue-->1// bool compatible with int[]-->TypeError// array not compatible with int or string // int|float|bool"45"-->45// int numeric string"45.0"-->45.0// float numeric string"45X"-->45+ Notice: Non well formed numeric string// int numeric string""-->false// not numeric string, fall back to bool"X"-->true// not numeric string, fall back to bool[]-->TypeError// array not compatible with int, float or bool

Alternatives

There are two main alternatives to the preference-based approach used by this proposal:

The first is to specify that union typesalways use strict typing, thus avoiding any complicated coercion semantics altogether. Apart from the inconsistency this introduces in the language, this has two main disadvantages: First going from a type likefloat toint|float would actuallyreduce the number of valid inputs, which is highly unintuitive. Second, it breaks the variance model for union types, because we can no longer say thatfloat is a subtype ofint|float.

The second is to perform the coercions based on the order of types. This would mean thatint|string andstring|int are distinct types, where the former would favor integers and the latter strings. Depending on whether exact type matches are still prioritized, the string type wouldalways be used for the latter case. Once again, this is unintuitive and has very unclear implications for the subtyping relationship on which variance is based.

Property types and references

References to typed properties with union types follow the semantics outlined in thetyped properties RFC:

If typed properties are part of the reference set, then the value is checked against each property type. If a type check fails, a TypeError is generated and the value of the reference remains unchanged.

There is one additional caveat: If a type check requires a coercion of the assigned value, it may happen that all type checks succeed, but result in different coerced values. As a reference can only have a single value, this situation also leads to a TypeError.

Theinteraction with union types was already considered at the time, because it impacts the detailed reference semantics. Repeating the example given there:

class Test{public int|string$x;public float|string$y;}$test=new Test;$r="foobar";$test->x=&$r;$test->y=&$r; // Reference set: { $r, $test->x, $test->y }// Types: { mixed, int|string, float|string } $r=42;// TypeError

The basic issue is that the final assigned value (after type coercions have been performed) must be compatible with all types that are part of the reference set. However, in this case the coerced value will beint(42) for propertyTest::$x, while it will befloat(42.0) for propertyTest::$y. Because these values are not the same, this is considered illegal and aTypeError is thrown.

An alternative approach would be to cast the value to the only common typestring instead, with the major disadvantage that this matchesneither of the values you would get from a direct property assignment.

Reflection

To support union types, a new classReflectionUnionType is added:

class ReflectionUnionTypeextends ReflectionType{/** @return ReflectionType[] */publicfunction getTypes(); /* Inherited from ReflectionType *//** @return bool */publicfunction allowsNull(); /* Inherited from ReflectionType *//** @return string */publicfunction __toString();}

ThegetTypes() method returns an array ofReflectionTypes that are part of the union. The types may be returned in an arbitrary order that does not match the original type declaration. The types may also be subject to equivalence transformations.

For example, the typeint|string may return types in the order[“string”, “int”] instead. The typeiterable|array|string might be canonicalized toiterable|string orTraversable|array|string. The only requirement on the ReflectionAPI is that the ultimately represented type is equivalent.

TheallowsNull() method returns whether the union contains the typenull.

The__toString() method returns a string representation of the type that constitutes a valid code representation of the type in a non-namespaced context. It is not necessarily the same as what was used in the original code.

For backwards-compatibility reasons, union types that only includenull and one other type (written as?T,T|null, or through implicit parameter nullability), will instead useReflectionNamedType.

Examples

// This is one possible output, getTypes() and __toString() could// also provide the types in the reverse order instead.function test(): float|int{}$rt=(new ReflectionFunction('test'))->getReturnType();var_dump(get_class($rt));// "ReflectionUnionType"var_dump($rt->allowsNull());// falsevar_dump($rt->getTypes());// [ReflectionType("int"), ReflectionType("float")]var_dump((string)$rt);// "int|float" function test2(): float|int|null{}$rt=(new ReflectionFunction('test2'))->getReturnType();var_dump(get_class($rt));// "ReflectionUnionType"var_dump($rt->allowsNull());// truevar_dump($rt->getTypes());// [ReflectionType("int"), ReflectionType("float"),//  ReflectionType("null")]var_dump((string)$rt);// "int|float|null" function test3(): int|null{}$rt=(new ReflectionFunction('test3'))->getReturnType();var_dump(get_class($rt));// "ReflectionNamedType"var_dump($rt->allowsNull());// truevar_dump($rt->getName());// "int"var_dump((string)$rt);// "?int"

Backwards Incompatible Changes

ThisRFC does not contain any backwards incompatible changes. However, existing ReflectionType based code will have to be adjusted in order to support processing of code that uses union types.

Vote

Voting started 2019-10-25 and ends 2019-11-08.

Add union types as proposed?
Real name	Yes	No
ajf
alcaeus
asgrim
ashnazg
beberlei
bishop
bwoebi
carusogabriel
colinodell
cpriest
dams
danack
derick
duncan3dc
eliw
galvao
gasolwu
girgias
guilhermeblanco
hirokawa
hywan
jasny
jbnahan
jhdxr
jmikola
jwage
kalle
kguest
kocsismate
krakjoe
laruence
lcobucci
levim
lex
malukenho
marandall
mariano
mbeccati
mcmic
mike
narf
nikic
ocramius
pajoye
patrickallaert
peehaa
pollita
ralphschindler
ramsey
rdohms
reywob
rjhdby
salathe
sammyk
santiagolizardo
sebastian
sergey
ssb
stas
subjective
svpernova09
thekid
thorstenr
trowski
wyrihaximus
zimt
Final result:	61	5
This poll has been closed.

Future Scope

The features discussed in the following arenot part of this proposal.

type ArrayFilterFlags=0|ARRAY_FILTER_USE_KEY|ARRAY_FILTER_USE_BOTH;array_filter(array$array, callable$callback, ArrayFilterFlags$flag):array;

A benefit of using a union of literal types instead of an enum, is that it works directly with values of the underlying type, rather than an opaque enum value. As such, it is easier to retrofit without breaking backwards-compatibility.

ThisRFC intentionally supports thefalse type in a maximally restricted form, which is enough to model internal function return values, but avoids unnecessarily constraining a future proposal for introducing first-class literal types. In particular:

No values implicitly coerce tofalse, while it would also be possible to followbool parameter coercion semantics, restricted to input values that coerce tofalse. Both approaches have advantages, but we pick the conservative option, which permits future extension, here.
Onlyfalse is supported, but nottrue. Once both are supported, the subtyping relationship betweenfalse|true andbool needs to be defined (which is also tightly related to the question of implicit coercions).

Type Aliases

As types become increasingly complex, it may be worthwhile to allow reusing type declarations. There are two general ways in which this could work. One is a local alias, such as:

use int|floatas number; function foo(number$x){}

In this casenumber is a symbol that is only visible locally and will be resolved to the originalint|float type during compilation.

The second possibility is an exported typedef:

namespace Foo;type number= int|float; // Usable as \Foo\number from elsewhere

Statistics

To illustrate the use of union types in the wild, the use of union types in@param and@return annotations in phpdoc comments has been analyzed.

In the top two thousand composer packages there are:

25k parameter union types:Full JSON data
14k return union types:Full JSON data

In the PHP stubs for internal functions (these are incomplete right now, so the actual numbers should be at least twice as large) there are:

336 union return types
of which 312 includefalse as a value

This illustrates that thefalse pseudo-type in unions is necessary to express the return type of many existing internal functions.

rfc/union_types_v2.txt · Last modified:2025/04/03 13:08 by127.0.0.1

Movatterモバイル変換

PHP RFC: Union Types 2.0

Introduction

Proposal

Supported Types

void type

Nullable union types

false pseudo-type

Duplicate and redundant types

Type grammar

Variance

Property types

Adding and removing union types

Variance of individual union members

Coercive typing mode

Conversion Table

Examples

Alternatives

Property types and references

Reflection

Examples

Backwards Incompatible Changes

Vote

Future Scope

Intersection Types

Mixed Type

Literal Types

Type Aliases

Statistics

Page Tools

Table of Contents