- Notifications
You must be signed in to change notification settings - Fork4
Tentative Extra Data Structures for php
License
TysonAndre/pecl-teds
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Teds is a another collection of data structures.(Tentative Extra Data Structures)
This extension requires php 8.0 or newer.
phpize./configuremake install
On Windows, seehttps://wiki.php.net/internals/windows/stepbystepbuild_sdk_2 instead
Teds
contains the following types ofTeds\Collection
instances as well as variousiterable functionality:
Teds\Sequence
is an interface representing a Collection of values with keys0..n-1
and no gaps. (also known as a list in other languages)It is implemented by the following classes.
Teds\Vector
- a memory-efficient representation of a list of values that can easily grow/shrink.Teds\Deque
- a memory-efficient representation of a list of values with amortized constant time push/pop/pushFront/popFront.
Teds
also includes several specializations ofTeds\Sequence
for memory-efficiency:
Teds\IntVector
- a memory-efficient specialization of lists of integers, typically with faster serialization requiring less memory.Teds\LowMemoryVector
- exposes the same API asTeds\Vector
but uses less memory for specializations (exclusively bool/null, exclusively integers, exclusively floats)Teds\BitVector
- exposes the same API asTeds\Vector
but uses significantly less memory for booleans. Provides ways to read/write raw bytes/words.
And immutable versions:
Teds\ImmutableSequence
- represents an immutable sequence.Teds\EmptySequence::INSTANCE
(PHP 8.1+, uses enum) - represents an immutable empty sequence.
TheTeds\Set
interface is implemented by the following classes:
Teds\StrictHashSet
- a hash table based set of unique values providing efficient average-time lookup/insertion/removal. UsesTeds\strict_hash
.Teds\StrictTreeSet
- a binary tree-based set ofsorted unique values providing good worst-case time. UsesTeds\stable_compare
for stable ordering.Teds\StrictSortedVectorSet
- provides a similar API toTeds\StrictTreeSet
but is represented internally as a Vector. This has reduced memory usage and faster construction time, and can be useful in cases where modification is infrequent (e.g. more common to unserialize without modifying)
Some specializations are provided for reduced memory usage:
Teds\SortedIntVectorSet
- a mutable sorted set of integers, represented internally as a Vector. This has reduced memory usage and can be useful in cases where modification is infrequent (e.g. more common to unserialize without modifying)
Teds
also provides immutable specializations of common data types offering faster serialization/unserialization:
Teds\ImmutableSortedStringSet
- a sorted set of strings sorted by strcmp. Note that internally, this is backed by a single string with the data that would be used for__unserialize
/__serialize
.As a result, this is faster to unserialize and uses less memory than an array (see benchmark), but when iterating over the string, temporary copies of the strings that are members of the set are returned.
Teds\ImmutableSortedIntSet
- a sorted set of integers.Teds\EmptySet::INSTANCE
(PHP 8.1+, uses enum) - represents an immutable empty set.
TheTeds\Map
interface is implemented by the following classes:
Teds\StrictHashMap
- a hash table based map of unique keys to values providing efficient average-time lookup/insertion/removal. UsesTeds\strict_hash
.Teds\StrictTreeMap
- a binary tree-based map ofsorted unique keys to values providing slower average-case time but good worst-case time. UsesTeds\stable_compare
for stable ordering.Teds\StrictSortedVectorMap
- provides a similar API toTeds\StrictTreeMap
but is represented internally like a Vector. This has reduced memory usage and faster construction time and can be useful in cases where modification is infrequent (e.g. more common to unserialize without modifying)
Teds
also provides the following map types:
Teds\EmptyMap::INSTANCE
(PHP 8.1+, uses enum) - represents an immutable empty map.
Teds\StrictMinHeap
andTeds\StrictMaxHeap
are heaps where elements can be added (allowing duplicates) and are removed in order (or reverse order) ofTeds\stable_compare
.Teds\CachedIterable
is aCollection
that lazily evaluates iterables such as Generators and stores the exact keys and values to allow future retrieval and iteration.Teds\ImmutableIterable
is aCollection
that eagerly evaluates iterables such as Generators and stores the exact keys and values to allow future retrieval and iteration.Teds\MutableIterable
is aCollection
that allows construction and modification of iterables, setting keys and values and allowing duplicate keys in any order.
Currently, PHP does not provide a built-in way to store the state of an arbitrary iterable for reuse later (when the iterable has arbitrary keys, or when keys might be repeated). There are use cases for this, such as:
- Creating a rewindable copy of a non-rewindable
Traversable
(e.g. aGenerator
) before passing that copy to a function that consumes aniterable
/Traversable
. (new ImmutableIterable(my_generator())) - Generating an
IteratorAggregate
from a class still implementingIterator
(e.g.SplObjectStorage
) so that code can independently iterate over the key-value sequences. (e.g.foreach ($immutableImmutableIterable as $k1 => $v1) { foreach ($immutableImmutableIterable as $k2 => $v2) { /* process pairs */ } }
) - Providing helpers such as
iterable_flip(iterable $iterable)
,iterable_take(iterable $iterable, int $limit)
,iterable_chunk(iterable $iterable, int $chunk_size)
that act on iterables with arbitrary key/value sequences and have return values including iterables with arbitrary key/value sequences - Providing constant time access to both keys and values of arbitrary key-value sequences at any offset (for binary searching on keys and/or values, etc.)
Having this implemented as a native class allows it to be much more efficient than a userland solution (in terms of time to create, time to iterate over the result, and total memory usage, depending on the representation).
Objects within this data structure or references in arrays in this data structure can still be mutated.
Traversables are eagerly iterated over in the constructor.
This is similar toTeds\ImmutableIterable
but lazily evaluates Traversables instead of eagerly evaluating Traversables.(E.g. Generators will only run until the last offset used from a CachedIterable. Seetests/CachedIterable/lazy.phpt andtests/CachedIterable/selfReferential.phpt for examples.)
This can be used to cache results of generators without fetching more elements than needed (e.g. Generators that repeatedly call databases or services to paginate).
Similar to SplFixedArray or Ds\Sequence, but immutable.This stores a sequence of values with the keys 0, 1, 2....
This exposes the same API as a Vector, but with a more memory-efficient representation if the LowMemoryVector has only ever included a type that it could optimize.
Benefits:
For collections of exclusively int, exclusively floats, or exclusively null/true/false, this uses less memory when serialized compared to vectors/arrays.
For collections of exclusively int, exclusively floats, or exclusively null/true/false, this uses less memory when serialized compared to vectors/arrays.
Note that adding other types will make this use as much memory as a Vector, e.g. adding any non-float (including int) to a collection of floats.
Has faster checks for contains/indexOf (if values can have an optimized representation)
Has faster garbage collection (if values can have an optimized representation due to int/float/bool/null not needing reference counting).
Interchangeable with
Vector
or other collections without configuration - this will silently fall back to the defaultmixed
representation if a more efficient representation is not supported.
Drawbacks:
- Slightly more overhead when types aren't specialized.
- Adding a different type to the collection will permanently make it used the less efficient
mixed
representation.
In 64-bit builds, the following types are supported, with the following amounts of memory (plus constant overhead to represent the LowMemoryVector itself, and extra capacity for growing the LowMemoryVector):
null/bool : 1 byte per value.
(In serialize(), this is even less, adding 1 to 2 bits per value (2 bits if null is included)).
signed 8-bit int (-128..127): 1 byte per value. Adding a larger int to any of these n-bit types will convert them to the collection of that larger int type.
signed 16-bit int: 2 bytes per value.
signed 32-bit int: 4 bytes per value.
signed 64-bit int: 8 bytes per value. (64-bit php builds only)
signed PHP
float
: 8 bytes per value. (Cdouble
)mixed
or combinations of the above: 16 bytes per value. (Same asVector
)
In comparison, in 64-bit builds of PHP, PHP's arrays take at least 16 bytes per value in php 8.2, and at least 32 bytes per value before php 8.1, at the time of writing.
Example benchmarks:benchmarks/benchmark_vector_bool.php andbenchmarks/benchmark_vector_unserialize.phpt.
Similar toTeds\LowMemoryVector
but throws a TypeError on attempts to add non-integers.
Similar toTeds\LowMemoryVector
/Teds\IntVector
but throws a TypeError on attempts to add non-booleans.This can be used as a memory-efficient vector of booleans.
This uses only a single bit per value for large bit sets in memory and when serializing (around 128 times less memory than arrays, for large arrays of booleans)
Similar to SplFixedArray or Ds\Vector.This stores a mutable sequence of values with the keys 0, 1, 2...It can be appended to withpush()
, and elements can be removed from the end withpop()
This is implemented based on SplFixedArray/ImmutableSequence.There are plans to add more methods.
Similar toTeds\Vector
andTeds\ImmutableIterable
.This stores a mutable vector of keys and values with the keys 0, 1, 2...It can be resized withsetSize()
.
Similar to SplDoublyLinkedList but backed by an array instead of a linked list.Much more efficient in memory usage and random access than SplDoublyLinkedList.
(Also similar toDs\Deque
)
This is a map where entries for keys of any type can be inserted ifTeds\stable_compare !== 0
.
This currently uses a balancedred-black tree to ensure logarithmic time is needed for insertions/removals/lookups.
Removing aTeds\StrictTreeMap
/Teds\StrictTreeSet
entry will move iterators pointing to that entries to the entry before the removed entry (as of 1.2.1).
This usesTeds\stable_compare
internally.
TheTeds\StrictTreeSet
API implementation is similar, but does not associate values with keys. Also,StrictTreeSet
does not implement ArrayAccess and uses different method names.
This is a map where entries for keys of any type can be inserted if they are!==
to other keys.This usesTeds\strict_hash
internally.
TheTeds\StrictHashSet
API implementation is similar, but does not associate values with keys and does not implement ArrayAccess and uses different method names.
NOTE: The floats0.0
and-0.0
(negative zero) have the same hashes and are treated as the same entries, because0.0 === -0.0
in php.NOTE: The floatNAN
(Not a Number) is deliberately treated as equivalent to itself byTeds\strict_hash
andStrictHashSet
/StrictHashMap
, despite havingNAN !== $x
in php for any $x, includingNAN
. This is done to avoid duplicate or unremovable entries.
Removing an entry from a hash map/set will move iterators pointing to that entry to the entry prior to the removed entry.
This usesTeds\stable_compare
instead of PHP's unstable default comparisons.Sorting logic can be customized by inserting[$priority, $value]
instead of$value
.(Or by subclassingSplMinHeap
/SplMaxHeap
and overridingcompare
manually).
php >$x =newSplMinHeap();php >foreach (['19','9','2b','2']as$v) {$x->insert($v); }php >foreach ($xas$value) {echojson_encode($value).","; }echo"\n";// unpredictable order"2","19","2b","9",php >$x =newTeds\StrictMinHeap();php >foreach (['19','9','2b','2']as$v) {$x->add($v); }php >foreach ($xas$value) {echojson_encode($value).","; }echo"\n";// lexicographically sorted"19","2","2b","9",
This provides empty immutable collections for php 8.1+ based on single-case enums.
Empty Immutable Collection API
NOTE: This is currently being revised, and new methods may be added to these interfaces in 0.x releases or new major releases. More methods are currently being added.
These provide common interfaces for accessing the lists, sorted/hash sets, sorted/hash maps, sequences, and key value sequences that are provided byTeds\
.
<?phpnamespaceTeds;/** * Collection is a common interface for an object with values that can be iterated over and counted, * but not addressed with ArrayAccess (e.g. Sets, Heaps, objects that yield values in an order in their iterators, etc.) */interface Collectionextends \Traversable, \Countable {/** @return list<values> the list of values in the collection */publicfunctionvalues():array {}/** * Returns a list or associative array, * typically attempting to cast keys to array key types (int/string) to insert elements of the collection into the array, or throwing. * * When this is impossible for the class in general, * the behavior depends on the class implementation (e.g. throws \Teds\UnsupportedOperationException, returns an array with representations of key/value entries of the Collection) */publicfunctiontoArray():array {}/** Returns true if count() would be 0 */publicfunctionisEmpty():bool {}/** Returns true if this contains a value identical to $value. */publicfunctioncontains(mixed$value):bool {}}/** * This represents a Collection that can be used like a list without gaps. * E.g. get()/set() will work for is_int($offset) && 0 <= $offset and $offset < $list->count(). */interface Sequenceextends Collection, \ArrayAccess {publicfunctionget(int$offset):mixed {}publicfunctionset(int$offset,mixed$value):void {}publicfunctionpush(mixed ...$values):void {}publicfunctionpop():mixed {}}/** * A Map is a type of Collection mapping unique keys to values. * * Implementations should either coerce unsupported key types or throw TypeError when using keys. * * Implementations include * * 1. Teds\StrictHashMap, a hash table with amortized constant time operations * 2. Teds\StrictTreeMap, a sorted binary tree */interface Mapextends Collection, \ArrayAccess {/** * Returns true if there exists a key identical to $key according to the semantics of the implementing collection. * Typically, this is close to the definition of `===`, but may be stricter or weaker in some implementations, e.g. for NAN, negative zero, etc. * * containsKey differs from offsetExists, where implementations of offsetExists usually return false if the key was found but the corresponding value was null. * (This is analogous to the difference between array_key_exists($key, $array) and isset($array[$key])) */publicfunctioncontainsKey(mixed$value):bool {}}/** * A Set is a type of Collection representing a set of unique values. * Implementations include Teds\StrictHashSet and Teds\StrictTreeSet. * * 1. Teds\StrictHashSet, a hash table with amortized constant time operations * 2. Teds\StrictTreeSet, a sorted binary tree */interface Setextends Collection {/** * Returns true if $value was added to this Set and was not previously in this Set. */publicfunctionadd(mixed$value):bool {}/** * Returns true if $value was found in this Set before being removed from this Set. */publicfunctionremove(mixed$value):bool {}}
This PECL contains a library of native implementations of various functions acting on iterables.Seeteds.stub.php
for function signatures.
The behavior is equivalent to the following polyfill(similarly to array_filter, the native implementation is likely faster than the polyfill with no callback, and slower with a callback)
namespaceTeds;/** * Determines whether any element of the iterable satisfies the predicate. * * If the value returned by the callback is truthy * (e.g. true, non-zero number, non-empty array, truthy object, etc.), * this is treated as satisfying the predicate. * * @param iterable $iterable * @param null|callable(mixed):mixed $callback */functionany(iterable$iterable, ?callable$callback =null):bool {foreach ($iterableas$v) {if ($callback !==null ?$callback($v) :$v) {returntrue; } }returnfalse;}/** * Determines whether all elements of the iterable satisfy the predicate. * * If the value returned by the callback is truthy * (e.g. true, non-zero number, non-empty array, truthy object, etc.), * this is treated as satisfying the predicate. * * @param iterable $iterable * @param null|callable(mixed):mixed $callback */functionall(iterable$iterable, ?callable$callback =null):bool {foreach ($iterableas$v) {if (!($callback !==null ?$callback($v) :$v)) {returnfalse; } }returntrue;}/** * Determines whether no element of the iterable satisfies the predicate. * * If the value returned by the callback is truthy * (e.g. true, non-zero number, non-empty array, truthy object, etc.), * this is treated as satisfying the predicate. * * @param iterable $iterable * @param null|callable(mixed):mixed $callback */functionnone(iterable$iterable, ?callable$callback =null):bool {return !any($iterable,$callback);}// Analogous to array_reduce but with mandatory defaultfunctionfold(iterable$iterable,callable$callback,mixed$default):bool {foreach ($iterableas$value) {$default =$callback($default,$value);}return$default;}/** * Returns the first value for which $callback($value) is truthy. */functionfind(iterable$iterable,callable$callback,mixed$default =null):bool {foreach ($iterableas$value) {if ($callback($value)) {return$value;}}return$default;}/** * Similar to in_array($value, $array, true) but also works on Traversables. */functionincludes_value(iterable$iterable,mixed$value):bool {foreach ($iterableas$other) {if ($other ===$value) {returntrue;}}returnfalse;}/** * Returns a list of unique values in order of occurrence, * using a hash table with `Teds\strict_hash` to deduplicate values. */functionunique_values(iterable$iterable):array {// Without Teds installed, this takes quadratic time instead of linear time.$result = [];foreach ($iterableas$value) {if (!in_array($value,$result,true)) {$result[] =$value;}}return$result;}
Teds\stable_compare
is a function that can be used to compare arbitrary values in a stable order.
This exists because php's<
operator is not stable.'10' < '0a' < '1b' < '9' < '10'
.Teds\stable_compare
fixes that by strictly ordering:
null < false < true < int,float < string < array < object < resource
.- objects are compared by class name with strcmp, then by spl_object_id.
- resources are compared by id.
- arrays are compared recursively. Smaller arrays are less than larger arrays.
- int/float are compared numerically. If an int is equal to a float, then the int is first.
- strings are compared with strcmp.
Teds\strict_hash
provides a hash based on value identity.Before a final step to improve accidental hash collisions:
- Objects are hashed based only on
spl_object id
.Different objects will have different hashes for the lifetime of the hash. - Resources are hashed based on
get_resource_id
. - Strings are hashed
- References are dereferenced and hashed the same way as the value.
- Integers are used directly.
- Floats are hashed in a possibly platform-specific way.
- Arrays are hashed recursively. If $a1 === $a2 then they will have the same hash.
This may vary based on php release, OS, CPU architecture, or Teds releaseand should not be saved/loaded outside of a given php process.(andspl_object_id
/get_resource_id
are unpredictable)
Teds\binary_search(array $values, mixed $target, callable $comparer = null, bool $useKey=false)
can be used tobinary search on arrays that are sorted by key (ksort, uksort) or value (sort, usort, uasort).(even if keys were unset).
This will have unpredictable results if the array is out of order. SeeTeds\stable_sort
for ways to sort even arbitrary values in a stable order.
This is faster for very large sorted arrays. Seebenchmarks.
This returns the key and value of the first entry<=
$needle according to the comparer, and whether an entry comparing equal was found.By default, php's default comparison behavior (<=>
) is used.
php >$values = [1 =>100,3 =>200,4 =>1000];php > echojson_encode(Teds\binary_search($values,1));{"found":false,"key":null,"value":null}php > echojson_encode(Teds\binary_search($values,100));{"found":true,"key":1,"value":100}php > echojson_encode(Teds\binary_search($values,201));{"found":false,"key":3,"value":200}php > echojson_encode(Teds\binary_search($values,99));{"found":false,"key":null,"value":null}php > echojson_encode(Teds\binary_search($values,1, useKey:true));{"found":true,"key":1,"value":100}
Teds\is_same_array_handle(array $array1, array $array2)
- check if two arrays have the same handle, forinfinite recursion detection.Teds\array_value_first(array $array)
,Teds\array_value_last(array $array)
- Return the first/last value of an array without creating references or moving the internal array pointer. Similar to$array[array_key_first($array)] ?? null
.
This contains functionality and data structures that may be proposed for inclusion into PHP itself (under a different namespace) at a future date, reimplemented usingSPL's source code as a starting point.
Providing this as a PECL first makes this functionality easier to validate for correctness, and make it more practical to change APIs before proposing including them in PHP if needed.
SeeCOPYING
Seepackage.xml
- https://www.php.net/spl is built into php
- https://www.php.net/manual/en/book.ds.php
About
Tentative Extra Data Structures for php