tzaeschke/phtree-cppPublic

forked fromimprobable-eng/phtree-cpp

NotificationsYou must be signed in to change notification settings
Fork11
Star34

PH-Tree C++ implementation

License

Apache-2.0 license

34 stars 17 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.github/workflows		.github/workflows
benchmark		benchmark
ci		ci
cmake		cmake
examples		examples
fuzzer		fuzzer
include/phtree		include/phtree
test		test
third_party		third_party
tools		tools
.bazelignore		.bazelignore
.bazelrc		.bazelrc
.bazelversion		.bazelversion
.clang-format		.clang-format
.gitignore		.gitignore
BUILD		BUILD
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TODO.txt		TODO.txt
WORKSPACE		WORKSPACE

Repository files navigation

This is a fork ofImprobable's (currently unmaintained) PH-tree.

Multi-dimensional / spatial index with very fast insert/erase/relocate operations and scalability with large datasets.This library is C++ / header only.

PH-Tree C++

ThePH-Tree is an ordered index on an n-dimensional space(quad-/oct-/2^n-tree) where each dimension is (by default)indexed by a 64bit integer. The index order follows z-order / Morton order. The default implementation is effectivelya 'map', i.e.each key is associated with at most one value. For convenience there is also a multimap implementationsthat supports multiple entries with identical keys.Keys are points or boxes in n-dimensional space.

Two strengths of PH-Trees are fast insert/removal operations and scalability with large datasets. It also provides fastwindow queries andk-nearest neighbor queries, and it scales well with higher dimensions. The default implementationis limited to 63 dimensions.

The API ist mostly analogous to STL'sstd::map andstd::multimap, see function descriptions for details.

Theoretical background is listedhere.

More information about PH-Trees (including a Java implementation) is availablehere.

User Guide

API Usage

Performance

Compiling / Building

Further Resources

Theory

API Usage

Key Types

ThePH-Tree Map has five predefined tree types:

PhTreeD usesPhPointD keys, which are vectors/points of 64 bitdouble.
PhTreeF usesPhPointF keys, which are vectors/points of 32 bitfloat.
PhTreeBoxD usesPhBoxD keys, which consist of twoPhPointD that define an axis-aligned rectangle/box.
PhTreeBoxF usesPhBoxF keys, which consist of twoPhPointF that define an axis-aligned rectangle/box.
PhTree usesPhPoint keys, which are vectors/points ofstd::int64

ThePH-Tree MultiMap has three predefined tree types:

PhTreeMultiMapD usesPhPointD keys, which are vectors/points of 64 bitdouble.
PhTreeMultiMapF usesPhPointF keys, which are vectors/points of 32 bitfloat.
PhTreeMultiMapBoxD usesPhBoxD keys, which consist of twoPhPointD that define an axis-aligned rectangle/box.
PhTreeMultiMapBoxF usesPhBoxF keys, which consist of twoPhPointF that define an axis-aligned rectangle/box.
PhTreeMultiMap usesPhPoint keys, which are vectors/points ofstd::int64

Additional key types and tree types can be defined easily analogous to the types above, please refer to the declaration of thetypes for an example. Support for custom key classes (points and boxes) as well as custom coordinate mappings can beimplemented using customConverter classes, see below. ThePhTreeMultiMap is by default backedbystd::unordered_set but this can be changed via a template parameter.

ThePhTree andPhTreeMultiMap types are declared inphtree.h andphtree_multimap.h.

Basic Operations

classMyData { ... };MyData my_data;// Create a 3D point tree with floating point coordinates and a value type of `MyData`.auto tree = PhTreeD<3, MyData>();// Create coordinatePhPointD<3> p{1.1,1.0,10.};// Some operationstree.relocate(p1, p2);// Move an entry from point 1 to point 2tree.relocate_if(p1, p2, predicate);// Conditionally move an entry from point 1 to point 2tree.emplace(p, my_data);tree.emplace_hint(hint, p, my_data);tree.try_emplace(p, my_data);tree.try_emplace(hint, p, my_data);tree.insert(p, my_data);tree[p] = my_data;tree.count(p);tree.find(p);tree.lower_bounds(p);// Find a key or the next higher key following Morton order (except positive/negative swapped)tree.erase(p);tree.erase(iterator);tree.size();tree.empty();tree.clear();// Multi-map onlytree.estimate_count(query);

Queries

For-each over all elements:tree.for_each(callback);Note thatfor_each tends to be 10%-20% faster than using an iterator.
Iterator over all elements:auto iterator = tree.begin();
For-each with box shaped window queries:tree.for_each(PhBoxD(min, max), callback);
Iterator for box shaped window queries:auto q = tree.begin_query(PhBoxD(min, max));
Iterator fork nearest neighbor queries:auto q = tree.begin_knn_query(k, center_point, distance_function);
Custom query shapes, such as spheres:tree.for_each(callback, FilterSphere(center, radius, tree.converter()));

For-each example

// Callback for counting entriesstructCounter {voidoperator()(PhPointD<3> key, T& t) {        ++n_;    }size_t n_ =0;};// Count entries inside an axis aligned box defined by the two points (1,1,1) and (3,3,3)Counter callback;tree.for_each({{1,1,1}, {3,3,3}}, callback);// callback.n_ is now the number of entries in the box.

Iterator examples

// Iterate over all entriesfor (auto it : tree) {    ...}// Iterate over all entries inside an axis aligned box defined by the two points (1,1,1) and (3,3,3)for (auto it = tree.begin_query({{1,1,1}, {3,3,3}}); it != tree.end(); ++it) {    ...}// Find 5 nearest neighbors of (1,1,1)for (auto it = tree.begin_knn_query(5, {1,1,1}, DistanceEuclidean<3>())); it != tree.end(); ++it) {    ...}

Filters

All queries allow specifying an additional filter. The filter is called for every key/value pair that would normally bereturned (subject to query constraints) and to every node in the tree that the query decides to traverse (also subjectto query constraints). Returningtrue in the filter does not change query behaviour, returningfalse means that thecurrent value or child node is not returned or traversed. An example of a geometric filter can be foundinphtree/common/filter.h inFilterAABB orFilterSphere (for examples with box keys seeFilterBoxAABB orFilterBoxSphere).

template<dimension_t DIM,typename T>structFilterByValueId {    [[nodiscard]]constexprboolIsEntryValid(const PhPoint<DIM>& key,const T& value)const {// Arbitrary example: Only allow values with even values of id_return value.id_ %2 ==0;    }    [[nodiscard]]constexprboolIsNodeValid(const PhPoint<DIM>& prefix,int bits_to_ignore)const {// Allow all nodesreturntrue;    }};// Iterate over all entries inside an axis aligned box defined by the two points (1,1,1) and (3,3,3).// Return only entries that suffice the filter condition.for (auto it = tree.begin_query({1,1,1}, {3,3,3}, FilterByValueId<3, T>())); it != tree.end(); ++it) {    ...}

Note: The filter example works only for the 'map' version of the PH-Tree, such asPhTree,PhTreeD, ... . Filters forthePhTreeMultiMap are discussed in the next section.

Filters for MultiMaps

ThePhTreeMultiMap requires a different type of filter. In order to function as a multimap, it uses a collections("buckets") as entries for each occupied coordinate. The buckets allow it to store several values per coordinate. Whenusing a filter, the PH-Tree will checkIsEntryValid for everybucket (this is different from version 1.x.x where itcalledIsEntryValid for every entry in a bucket but never for the bucket itself). Since 2.0.0 there is a new functionrequired in every multimap filter:IsBucketEntryValid. It is called once for every entry in a bucket if the bucketpassedIsEntryValid. An example of a geometric filter can be found inphtree/common/filter.h inFilterMultiMapAABB.

template<dimension_t DIM,typename T>structFilterMultiMapByValueId {template<typename BucketT>    [[nodiscard]]constexprboolIsEntryValid(const PhPoint<DIM>& key,const BucketT& bucket)const {// Arbitrary example: Only allow keys/buckets with a certain property, e.g. keys that lie within a given sphere.returncheck_some_geometric_propert_of_key(key);    }    [[nodiscard]]constexprboolIsBucketEntryValid(const PhPoint<DIM>& key,const T& value)const {// Arbitrary example: Only allow values with even values of id_return value.id_ %2 ==0;    }    [[nodiscard]]constexprboolIsNodeValid(const PhPoint<DIM>& prefix,int bits_to_ignore)const {// Allow all nodesreturntrue;    }};

Distance function

Nearest neighbor queries can also use other distance metrics, such as L1 or Chebyshev distance. Note that the queryreturns a special iterator that provides a function to get the distance of the current entry:

#include"phtree/phtree.h"// Find 5 nearest neighbors of (1,1,1) using L1 distancefor (auto it = tree.begin_knn_query(5, {1,1,1}, DistanceL1<3>())); it != tree.end(); ++it) {    std::cout <<"distance =" << it.distance() << std::endl;    ...}

Converters

The PH-Tree can internally only process integer keys. In order to use floating point coordinates, the floating pointcoordinates must be converted to integer coordinates. ThePhTreeD andPhTreeBoxD use by default thePreprocessIEEE &PostProcessIEEE functions. TheIEEE processor is a loss-less converter (in terms of numericprecision) that simply takes the 64bits of a double value and treats them as if they were a 64bit integer(it is slightly more complicated than that, see discussion in the papers referenced above). In other words, it treatsthe IEEE 754 representation of the double value as integer, hence the nameIEEE converter.

TheIEEE conversion is fast and reversible without loss of precision. However, it has been shown that other converterscan result in indexes that are up to 20% faster. One useful alternative is aMultiply converter that convert floatingpoint to integer by multiplication and casting:

double my_float = ...;// Convert to intstd::int64_t my_int = (std::int64_t) my_float *1000000.;// Convert backdouble resultung_float = ((double)my_int) /1000000.;

It is obvious that this approach leads to a loss of numerical precision. Moreover, the loss of precision depends on theactual range of the double values and the constant. The chosen constant should probably be as large as possible butsmall enough such that converted values do not exceed the 64bit limit ofstd::int64_t. Note that the PH-Tree providesseveralConverterMultiply implementations for point/box and double/float. For example:

// Multiply converter that multiplies by 1'000'000 (and divides by 1).auto tree = PhTreeD<DIM, T, ConverterMultiply<DIM,1000000,1>>();

You can also write your own converter. For example:

template<dimension_t DIM>structMyConverterMultiply :publicConverterPointBase<DIM,double,scalar_64_t> {explicitMyConverterMultiply(double multiplier)    : multiplier_{multiplier}, divider_{1. / multiplier} {}    [[nodiscard]] PhPoint<DIM>pre(const PhPointD<DIM>& point)const {        PhPoint<DIM> out;for (dimension_t i =0; i < DIM; ++i) {            out[i] = point[i] * multiplier_;        }return out;    }    [[nodiscard]] PhPointD<DIM>post(const PhPoint<DIM>& in)const {        PhPointD<DIM> out;for (dimension_t i =0; i < DIM; ++i) {            out[i] = ((double)in[i]) * divider_;        }return out;    }    [[nodiscard]]autopre_query(const PhBoxD<DIM>& query_box)const {return PhBox{pre(query_box.min()),pre(query_box.max())};    }constdouble multiplier_;constdouble divider_;};template<dimension_t DIM,typename T>using MyTree = PhTreeD<DIM, T, MyConverterMultiply<DIM>>;voidtest() {    MyConverterMultiply<3> converter{1000000};    MyTree<3, MyData>tree(converter);    ...// use the tree}

It is also worth trying out constants that are 1 or 2 orders of magnitude smaller or larger than this maximum value.Experience shows that this may affect query performance by up to 10%. This is due to a more compact structure of theresulting index tree.

Custom key types

With custom converters it is also possible to use your own custom classes as keys (instead ofPhPointD orPhBoxF).The following example defined customMyPoint andMyBox types and a converter that allows using them with aPhTree:

structMyPoint {double x_;double y_;double z_;};using MyBox = std::pair<MyPoint, MyPoint>;classMyConverterMultiply :publicConverterBase<3,3,double,scalar_64_t, MyPoint, MyBox> {using BASE = ConverterPointBase<3,double,scalar_64_t>;using PointInternal =typename BASE::KeyInternal;using QueryBoxInternal =typename BASE::QueryBoxInternal;public:explicitMyConverterMultiply(double multiplier =1000000)    : multiplier_{multiplier}, divider_{1. / multiplier} {}    [[nodiscard]] PointInternalpre(const MyPoint& point)const {return {static_cast<long>(point.x_ * multiplier_),static_cast<long>(point.y_ * multiplier_),static_cast<long>(point.z_ * multiplier_)};    }    [[nodiscard]] MyPointpost(const PointInternal& in)const {return {in[0] * divider_, in[1] * divider_, in[2] * divider_};    }    [[nodiscard]] QueryBoxInternalpre_query(const MyBox& box)const {return {pre(box.first),pre(box.second)};    }private:constdouble multiplier_;constdouble divider_;};voidtest() {    MyConverterMultiply tm;    PhTree<3, Id, MyConverterMultiply>tree(tm);    ...// use the tree}

Restrictions

C++: Supports value types ofT andT*, but notT&
C++: Return types offind(),emplace(), ... differ slightly fromstd::map, they have functionfirst(),second() instead of fields of the same name.
General: PH-Trees aremaps, i.e. each coordinate can hold onlyone entry. In order to hold multiple valuesper coordinate please use thePhTreeMultiMap implementations.
General: PH-Trees order entries internally in z-order (Morton order). However, the order is based on the (unsigned) bit representation of keys, so negative coordinates are returnedafter positive coordinates.
General: The current implementation support between 2 and 63 dimensions.
Differences to std::map: There are several differences tostd::map. Most notably for the iterators:
- begin()/end() are not comparable with< or>. Onlyit == tree.end() andit != tree.end() is supported.
- Value ofend(): The tree has no linear memory layout, so there is no useful definition of a pointer pointing _after_ the last entry or any entry. This should be irrelevant for normal usage.

Troubleshooting / FAQ

Problem: The PH-Tree appears to be losing updates/insertions.

Solution: Remember that the PH-Tree is amap, keys will not be inserted if an identical key already exists. Theeasiest solution is to use one of thePhTreeMultiMap implementations. Alternatively, this can be solved by turning aPhTree into a multi-map, for example by using something likestd::map orstd::set as member type:PhTree<3, T, CONVERTER, std::set<MyDataClass>>. Theset instances can then be used to handle key conflicts bystoring multiple entries for the same key. The logic to handle conflicts must currently be implemented manually.

Performance

When to use a PH-Tree

The PH-Tree is a multi-dimensional index or spatial index. This section gives a rough overview how the PH-Tree comparesto other spatial indexes, such askD-trees, R-trees/BV-hierarchies or quadtrees.

Disclaimer: This overview cannot be comprehensive (there are 100s of spatial indexes out there) and performance dependsheavily on the actual dataset, usage patterns, hardware, ... .

Generally, the PH-Tree tends to have the following advantages:

Fast insertion/removal times. While some indexes, such ask-D-trees, trees can be build from scratch very fast, theytend to be much slower when removing entries or when indexing large datasets. Also, most indexes requirerebalancing which may result in unpredictable latency (R-trees) or may result in index degradation if delayed(kD-trees).
Competitive query performance. Query performance is generally comparable to other index structures. The PH-Tree isfast at looking up coordinates but requires more traversal than other indexes. This means it is especially efficientif the query results are 'small', e.g. up to 100 results per query.
Scalability with large datasets. The PH-Tree's insert/remove/query performance tends to scale well to large datasetswith millions of entries.
Scalability with the number of dimensions. The PH-Tree has been shown to deal "well" with high dimensional data (1000k+ dimensions). What does "well" mean?
- It works very well for up to 30 (sometimes 50) dimensions.Please note that the C++ implementation has not beenoptimized nearly as much as the Java implementation.
- For more dimensions (Java was tested with 1000+ dimensions) the PH-Tree still has excellent insertion/deletionperformance. However, the query performance cannot compete with specialised high-dim indexes such as cover-treesor pyramid-trees (these tend to bevery slow on insertion/deletion though).
Modification operations (insert/delete) in a PH-Tree are guaranteed to modify only one Node (potentiallycreating/deleting a second one). This guarantee can have advantages for concurrent implementations or when serializingthe index. Please note that this advantage is somewhat theoretical because this guarantee is not exploited by thecurrent implementation (it doesn't support concurrency or serialization).

PH-Tree disadvantages:

A PH-Tree is amap, not amulti-map. This project also providesPhTreeMultiMap implementations that store ahash-set at each coordinate. In practice, the overhead of storing sets appears to be usually small enough to notmatter much.
PH-Trees are not very efficient in scenarios where queries tend to return large result sets in the order of 1000 ormore.

Optimizing Performance

There are numerous ways to improve performance. The following list gives an overview over the possibilities.

Use-O3 -mavx, -mbmi2 compiler flags. Ensure that vectorization and count trailing zeros (CTZ/TZCNT) areenabled.
Usefor_each instead of iterators. This should improve performance of queries by 10%-20%.
Userelocate() /relocate_if() if possible. When updating the position of an entry, the naive way isto useerase() /emplace(). Withrelocate /relocate_if(), insertion can avoid a lot of duplicatenavigation in the tree if the new coordinate is close to the old coordinate.
```
relocate(old_position, new_position);relocate_if(old_position, new_position, [](const T& value) {return [true/false]; });
```
The multi-map version relocates all values unless a 'value' is specified to identify the value to be relocated:
```
relocate(old_position, new_position, value);
```
Store pointers instead of large data objects. For example, usePhTree<3, MyLargeClass*> instead ofPhTree<3, MyLargeClass> ifMyLargeClass is large.
- This prevents the PH-Tree from storing the values inside the tree. This should improve cache-locality and thusperformance when operating on the tree.
- Using pointers is also useful if construction/destruction of values is expensive. The reason is that the tree hasto construct and destruct objects internally. This may be avoidable but is currently still happening.
Use non-box query shapes. Depending on the use case it may be more suitable to use a custom filter for queries.For example:
tree.for_each(callback, FilterSphere(center, radius, tree.converter()));
Use a different data converter. The default converter of the PH-Tree results in a reasonably fast index. Itsbiggest advantage is that it provides lossless conversion from floating point coordinates to PH-Tree coordinates(integers) and back to floating point coordinates.
- TheConverterMultiply is a lossy converter but it tends to improve performance by 10% or more. This is notcaused by faster operation in the converter itself but by a more compact tree shape. The example shows how to usea converter that multiplies coordinates by 100'000, thus preserving roughly 5 fractional digits:
  PhTreeD<DIM, T, ConverterMultiply<3, 100 * 1000, 1>>()
Use custom key types. By default, the PH-Tree accepts only coordinates in the form of its own key types, suchasPhPointD,PhBoxF or similar. To avoid conversion from custom types to PH-Tree key types, custom classes canoften be adapted to be accepted directly by the PH-Tree without conversion. This requires implementing a customconverter as described in the section aboutCustom Key Types.
Advanced:Adapt internal Node representation. Depending on the dimensionalityDIM, the PH-Tree usesinternally inNodes different container types to hold entries.By default, it uses an array forDIM<=3, a vector forDIM<=8 and an ordered map forDIM>8.Adapting these thresholds can have strong effects on performance as well as memory usage.One example: Changing the threshold to use vector forDIM==3 reduced performance of theupdate_d benchmarkby 40%-50% but improved performance ofquery_d by 15%-20%. The threshold is currently hardcoded.
The effects are not always easy to predict but here are some guidelines:
- "array" is the fastest solution for insert/update/remove type operations. Query performance is "ok". Memoryconsumption isO(DIM^2) for every node regardless of number of entries in the node.
- "vector" is the fastest for queries but has for large nodesworst case O(DIM^2) insert/update/removeperformance.
- "map" scales well withDIM but is for low values ofDIM generally slower than "array" or "vector".

Compiling the PH-Tree

The PH-Tree index itself is aheader only library, it can be used by simply copying everything in theinclude/phtree folder.The examples, tests and benchmarks can be build with bazel or cmake.

Build system & dependencies

PH-tree can be built withBazel (primary build system) or withcmake3.14.All code is written in C++ targeting the C++17 standard.The code has been verified to compile on Ubuntu Linux with Clang 14 and GCC 12,on Windows with Visual Studio 2022 (cmake only, and except benchmarks, which don't work with VS)and on MacOS with AppleClang 14 (bazel only).

The PH-tree makes use of vectorization and CountTrailingZeros/CTZ/TZCNT, so suggested compilation options forclang/gcc are:

-O3 -mavx -mbmi2

For Windows, when using the Windows SDK, you may need to defineNOMINMAX either using the preprocessor/D orwith#define NOMINMAX before importing any windows files. Alternatively, you can import PH-tree dependenciesbefore importing any Windows SDK dependencies.

Bazel

WORKSPACE file:

http_archive(    name = "phtree",    strip_prefix = "phtree-cpp-v1.6.2",    url = "https://github.com/tzaeschke/phtree-cpp",)

BUILD file:

cc_binary(    ...    deps = [        "@phtree//:phtree",    ],)

Once you have set up your dependencies, you should be able to build the PH-Tree repository by running:

bazel build ...

Similarly, you can run all unit tests with:

bazel test ...

Benchmarks:

bazel run //benchmark:update_mm_d_benchmark --config=benchmark  -- --benchmark_counters_tabular=true

cmake dependency

The library supports three types of cmake dependency management,FetchContent,find_package() andadd_subfolder().All three approaches are used inthis example project.

FetchContent

WithFetchContent_...():

include(FetchContent)FetchContent_Declare(        phtree        GIT_REPOSITORY https://github.com/tzaeschke/phtree-cpp.git        GIT_TAG v1.6.2)FetchContent_MakeAvailable(phtree)

find_package()

You need to build the library with:

mkdir out && cd outcmake .. -DPHTREE_INSTALL=onsudo cmake --build . --config Release --target install -- -j <number of cores>

Note that the optionCMAKE_INSTALL_PREFIX:PATH=... doesnot work.The library can then be included with:

find_package(phtree CONFIG REQUIRED)add_executable(ExampleProject example.cc)target_link_libraries(ExampleProject phtree::phtree)

add_subfolder()

For this you can simply copy the PH-Tree source code into your project (you can skipbenchmark andtest) andthen include the folder withadd_subdirectory(phtree-cpp).

cmake build

cmake usesccache when available.

mkdir buildcd buildcmake ..cmake --build .

Run example:

cmake .. -DPHTREE_BUILD_EXAMPLES=ONcmake --build ../example/Example

Run tests:

cmake .. -DPHTREE_BUILD_TESTS=ONcmake --build .ctest

Next to example (PHTREE_BUILD_EXAMPLES) there are also tests (PHTREE_BUILD_TESTS) andbenchmarks (PHTREE_BUILD_BENCHMARKS). To build all, usePHTREE_BUILD_ALL.Note that the benchmarks currently don't work on Windows.

Further Resources

Theory

The PH-Tree is discussed in the following publications and reports:

T. Zaeschke, C. Zimmerli, M.C. Norrie:"The PH-Tree -- A Space-Efficient Storage Structure and Multi-Dimensional Index", (SIGMOD 2014)
T. Zaeschke: "The PH-Tree Revisited", (2015)
T. Zaeschke, M.C. Norrie: "Efficient Z-Ordered Traversal of Hypercube Indexes" (BTW 2017).

About

PH-Tree C++ implementation

Code of conduct

Contributing

Activity

Stars

34 stars

Watchers

4 watching

Forks

11 forks

Report repository

Releases7

v1.6.2 Latest

Mar 8, 2025

+ 6 releases

Packages

No packages published

Languages

C++95.8%
Starlark2.1%
CMake1.5%
Shell0.6%

Movatterモバイル変換

License

tzaeschke/phtree-cpp

Folders and files

Latest commit

History

Repository files navigation

PH-Tree C++

User Guide

API Usage

Performance

Compiling / Building

Further Resources

API Usage

Key Types

Basic Operations

Queries

For-each example

Iterator examples

Filters

Filters for MultiMaps

Distance function

Converters

Custom key types

Restrictions

Troubleshooting / FAQ

Performance

When to use a PH-Tree

Optimizing Performance

Compiling the PH-Tree

Build system & dependencies

Bazel

cmake dependency

FetchContent

find_package()

add_subfolder()

cmake build

Further Resources

Theory

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases7

Packages0

Uh oh!

Languages

Packages