Introduction
This is a follow-up to myprevious question. This code rocks new ground-up implementation of core functionality, which I made from combining @vnp’s and @JDługosz's answers. The core functionality got ~100 LOC decrease and insert gained 15-25% performance gain. It turns out that my remove tests didn't test anything at all in the previous version (now fixed), so I don't know how much performance was gained in remove, but I'm pretty sure the gain is paramount.
Though there are some changes I didn't make. I state the reasons closer to the end of post.
Changes:
- Uniform node location
There is a single function to find the desired node with value or yet to be value. This is achieved through double pointers. Yes, double pointers.
Using boolean conversion instead of
== nullptrUsing default initialization of members
New functionality:
- SFINAE
try_insert is now uniform for both rvalue and lvalue, and activated only if type is either copyable or moveable. Copy constructor checks for feature too.
Copy constructor
emplace
This one was a little bit tricky to pull out. It had some exception issues, but should be okay. The only thing that makes me worried is alignment, though I guess doing SSE or some other hardware trick on BSTs is pretty much useless.
- size
Now the tree knows its size and correctly maintains it.
Code
#ifndef ALGORITHMS_BINARY_SEARCH_TREE_HPP#define ALGORITHMS_BINARY_SEARCH_TREE_HPP#include "binary_tree_iterator.hpp"#include <ostream>#include <utility>#include <type_traits>#include <memory>namespace shino{ template <typename ValueType> class binary_search_tree { constexpr static bool v_copiable = std::is_copy_constructible_v<ValueType>; constexpr static bool v_moveable = std::is_move_constructible_v<ValueType>; struct node { //DO NOT MOVE ELEMENTS AROUND, emplace relies on this order const ValueType value; node* left = nullptr; node* right = nullptr; }; node* root = nullptr; std::size_t element_count = 0; public: using iterator = binary_tree_iterator<node>; binary_search_tree(binary_search_tree&& other) noexcept: root(std::exchange(other.root, nullptr)), element_count(std::exchange(other.element_count, 0)) {} binary_search_tree& operator=(binary_search_tree&& other) noexcept { std::swap(root, other.root); std::swap(element_count, other.element_count); return *this; } template <typename = std::enable_if_t<v_copiable>> explicit binary_search_tree(const binary_search_tree& other) { if (other.element_count == 0) return; root = new node(other.root->value); deep_copy(root->left, other.root->left); deep_copy(root->right, other.root->right); element_count = other.element_count; } template <typename AnotherType, typename = std::enable_if_t<std::is_same_v<ValueType, std::decay_t<AnotherType>> and (v_copiable || v_moveable)>> bool try_insert(AnotherType&& value) { auto insertion_point = find_node(value); if (*insertion_point) return false; *insertion_point = new node{std::forward<AnotherType>(value)}; ++element_count; return true; } template <typename ... Args> bool emplace(Args&& ... args) { std::unique_ptr<char[]> buffer = std::make_unique<char[]>(sizeof(node)); new (buffer.get()) node(std::forward<Args>(args)...); auto possible_node = reinterpret_cast<node*>(buffer.get()); auto insertion_point = find_node(possible_node->value); if (*insertion_point) { std::destroy_at(possible_node->value); return false; } possible_node->left = nullptr; possible_node->right = nullptr; *insertion_point = possible_node; buffer.release(); ++element_count; return true; } bool exists(const ValueType& value) const { return *find_node(value) != nullptr; } bool delete_if_exists(const ValueType& value) { if (element_count == 0) return false; auto child_ptr = find_node(value); if (!*child_ptr) return false; *child_ptr = find_replacement(*child_ptr); --element_count; return true; } std::size_t size() const { return element_count; } void clear() { clear_helper(root); } iterator begin() { return iterator{root}; } iterator end() { return {}; } ~binary_search_tree() { clear(); } private: void deep_copy(node* dest, node* source) { if (!source) return; if (source->left) { dest->left = new node(source->left->value); deep_copy(dest->left, source->left); } if (source->right) { dest->right = new node(source->right->value); deep_copy(dest->right, source->right); } } node* find_replacement(node* start_pos) { if (!start_pos->left) { auto replacement = start_pos->right; delete start_pos; return replacement; } auto descendant = start_pos->left; while (descendant->right) descendant = descendant->right; descendant->right = start_pos->right; delete start_pos; return start_pos->left; } void clear_helper(node* start_position) { if (!start_position) return; clear_helper(start_position->left); clear_helper(start_position->right); delete start_position; } node** find_node(const ValueType& value) { auto* current = &root; while (*current and (*current)->value != value) if (value < (*current)->value) current = &(*current)->left; else current = &(*current)->right; return current; } };}#endif //ALGORITHMS_BINARY_SEARCH_TREE_HPPTests:
#include "binary_search_tree.hpp"#include <random>#include <unordered_set>#include <algorithm>#include <iostream>std::vector<int> generate_unique_numbers(std::size_t size){ std::vector<int> result; if (size == 0) return {}; static std::mt19937_64 twister; std::uniform_int_distribution<> distribution{0, static_cast<int>(size - 1)}; std::unordered_set<int> numbers; while (numbers.size() != size) { numbers.insert(distribution(twister)); } return {numbers.begin(), numbers.end()};}void run_randomized_insert_tests(){ for (std::size_t i = 0; i <= 5'000; ++i) { std::cout << "running binary_search_tree insert test on size " << i << '\n'; auto numbers = generate_unique_numbers(i); shino::binary_search_tree<int> tree; for (auto x: numbers) tree.try_insert(x); std::sort(numbers.begin(), numbers.end()); std::size_t numbers_index = 0; for (auto x: tree) { if (x != numbers[numbers_index++]) throw std::logic_error{"tree binary_tree_iterator is broken on size " + std::to_string(i)}; } }}void remove_value(std::vector<int>& vec, int x){ vec.erase(std::remove(vec.begin(), vec.end(), x), vec.end());}void run_randomized_remove_tests(){ static std::mt19937_64 twister; for (std::size_t i = 0; i <= 1'000; ++i) { shino::binary_search_tree<int> tree; auto numbers = generate_unique_numbers(i); for (auto x: numbers) tree.try_insert(x); std::sort(numbers.begin(), numbers.end()); std::cout << "running remove test on tree of size " << i << '\n'; for (std::size_t j = 0; j < i; ++j) { std::bernoulli_distribution dist; if (dist(twister)) { tree.delete_if_exists(static_cast<int>(j)); remove_value(numbers, static_cast<int>(j)); } } std::size_t values_index = 0; for (auto x: tree) { if (numbers[values_index] != x) throw std::logic_error{"remove doesn't work correctly on " + std::to_string(i)}; ++values_index; } }}int main(){ std::cout << "running randomized insert tests...\n"; run_randomized_insert_tests(); std::cout << "randomized insert tests passed successfully\n"; std::cout << "running randomized remove tests...\n"; run_randomized_remove_tests(); std::cout << "randomized remove tests passed successfully\n";}Not applied changes
- Return iterator along with insert
I don't think I have suitable iterator interface to implement at the moment.shino::binary_tree_iterator mentioned inthis post might hinder the performance of those operations.
- Still using
!= nullptrinexists
Linter was arguing that implicit conversion in that context is confusing. I gave in.
Concerns
Mostly I'd like to improve simplicity and readability of the code on this iteration, and improve usage of standard library. Though if you see any low hanging performance fruit, please let me know. As always, any other comments are welcome too.
- \$\begingroup\$I'd like this to reach boost level of quality. I wanted to apply for GSoC, but couldn't pass their coding test. I'm starting preparations for the next year already :)\$\endgroup\$Incomputable– Incomputable2018-06-26 23:00:17 +00:00CommentedJun 26, 2018 at 23:00
2 Answers2
Memory management
clearanddeep_copymight overflow the callstack for large number of nodes if the tree is highly unbalanced.- Prefer
std::unique_ptrover raw owning pointers.binary_search_tree::root,node::leftandnode::rightcan easily be replaced withstd::unique_ptr<node>without impeding performance. Doing so easily documents the ownership ofnodes, thus increasing readability. emplacejust tramples all over alignment requirements, asalignof(node)(depending onalignof(ValueType)) might be much greater thanalignof(char). What prevents a forwarding contructor innode, like this one:template<typename... Args, typename = std::enable_if_t<std::is_constructible_v<ValueType, Args...>>>node(Args&&... args) : value{std::forward<Args>(args)...} {}Using this constructor would simplify the code in
emplace.find_replacementaccessesstart_pos->leftafter deletingstart_pos.
Correctness
emplacedoesn't check whetherValueTypewould be constructible from the passedArgs.try_insertwill accept an lvalue reference to a move-only type, or an rvalue reference to a copy-only type.
Performance
- Balancing the tree would decrease the worst-case insertion, lookup and deletion costs from \$\mathcal{O}(n)\$ to \$\mathcal{O}(\log n)\$. However, it might increase the average runtime by a constant factor. As always with performance: Measure, measure, measure!
- Trees are notoriously bad regarding cache misses because of all the cache misses due to pointer chasing. Maybe using an arena allocator (aka "object pool" or "region allocator") could help to reduce those cache misses.
Interface
- Which traversal order is used by the iterators? The naming doesn't clarify whether it's using in-order, pre-order, post-order, level-order or some different traversal order.
- Rule of 5 violation: copy assignment operator is missing and not declared as deleted.
- Many member function return a
boolto indicate whether an operation did some specific work. This usually doesn't matter, as the result would be the same. If it does matter in some case, the relevant information can usually be checked in another way, - Many member function names deviate from the usual standard library ones. This might lead to confusion, and requires users of the library to add adaptors if they want to use standard library functionality, e.g. algorithms.
find_nodewill never return anullptr, so the implementation could be adjusted to return anode*&(basically, exchange the loop for recursion). This would allow clearer code, as it removes the need for the obscure(*pos)->valuesyntax (replacement would bepos->value).
template restrictions
- Prefer error messages with
static_assertover SFINAE if a (member) function has no overloads. This helps with debugging and gives better error messages. (SFINAE is still fine if other overloads might be a better fit, as it only removes that candidate from the overload set.) - Non-templated functions are always a better match than templated function, so the default copy constructor will always be a better match than the templated one. To get a conditional copy constructor, the condition needs to be evaluated at class level (e.g. by inheriting from a dependent base class).
- \$\begingroup\$I once tried implementing it with smart pointers, but it quickly became irritating. Though may be I’m used them incorrectly.\$\endgroup\$Incomputable– Incomputable2018-06-26 23:54:01 +00:00CommentedJun 26, 2018 at 23:54
- \$\begingroup\$@Incomputable: Well, just worked on it a bit while finding all those issues ;) Also, forgot to note: For copy construction and copy assignment operators to work with
std::for_eachthe iterators would need to go level-order or pre-order. Otherwise the resulting internal representation might differ.\$\endgroup\$hoffmale– hoffmale2018-06-27 08:42:40 +00:00CommentedJun 27, 2018 at 8:42
Why:
std::unique_ptr<char[]> buffer = std::make_unique<char[]>(sizeof(node));new (buffer.get()) node(std::forward<Args>(args)...);auto possible_node = reinterpret_cast<node*>(buffer.get());Is this not easier to write?
std::unique_ptr<node> node = std::make_unique<node>(std::forward<Args>(args)...);auto possible_node = node.get();- \$\begingroup\$I don't know to be honest. My mind jammed for a bit, as I started writing the code late at night. I guess it is another indication to stop the bad habit.\$\endgroup\$Incomputable– Incomputable2018-06-27 16:34:27 +00:00CommentedJun 27, 2018 at 16:34
- \$\begingroup\$That would require a
nodeconstructor accepting those forwarded arguments (see my answer). Without that, your next-best choice would bestd::make_unique<node>(ValueType{std::forward<Args>(args)...}), but that won't work ifValueTypehas no copy constructor and no move constructor (which, sadly, is one of the reasonsemplaceexists in the first place).\$\endgroup\$hoffmale– hoffmale2018-06-27 17:12:17 +00:00CommentedJun 27, 2018 at 17:12 - \$\begingroup\$@hoffmale: We already know he has a constructor that works. Because the second line is a new statement passing exactly those parameters to the constructor!\$\endgroup\$Loki Astari– Loki Astari2018-06-27 17:14:44 +00:00CommentedJun 27, 2018 at 17:14
- \$\begingroup\$@MartinYork: You mean
new (buffer.get()) node(std::forward<Args>(args)...);? That doesn't call any existing constructor, it tries to use aggregate initialization, which will fail unlessArgs...somehow is convertible toValueType,ValueType, node*orValueType, node*, node*.\$\endgroup\$hoffmale– hoffmale2018-06-27 17:19:31 +00:00CommentedJun 27, 2018 at 17:19 - \$\begingroup\$Eh, wait, I misread those parentheses as braces. Sry for that. It calls a non-existing constructor unless
Argsis empty. So it might or might not compile, depending on what you're callingemplacewith.\$\endgroup\$hoffmale– hoffmale2018-06-27 17:22:39 +00:00CommentedJun 27, 2018 at 17:22
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.

