Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Comments

Add all-nearest-neighbors (a1NN) iterator for tree-to-tree lookup#41

Draft
aschampion wants to merge 1 commit intogeorust:masterfrom
aschampion:tree-to-tree-aknn-squash
Draft

Add all-nearest-neighbors (a1NN) iterator for tree-to-tree lookup#41
aschampion wants to merge 1 commit intogeorust:masterfrom
aschampion:tree-to-tree-aknn-squash

Conversation

@aschampion
Copy link
Contributor

Add the ability to efficiently find the nearest neighbor in a target
tree for each point in a query tree. This works by traversing the query
tree depth-first and at each query tree node pruning nodes from the set
of candidate subtrees of the target tree that can not potentially hold
the nearest neighbors for any point in the query tree node.

This results in speedups on the order of 1.3x versus individual lookup
of the query points, and this speedup increases with the size and
dimensionality of the trees.

This all-nearest-neighbors or "a1NN" lookup is common in point cloud comparison algorithms, and is our primary use ofrstar.

The goods

$ cargo bench allall to all tree lookup  time:   [372.09 us 375.74 us 379.91 us]                                   ...all to all point lookup time:   [494.12 us 496.84 us 500.37 us]

Note that this difference in performance grows as the trees get larger.

Details of the algorithm

The algorithm works by traversing the query tree in DFS and keeping a stack of pruned subtrees of the the target tree for each depth of that traversal. These subtrees cover the potential nearest neighbors for any point in the query tree node at that depth. This allows points in the query tree to effectively reuse computation by only needing to search for their nearest neighbor in these subtrees.

The pruning works similarly to the existing pruning for single point lookups withmin_max_dist_2, only instead of doing a point-to-envelope tight upper bound, we need to find an envelope-to-envelop tight upper bound. This is implemented inEnvelope::max_min_max_dist_2 forAABB. Conceptually it is the maximum overmin_max_dist_2 for any point in the envelope, and is naively implemented that way by iterating over the extrema (corners) of the envelope.

Here's an illustrative diagram:

image

Herep isA.max_min_max_dist_2(&B) andq isB.max_min_max_dist_2(&A).m is the maximum distance between the envelope, so that the diagram can prime your intuition that this distance can do better than that naive bound.

So for each query tree node we consider, we look at the candidate target subtrees of the parent (or children of those subtrees), and find the minimummax_min_max_dist_2 of that set of subtrees. This means that forany point in the query node, theremust bea nearest neighbor within that distance in some subtree. Thus we can prune any subtree whoseminimum distance to the query node is greater than that distance.

For more details see the implementation. I tried to provide plenty of comments.

Notes

  • This depends onAabb: fix min_max_dist_2 consistency with distance_2 #40 for consistency in the distances used by the pruning, and tests will not pass until that PR is merged.
  • There may be a clever optimization formax_min_max_dist_2 like the one formin_max_dist_2 inOptimizations to nearest neighbor #35, but I haven't thought of it and am somewhat skeptical because of the combinatorial aspect.
  • After coming up with this I did a quick lit survey and haven't found an exact match. The closest I've found so far isthis paper, but it differs in that:
    • It's not an R-tree
    • It's doing akNN rather than a1NN (more complicated, although this PR's approach could be generalized in a similar way)
    • It uses the less optimal maximum distance between envelopes
  • Thanks to@clbarnes for feedback.

Stoeoef reacted with thumbs up emoji
Add the ability to efficiently find the nearest neighbor in a targettree for each point in a query tree. This works by traversing the querytree depth-first and at each query tree node pruning nodes from the setof candidate subtrees of the target tree that can not potentially holdthe nearest neighbors for any point in the query tree node.This results in speedups on the order of 1.3x versus individual lookupof the query points, and this speedup increases with the size anddimensionality of the trees.
@jefferis
Copy link

Dear Andrew, I realised I dropped the ball on responding to an earlier query on this. Your implementation looks neat and is along the lines of what I was thinking (that reference you linked is one of the all knn refs I had looked at). However I have to say that I am rather disappointed about the final speed-up. I had honestly expected something in the range of an order of magnitude given consistent locality for many points. Is it possible in the end that the density of points in neuron data is just a bit low? Do you know where the algorithm is spending time right now? Is it very different from the naive implementation of scanning all points with regular knn?

@aschampion
Copy link
ContributorAuthor

Hi Greg, I think you may have meant to respond atclbarnes/nblast-rs#20, so I'll respond there.

@aschampionaschampion marked this pull request as draftJune 19, 2020 14:28
@aschampion
Copy link
ContributorAuthor

Reverting this to draft status as there are performance pitfalls for 3d data that will need to be looked into.

@Stoeoef
Copy link
Collaborator

Thank you very much for this contribution!
This looks really promising and the feature would fit in well with the rstar library.

I'm still trying to understand how all of this works. Thanks a lot for the detailed description and comments, those make this much easier.
One thing I'm wondering about is the failing unit test. The distance between the expected / the actual point seems to be a bit too large to be explainedjust by rounding errors. OTOH,#40 fixes the issue, which may be either a coincidence or the actual fix. I'll try to investigate a bit in that direction.

@aschampion
Copy link
ContributorAuthor

There are a few points to consider wrt#40 and precision issues. Most important is that the size of the discrepancy caused by rounding errors, etc., can be a much smaller magnitude than the discrepancy of the distance of the returned neighbor. A discrepancy near the epsilon level in the wrong direction will causedistance_2_if_less_or_equal to returnNone. This can cause the correct neighbor to be disregarded, and other neighbors to be returned instead, which may be arbitrarily far from the correct neighbor. The same is true for pruning in this PR.

I'm running into this right now after making several fixes and refactors to this PR, in that the all nearest neighbors iterator and the single nearest neighbor function return different results rarely. This isn't because one is wrong, but because both are: sometimes one yields the correct neighbor, sometimes the other. This only shows up consistently when exercised with large (> 100K node) 3D trees, but of course may be happening undetected with one or both methods more frequently.

As a real example, here's a node being ignored by the existing single nearest neighbor lookup viadistance_2_if_less_or_equal due to a small difference (0.00000000000000109):

[rstar/src/object.rs:226] distance_2 = 0.008569106766881232[rstar/src/object.rs:226] max_distance_2 = 0.00856910676688123

which results in this being ignored and thus a much larger difference (4%!) in the returned neighbor vs the correct one given byall_nearest_neighbors:

[rstar/src/algorithm/nearest_neighbor.rs:538] single_neighbor.unwrap().distance_2(neighbors.query) = 0.008943686353908992[rstar/src/algorithm/nearest_neighbor.rs:538] neighbors.target.distance_2(neighbors.query) = 0.008569106766881232thread 'algorithm::nearest_neighbor::test::test_all_nearest_neighbors' panicked at 'assertion failed: `(left == right)`  left: `Some([0.016142428043963264, 0.3308523413774229, 0.08940400251818637])`, right: `Some([0.12212996229489037, 0.18978760533844952, 0.07978876040511107])`', rstar/src/algorithm/nearest_neighbor.rs:540:13

Whilemin_max_dist_2 could be further changed to something even more identical to the results oflength_2 (with#40 it differs fromlength_2 only in the order the final additions occur, but this is already enough for small discrepancy due to non-associativity), the underlying issue remains that distances generated different ways (such as throughEnvelope::distance_2 vsPointExt::distance_2) are compared. While returning differentneighbors that differ only by an epsilon is not a problem, comparisons elsewhere will be.

One solution could be adding anepsilon associated const or method toRTreeNum, whichdistance_2_if_less_or_equal and other hard thresholding comparisons can use. For int types this would beZero::zero and for floats it could be their associated::EPSILON. How does that sound?

@aschampion
Copy link
ContributorAuthor

For now a small change to#40 makes it identical to itslength_2 results before#35 while retaining most of the performance there, so I will push that amended change to#40 later this weekend. That prevents most of the discrepancy issues between different nearest neighbor methods.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@aschampion@jefferis@Stoeoef

[8]ページ先頭

©2009-2026 Movatter.jp