- Notifications
You must be signed in to change notification settings - Fork1.6k
<algorithm>: Implement worst-case linear-timenth_element#5100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
<algorithm>: Implement worst-case linear-timenth_element#5100
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
davidmrdavid left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks a lot for this PR ⭐.
Left some comments, mostly nits that I think would improve readability, and some questions.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
muellerj2 commentedMar 9, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
New benchmarks with new adversary2 and minor changes to benchmark code (AMD Ryzen 7 7840HS): Before: After: |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
…` and `ranges::generate`.
Also restore a comment to `_Partition_by_median_guess_unchecked`.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
StephanTLavavej commentedApr 23, 2025
Thanks! 💚 And apologies for the significant delay in getting around to reviewing this! I pushed minor cleanups to the product code and moderate refactorings to the test/benchmark code. Please meow if I messed anything up. I reviewed@davidmrdavid's comments and I believe we've addressed them all. Click to expand 5950X benchmark results:
|
StephanTLavavej commentedApr 23, 2025
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
b5df16a intomicrosoft:mainUh oh!
There was an error while loading.Please reload this page.
StephanTLavavej commentedApr 23, 2025
Thanks for fixing this major bug that went unfixed for decades by some of the world's most experienced C++ Standard Library implementers! 😻 🛠️ 💚 |
BillyONeal commentedApr 24, 2025
I wanted to do this for ages, and I'm so happy someone finally has! Thank you so much! |
Fixes#856 for
std::nth_elementandstd::ranges::nth_element. This implements a fallback to the median-of-medians-of-five algorithm when the quickselect algorithm seems to be making too little progress.The median-of-medians algorithm is mostly the textbook version, with two minor tweaks:
Benchmark results
bm_uniformjust appliesnth_elementto an integer array of the given length. The integer array is uniformly sampled from a fixed seed. This is to check that the worst-case fallback does not noticeably worsen the processing time on such a sequence.bm_tunkey_adversaryappliesnth_elementto a sequence on which the implemented quickselect algorithm performs terribly.Before:
After:
As expected, the fallback greatly improves the running time for
bm_tunkey_adversary. The timings forbm_uniformare about on par, or more precisely even a bit better with this PR on my machine.The fallback heuristic
std::sortswitches to its fallback when the recursion depth exceeds some logarithmic threshold. We could use the same heuristic as well, however, this would not guarantee linear time in the worst case but "only" anSo I opted for an adaptive depth limit: Like the heuristic for
std::sort, it assumes that each iteration should reduce the range of inspected elements by 25 %. But whilestd::sortderives a maximum recursion depth from this assumption, this heuristic falls back to the median-of-medians algorithm when the actual size of the processed sequence exceeds the desired size by some constant tolerance factor (currently about 2) during some iteration. Thus, the total number of processed elements over all quickselect iterations is bounded by a multiple of the sequence length times a geometric sum, ensuring worst-case linear time overall. At the same time, the tolerance factor introduces some leeway so that one or two bad iterations (especially at the beginning) don't trigger the fallback immediately.Obviously, there are many possible choices for the desired percentage reduction per iteration and the tolerance factor. But the benchmarks seem to suggest that the chosen values aren't too bad; a smaller percentage reduction or a larger margin factor noticeably worsen the
bm_tunkey_adversarybenchmark, but result in little difference forbm_uniform. Besides, the implementation ofstd::sortalready sets a precedent for a desired reduction of 25 % per iteration.Test
The newly added test applies
nth_elementto the same worst-case sequence as the benchmark. This makes sure that the fallback is actually exerted by the test. (I think it's also the first test that exerts the quickselect algorithm and not the just the insertion sort fallback.)