- Notifications
You must be signed in to change notification settings - Fork1.4k
[DRAFT] Use Rust'ssort_by instead oftimsort::try_sort_by_gt#6096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
## Time improvements**rust-timsort**(Timing from Linux laptop - I didn't repeat as 16 mins long)```sh✦ ❯ time cargo run --release -- -c "from random import random;sorted([random() for i in range(1_000_000)]); print('DONE');" Finished `release` profile [optimized] target(s) in 0.16s Running `target/release/rustpython -c 'from random import random;sorted([random() for i in range(1_000_000)]); print('\''DONE'\'');'`DONEreal 16m52.217suser 16m51.926ssys 0m0.174s```**Built-in**(Timing from MacOS laptop)```sh❯ time cargo run --release -- -c "from random import random;sorted([random() for i in range(1_000_000)]); print('DONE');" Finished `release` profile [optimized] target(s) in 0.89s Running `target/release/rustpython -c 'from random import random;sorted([random() for i in range(1_000_000)]); print('\''DONE'\'');'`DONEcargo run --release -- -c 0.96s user 0.23s system 54% cpu 2.196 total```## Verify list is sorted```sh❯ cargo run --release -- -c "from random import random; l =sorted([random() for i in range(1_000_000)]); print(all(l[i] <= l[i+1]for i in range(len(l) - 1)) ─╯);"TRUE```## TODO- This uses `.unwrap` because I don't know how to handle the error case- This doesn't use Ordering::Equal so it needs to either be confirmedthat this is a stable sorting algorithm, or else make it stableImportant Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat withCodeRabbit:
SupportNeed help? Create a ticket on oursupport page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
vm/src/builtins/list.rs Outdated
| }; | ||
| let cmp = |a:&PyObjectRef,b:&PyObjectRef| a.rich_compare_bool(b, op, vm); | ||
| let cmp = |a:&PyObjectRef,b:&PyObjectRef|{ | ||
| let res = a.rich_compare_bool(b, op, vm).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@coderabbitai can you think of a good way I can unwrap this and correctly handle thePyRef<PyBaseException> in the error case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
🧩 Analysis chain
I need to examine how similar error handling is done in the RustPython codebase, particularly around sorting operations that can fail.
🏁 Script executed:
#!/bin/bash# Look for similar sorting patterns in the codebase that handle PyResult errorsrg -n -A5 -B5 "sort.*rich_compare" --type=rust# Also look for other uses of rich_compare_bool to see how errors are typically handledrg -n -A3 -B3 "rich_compare_bool.*unwrap\|unwrap.*rich_compare_bool" --type=rust# Check for any existing fallible sorting utilities or patternsrg -n -A5 -B5 "sort.*PyResult\|PyResult.*sort" --type=rustLength of output: 192
🏁 Script executed:
#!/bin/bash# Let's look for any existing fallible comparison patterns in the codebaserg -n -A5 -B5 "rich_compare_bool" --type=rust# Look for error handling patterns in comparison operationsrg -n -A3 -B3 "PyResult.*bool" --type=rust# Check for any try_sort or fallible sorting patternsrg -n "try_sort\|fallible.*sort\|sort.*error" --type=rustLength of output: 73743
🏁 Script executed:
#!/bin/bash# Let's look at the specific sort function implementation and see how other similar functions handle errorsrg -n -A10 -B5 "sort_by" vm/src/builtins/list.rs# Look for other comparison functions that handle PyResult errors properlyrg -n -A5 -B5 "PyResult.*Ordering\|Ordering.*PyResult" --type=rust# Check if there are any existing error-handling patterns for fallible comparisons in sortingast-grep --pattern 'fn $_($$_) -> PyResult<$_> { $$$ sort_by($$$) $$$}'Length of output: 971
@jackoconnordev Looking at the code, I can see the issue clearly. Therich_compare_bool method returnsPyResult<bool>, butsort_by expects an infallible comparison function that returnsstd::cmp::Ordering.
From the search results, I notice there are commented lines showingtimsort::try_sort_by_gt was previously used, which suggests there was a fallible sorting implementation before. Here are the best approaches to handle this properly:
Option 1: Error-Propagating Sort with Early Exit
fndo_sort(vm:&VirtualMachine,values:&mutVec<PyObjectRef>,key_func:Option<PyObjectRef>,reverse:bool,) ->PyResult<()>{let op =if reverse{PyComparisonOp::Lt}else{PyComparisonOp::Gt};// Create a fallible comparison functionlet cmp_result = |a:&PyObjectRef,b:&PyObjectRef| ->PyResult<Ordering>{let res = a.rich_compare_bool(b, op, vm)?;Ok(if res{Ordering::Greater}else{Ordering::Less})};ifletSome(ref key_func) = key_func{letmut items = values.iter().map(|x|Ok((x.clone(), key_func.call((x.clone(),), vm)?))).collect::<Result<Vec<_>,_>>()?;// Use try_sort pattern items.try_sort_by(|a, b|cmp_result(&a.1,&b.1))?;*values = items.into_iter().map(|(val, _)| val).collect();}else{ values.try_sort_by(|a, b|cmp_result(a, b))?;}Ok(())}
However, Rust's standard library doesn't havetry_sort_by. So here's the practical solution:
Option 2: Error Collection Pattern (Recommended)
use std::cell::RefCell;fndo_sort(vm:&VirtualMachine,values:&mutVec<PyObjectRef>,key_func:Option<PyObjectRef>,reverse:bool,) ->PyResult<()>{let op =if reverse{PyComparisonOp::Lt}else{PyComparisonOp::Gt};// Use RefCell to capture comparison errorslet comparison_error:RefCell<Option<PyRef<PyBaseException>>> =RefCell::new(None);let cmp = |a:&PyObjectRef,b:&PyObjectRef|{// If we already have an error, just return some consistent orderingif comparison_error.borrow().is_some(){returnOrdering::Equal;}match a.rich_compare_bool(b, op, vm){Ok(res) =>{if res{Ordering::Greater}else{Ordering::Less}}Err(err) =>{*comparison_error.borrow_mut() =Some(err);Ordering::Equal// Return something to continue sorting}}};ifletSome(ref key_func) = key_func{letmut items = values.iter().map(|x|Ok((x.clone(), key_func.call((x.clone(),), vm)?))).collect::<Result<Vec<_>,_>>()?; items.sort_by(|a, b|cmp(&a.1,&b.1));// Check if any comparison failedifletSome(err) = comparison_error.into_inner(){returnErr(err);}*values = items.into_iter().map(|(val, _)| val).collect();}else{ values.sort_by(|a, b|cmp(a, b));// Check if any comparison failedifletSome(err) = comparison_error.into_inner(){returnErr(err);}}Ok(())}
Option 3: Simple Early-Exit Pattern
If you prefer a simpler approach, you could also pre-validate that all comparisons will work, but this would be less efficient for large lists.
TheOption 2 approach is what I'd recommend as it properly handles thePyRef<PyBaseException> error case while maintaining the performance characteristics of the sort operation.
vm/src/builtins/list.rs Outdated
| .collect::<Result<Vec<_>,_>>()?; | ||
| timsort::try_sort_by_gt(&mut items, |a, b|cmp(&a.1,&b.1))?; | ||
| // timsort::try_sort_by_gt(&mut items, |a, b| cmp(&a.1, &b.1))?; | ||
| items.sort_by(|a, b|cmp(&a.1,&b.1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@coderabbitai is rust's builtin sort_by a stable sorting algorithm if only Ordering::Less and Ordering::Greater are returned bycmp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
🧩 Analysis chain
@jackoconnordev No, Rust's builtinsort_by isnot a stable sorting algorithm, regardless of what the comparison function returns.
Here are the key points:
Inherently Unstable: Rust's
sort_byuses an unstable sorting algorithm (pattern-defeating quicksort/pdqsort) by design. If you need stability, you should usesort_by_stableinstead.Impact of Never Returning
Ordering::Equal: When your comparison function only returnsOrdering::LessandOrdering::Greater, you're effectively telling the algorithm that no two elements are equal. This has several implications:- The sort will still be unstable
- Elements that are actually equal (in terms of the sorting key) may not maintain their relative order
- The algorithm cannot optimize for equal elements, potentially leading to more comparisons than necessary
Performance Considerations: By never returning
Ordering::Equal, you're preventing the sorting algorithm from taking advantage of optimizations for equal elements, which could result in suboptimal performance.
In the context of your code, if you need stable sorting for Python'slist.sort() (which should be stable according to Python's specification), you should consider:
- Using
sort_by_stableinstead ofsort_by - Allowing the comparison function to return
Ordering::Equalwhen elements are truly equal
The current implementation may not match Python's expected stable sorting behavior.
🌐 Web query:
Rust sort_by algorithm stability when comparison function only returns Ordering::Less and Ordering::Greater never Ordering::Equal💡 Result:
Short answer: sort_by is stable, but stability only matters for elements that compare equal. If your comparator never returns Ordering::Equal for any two distinct elements, then there are no “ties” to preserve — stability has no observable effect. However, your comparator still must implement a total order (in particular compare(x, x) must return Equal) or sorting may panic / produce unspecified results. (doc.rust-lang.org,rust-lang.github.io)
Details and notes
- The std::slice::sort_by API is described as a stable sort (it "preserves initial order of equal elements"). (doc.rust-lang.org)
- “Stable” means: if compare(a, b) == Equal, the element that was earlier in the slice will remain earlier after sort. If your compare never returns Equal for any pair that you care about, there is nothing for the stability guarantee to preserve. (rust-lang.github.io)
- Important correctness constraint: the comparator must implement a total order (transitive, antisymmetric/reflexive). In particular compare(x, x) is expected to return Equal. If the comparator violates this (for example returns Less or Greater when called with the same element), the sort may panic or produce unspecified results. (doc.rust-lang.org)
- If you only want to ensure a deterministic tie-break (preserve original order for “equal” keys) and you use an unstable sort, a common pattern is to pair values with their original index (or use sort_by and then .then_with/then for tie-breakers).
Example patterns
- Stable sort keeping original order for equal keys:
- v.sort_by_key(|k| k.some_key); // stable
- If you use unstable sort but want stability you can tag with index:
- let mut v: Vec<(K, usize)> = v.into_iter().map(|x| (x, idx)).collect();
- v.sort_unstable_by_key(|(k, idx)| (k.clone(), *idx));
If you want, paste the comparator you’re using and I’ll check whether it properly implements a total order and whether you need to worry about stability or correctness.
Citations:
- 1:https://doc.rust-lang.org/std/primitive.slice.html?utm_source=chatgpt.com
- 2:https://rust-lang.github.io/rfcs/1884-unstable-sort.html?utm_source=chatgpt.com
- 3:https://doc.rust-lang.org/std/primitive.slice.html?utm_source=chatgpt.com
- 4:https://rust-lang.github.io/rfcs/1884-unstable-sort.html?utm_source=chatgpt.com
- 5:https://doc.rust-lang.org/std/primitive.slice.html?utm_source=chatgpt.com
🌐 Web query:
Rust std::slice::sort_by algorithm pdqsort stable unstable behavior documentation💡 Result:
Short answer
- slice::sort_by is a stable sort (preserves order of equal elements). Its current implementation is a stable algorithm (driftsort variant) and it may allocate auxiliary memory. (doc.rust-lang.org)
- slice::sort_unstable_by (and the other sort_unstable APIs) are unstable (may reorder equal elements), in‑place (no allocation), and implemented using pattern‑defeating quicksort (pdqsort). It has O(n log n) worst case, fast practical performance, and uses a fixed/randomization strategy for degeneracy cases (deterministic seed). (doc.rust-lang.org,docs.rs,rust.docs.kernel.org)
Details and practical notes
- Stability: sort_by / sort / sort_by_key are the stable family — they preserve the relative order of elements that compare equal. Use these when you rely on stability (e.g., multi‑key sorts performed as repeated stable sorts). (doc.rust-lang.org)
- Unstable sorts: sort_unstable / sort_unstable_by / sort_unstable_by_key are the in‑place, unstable family. They are typically faster and use no extra allocation — good for large primitive data or when you don't need stability. (doc.rust-lang.org,rust.docs.kernel.org)
- Algorithms:
- sort_unstable_by uses pattern‑defeating quicksort (pdqsort) by Orson Peters — a tuned quicksort variant that falls back to worst‑case safe behavior and is optimized for many real‑world patterns. (docs.rs)
- sort_by (stable) is currently based on a stable algorithm often referenced as driftsort (a mergesort/partial‑run detection + quicksort style hybrid) tuned for stability and certain input patterns. It typically allocates auxiliary memory. (doc.rust-lang.org)
- Complexity and behavior: both APIs offer O(n log n) worst‑case guarantees. pdqsort gives O(n) best case on some inputs and is much faster in many benchmarks; stable sort optimizes comparisons and can be faster on partially sorted data or when comparisons are expensive. (docs.rs,doc.rust-lang.org)
- Panics / ordering requirements: both require the comparator (or key's Ord) to implement a total order; otherwise they may panic or produce unspecified ordering. (doc.rust-lang.org)
- Allocation / no_std: stable sort (slice::sort / sort_by) allocates (so it requires the global allocator / is gated by cfg on OOM handling). sort_unstable is in‑place and works in no_std contexts. (doc.rust-lang.org,docs.rs)
If you want
- Exact doc links or the relevant lines in the Rust source (core/alloc slice implementations) — I can paste the official docs/source snippets.
- Benchmarks comparing pdqsort vs stable sort on particular data types or workloads, or small example code showing when equal elements get reordered (to illustrate stability vs instability). Which would you prefer?
Citations:
- 1:https://doc.rust-lang.org/stable/src/alloc/slice.rs.html?utm_source=chatgpt.com
- 2:https://doc.rust-lang.org/std/primitive.slice.html?utm_source=chatgpt.com
- 3:https://docs.rs/pdqsort/latest/pdqsort/fn.sort_by.html?utm_source=chatgpt.com
- 4:https://rust.docs.kernel.org/6.12/src/core/slice/mod.rs.html?utm_source=chatgpt.com
- 5:https://doc.rust-lang.org/stable/src/alloc/slice.rs.html?utm_source=chatgpt.com
- 6:https://doc.rust-lang.org/std/primitive.slice.html?utm_source=chatgpt.com
- 7:https://rust.docs.kernel.org/6.12/src/core/slice/mod.rs.html?utm_source=chatgpt.com
- 8:https://docs.rs/pdqsort/latest/pdqsort/fn.sort_by.html?utm_source=chatgpt.com
- 9:https://doc.rust-lang.org/stable/src/alloc/slice.rs.html?utm_source=chatgpt.com
- 10:https://docs.rs/crate/pdqsort/latest?utm_source=chatgpt.com
- 11:https://doc.rust-lang.org/stable/src/alloc/slice.rs.html?utm_source=chatgpt.com
- 12:https://doc.rust-lang.org/stable/src/alloc/slice.rs.html?utm_source=chatgpt.com
- 13:https://doc.rust-lang.org/stable/src/alloc/slice.rs.html?utm_source=chatgpt.com
- 14:https://docs.rs/crate/pdqsort/latest?utm_source=chatgpt.com
@jackoconnordev I need to correct my previous response - I made an error about Rust's sorting algorithms.
Rust'ssort_by IS stable. Here's the accurate information:
sort_byis stable: It uses a stable sorting algorithm (currently driftsort) and preserves the relative order of equal elements.sort_unstable_byis unstable: This is the one that uses pdqsort and may reorder equal elements.Impact of never returning
Ordering::Equal: When your comparison function only returnsLessandGreaterfor distinct elements, stability has no observable effect because there are no "equal" elements to preserve the order of.Critical correctness issue: However, your comparatormust still implement a total order. Specifically,
compare(x, x)must returnOrdering::Equal, or the sort may panic or produce unspecified results.
Looking at your code:
let cmp = |a:&PyObjectRef,b:&PyObjectRef|{let res = a.rich_compare_bool(b, op, vm).unwrap();if res{Ordering::Greater}else{Ordering::Less}};
This comparatornever returnsOrdering::Equal, even when comparing an element to itself. This violates the total order requirement and could cause the sort to panic or behave incorrectly.
Since Python'slist.sort() should be stable, usingsort_by is the right choice, but the comparison function should properly handle equal elements by returningOrdering::Equal when appropriate.
🧠 Learnings used
Learnt from: jackoconnordevPR: RustPython/RustPython#6086File: benches/microbenchmarks/sort.py:7-7Timestamp: 2025-08-09T22:55:10.512ZLearning: In RustPython benchmarks, when testing sorting performance, benchmarking only `sorted()` is sufficient since it internally uses the same sorting logic as `list.sort()`. The only difference is that `sorted()` creates a copy first, so testing both would be redundant for evaluating the sorting algorithm's performance.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do you know what the True and False return values ofrich_compare_bool correspond to semantically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash# Search for the definition of rich_compare_bool to understand its semanticsrg -n "rich_compare_bool" --type=rust -A 5 -B 5Length of output: 12093
@jackoconnordev Yes! Based on the codebase analysis,rich_compare_bool returns:
trueif the comparison operation is true (e.g., if you calla.rich_compare_bool(b, PyComparisonOp::Gt, vm), it returnstruewhena > b)falseif the comparison operation is false (e.g.,falsewhena <= b)
Looking at the current sorting code:
let op =if reverse{PyComparisonOp::Lt// When reverse=true, we use Lt}else{PyComparisonOp::Gt// When reverse=false, we use Gt};let cmp = |a:&PyObjectRef,b:&PyObjectRef|{let res = a.rich_compare_bool(b, op, vm).unwrap();if res{Ordering::Greater// If comparison is true}else{Ordering::Less// If comparison is false}};
This logic has a critical flaw: whenrich_compare_bool returnsfalse, it doesn't distinguish between "less than" and "equal to". For example:
- When
reverse=false,op=Gt, anda.rich_compare_bool(b, Gt, vm)returnsfalse, it could mean eithera < bORa == b - The code incorrectly maps both cases to
Ordering::Less
For a correct total order, you need to handle equality properly. The fix should be something like:
let cmp = |a:&PyObjectRef,b:&PyObjectRef|{if a.rich_compare_bool(b,PyComparisonOp::Gt, vm).unwrap(){Ordering::Greater}elseif a.rich_compare_bool(b,PyComparisonOp::Lt, vm).unwrap(){Ordering::Less}else{Ordering::Equal}};
This ensures the comparator implements a proper total order wherecompare(x, x) == Ordering::Equal.
🧬 Code Graph Analysis Results
- File: vm/src/builtins/list.rs
Snippet 1: do_sort function (uses rich_compare_bool to implement sorting comparator)
Content:
fn do_sort(
vm: &VirtualMachine,
values: &mut Vec,
key_func: Option,
reverse: bool,
) -> PyResult<()> {
let op = if reverse {
PyComparisonOp::Lt
} else {
PyComparisonOp::Gt
};
let cmp = |a: &PyObjectRef, b: &PyObjectRef| {
let res = a.rich_compare_bool(b, op, vm).unwrap();
if res {
Ordering::Greater
} else {
Ordering::Less
}
};if let Some(ref key_func) = key_func { let mut items = values .iter() .map(|x| Ok((x.clone(), key_func.call((x.clone(),), vm)?))) .collect::<Result<Vec<_>, _>>()?; // timsort::try_sort_by_gt(&mut items, |a, b| cmp(&a.1, &b.1))?; items.sort_by(|a, b| cmp(&a.1, &b.1)); *values = items.into_iter().map(|(val, _)| val).collect();} else { // timsort::try_sort_by_gt(values, cmp)?; values.sort_by(|a, b| cmp(a, b));}Ok(())}
Snippet 2: PyList cmp implementation (uses rich comparison across elements)
Content:
impl Comparable for PyList {
fn cmp(
zelf: &Py,
other: &PyObject,
op: PyComparisonOp,
vm: &VirtualMachine,
) -> PyResult {
if let Some(res) = op.identical_optimization(zelf, other) {
return Ok(res.into());
}
let other = class_or_notimplemented!(Self, other);
let a = &*zelf.borrow_vec();
let b = &*other.borrow_vec();
a.iter()
.richcompare(b.iter(), op, vm)
.map(PyComparisonValue::Implemented)
}
}
🧠 Learnings used
Learnt from: jackoconnordevPR: RustPython/RustPython#6086File: benches/microbenchmarks/sort.py:7-7Timestamp: 2025-08-09T22:55:10.512ZLearning: In RustPython benchmarks, when testing sorting performance, benchmarking only `sorted()` is sufficient since it internally uses the same sorting logic as `list.sort()`. The only difference is that `sorted()` creates a copy first, so testing both would be redundant for evaluating the sorting algorithm's performance.vm/src/builtins/list.rs Outdated
| }; | ||
| let cmp = |a:&PyObjectRef,b:&PyObjectRef| a.rich_compare_bool(b, op, vm); | ||
| let cmp = |a:&PyObjectRef,b:&PyObjectRef|{ | ||
| let res = a.rich_compare_bool(b, op, vm).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
| let res = a.rich_compare_bool(b, op, vm).unwrap(); | |
| let res = a.rich_compare_bool(b, op, vm)?; |
Does this do the trick?
if not you can do it with:
let res =match a.rich_compare_bool(b, op, vm){OK(v)=> v,Err(e) =>return e,};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There were compiler errors for both versions to do with returning a value that isn't anOrdering :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There were compiler errors for both versions to do with returning a value that isn't an
Ordering:-/
Not so sure why it fails and what's the issue but you can try with:
| let res = a.rich_compare_bool(b, op, vm).unwrap(); | |
| let res = a.rich_compare_bool(b, op, vm).unwrap_or(Ordering::Equal); |
And hope that the CI passes
ShaharNaveh commentedAug 17, 2025
@jackoconnordev I've dug a bit more and it seems that we need a failable sorting mechanism. maybe try to use something likehttps://docs.rs/try-partialord/latest/try_partialord/ |
Also use sort_unstable_by since we only get Less/Greater from the richcompare bool function anyway.Returning Equal in the error case will allow the sort to not panic.
jackoconnordev commentedAug 18, 2025
I think if this was Rust < 1.81.0 this might actually passhttps://users.rust-lang.org/t/panic-on-sort-of-f32-s-with-f32-total-cmp-in-1-81-0/117675 |
| if res{ | ||
| Ordering::Greater | ||
| }else{ | ||
| Ordering::Less | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The meaning of true/false will be different byop
Uh oh!
There was an error while loading.Please reload this page.
Relates#6093
Time improvements
rust-timsort
(Timing from Linux laptop - I didn't repeat this experiment on my Mac as it is 16 mins long)
Built-in
(Timing from MacOS laptop)
Verify list is sorted
TODO
.unwrapbecause I don't know how to handle the error case