Refactor_average_weighted_percentile so we are not just performing_weighted_percentile twice, thus avoids sorting and computing cumulative sum twice.

#30945 essentially uses the sorted indicies and calculates_weighted_percentile(-array, 100-percentile_rank) - this was verbose and required computing cumulative sum again on the negative (you could have used symmetry to avoid computing cumulative sum in cases when fraction above is greater than 0 - i.e.,g>0 from Hyndman and Fan)

I've followed the Hyndman and Fan computation more closely and calculateg and just usej+1 (since we already knowj). This did make handling the case wherej+1 had a sample weight of 0 (or when you have sample weight of 0 at the end of the array) more complex.

Any other comments?

lucyleeow added3 commits

July 14, 2025 14:58

try reverse cum sum

8fe6ae2

initial implementation, wip tests

b9c0c7b

fix and add tests, update use

b56fab0

github-actionsbot added module:metrics module:preprocessing module:utils labels

Jul 17, 2025

lucyleeow mentioned this pull request

Jul 17, 2025

Refactor weighted percentile functions to avoid redundant sorting#30945

Closed

Copy link

github-actionsbot commentedJul 17, 2025•
edited
Loading

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit:ba57727. Link to the linter CI:here}

lucyleeow commented

Jul 17, 2025

View reviewed changes

sklearn/utils/stats.py


		result = xp.where(
		is_fraction_above,
		array[percentile_in_sorted, col_indices],

Copy link

MemberAuthor

lucyleeowJul 17, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I initially thought this should bepercentile_plus_one_in_sorted as from the paper, when g>0,$\gamma=1$,~~butsearchsorted defaults to left (equals is on the right), whereas the paper definedj <= pn < j+1~~ butsearchsorted effectively givesi-1 < pn <= i whereas the paper hadj <= pn < j+1. This means that whenpn is greater than the LHS,searchsorted'si equalsj+1, from the paper.

When the quantile exactly matches an index,searchsorted'si equalsj, from the paper (as the equals is on opposite sides in paper vssearchsorted).

lucyleeow added2 commits

July 18, 2025 23:51

fixes and add tests

f99366c

simplify zero sample code

ba57727

Labels

module:metrics module:preprocessing module:utils

1 participant

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MNT Refactor`_average_weighted_percentile` to avoid double sort#31775

Are you sure you want to change the base?

MNT Refactor`_average_weighted_percentile` to avoid double sort#31775