no weights: ND-support, propagate NaNs, supported methods:"linear", "inverted_cdf", "averaged_inverted_cdf"
- adding support for omitting NaNs shouldn't be too hard (if you assume NaNs are sorted at the end, which is what everyone does), but I don't think it's needed for now.
with weights: Only 1D or 2D support, propagate or omit NaNs, supported methods:"inverted_cdf", "averaged_inverted_cdf"
Delegates for all major supported backends when possible (numpy, cupy, torch and jax), except dask, as it was raising errors.

Comparison with NumPy:

Those specs matches NumPyquantile fornan_policy="propagate" andnanquantile fornan_policy="omit". Differences are:

usingnan_policy="propagate"|"omit" (inspired by scipy) instead of two functions like numpy (quantile fornan_policy="propagate" andnanquantile. Why? Less code/doc duplication, and clearer behavior IMO.
weighted case:
- support for"averaged_inverted_cdf" method (only"inverted_cdf" method is supported by numpy): we need this in scikit-learn.
- only up to 2D: we don't need more for now.
non-weighted case:
- less methods. Why? We don't need more. I've only found one call with a method in scikit-learn and it's a deprecated method.
- no support fornan_policy="omit": there are a few calls tonanpercentile in scikit-learn. Would be easy to implement, following what has been done in scipy.
- implementation: we sort instead of relying on partitioning like numpy does. The partitioning-based implementation would be quite more complex and would not benefits perf in most cases, as currentlyxpx.partion relies on sorting when not delegating. This could be worked

Comparison with SciPy:

Main difference is the broadcasting/reduction behavior. We aligned on numpy for this one.

Also, SciPy doesn't support weights.

Implementation for the non-weighted case is more or less copy-pasted from SciPy, except that I've only kept support for 3 methods.

Future use of`xpx.quantile` in scikit-learn

Those specs should be enough to replacesklearn.utils.stats._weighted_percentile and to replace most uses ofnp.quantile/percentile when rewriting numpy-based functions to support Array API standard.

Note that the implementation for the weighted case differs a bit from sklearn's_weighted_percentile: it's mostly the same approach (sortweights based ona and compute cumulative sum), but they differ in the way edge cases are handled (null weights mostly). I believe my implementation to be easier to read and to get right, and equivalent in terms of performance (dominated by argsort for big inputs).

Still TODO:

decide whether to support dask. I don't think we should it's very slow and doesn't match the purpose of dask of mentioned here. Supporting dask would slow down the CI quite a bit.
read scikit-learn's tests for_weighted_percentile to see if there is something I missed in my tests

cakedev0 added2 commits

October 21, 2025 11:58

Draft

a6f6c93

Merge remote-tracking branch 'upstream/main' into quantile

d30bcbf

cakedev0 marked this pull request as draft

October 21, 2025 10:05

cakedev0 added2 commits

October 21, 2025 12:06

revert changes to renovate.json

dc236da

revert changes to renovate.json

f92fc4b

lucascolley changed the title~~FEA: Add quantile function - method="linear", no weights~~ENH: addquantile function -method="linear", no weights

Oct 21, 2025

lucascolley added enhancement

New feature or request

new function labels

Oct 21, 2025

lucascolley mentioned this pull request

Oct 21, 2025

ENH: addquantile#341

Closed

Copy link

lucyleeow commentedOct 21, 2025

Fromscipy/scipy#23832

There is a slight difference between NumPy gufunc and scipy.stats rules that has to do with prepending 1s to the shapes when the arguments have different dimensionality. Specifically, in scipy.stats, axis identifies the core dimension after 1s are prepended to match the dimensionalities.

I am assuming all delegate packages do what numpy does, in terms of broadcasting? If that is the case we should probably follow unless we think scipy has better rules?

cakedev0 added7 commits

October 21, 2025 18:39

untested implem; limited to method="linear"; trying to mimic numpy be…

06e370a

…havior

remove unused imports

dc7a1e5

draft version with some tests that are passing

98fe39f

linting: fix pyright

034c064

linting: fix mypy

05ffb7b

fixed linting

89d8410

WIP: adding support for weights

19fa6ea

This was referencedOct 22, 2025

ENH: addquantile#340

Open

BUG: nanpercentile/nanquantile does not broadcast weights as the doc suggestednumpy/numpy#29709

Open

cakedev0 added10 commits

October 23, 2025 07:37

Weighted quantile; nan-policy; everything mostly works

fa789fc

linting: pyright & mypy

1d8fef7

linting: ruff

26804fe

linting & cleanup

3611708

fix tests for numpy 1.x

7160bae

working on coverage

0b2cb9b

working on coverage

3226659

more coverage

c395b84

fix test

07f7007

second attempt to fix test

e319529

cakedev0 changed the title~~ENH: addquantile function -method="linear", no weights~~ENH: addquantile function with weights support

Oct 23, 2025

mathause reviewed

Oct 25, 2025

View reviewed changes

src/array_api_extra/_delegation.py

		raiseValueError(msg)
		else:
		ifmethodnotin {"inverted_cdf","averaged_inverted_cdf"}:
		msg=f"`method` '{method}' not supported with weights."

Copy link

mathauseOct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

only method x/ y support weights?

src/array_api_extra/_delegation.py


		methods= {"linear","inverted_cdf","averaged_inverted_cdf"}
		ifmethodnotinmethods:
		msg=f"`method` must be one of{methods}"

Copy link

mathauseOct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sort methods to get a deterministic output?

Copy link

ContributorAuthor

cakedev0Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's deterministic already. But do you mean declaring methods in the sorted order?

Like this:methods = {"averaged_inverted_cdf", "inverted_cdf", "linear"}

Copy link

mathauseOct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I thought this was not deterministic - but maybe this was in older python versions or between different OS, or maybe I just misremembered? In any case - sorry about the noise.

src/array_api_extra/_delegation.py OutdatedShow resolvedHide resolved

src/array_api_extra/_delegation.py

		raiseValueError(msg)
		nan_policies= {"propagate","omit"}
		ifnan_policynotinnan_policies:
		msg=f"`nan_policy` must be one of{nan_policies}"

Copy link

mathauseOct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

ditto

cakedev0and others added3 commits

October 25, 2025 11:34

Fix typo in err msg

c71351f

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

avoid sorting a; just sort the weights

ce55335

Merge branch 'quantile' of github.com:cakedev0/array-api-extra into q…

0353767

…uantile

Copy link

Member

lucascolley commentedOct 25, 2025

@cakedev0 would you be able to fill out the PR description a little more to justify API decisions, primarily comparing how the implementation compares to what is available in NumPy and SciPy respectively?

Is there a link to a PR which would use this in sklearn, or are we not at that stage yet?

Copy link

ContributorAuthor

cakedev0 commentedOct 25, 2025

Added some comparisons with numpy/scipy in the PR description.

Is there a link to a PR which would use this in sklearn, or are we not at that stage yet?

We are not at this stage yet, but I would be happy to open an "experiment PR" where I replace most calls tonp.percentile/np.quantile/_weighted_percentile by calls toxpx.quantile to check that the tests pass. This would help a lot being confident about this PR (in terms of API decisions and implementation correctness).

Copy link

Member

lucascolley commentedOct 25, 2025

Thanks, yeah, that sounds like a good idea.

cakedev0 mentioned this pull request

Oct 25, 2025

Add Weighted Quantile and Percentile Support to jax.numpyjax-ml/jax#32737

Closed

return max values when all weights are null

7c18a82

Copy link

ContributorAuthor

cakedev0 commentedOct 26, 2025

Updates from working on the "experiment PR" (you can see the diff here:cakedev0/scikit-learn#3):

all tests pass 🎉 (including tests for_weighted_percentile)
supportingnan_policy="omit" is important/needed. It unblocks 3 preprocessors:QuantileTransformer, RobustScaler, SplineTransformer, which is more of less half of scikit-learn things needing axpx.quantile on the listMake more of the "tools" of scikit-learn Array API compatible scikit-learn/scikit-learn#26024 - So I'll think
supporting more methods would be a bit useful forQuantileTransformer (but I think it's fine to not support all the methods for all the backends...).Question: do we need to raise an error for a method that's implemented by numpy but not by xpx when the backend is numpy?
it would be convenient thatmethod default to havea"default" method that default to"averaged_inverted_cdf" for the weighted-case and"linear" for the non-weighted case. Unsure if this a good practice in terms of interface though?

Copy link

ContributorAuthor

cakedev0 commentedOct 27, 2025

(putting this in draft mode while I'm working on supporting nan_policy="omit" for the non-weighted case)

cakedev0 marked this pull request as draft

October 27, 2025 19:43

Copy link

lucyleeow commentedOct 28, 2025

would you be able to fill out the PR description a little more to justify API decisions,

FYI@cakedev0, there is currently a discussion about nan API in numpy:https://mail.python.org/archives/list/numpy-discussion@python.org/thread/P4EEL5P2PNDASWLI24CPHOW2KG2LZ3YY/

supporting more methods would be a bit useful for QuantileTransformer

Can you expand?

Copy link

ContributorAuthor

cakedev0 commentedOct 28, 2025•
edited
Loading

supporting more methods would be a bit useful for QuantileTransformer

Sorry, not theQuantileTransformer but theKBinsDiscretizer, it supports various quantile methods:https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/preprocessing/_discretization.py#L59-L66

(but I guess it's fine not to support all the methods in the array API version)

Copy link

Contributor

mdhaber commentedNov 4, 2025•
edited
Loading

Perhaps I can't be entirely objective, but I think there's also objective argument against a split betweenxpx.quantile andstats.quantile.

It sounds like the pain points are the broadcasting behavior and weights. Let's see if I can address those.

Main difference is the broadcasting/reduction behavior. We aligned on numpy for this one.

NumPy is self-inconsistent on this one. NEP 5 specifies what generalized universal functions should look like. I don't know the history - maybequantile/percentile existed first - but they could (in principle) follow the NEP and don't. That behavior makes it impossible to take quantiles at different probabilities for different slices, which has real-world uses (e.g. BCa bootstrap). The behavior ofstats.quantiles permits thatand permits the Cartesian product calculation.

Now, this is not to say that either of the NumPy precedents should be the final answer for the array API orarray-api-extra. I think we should do what's right for the ecosystem. Consistency with other array API functions (does anything else "broadcast" by taking a Cartesian product? Imagine if this were the standard for other array operations like addition!) and functionality (do we really want to preclude the possibility of different probabilities corresponding with each slice?) needs to be considered. If sklearnonly needs the Cartesian product behavior and doesn't like the standard broadcasting rules, a lightweight wrapper can easily translate.

Also, SciPy doesn't support weights.

weights were requested inscipy/scipy#22894. I was willing to help with it, but there was no response toscipy/scipy#22794 (comment). A PR for weights appeared, but it didn't address the concerns inscipy/scipy#22894. I didn't say anything to block it, but apparently nobody else is interested, so neither has progressed.weights are not very hard to implement; it's the theoretical/interface question that is tricky.

If I can get some feedback aboutscipy/scipy#22644 - which is aboutquantile and predates theweights request - and get that merged, I would be more proactive about addingweights support to SciPy'squantile. That PR actually has a similar interface question about how to acceptmethod-specific options, which we might have ifweights were not supported by the continuousmethods.

Copy link

lorentzenchr commentedNov 5, 2025

After getting weights into numpy quantile/percentile, I was so exhausted to continue pushing it elsewhere. At least now, there is one reference, numpy. I would love to see it adopted elsewhere. Quantiles are just very important in many fields.

Copy link

Contributor

mdhaber commentedNov 5, 2025

I drafted it forstats.quantile last night, and I studied existing uses ofweights inscipy.stats. I'll submit a PR today or tomorrow.

The same code works for all H&F methods. I was a little concerned about non-integer weights before, but I don't think there's a real problem. For integer weights, it does the same thing as repeated samples, so we can document that these are frequency weights¹. Its just that for most methods, the interpolation actuallydoes depend on the total amount of data / total weight.

Details

importnumpyasnpimportmatplotlib.pyplotaspltrng=np.random.default_rng(328983495438219)p=np.linspace(0,1,300)res=np.quantile([0,1],p,method='linear')res2=np.quantile([0,0,1,1],p,method='linear')res3=np.quantile([0,0,0,1,1,1],p,method='linear')plt.plot(p,res,p,res2,p,res3)

Footnotes

If they can be interpreted as some other kind of weight, too, great. But almost all uses ofweights instats turn out to be frequency weights.↩

mdhaber mentioned this pull request

Nov 6, 2025

ENH: stats.quantile: add support for frequency weightsscipy/scipy#23941

Merged

Copy link

Contributor

mdhaber commentedNov 6, 2025•
edited
Loading

PR is up atscipy/scipy#23941. Here's a wrapper that hamstringsstats.quantile to get NumPy's Cartesian product behavior:

importnumpyasnpfromscipyimportstatsdefquantile_cartesian_product(x,p,method='linear',axis=0):p_shape=p.shapep=np.ravel(p)x=np.moveaxis(x,axis,-1)res=stats.quantile(x,p,method=method,axis=-1,keepdims=True)iflen(p_shape):res=np.moveaxis(res,-1,0)res=np.reshape(res,p_shape+res.shape[1:])else:res=np.squeeze(res,axis=-1)returnres# testrng=np.random.default_rng(234892398458345982)x=rng.random((6,7,8))foraxisin [-1,0,1]:forp_shapein [(), (1,), (2,), (2,3)]:p=rng.random(p_shape)kwargs=dict(axis=1,method='weibull')res=quantile_with_wonky_cartesian_product_behavior(x,p,**kwargs)ref=np.quantile(x,p,**kwargs)np.testing.assert_allclose(res,ref)

Copy link

Member

lucascolley commentedDec 1, 2025

where does this stand now thatscipy/scipy#23941 has merged?

Copy link

ContributorAuthor

cakedev0 commentedDec 8, 2025

Personally, I don't really agree with some decisions that were made inscipy.stats but I was unable to make comments on the PR due to my wrist issues.

Also, I'm more busy lately and I don't want to spend time porting the quantile function from scipy here, it's really big now and relies on a lot of internal scipy helpers, it will be a lot of work.

Being very new to the ecosystem I'm not sure my opinion is well informed, still here are my disagreements with the choices that were made in scipy:

About the broadcasting behavior: I agree that the behavior ofscipy.stats.quantile is indeed the most flexible.
But I don't think it's natural. In the doc, it's written that it behaves the same as other functions inscipy.stats but I disagree. Other functions I could find with broadcasting behavior inscipy.stats all work this way:
f(x, y, axis) -> broadcast(x, y); apply f along axis (and reduce it away if keepdims=False).
In those,x andy are of the same nature, so it's natural to broadcast them together. And, thenaxis is reduced away. It's equivalent to the behavior you'll get forf(x, y, axis): return (x + y).sum(axis=axis). Very natural.
For the quantile, first it's not very natural to broadcast "data" with "parameters", and secondly the reduction behavior is much more complex. One thing I really don't like is that the output dimension is not determined by the input dimensions, for instance: bothquantile([0, 1], [0.5]) andquantile([0, 1], 0.5) have dimension 0 butquantile([0, 1], [0.5, 0.9]) has dimension 1.

So for me, it's not clear which option is the best. IMO, here are the pros and cons of the scipy approach:
+++ more powerful, unlocks things that would require a Python loop otherwise. But how much this loop would cost? It's generally very ok to loop on 100 features for instance.
-- more complex behavior for users, changes from what they are used to for this function (all other array libraries align with numpy I think).
- forarray-api-extra: more complex implementation, more complex delegation code, less cases where delegation is possible.

About the weights: personally, I don't understand the interest of frequency weights when you have a valid interpretation only for integers. Those should be calledcounts, notweights, and they are just an optimization, not a feature, becausef(x, weights=counts) returns the same result asf(xp.repeat(x, counts)), it just computes it in a more efficient way.
So I don't think supporting weights as frequency weights for all methods is a good idea, it would be better to implement acounts argument if the optimization is needed.

For methods"inverted_cdf" and"averaged_inverted_cdf", there are clear interpretations for real-valued weights: the one in the numpy doc, or the link with the argmin of the weighted pinball loss (which is why scikit-learn needs weighted quantile in many cases I think).

Copy link

Member

lucascolley commentedDec 8, 2025

I don't have a horse in the race w.r.t. the nicest user interface, so I will leave that discussion to those already involved.

What I will say is that from an ecosystem perspective, now that SciPy has committed to the public API, I would in principle be happy with a situation where we:

offer a certain API in array-api-extra which has some limitations but is more lightweight and familiar for users of preceding tools
direct users to SciPy for use-cases not covered by that implementation, or for situations where things like the SciPy broadcasting may result in code that is easier to understand

The downside of having two functions with the same name providing different semantics is real. But if the consensus from the technical discussion is that there are real tradeoffs such that each set of semantics has situations in which it is preferable, that might be acceptable.

Copy link

Contributor

mdhaber commentedDec 8, 2025•
edited
Loading

Naturalness is subjective, so we valued consistency with the rest ofscipy.stats¹. Unfortunately, this meant lack of consistency withnp.quantile becausenp.quantile was an anomaly and didn't respect decisions made by NumPy previously (NEP 5).

One thing I really don't like is that the output dimension is not determined by the input dimensions,

Of course they are. But like every otherstats function, they also depend onkeepdims, which controls whether theaxis is reduced away. The default ofkeepdims isFalse in all other functions because itcan always beFalse. We tried to keep in line with that by making the defaultFalse where possible. (Ithas to beTrue if the second argument has length > 1 along axis; )

Now, I don't necessarily like the existence ofkeepdims as a parameter throughoutstats. But it is what it is.

But I don't think it's natural...

Every other independent two-sample statistic in SciPy (e.g.ttest_ind,mannwhitneyu) broadcasts the arrays exactly asquantile does. You can tell because theyuse the same function to do the broadcasting. So quantile doesf(x, y, axis) -> broadcast(x, y); apply f along axis (and reduce it away if keepdims=False) just as described.

If you mean "they don't broadcast the arrays exactly likenp.broadcast_arrays", then you're right. We make one modification to thebroadcast_arrays rule because we have to generalize to the case in which the length alongaxis is not equal.

After that, the only difference is that the length of the output core dimension ofquantile is that of the second argument, and the length of the output core dimension of other statistics is1.

About the weights: personally, I don't understand the interest of frequency weights when you have a valid interpretation only for integers.

About a dozen functions in SciPy accept frequencyweights via theweights parameter, so we stuck with that name instead of inventing a new one likecounts. And also because sometimes the implementation of frequency and other kinds of weights are exactly the same: forinverted_cdf andaveraged_inverted_cdf, the weighted statistic is calculated the same way as NumPy and as proposed here. So the weights are not limited to integers, and you can interpret them however you wish. There is no limitation here; we justadded the opportunity to use frequency weights with othermethods.

Footnotes

which I find quite natural, thinking about arrays as arrays. I do not introduce a notion of some arrays being "data" and others being "parameters"; from that perspective, I can see how array broadcasting in general would be unintuitive.↩

Copy link

ContributorAuthor

cakedev0 commentedDec 9, 2025•
edited
Loading

About NEP 5: can you be more precise about how scipy's quantile respects it and not numpy's quantile?

I've read it a couple of times, and honestly I don't fully understand. One of the main example isnp.inner1d, which is nownp.inner. Andnp.inner does do the Cartesian product, see thedocumentation:out.shape = (*a.shape[:-1], *b.shape[:-1]).

So to your question:

does anything else "broadcast" by taking a Cartesian product? Imagine if this were the standard for other array operations like addition!

The answer is not as obvious as you make it sound. If the Cartesian product is valid fornp.inner (which "NEP5-style signature" is(i),(i)->()) why disqualify it fornp.quantile (which "NEP5-style signature" would be(i),(j)->(j) I guess)?

Edit:

np.inner andnp.dot do Cartesian product
np.vecdot andnp.matmul broadcast

Both exists. So maybe it's fine for both versions of quantile to exist in the array-API space? 😄

Copy link

ContributorAuthor

cakedev0 commentedDec 9, 2025

The default ofkeepdims isFalse in all other functions because it can always beFalse. We tried to keep in line with that by making the defaultFalse where possible. (It has to beTrue if the second argument has length > 1 along axis; )

To me, this shows that quantile is different from all other functions inscipy.stats. So the arguments based on consistency with the rest ofscipy.stats do not convince me.

Also, if the default ofkeepdims depends on the length along axis, for me it's not different than saying that for the default case, output ndim isn't determined by inputs ndim alone.

Copy link

ContributorAuthor

cakedev0 commentedDec 9, 2025•
edited
Loading

@lucascolley

I'm not objective because I invested time in some code that will be completely thrown away if scipy.stats choices are adopted here.

I argued for my choices in detail, or at least clarified that other choices come with trade-offs too.

Now, I'll let people who have no horse in the race decide, and:

I'll happy to finish this PR if the final decision is to continue in the current direction.
Otherwise, feel free to close this PR. It'll be easier to start from scratch and copy things from scipy, even for tests (many of my tests are based on numpy).

Copy link

Contributor

mdhaber commentedDec 9, 2025•
edited
Loading

I've read it a couple of times, and honestly I don't fully understand. One of the main example isnp.inner1d, which is nownp.inner.

"which is nownp.inner..." - are you sure? Research (further below) suggests thatnp.inner existed beforenp.inner1d, andinner1d was added to meet the need for the broadcasting (rather than Cartesian product) behavior. It seemsinner1d was never part of the mainnumpy namespace, and it disappeared quietly. (That said, sklearn noticed:numpy/numpy#10815.)

Andnp.inner does do the Cartesian product, see thedocumentation:out.shape = (*a.shape[:-1], *b.shape[:-1]).

Which is diferent from what the NEP 5 says thatinner1d did:

For example, where a is of shape(3,5,N) and b is of shape(5,N), this will return an output of shape(3,5)

importnumpyasnpN=10a=np.ones((3,5,N))b=np.ones((5,N))np.inner(a,b).shape# (3, 5, 5)

I'd suggest thatmatmul andvecdotcorrect the behavior ofdot andinner. For instance, frommatmul:

The matmul function implements the semantics of the@ operator introduced in Python 3.5 followingPEP 465.

Which states:

For inputs with more than 2 dimensions, we treat the last two dimensions as being the dimensions of the matrices to multiply, and ‘broadcast’ across the other dimensions.

np.vecdot was added because broadcasting, rather than Cartesian product, is the behavior offered by the Array API standardvecdot ("shape determined according to Broadcasting along the non-contracted axes").

I say "correct" because presumably the discussions that led to the inclusion of the@ operator in Python (PEP 465) and the array API were more careful than whatever led tonp.inner, which has had the Cartesian product behavior since at leastNumPy 1.3. Or so I infer fromthis mailing list discussion, which seems to have been the precursor to NEP 5. Since then, there have been discussions started by users looking for the broadcasting behavior and not getting what they wanted frominner, e.g.here andhere.

Cartesian product is OG NumPy behavior. NEP 5 broadcasting superseded it. Let's not go backwards.

Copy link

ContributorAuthor

cakedev0 commentedDec 9, 2025

... Well that is very convincing. Immense thank you for the detailed explanation, I definitely learned a lot there (and sorry for the time you had to invest, I hope I'll be able to make good use of the understanding I gained here in future contribution).

which is now np.inner

Indeed no. Sorry, I assumed that based on the very similar names, my bad!

Copy link

ContributorAuthor

cakedev0 commentedDec 9, 2025

I still think that this behavior isn't right:

both quantile([0, 1], [0.5]) and quantile([0, 1], 0.5) have dimension 0 but quantile([0, 1], [0.5, 0.9]) has dimension 1.

But this is more of a detail, and I do understand why it's like this. I'll try to see if I can propose something I would find better, but I'm not sure I'll be able to.

Maybe nokeepdims? The signature is(i),(j)->(j), it doesn't reduce away a dimension, so maybe it's fair to not have akeepdims parameter? I could not find a gufunc with such signature in numpy... Formatmul, passingkeepdims is forbidden, but the error message says it's because it's 2d.

You said you don't likekeepdims, so maybe you'll like this idea 😉

Copy link

Contributor

mdhaber commentedDec 10, 2025•
edited
Loading

You're welcome to argue against the existence ofkeepdims (andaxis, andnan_policy, for that matter. I could argue that all of them do more harm to the ecosystem than good).

The signature is (i),(j)->(j), it doesn't reduce away a dimension...

OK, let's think through the implications of that signature withx of shape(i,) andp of shape(). At first glance, you might think that this (the simplest, most common use ofquantile!) would raise an error becausep is missing a core dimension. Actually, NEP 5does allow for this, though - the first rule is "While an input array has a smaller dimensionality than the corresponding number of core dimensions, 1’s are prepended to its shape." So we end up withp of shape(1,), thereforej=1, and the output would have shape(1,), not(). I don't think this is what you wanted!

Butquantile in the simplest, most common case,is a reducing statistic, which would make the signature(i),()->(). A similar function isexpectile, which@lorentzenchr added in gh-17039. It doesn't have akeepdims parameter, and it also reduces. Incidentally, it reduces whether the second argumentalpha is an array of shape(1,), a 0-d array, or afloat¹.

fromscipy.statsimportexpectileexpectile(np.arange(10),np.asarray([0.5]))# np.float64(4.5)expectile(np.arange(10),0.5))# np.float64(4.5)

The documented expectation is thatalpha is a float, so I'm not really giving this as evidence that the behavior with 1dalpha isright orbest, just that it's natural forquantile, likeexpectile, to reduce (if the fact thatnp.quantile is a reducing statistic is not enough).

So really, we have this situation where we want to support(i),()->() in the simplest case and(i),(j)->(j) in others. Why do we ever want to treat the signature as(i),(j)->(j)? After all, assuming we're working along the last axis, I'm pretty sure that the output is almost always going to be the same for compatible array shapes². The reason is that treating the signature as(i),()->() ismuch less efficient in some cases becauseall dimensions of the second argument - including the last - are treated as loop dimensions, so any preprocessing you do on the first argument (e.g.partition orsort) is repeated forevery element of the second argument. It's much more efficient to share that preprocessing on the first argument for all elements of the second argument along the last axis.

So you want to get the best of both worlds - efficiency benefits of(i),(j)->(j) while respecting the fact that the quantile operation is naturally(i),()->(). You can invent your own way out of the conundrum, but what we did instats wasalways treat the signature as(i),(j)->(j) but definedkeepdims=False to make it behave more like(i),(j)->() whenj=1. Depending on how we defined the default behavior ofkeepdims, I suppose wecould have made arrays with shapes(i,) and(1,) produce an array of shape(1,), but we didn't. I'm not sure if changingthat is worth a deprecation cycle or a separate function, since you can get whatever you want by specifyingkeepdims explicitly.

Footnotes

If you like, you can think of this signature as(i),(j)->() with the requirement thatj must equal1. It is not outrageous to impose a requirement on the value ofi orj (e.g. a gufunc for cross product... preferably one that doesn't have FOURaxis arguments likenp.cross).↩
For example, consider inputs of shape(i,) and(j,). Whether the signature is(i),()->() or(i),(j)->(j), the output shape will be(j,), it's just that in the former case,j is the length of abatch (or "loop", in the NEP 5 terminology) dimension, not a core dimension.↩

Labels

enhancement

New feature or request

new function

Movatterモバイル変換

ENH: addquantile function with weights support#494

Are you sure you want to change the base?

ENH: addquantile function with weights support#494

Conversation

cakedev0 commentedOct 21, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Specs:

Comparison with NumPy:

Comparison with SciPy:

Future use ofxpx.quantile in scikit-learn

Uh oh!

lucyleeow commentedOct 21, 2025

Uh oh!

mathauseOct 25, 2025

Choose a reason for hiding this comment

Uh oh!

mathauseOct 25, 2025

Choose a reason for hiding this comment

Uh oh!

cakedev0Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

mathauseOct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mathauseOct 25, 2025

Choose a reason for hiding this comment

Uh oh!

lucascolley commentedOct 25, 2025

Uh oh!

cakedev0 commentedOct 25, 2025

Uh oh!

lucascolley commentedOct 25, 2025

Uh oh!

cakedev0 commentedOct 26, 2025

Uh oh!

cakedev0 commentedOct 27, 2025

Uh oh!

lucyleeow commentedOct 28, 2025

Uh oh!

cakedev0 commentedOct 28, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

mdhaber commentedNov 4, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commentedNov 5, 2025

Uh oh!

mdhaber commentedNov 5, 2025

Footnotes

Uh oh!

mdhaber commentedNov 6, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

lucascolley commentedDec 1, 2025

Uh oh!

cakedev0 commentedDec 8, 2025

Uh oh!

lucascolley commentedDec 8, 2025

Uh oh!

mdhaber commentedDec 8, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Footnotes

Uh oh!

cakedev0 commentedDec 9, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

cakedev0 commentedDec 9, 2025

Uh oh!

cakedev0 commentedDec 9, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

mdhaber commentedDec 9, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

cakedev0 commentedDec 9, 2025

Uh oh!

cakedev0 commentedDec 9, 2025

ENH: add`quantile` function with weights support#494

ENH: add`quantile` function with weights support#494

cakedev0 commentedOct 21, 2025•
edited
Loading

Future use of`xpx.quantile` in scikit-learn

cakedev0 commentedOct 28, 2025•
edited
Loading

mdhaber commentedNov 4, 2025•
edited
Loading

mdhaber commentedNov 6, 2025•
edited
Loading

mdhaber commentedDec 8, 2025•
edited
Loading

cakedev0 commentedDec 9, 2025•
edited
Loading

cakedev0 commentedDec 9, 2025•
edited
Loading

mdhaber commentedDec 9, 2025•
edited
Loading

mdhaber commentedDec 10, 2025•
edited
Loading