Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork8.1k
Make hexbin mincnt consistently inclusive of lower bound#21381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
jklymak commentedOct 19, 2021
Hi@RebeccaWPerry, maybe I'm mistaken, but I think the problem in#20877 is that if |
RebeccaWPerry commentedOct 19, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@jklymak it was super interesting to look into! Check out the example I added -- where C is all ones. Surely if I'm "scaling" by 1 there should be no difference? if C is none: hexagons with less than mincount are set to nan (code) if C is provided: hexagons with more than mincount are set to the value (code) And I found no evidence that the scaling of C should impact the count -- count is simply count as implemented and as the bug reporter expected. |
QuLogic commentedOct 19, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I don't know about the scaling, but the condition on This would need a test though. |
jklymak commentedOct 20, 2021
Agreed, I originally misdiagnosed the problem. This change is fine, but as@QuLogic suggests a test would be nice. You can probably just |
RebeccaWPerry commentedOct 20, 2021
Sounds good - I'll add a test! |
tacaswell commentedOct 20, 2021
This PR is affected by a re-writing of our history to remove a large number of accidentally committed filessee discourse for details. To recover this PR it will need be rebased onto the new default branch (main). There are several ways to accomplish this, but we recommend (assuming that you call the matplotlib/matplotlib remote git remote updategit checkout maingit merge --ff-only upstream/maingit checkout YOUR_BRANCHgit rebase --onto=main upstream/old_master# git rebase -i main # if you prefergit push --force-with-lease# assuming you are tracking your branch If you do not feel comfortable doing this or need any help please reach out to any of the Matplotlib developers. We can either help you with the process or do it for you. Thank you for your contributions to Matplotlib and sorry for the inconvenience. |
RebeccaWPerry commentedOct 22, 2021
I think I have addressed the feedback -- added a test and rebased to main. Please let me know if there is anything else to address! |
anntzer commentedOct 25, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
While I agree with the change (which probably needs an API change note), there is an additional issue to consider (#16865 (comment)): with this PR, |
anntzer left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Per#21381 (comment); anyone can dismiss once handled.
timhoffm commentedOct 25, 2021
A quick thought (I've not followed the situation in detail, so might be wrong):
Or shouldwe wrap every |
anntzer commentedOct 25, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
But I think(?) the problem also comes from the fact that the default reducer is actually a bit weird; e.g. I would rather have expected it to be sum() rather than mean(). For example likely makes more sense (I know, that's pretty vague...) than and sum() is just fine with empty inputs (and I'd rather have sum() return 0 rather than nan). (Not that the current situation is really better, I guess... but my point is that auto-wrapping is not a complete solution either.) |
jklymak commentedOct 26, 2021
Yes, I agree, this probably has to be fixed before this can go in. |
fundamental issue with accumulation for 0-sized bins
anntzer commentedOct 26, 2021
re-posting my message on#16865 (let's keep the discussion here instead)
|
RebeccaWPerry commentedOct 30, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I'm parsing my way through the various options here - in response to this:
I think it's important for sum() to return nan when where are no points in a cell because no points is different than having points that sum to 0 (either all points are 0, or there are negative and positive values that balance each other out). |
RebeccaWPerry commentedOct 30, 2021
@anntzer thanks for flagging this interconnected issue. Please bear with me since this is my first PR that has gotten this far.
Weights C ARE specified
Here are the paths forward I can see in the thread above and from thinking about it some more:
Am I missing any viable options? Any recommendations on which path forward to choose? I like 1 or 2, but I'm not sure how to weigh the importance of maintaining the same default behavior of a function from one code release to the next. Thanks in advance for the guidance! I'm happy to make the changes if I can get a clear sense of what the best direction to take this in is! |
anntzer commentedOct 31, 2021
You ran into a fairly complicated issue to start with, and you are doing well in keeping all relevant points in your head :-) I agree with your description of the current behavior. We also try very hard not to change behavior,especially if there's no deprecation warning in between (silently changing the plots of end users is not nice). Here, I think the agreement is that we want mincnt to behave consistently regardless of whether C is given, so there will need to be a backcompat break at some point, and therefore a deprecation warning. I also agree that the magic of (3) is not really a good option (we can't know what warnings we need to suppress, and what warnings are legitimate ones that need to be propagated out). In my view, defaulting reduce_C_function to mean() doesn't make sense, just like hist2d defaults to summing weights that fall into a bin, rather than averaging them. Given that we're already going to have a deprecation period for the change in mincnt semantics, we may just as well do both of them at once, and both switch to sum() and to inclusive mincnt. (This is something where I'd like other devs to weigh in, in particular@dstansby, who has recently played with hexbin().) Therefore, I guess the deprecation would be something like (unchecked...) and in the future (after the deprecation period), even though the signature will now be still add a special-case for |
jklymak commentedNov 22, 2021
I don't think its as complicated as all that. Currently if C is set, and a bin is empty, the bin is set to NaN. The suggested change would simply be: iflen(vals)>=mincntandlen(vals)>0:lattice1[i,j]=reduce_C_function(vals)else:lattice1[i,j]=np.NaN That doesn't require a deprecation, and is consistent with the docs. It does require an API note, since it is fixing a bug? |
jklymak commentedFeb 1, 2022
Popping to draft until the review comments can be dealt with. Feel free to mark active when ready... |
rcomer commentedDec 12, 2023
Hi@RebeccaWPerry, thank you for all your work digging into this. It looks like the issue you were targeting has recently been fixed by another pull request (by basically the same approach). The fix was#26113, with some follow up in#27179 - a pretty gnarly problem! So I'm going to close this one. |
Uh oh!
There was an error while loading.Please reload this page.
PR Summary
This addresses#20877. I dug into the root cause and found that "mincnt" is documented as beingexclusive, sometimes behaves asinclusive (default when you don't provide C), and sometimes behaves asexclusive (when you do provide C). In this PR, I make three small changes to standardize on mincnt beinginclusive.
Prior to the fix:
After the fix:
Expected behavior is for both of these plots to be identical (adapted from example in#20877). The number of points reported by the print statements should be identical as well.:
PR Checklist
pytestpasses).flake8on changed files to check).flake8-docstringsand runflake8 --docstring-convention=all).doc/users/next_whats_new/(follow instructions in README.rst there).doc/api/next_api_changes/(follow instructions in README.rst there).