Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork7.9k
Description
This is revisiting#4487 in which@jakevdp suggested changing the default ofbins
to 'auto'.
Since automatic determination is now supported in matplotlib via numpy, I think it would be great to make it the default.
The main reason for wanting the change is that many people use this for data analysis, and the behavior ofbins=10
is pretty terrible in many cases (seeJake's example, still many people use the defaults.
Good defaults matter. I'd love to keep educating people but no amount of educating will prevent people from using the defaults (we found this true in sklearn when mining github).
Many people use this from pandas and the actual implementation is in numpy, and@jklymak makes the case that matplotlib ideally delegates as much to numpy as possible. I am very sympathetic to this position.
My main claim is thatsomewhere the default should change.
Currently my position is that matplotlib is the best place for that. I don't think having pandas change the default would be as good as it would lead to inconsistencies between pandas and matplotlib. I would be happy with numpy changing the default, but the use cases of numpy are not necessarily related to visualization or even data analysis at all, so it's less clear to me that 'auto' is a good default there.
Also, from my perspective (and yours might be different), changing the default in numpy is more likely to break people's code and might require code changes, so the case for changing there needs to be really strong, and I think it's weaker than for matplotlib.
If you have good reasons to suggest changing the defaults in numpy, I'm happy for us all to figure this out together (data science user + numpy + matplotlib). But right now, the default behavior leads to people making bad inferences.