NotificationsYou must be signed in to change notification settings
Fork8.1k
Star22.1k

DOC: normalizing histograms#27426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

jklymak merged 1 commit intomatplotlib:mainfromjklymak:doc-histogram-normalizations

Dec 7, 2023

Merged

DOC: normalizing histograms#27426

jklymak merged 1 commit intomatplotlib:mainfromjklymak:doc-histogram-normalizations

Dec 7, 2023

Conversation

Copy link

Member

jklymak commentedDec 2, 2023

People often seem confused by thedensity kwarg ofhist. I don't think we should change it, but we could document better.

This was similar enough tohistogram_features that I removed that and added a redirect.

jklymak added the Documentation: examplesfiles in galleries/examples label

Dec 2, 2023

story645 reviewed

Dec 2, 2023

View reviewed changes

Copy link

Member

story645 left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There's a lot of good information here, but honestly it feels a bit overwhelming. I think adding some headings in the normalizing bins sections highlighting what you're trying to show in each subsection may help anchor the reader & honestly make it more likely they won't just skim over the thing but actually look at the section that's relevant for 'em. ETA: same w. the code honestly -> separating the plotting from the labeling code a bit more might make it easier to single out what functionality is trying to be highlighted here.

galleries/examples/statistics/histogram_normalization.py Outdated

Comment on lines 106 to 110

		# to make the point very obvious, consider bins that do not have the same
		# spacing. By normalizing by density, we preserve the shape of the
		# distribution, whereas if we do not, then the wider bins have much higher
		# values than the thin bins:

Copy link

Member

story645Dec 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If this is supposed to be about appropriate bin choice, then separate that out into it's own example w/ a side by side of equally and irregularly spaced bins? Basically I'm reading this trying to visualize the point you're trying to make, so you may as well just visualize it?

Copy link

Member

story645Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

At minimum, I had to read this a couple of times to understand what was going on because of the way that the top sentence was breaking. I'm honestly still not sure if this is what I'm supposed to parse out of this:

By normalizing by density, we preserve the shape of thedistribution. We emphasize this point using  irregularly spaced bins to show that in the unnormalized example the frequencies are much more sensitive to bin width.

Copy link

MemberAuthor

jklymakDec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've added a new section before this, as I agree this was too big a leap. Sorry - I was still playing with it, and should have marked as draft.

Copy link

Member

story645Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No, you did mark it draft - I should have asked if draft meant ready for feedback. I mark draft as an I don't think it's ready to merge, but ready for reviews.

Copy link

MemberAuthor

jklymakDec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I consider Draft to mean not ready for review:

Copy link

Member

story645Dec 4, 2023•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry and ok, got that for next time. Limit of github that there isn't a third -> ready for feedback but still nascent option :/ ETA: meaning I take that review very literally as ready for final it can be merged on approval review.

Copy link

MemberAuthor

jklymakDec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's fine to solicit review for a draft PR, but I just use Draft to abuse CI. Maybe a bad habit, but nicer for a server farm to build the docs for me than doing it locally.

Copy link

Member

story645Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's fair - code spaces might also be really good for your use case and faster.

I'm thinking about proposing adding a field to the PR template asking `if draft, would you like feedback?[], anything in particular" b/c I think this might be a fairly common expectation mismatch (half of us are drafts are for early round feedback while messing around and encourage folks to use it as such, half of us are drafts are messing around before feedback please don't touch) and it's one of those things I don't think folks necessarily think to communicate or ask about.

galleries/examples/statistics/histogram_normalization.py OutdatedShow resolvedHide resolved

jklymak force-pushed thedoc-histogram-normalizations branch from981021d to3e98181Compare

December 3, 2023 16:54

jklymak marked this pull request as draft

December 3, 2023 18:20

jklymak force-pushed thedoc-histogram-normalizations branch 2 times, most recently frome00fe2e toaa52a1dCompare

December 4, 2023 03:52

oscargus reviewed

Dec 4, 2023

View reviewed changes

galleries/examples/statistics/histogram_normalization.py OutdatedShow resolvedHide resolved

oscargus reviewed

Dec 4, 2023

View reviewed changes

galleries/examples/statistics/histogram_normalization.pyShow resolvedHide resolved

Copy link

Member

oscargus commentedDec 4, 2023

I spent some time reading up on this the other day, so I think this is a valuable addition! (I wanted the probability mass function it turned out.)

jklymak force-pushed thedoc-histogram-normalizations branch 2 times, most recently froma0905d4 tob1afe65Compare

December 4, 2023 23:34

Copy link

MemberAuthor

jklymak commentedDec 5, 2023

https://output.circle-artifacts.com/output/job/bd0c26ea-c7d9-4cb5-8f9b-f2c8923574c7/artifacts/0/doc/build/html/gallery/statistics/histogram_normalization.html#sphx-glr-gallery-statistics-histogram-normalization-py

jklymak marked this pull request as ready for review

December 5, 2023 00:29

tacaswell added this to thev3.9.0 milestone

Dec 5, 2023

tacaswell reviewed

Dec 5, 2023

View reviewed changes

galleries/examples/statistics/histogram_normalization.py Outdated

		# %%
		# This normalization can be a little hard to interpret when just exploring the
		# data. The value attached to each bar is divided by the total number of data
		# points _and_ the width of the bin, and thus the values _integrate_ to one

Copy link

Member

tacaswellDec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	# points_and_ the width of the bin, and thus the values_integrate_ to one
	# pointsand the width of the bin, and thus the valuesintegrate to one

we are inrst notmd here

tacaswell reviewed

Dec 5, 2023

View reviewed changes

galleries/examples/statistics/histogram_normalization.py Outdated

Comment on lines 97 to 98

		# e.g. (``density = counts / (sum(counts) * np.diff(bins))``),
		# and (``np.sum(density * np.diff(bins)) == 1``).

Copy link

Member

tacaswellDec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	# e.g. (``density = counts / (sum(counts) * np.diff(bins))``),
	# and (``np.sum(density * np.diff(bins)) == 1``).
	# e.g. ::
	#
	# density = counts / (sum(counts) * np.diff(bins))
	# np.sum(density * np.diff(bins)) == 1

The in-line code snippets are hard to read the way they got wrapped, do as a code block?

tacaswell approved these changes

Dec 5, 2023

View reviewed changes

Copy link

Member

tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I left two small style comments, but this is a significant improvement even without them.

@jklymak can self-merge with or without my suggestions.

Copy link

Member

story645 commentedDec 5, 2023•
edited
Loading

The reason I keep hammering on scoping/chunking/headings is the same reason I'm reorganizing the contributing docs so that information is better binned -> otherwise my brain will just gloss over the content even when I'm trying really hard to figure it out because issues withworking memory are really common for folks with dyslexia, burnout, or ADHD (👋).

A very very low hanging fruit way to make the docs more accessible is some anchoring through titles and headings. Nothing complicated, just tell folks what they should be focusing on/picking out from a subsection/example. Smaller/cleaner examples would be great too, but subheadings are I don't think a big enough ask to be out of scope & much easier to put in now then on a new cycle. ETA: And I'm specifically making this ask of Jody b/c he's an experienced educator and technical writer - I wouldn't necessarily make this ask in other contexts.

story645 requested changes

Dec 5, 2023

View reviewed changes

Copy link

Member

story645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm requesting changes b/c the .rst up top is over indented and that needs to be fixed.

galleries/examples/statistics/histogram_normalization.py Outdated

Comment on lines 12 to 16

		- bin the data as you want, either with an automatically chosen number of
		bins, or with fixed bin edges,
		- normalize the histogram so that its integral is one,
		- and assign weights to the data points, so that each data point affects the
		count in its bin differently.

Copy link

Member

story645Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	-binthedataasyouwant,eitherwithanautomaticallychosennumberof
	bins,orwithfixedbinedges,
	-normalizethehistogramsothatitsintegralisone,
	-andassignweightstothedatapoints,sothateachdatapointaffectsthe
	countinitsbindifferently.
	-binthedataasyouwant,eitherwithanautomaticallychosennumberof
	bins,orwithfixedbinedges,
	-normalizethehistogramsothatitsintegralisone,
	-andassignweightstothedatapoints,sothateachdatapointaffectsthe
	countinitsbindifferently.

galleries/examples/statistics/histogram_normalization.py

		fig, ax = plt.subplot_mosaic([['False', 'True']], layout='constrained')
		dx = 0.1
		xbins = np.arange(-4, 4, dx)
		ax['False'].hist(xdata, bins=xbins, density=False, histtype='step', label='Counts')

Copy link

Member

story645Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For all the bin width comparisons, it's kind of hard to tell the bin widths from the 'step' type - is there a way to actually show the bins?

Copy link

MemberAuthor

jklymakDec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

With so many bins, vertical lines end up almost merging. Fewer bins, it's hard to see the normal distribution.

Copy link

Member

story645Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What about stacking as rows instead of columns so you have more horizontal space to work in?

galleries/examples/statistics/histogram_normalization.py

Comment on lines +172 to +180

		xbins = np.arange(-4, 4, dx)
		# expected histogram:
		ax['False'].plot(xpdf, pdf1000dx, '--', color=f'C{nn}')
		ax['False'].hist(xdata, bins=xbins, density=False, histtype='step')

		ax['True'].hist(xdata, bins=xbins, density=True, histtype='step', label=dx)

Copy link

Member

story645Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do you need both here and all three? only because the busyness makes it feel very cluttered in a way where it's hard to read off the lesson

Copy link

MemberAuthor

jklymakDec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is the main point of why you want to normalize, so comparing and contrasting with and without normalizing is the goal. Multiple bin sizes is to better give the reader an idea of how the bin size affects their results. Sure, it's busy, but I don't think incomprehensibly so.

Copy link

Member

story645Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe same as above, stacking horizontally ? or maybe thinner lines + making the histogram colors paler so that it's easier to visually distinguish?

Copy link

MemberAuthor

jklymak commentedDec 6, 2023

The reason I keep hammering on scoping/chunking/headings is the same reason I'm reorganizing the contributing docs so that information is better binned -> otherwise my brain will just gloss over the content even when I'm trying really hard to figure it out because issues withworking memory are really common for folks with dyslexia, burnout, or ADHD (👋).

The goal of this example is not to provide a random-access reference. That is already in the API docs. The goal is to explain carefully what the normalization options do so that we can point folks who ask why we normalize histograms the way we do towards this reference for an explanation. I consider this a relatively advanced topic, and I think breaking it into more sections or subsections would be more distracting than helpful.

Copy link

Member

story645 commentedDec 6, 2023•
edited
Loading

I consider this a relatively advanced topic

That's mutually exclusive from its accessibility in the universal design context?

I'm not saying make the text accessible to folks who don't have the mathematical background or Python knowledge to parse it, I'm asking that you make it more accessible to folks who's brains may be wired a bit different.

Plenty of writing on advanced topics (and most academic papers) break things out - a great example is7 Stages in Compositionality, which is intro to category theory

I think breaking it into more sections or subsections would be more distracting than helpful.

Is it distracting for you? Because then it's competing needs and we should figure out if we can find something that works for both of us. The lack of sections is super distracting for me, b/c I don't know where to focus or where one part ends or the other begins. And as someone who frequently links folks to our docs, it's nice when I have a specific place to do so.

Reading through it again, I'm guessing the following, which if I'm correct I don't see the harm in putting this in the document & if I'm wrong it would be helpful to have that table setting in the document.

Choosing Bins
1. passing in bin edges
2. passing in number of bins
Normalizing Histograms
1. density = True and scaling
  1. explanation of integration
  2. use case: preserve shape
  3. use case: compare histograms with different bin sizes
2. using weights
  1. explaination of pmf
  2. use case: compare histograms with different populations:

ETA: Also I could reverse outline it b/c I've read this document a bunch of times now and reverse outlined it out of frustration b/c I was trying to understand the structure. And it's clearly well structured, and my hypothesis is folks would be more enticed to read it if they could see at a scan from the outline that it's well structured.

story645 reviewed

Dec 6, 2023

View reviewed changes

galleries/examples/statistics/histogram_normalization.py Outdated

Comment on lines 99 to 100

		# (``density = counts / (sum(counts) * np.diff(bins))``)
		# (``np.sum(density * np.diff(bins)) == 1``).

Copy link

Member

story645Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

this isn't compiling correctly.

DOC: normalizing histograms

f2da1f0

jklymak force-pushed thedoc-histogram-normalizations branch froma305177 tof2da1f0Compare

December 6, 2023 02:11

Copy link

Member

story645 commentedDec 6, 2023•
edited
Loading

Also I think this is a conceptual tutorial on normalizing histograms far more than a gallery example on how to use the function.

Basically going by the rough distinction on the index page of:

Demo-> plot type and example galleries
Usage->user guide and tutorials

I think this is far more usage than demo.

Copy link

MemberAuthor

jklymak commentedDec 7, 2023

Thanks for the discussion. I'm going to decline to modify this example further at this time. I think the current contribution crosses the bar of being a helpful addition. I'd suggest further changes could be follow-up PRs.

Copy link

Member

story645 commentedDec 7, 2023•
edited
Loading

I'd suggest further changes could be follow-up PRs.

If that's true, then why not do it here? There's no urgency to get this in.

ETA: Also why not move it to tutorials, where it'll be easier to find?

story645 previously approved these changes

Dec 7, 2023

View reviewed changes

story645 dismissed theirstale review

December 7, 2023 06:19

it compiles so not gonna block

story645 force-pushed thedoc-histogram-normalizations branch from56b4b24 tob58498dCompare

December 7, 2023 06:54

story645 mentioned this pull request

Dec 7, 2023

Doc: follow up on histogram normalization example#27459

Draft

5 tasks

Copy link

MemberAuthor

jklymak commentedDec 7, 2023

@story645 I do not agree with your changes. Is there a reason you are force pushing onto my branch?

jklymak force-pushed thedoc-histogram-normalizations branch fromb58498d tof2da1f0Compare

December 7, 2023 14:35

Copy link

Member

story645 commentedDec 7, 2023•
edited
Loading

Accident so I undid it?

You can rebase to drop me from the commit history but I had to rebase to drop - I was using the gh cli & didn't realize it would just push back to yours.

Copy link

MemberAuthor

jklymak commentedDec 7, 2023

I'll self merge based on#27426 (review)

jklymak merged commit01fe735 intomatplotlib:main

Dec 7, 2023

jklymak deleted the doc-histogram-normalizations branch

December 7, 2023 15:00

Copy link

Member

tacaswell commentedDec 7, 2023

Reviewing this was on my todo list for yesterday and this morning, sorry I did not get to it in time.

Copy link

MemberAuthor

jklymak commentedDec 7, 2023

@tacaswell, it didn't change from your previous review except for a typo.

jklymak mentioned this pull request

Jan 5, 2024

[DOC]: usage docs content guidelines#26389

Draft

Labels

Documentation: examples

files in galleries/examples

Movatterモバイル変換

Uh oh!

DOC: normalizing histograms#27426

DOC: normalizing histograms#27426

Uh oh!

Conversation

jklymak commentedDec 2, 2023

Uh oh!

story645 left a comment• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

story645Dec 4, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oscargus commentedDec 4, 2023

Uh oh!

jklymak commentedDec 5, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tacaswell left a comment

Choose a reason for hiding this comment

Uh oh!

story645 commentedDec 5, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

story645 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jklymak commentedDec 6, 2023

Uh oh!

story645 commentedDec 6, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

story645 commentedDec 6, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

jklymak commentedDec 7, 2023

Uh oh!

story645 commentedDec 7, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

jklymak commentedDec 7, 2023

story645 left a comment•
edited
Loading

story645Dec 4, 2023•
edited
Loading

story645 commentedDec 5, 2023•
edited
Loading

story645 commentedDec 6, 2023•
edited
Loading

story645 commentedDec 6, 2023•
edited
Loading

story645 commentedDec 7, 2023•
edited
Loading

story645 commentedDec 7, 2023•
edited
Loading