Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Change density defaultbw = "nrd0" tobw = "sj"#5854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
teunbrand wants to merge4 commits intotidyverse:main
base:main
Choose a base branch
Loading
fromteunbrand:density_bw_to_SJ

Conversation

@teunbrand
Copy link
Collaborator

This PR aims tofix#3825.

Briefly,?density recommends the"sj" method over the"nrd0" default. This change propagates that recommendation to ggplot2's density calculations.

Some details:

  • Understandably, plenty of snapshots have changed.
  • I realised thatcalc_bw() andprecompute_bw() were doing the exact same task, so I merged these.
  • Whenvar(x) == 0, most of thestats::bw.*() functions throw an error, so we keepbw.nrd0() for these degenerate cases because it will not throw an error.

@teunbrandteunbrand added the breaking change ☠️API change likely to affect existing code labelApr 22, 2024
@thomasp85
Copy link
Member

I'm not sure I think this is worth the breaking change to be honest... I probably miss a compelling example of when it would improve a visualisation

@teunbrand
Copy link
CollaboratorAuthor

I don't know enough about these bandwidth estimators to make a qualified judgement.

However,this post shares some thoughts.
Consider having multimodal data.
The current defaultbw = "nrd0" smooths over everything.

library(ggplot2)set.seed(0)df<-data.frame(x= c(  runif(20,9,11),  runif(10,19,21),  runif(20,29,31),  runif(30,39,41)))ggplot(df, aes(x))+  geom_density(bw="nrd0")+  geom_rug()+  ggtitle("bw = 'nrd0'")

Whereasbw = "sj" preserves the multimodality.

ggplot(df, aes(x))+  geom_density(bw="sj")+  geom_rug()+  ggtitle("bw = 'sj'")

Counter-argument might be that if you have some exponential data,bw = "nrd0" gives a nice smooth density.

df<-data.frame(x= rexp(1000))ggplot(df, aes(x))+  geom_density(bw="nrd0",bounds= c(0,Inf))+  geom_rug()+  ggtitle("bw = 'nrd0'")

Whereasbw = "sj" gives a more 'wobbly' curve.

ggplot(df, aes(x))+  geom_density(bw="sj",bounds= c(0,Inf))+  geom_rug()+  ggtitle("bw = 'sj'")

Created on 2024-05-20 withreprex v2.1.0

It is probably a trade-off that I cannot motivate very well.
Maybe we can bat-signal@mjskay to see if we can get him to share his opinion.

@mjskay
Copy link
Contributor

I could perhaps dig up some examples if there is interest. I changed the ggdist default to the equivalent ofbw.SJ(..., method = "dpi") around the time I added automatic bounds detection, because I found it tended to do better with some bounded distributions (e.g. log normal, where nrd0 with the reflection method doesn't always have a small enough bandwidth to get the mode right). I started a more extensive evaluation of bandwidth estimators at some point but haven't had a chance to finish it.

@mjskay
Copy link
Contributor

For example

set.seed(20240522)data.frame(x= rlnorm(10000))|>    ggplot()+    geom_density(aes(x,color="sj"),bw="sj",bounds= c(0,Inf),n=1000)+    geom_density(aes(x,color="nrd0"),bw="nrd0",bounds= c(0,Inf),n=1000)+    geom_function(fun=dlnorm,n=1000,linetype="22")+    coord_cartesian(xlim= c(0,5))

image

It is worth pointing out thatnrd0 is an easier-to-use default because the other bandwidth estimators may fail in some cases, particularly data with many duplicates. e.g.:

density(rep(1,10),bw="SJ")## Error in bw.SJ(x, method = "ste") : sample is too sparse to find TD

ggdist handles this by falling back tonrd0 with a warning that the user probably shouldn't be using a kde on discrete data anyway (because, well, they shouldn't be...).

@teunbrand
Copy link
CollaboratorAuthor

Thanks@mjskay, I appreciate your thoughts on the matter. I tend to agree with you and the authors of?density that it probably is a better default thannrd0. Maybe this PR is worth reviving, but instead of changing the default bandwidth estimator, which might cause too much of a kerfuffle with users, ggplot2 could set a global option for it so it is easier to changes across density and violin layers simultaneously.

@teunbrandteunbrand reopened thisMay 22, 2024
@teunbrandteunbrand marked this pull request as draftMay 22, 2024 19:28
@teunbrandteunbrand added this to theEditioning milestoneDec 16, 2024
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

breaking change ☠️API change likely to affect existing code

Projects

None yet

Milestone

Editioning

Development

Successfully merging this pull request may close these issues.

Change default for bw arg of stat_density() and stat_ydensity()

3 participants

@teunbrand@thomasp85@mjskay

[8]ページ先頭

©2009-2025 Movatter.jp