Movatterモバイル変換


[0]ホーム

URL:


Example multiverse implementation: Femalehurricanes are deadlier than male hurricanes

Abhraneel Sarma, Northwestern University

Alex Kale, University of Washington

2024-10-07

Multiverse case study #4

In this document we re-implement the specification curve analysis bySimonsohn et al. [http://dx.doi.org/10.2139/ssrn.2694998] using theMultiverse library.

Introduction

The specification curve analysis is in principle similar to amultiverse analysis, where all alternate specifications of a particularanalysis asking the same research question are explored. In their study,Simonsohn et al. explore the robustness of the analysis by Jung etal. [https://doi.org/10.1073/pnas.1402786111], whichinvestigated whether hurricanes with female sounding names are moredeadlier than hurricanes with more male sounding names. We first beginby loading the dataset which is provided by the library. We then renamesome of the variables and perform some data transformations whichstandardises some of the variables (mean = 0 and standard deviation =1).

data("hurricane")# read and process datahurricane_data<- hurricane|># rename some variablesrename(year = Year,name = Name,dam = NDAM,death = alldeaths,female = Gender_MF,masfem = MasFem,category = Category,pressure = Minpressure_Updated_2014,wind = HighestWindSpeed    )|># create new variablesmutate(post =ifelse(year>1979,1,0),zcat =as.numeric(scale(category)),zpressure =-scale(pressure),zwind =as.numeric(scale(wind)),z3 =as.numeric((zpressure+ zcat+ zwind)/3)    )

Original analysis

We then illustrate an implementation of the original analysis by Junget al. [https://doi.org/10.1073/pnas.1402786111]. The originalanalysis used a negative binomial model, which is suitable foroverdispersed count data. Due to some issues with model fit with theMASS::glm.nb function (see Note 3:https://github.com/uwdata/boba/tree/master/example/hurricane),we instead use the simpler poisson regression model which will ensurethat none of the models fail while fitting.

In the original analysis, Jung et al. exclude two hurricanes whichcaused the highest number of deaths (Katrina and Audrey) as outliers.They transform the variable used the interactions between the 11-pointfemininity rating and both damages and zpressure respectively, as seenbelow:

df<- hurricane_data|>filter( name!="Katrina"& name!="Audrey" )fit<-glm(death~ masfem* dam+ masfem* zpressure,data = df,family ="poisson")

Multiverse Analysis

To implement a multiverse analysis, we first need to create themultiverse object:

M<-multiverse()

Excluding outliers

In their implementation, Simonsohn et al. describe a principledmethod of excluding outliers based on extreme observations of death anddamages. The consider it reasonable to exclude up two most extremehurricanes in terms of death, and upto three most extreme hurricanes interms of damages. This space of decisions is implemented usingmultiverse as follows:

Note

In this vignette, we make use ofmultiversecode chunks, a custom engine designed to work with themultiverse package, to implement the multiverse analyses. Please referto the vignette (vignette("multiverse-in-rmd")) for moredetails. Users could instead make use of the function which is moresuited for a script-style implementation. Please refer to the vignettes(vignette("complete-multiverse-analysis") andvignette("basic-multiverse")) for more details.

```{multiverse default-m-1, inside = M}df <- hurricane_data |>    filter(branch(death_outliers,         "no_exclusion" ~ TRUE,        "most_extreme_deaths" ~ name != "Katrina",        "most_extreme_two_deaths" ~ ! (name %in% c("Katrina", "Audrey"))    )) |>    filter(branch(damage_outliers,        "no_exclusion" ~ TRUE,        "most_extreme_one_damage" ~ ! (name %in% c("Sandy")),        "most_extreme_two_damage" ~ ! (name %in% c("Sandy", "Andrew")),        "most_extreme_three_damage" ~ ! (name %in% c("Sandy", "Andrew", "Donna"))    ))```

Identifying independent variables

The next decision involves identifying the appropriate independentvariable for the primary effect — how do we operationalise femininity ofthe name of a hurricane. Simonsohn et al. identify two distinct ways.First, using the 11 point scale that was used in the original analysis;or second using a binary scale.

The other decision involved is whether or not to transformdamages, another independent variable. damages follow along tailed, positive only valued distribution.

We implement these two decisions in our multiverse as follows:

```{multiverse label = default-m-2, inside = M}df <- df |>    mutate(        femininity = branch(femininity_calculation,          "masfem" ~ masfem,          "female" ~ female        ),        damage = branch(damage_transform,          "no_transform" ~ identity(dam),          "log_transform" ~ log(dam)        )    )```

Declaring alternative specifications of regression model

The next step is to fit the model. We can use either a log-linearmodel or a poisson model for the step. Both are reasonable alternativesfor this dataset. We also have to make a choice on whether we want toinclude an interaction betweenfemininity anddamage. This results in the following specification:

```{multiverse label = default-m-3, inside = M}fit <- glm(branch(model, "linear" ~ log(death + 1), "poisson" ~ death) ~           branch(main_interaction,              "no" ~ femininity + damage,              "yes" ~ femininity * damage          ) + branch(other_predictors,              "none" ~ NULL,              "pressure" %when% (main_interaction == "yes") ~ femininity * zpressure,              "wind" %when% (main_interaction == "yes") ~ femininity * zwind,              "category" %when% (main_interaction == "yes") ~ femininity * zcat,              "all" %when% (main_interaction == "yes") ~ femininity * z3,              "all_no_interaction" %when% (main_interaction == "no") ~ z3          ) + branch(covariates, "1" ~ NULL, "2" ~ year:damage, "3" ~ post:damage),           family = branch(model, "linear" ~ "gaussian", "poisson" ~ "poisson"),            data = df)```

Once we have implemented the analysis model in our multiverse, thecorresponding step will be applied to each analysis path. To interpretthe results, we first estimate a prediction interval corresponding toeach analysis path.

```{multiverse label = default-m-4, inside = M}pred <- predict(fit, se.fit = TRUE, type = "response")pred2expectation <- function(mu, sigma) {    branch(model, "linear" ~ exp(mu + sigma^2/2) - 1, "poisson" ~ mu)}disagg_fit <- df  |>    mutate(        fitted = pred$fit,                                # add fitted predictions and standard errors to dataframe        se.fit = pred$se.fit,        deg_f = df.residual(fit),                         # get degrees of freedom        sigma = sigma(fit),                               # get residual standard deviation        se.residual = sqrt(sum(residuals(fit)^2) / deg_f) # get residual standard errors    )# aggregate fitted effect of female storm nameexpectation <- disagg_fit |>    mutate(expected_deaths = pred2expectation(fitted, sigma)) |>     group_by(female) |>    summarise(mean_deaths = mean(expected_deaths), .groups = "drop_last") |>     compare_levels(mean_deaths, by = female)```

Execution and Results

After we’ve specified our multiverse analysis, we would like toexecute the entire multiverse, and view the results. Below, we plot themean difference point estimate for expected deaths when a hurricane hasa more feminine name, for each unique analysis path. We find that basedon these arbitrary specifications of the multiverse, there is perhaps norelation betweenfemininity of the name of a hurricane andthe number of deaths that it causes, as some models predict a lowernumber of deaths, and some predict much higher.

execute_multiverse(M)mean_deaths<- multiverse::expand(M)|>extract_variables(expectation)|>unnest(expectation)mean_deaths|>arrange(mean_deaths)|>mutate(.id =row_number())|>ggplot(aes(y = mean_deaths,x = .id))+geom_point()+theme_minimal()+labs(x ="universe",y ="Mean difference in expected deaths")


[8]ページ先頭

©2009-2025 Movatter.jp