Movatterモバイル変換


[0]ホーム

URL:


Causal Conditional DistanceCorrelation

Eric W. Bridgeford

2025-01-07

require(causalBatch)require(ggplot2)require(tidyr)n=200

To start, we will begin with a simulation example, similar to theones we were working in for the simulations, which you can accessfrom:

vignette("cb.simulations",package="causalBatch")

Let’s regenerate our working example data with some plottingcode:

# a function for plotting a scatter plot of the dataplot.sim<-function(Ys, Ts, Xs,title="",xlabel="Covariate",ylabel="Outcome (1st dimension)") {  data=data.frame(Y1=Ys[,1],Y2=Ys[,2],Group=factor(Ts,levels=c(0,1),ordered=TRUE),Covariates=Xs)  data%>%ggplot(aes(x=Covariates,y=Y1,color=Group))+geom_point()+labs(title=title,x=xlabel,y=ylabel)+scale_x_continuous(limits =c(-1,1))+scale_color_manual(values=c(`0`="#bb0000",`1`="#0000bb"),name="Group/Batch")+theme_bw()}

Next, we will generate a simulation:

sim=cb.sims.sim_sigmoid(n=n,eff_sz=1,unbalancedness=1.5)plot.sim(sim$Ys, sim$Ts, sim$Xs,title="Sigmoidal Simulation")

Despite the fact that the covariate distributions for eachgroup/batch do not overlap perfectly (note thatunbalancedness is not\(1\)), it looks like the two batches stillappear to be slightly different. We can test this using the causalconditional distance correlation, like so:

result<-cb.detect.caus_cdcorr(sim$Ys, sim$Ts, sim$Xs,R=100)

Here, we set the number of null replicatesR to\(100\) to make the simulation run faster,but in practice you should typically use at least\(1000\) null replicates. To make thisfaster, we would suggest settingnum.threads to be close tothe maximum number of cores available on your machine. You can identifythe number of cores available on your machine usingparallel::detectCores().

With the\(\alpha\) of the test at\(0.05\), we see that the\(p\)-value is:

print(sprintf("p-value: %.4f", result$Test$p.value))#> [1] "p-value: 0.0099"

Since the\(p\)-value is\(< \alpha\), we reject the nullhypothesis in favor of the alternative; that is, that the group/batchcauses a difference in the outcome variable.

We could optionally have pre-computed a distance matrix for theoutcomes, like so:

# compute distance matrix for outcomesDY=dist(sim$Ys)

In your use-cases, you could substitute this distance function forany distance function of your choosing, and pass a distance matrixdirectly to the detection algorithm, by specifying thatdistance=TRUE:

result<-cb.detect.caus_cdcorr(DY, sim$Ts, sim$Xs,distance=TRUE,R=100)

[8]ページ先頭

©2009-2025 Movatter.jp