osofr/simcausalPublic

NotificationsYou must be signed in to change notification settings
Fork11
Star70

Simulating Longitudinal and Network Data with Causal Inference Applications

70 stars 11 forks Branches Tags Activity

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
R		R
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS		NEWS
README.md		README.md
cran-comments.md		cran-comments.md

Repository files navigation

simcausal

Thesimcausal R package is a tool for specification and simulation of complex longitudinal data structures that are based on structural equation models (SEMs). The emphasis is on the types of simulations frequently encountered in causal inference problems, such as, observational data with time-dependent confounding, selection bias, and random monitoring processes. The interface allows for quick expression of dependencies between a large number of time-varying nodes.

Installation

To install the CRAN release version ofsimcausal:

install.packages('simcausal')

To install the development version (requires thedevtools package):

devtools::install_github('osofr/simcausal',build_vignettes=FALSE)

Documentation

Once the package is installed, see thevignette, consult the internal package documentation and examples.

To see the vignette in R:

vignette("simcausal_vignette",package="simcausal")

To see all available package documentation:

?simcausalhelp(package='simcausal')

To see the latest updates for the currently installed version of the package:

news(package="simcausal")

Brief overview

Below is an example simulating data with 4 covariates specified by 4 structural equations (nodes). New equations are added by using successive calls to+ node() function and data are simulated by callingsim function:

library("simcausal")D<- DAG.empty()+   node("CVD",distr="rcat.b1",probs= c(0.5,0.25,0.25))+  node("A1C",distr="rnorm",mean=5+ (CVD>1)*10+ (CVD>2)*5)+  node("TI",distr="rbern",prob= plogis(-0.5-0.3*CVD+0.2*A1C))+  node("Y",distr="rbern",prob= plogis(-3+1.2*TI+0.1*CVD+0.3*A1C))D<- set.DAG(D)dat<- sim(D,n=200)

To display the above SEM object as a directed acyclic graph:

plotDAG(D)

To allow the above nodesA1C,TI andY to change over time, for time points t = 0,...,7, and keepingCVD the same, simply addt argument tonode function and use the square bracket[...] vector indexing to reference time-varying nodes inside thenode function expressions:

library("simcausal")D<- DAG.empty()+   node("CVD",distr="rcat.b1",probs= c(0.5,0.25,0.25))+  node("A1C",t=0,distr="rnorm",mean=5+ (CVD>1)*10+ (CVD>2)*5)+   node("TI",t=0,distr="rbern",prob=plogis(-5-0.3*CVD+0.5*A1C[t]))+  node("A1C",t=1:7,distr="rnorm",mean=-TI[t-1]*10+5+ (CVD>1)*10+ (CVD>2)*5)+  node("TI",t=1:7,distr="rbern",prob=plogis(-5-0.3*CVD+0.5*A1C[t]+1.5*TI[t-1]))+  node("Y",t=0:7,distr="rbern",prob=plogis(-6-1.2*TI[t]+0.1*CVD+0.3*A1C[t]),EFU=TRUE)D<- set.DAG(D)dat.long<- sim(D,n=200)

The+ action function allows defining counterfactual data under various interventions (e.g., static, dynamic, deterministic, or stochastic), which can be then simulated by callingsim function. In particular, the interventions may represent exposures to treatment regimens, the occurrence or non-occurrence of right-censoring events, or of clinical monitoring events.

In addition, the functionsset.targetE,set.targetMSM andeval.target provide tools for defining and computing a few selected features of the distribution of the counterfactual data that represent common causal quantities of interest, such as, treatment-specific means, the average treatment effects and coefficients from working marginal structural models.

Using networks in SEMs

Functionnetwork provies support for networks simulations, in particular it enables defining and simulating SEM for dependent data. For example, a network sampling function likernet.gnm (provided by the package, see?rnet.gnm) can be used to specify and simulate dependent data from a network-based SEM. Start defining a SEM that uses the this network, with a+network syntax and providing "rnet.gnm" as a "netfun" argument tonetwork function:

library("simcausal")library("magrittr")D<- DAG.empty()+ network("ER.net",netfun="rnet.gnm",m_pn=50)

First define two IDD nodesW1 (categorical) andW2 (Bernoulli):

D<-D+   node("W1",distr="rcat.b1",probs= c(0.0494,0.1823,0.2806,0.2680,0.1651,0.0546))+   node("W2",distr="rbern",prob= plogis(-0.2+W1/3))

New nodes (structural equations) can now be specified conditional on the past node values of observations connected to each uniti (friends ofi). The friends are defined by the network matrix that is returned by the above network generatorrnet.gnm. Double square bracket syntax "[[...]]" allows referencing the node values of connected friends. Two special variables, "Kmax" and "nF" can be used along-side indexing "[[...]]".Kmax defines the maximal number of friends (maximal friend index) for all observation. Whenkth friend referenced in "Var[[k]]" doesn't exist, the default is to set that value to "NA". Adding the argument "replaceNAw0=TRUE" tonode function changes such values fromNA to0.nF is another special variable, which is a vector of lengthn and eachnF[i] is equal to the current number of friends for uniti. Any kind of summary function that can be applied to multiple time-varying nodes can be similarly applied to network-indexed nodes. For additional details, see the package documentation for the network function (?network) and the package vignette on conducting network simulations.

Define network variable "netW1" as theW1 values of the first friend and define binary exposure "A" so that probability of success for each unit 'i' forA is a logit-linear function of:

W1[i],
Sum ofW1 values among all friends ofi,
Mean value ofW2 among all friends ofi.

dat.net<- {D+ node("netW1.F1",distr="rconst",const=W1[[1]])+  node("A",distr="rbern",prob= plogis(2+-0.5*W1+-0.1* sum(W1[[1:Kmax]])+-0.7* ifelse(nF>0, sum(W2[[1:Kmax]])/nF,0)),replaceNAw0=TRUE)} %>%set.DAG() %>%sim(n=1000)

The simulated data frame returned bysim() also contains the simulated network object, saved as a separate attribute. The network is saved as anR6 object of classNetIndClass, under attribute called "netind_cl". The field "NetInd" contains the network matrix, the field "Kmax" contains the maximum number of friends (number of columns inNetInd) and the field "nF" contains the vector for total number of friends for each observation (see?NetIndClass for more information).

(Kmax <- attributes(dat.net)$netind_cl$Kmax)NetInd_mat <- attributes(dat.net)$netind_cl$NetIndhead(NetInd_mat)nF <- attributes(dat.net)$netind_cl$nFhead(nF)

Citation

To citesimcausal in publications, please use:

Sofrygin O, van der Laan MJ, Neugebauer R (2015).simcausal: Simulating Longitudinal Data with Causal Inference Applications. R package version 0.5.

Funding

The development of this package was partially funded through internal operational funds provided by the Kaiser Permanente Center for Effectiveness & Safety Research (CESR). This work was also partially supported through a Patient-Centered Outcomes Research Institute (PCORI) Award (ME-1403-12506) and an NIH grant (R01 AI074345-07).

Copyright

This software is distributed under the GPL-2 license.

About

Simulating Longitudinal and Network Data with Causal Inference Applications

Releases4

Release with official JSS CITATION Latest

Oct 9, 2017

+ 3 releases

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

simcausal

Installation

Documentation

Brief overview

Using networks in SEMs

Citation

Funding

Copyright

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases4

Packages

Contributors2

Uh oh!

Languages

Movatterモバイル変換

osofr/simcausal

Folders and files

Latest commit

History

Repository files navigation

simcausal

Installation

Documentation

Brief overview

Using networks in SEMs

Citation

Funding

Copyright

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases4

Packages0

Contributors2

Uh oh!

Languages

Packages