Movatterモバイル変換


[0]ホーム

URL:


psc-vignette

Richard Jackson

2025-11-13

Introduction

The psc.R package implements the methods for applying PersonalisedSynthetic Controls, which allows for patients receiving someexperimental treatment to be compared against a model which predictstheir reponse to some control. This is a form of causal inference whichdifferes from other approaches in that

Data are only required on a single treatment - all counterfactualevidence is supplied by a parametric model

Causal inference, in theory at least, is estimated at a patient level- as opposed to estimating average effects over a population

The causal estimand obtained is the Average Treatment Effect of theTreated (ATT) which differs from the Average Treatment Effect (ATE)obtained in other settings and addresses the question of whethertreatments are effective in the population of patients who are treated.This estimand then targets efficacy over effectivness.

In its basic form, this method creates a likelihood to compare acohort of data to a parametric model. See (X) for disucssion on it’s useas a causal inference tool. To use this package, two basic peices ofinformation are required, a dataset and a model against which they canbe compared.

In this vignette, we will detail how the psc.r package is constructedand give some examples for it’s application in practice.

Methodology

Thepscfit function compares a dataset (‘DC’) against aparametric model. This is done by selecting a likelihood which isidentified by the type of CFM that is supplied. At present, two types ofmodel are supported, a flexible parmaeteric survival model of type‘flexsurvreg’ and a geleneralised linear model of type ‘glm’.

Where the CFM is of type ‘flexsurvreg’ the likeihood supplied is ofthe form:

\[L(D∣\Lambda,\Gamma_i)=\prod_{i=1}^{n}f(t_i∣\Lambda,\Gamma_i)^{c_i}S(t_i∣\Gamma,\Lambda_i)^{(1−c_i)}\]

Where\(\Gamma\) defines thecumulative baseline hazard function,\(\Lambda\) is the linear predictor and\(t\) and\(c\) are the event time and indicatorvariables.

Where the CFM is of the type ‘glm’ the likelihood supplied is of theform:

\[L(x∣\Gamma_i) = \prod_{i=1}^{n} b (x∣\Gamma_i )\exp\{\Gamma_i t(x)−c(\Gamma_i)\}\]

Where\(b(.)\),\(t(.)\) and\(c(.)\) represent the functions of theexponential family. In both cases,\(\Gamma\) is defiend as:

\[ \Gamma_i = \gamma x_i+\beta\]

Where\(\gamma\) are the modelcoefficients supplied by the CFM and\(\beta\) is the parameter set to measure thedifference between the CFM and the DC.

Estimation is performed using a Bayesian MCMC procedure. Priordistributions for\(\Gamma\) (&\(\Lambda\)) are derived directly fromthe model coefficients (mean and variance covariance matrix) or the CFM.A bespoke MCMC routine is performed to estimate\(\beta\). Please see ‘?mcmc’ for moredetials.

For the standard example where the DC contains information from onlya single treatment, trt need not be specified. Where comparisons betweenthe CFM and multiple treatments are require, a covariate of treamtneallocations must be specified sperately (using the ‘trt’ option).

Package Structure

The main function for using applying Personal Synthetic Controls isthe pscfit() function which has two inputs, a Counter-Factual Model(CFM) and a data cohort (DC). Further arguments include

psc object

The output of the “pscfit()” function is an object of class ‘psc’.This class contains the following attributes

Postestimation functions

basic post estimation functions have been developed to work with thepsc object, namely “print()”, “coef()”, “summary()” and “plot()”. Forthe first three of these these provided basic summaries of the efficacyparameter obtained from the posterior distribution.

Motivating Example

The psc.r package includes as example a dataset “e4_data” which isderived from patients ith pancreatic ductal adenocarcinoma (PDAC) whohave all received some experimental treatment, in this case GemCap.Aside from this we also provide a Counter Factual Model (CFM) forpatients in the same setting (named ‘gemCFM’) to receive a therapycalled ‘Gem’. The aim here is to produce a ‘GemCap Vs Gem’ comparison.We start by loading the package and from there obtianing the data andthe model for analysis.

remove.packages("psc")#> Removing package from '/private/var/folders/48/9gl133v90cbgsmgj429bb7hr0000gn/T/RtmpBq9p8p/Rinst17753296a43f7'#> (as 'lib' is unspecified)#rm(list=ls())library(devtools)#> Loading required package: usethisinstall_github("richjjackson/psc")#> Using GitHub PAT from the git credential store.#> Downloading GitHub repo richjjackson/psc@HEAD#> rstpm2 (1.7.0 -> 1.7.1) [CRAN]#> Installing 1 packages: rstpm2#> Installing package into '/private/var/folders/48/9gl133v90cbgsmgj429bb7hr0000gn/T/RtmpBq9p8p/Rinst17753296a43f7'#> (as 'lib' is unspecified)#>#> The downloaded binary packages are in#>  /var/folders/48/9gl133v90cbgsmgj429bb7hr0000gn/T//RtmpWIqaNK/downloaded_packages#> ── R CMD build ─────────────────────────────────────────────────────────────────#> * checking for file ‘/private/var/folders/48/9gl133v90cbgsmgj429bb7hr0000gn/T/RtmpWIqaNK/remotes1777d3b8563ff/richJJackson-psc-94482ab/DESCRIPTION’ ... OK#> * preparing ‘psc’:#> * checking DESCRIPTION meta-information ... OK#> * checking for LF line-endings in source and make files and shell scripts#> * checking for empty or unneeded directories#> * building ‘psc_2.0.0.tar.gz’#> Installing package into '/private/var/folders/48/9gl133v90cbgsmgj429bb7hr0000gn/T/RtmpBq9p8p/Rinst17753296a43f7'#> (as 'lib' is unspecified)library(psc)#> Loading required package: survival#> Loading required package: ggplot2e4_data<- psc::e4_datagemCFM<- psc::gemCFM

#{r} #library(parallel) #library(posterior) #library(ggplot2) #library(survminer) #library(ggpubr) #library(survival) #

Starting with the model, we can inspect the model terms included inthe counter factual model using

gemCFM$terms#> [1] "LymphN"      "ResecM"      "Diff_Status" "PostOpCA199" "(weights)"

Included is a list of prognostic covariates:

Similarly we can observe the outcome terms. As the gemCFM object is asurvival model this includes terms named “time” and “cen”. NB the pscfitfunction will search for these terms and so it is important thatoutcomes included in the data cohort (DC) are labelled in the sameway.

gemCFM$out.nm#> [1] "time" "cen"

Importantly the covariates included in the DC must have names whichmatch these.
Prior to comparing the DC to the CFM then it is advisable to get an goodunderstanding of the data included in the CFM. Within the CFM objectthere are a series of plots to visualise the covariate values which canbe extracted using the ‘plotCFM’ function

plotCFM(gemCFM)

We give esamples of how the ‘pscfit()’ function can be used tocomapre data against models with survival outcomes (with a ‘flexsurvreg’model). Examples on how to perform analyses using GLM model objects areavailable from the github repohttps://github.com/richJJackson/psc

Survival Example

For an example with a survival outcome a model must be supplied whichis contructed ont he basis of flexible parametric splines. This iscontructed using the “flexsurvreg” function within the “flexsurv”package. An example is included within the ‘psc.r’ package names‘surv.mod’ and is loaded using the ’data()” function:

The ‘gemCFM’ is an object of class pscCFM which means it contains allof the structures required for analysis but has stripped the modelobject of any patient level data. Please note that the psc package canbe used by providing standard ‘glm’ or ‘flexsurvreg’ models. Here theprocedures will convert the model into a ‘pscCFM’ object using thepscCFM() function.

In this example the ‘gemCFM’ model is constructed with 5 internalknots and hence 7 parameters to describe the baseline cumulative hazardfunction:

gemCFM$haz_co#>      gamma0      gamma1      gamma2      gamma3      gamma4      gamma5#> -11.3808020   3.8359818   1.2911623  -1.3176560   1.1190182  -0.8876277#>      gamma6#>   0.3372484

There are also prognostic covariates which match with the prognosticcovariates in the data cohort….

gemCFM$cov_co#>      LymphN1      ResecM1 Diff_Status1 Diff_Status2  PostOpCA199#>    0.4876152    0.1805322   -0.4160534   -0.5897823    0.2671471

The process of comparing the DC to the CFM occures across 4steps:

Each of these steps update a ‘pscOb’ object resutls which is returnedto the user. This process is wrapped up in a single ‘pscfit()’function/

NB. A warning is supplied here just to note that data have beenremoved form the DC prior to analysis due to missing data.

We can view the attributes of the psc object that is created. Thisincludes details on all components including the CFM, the DC, thelikelihood applied, the starting values and posterior distribuion

attributes(surv.psc)#> $names#>  [1] "mod_class" "terms"     "out.nm"    "cov_class" "cov_lev"   "co"#>  [7] "cov_co"    "sig"       "haz_co"    "k"         "kn"        "lam"#> [13] "formula"   "datasumm"  "datavis"   "DC"        "lik"       "start.mu"#> [19] "start.sd"  "cfmPost"   "target"    "betaPrior" "ncores"    "draws"#> [25] "postFit"   "postEst"#>#> $class#> [1] "psc"

Its worth noting that as part of the ‘pscData()’ function whichensures the DC and CFM are compatible - the ‘datavis’ object has beenupdated. This will now produce a useful figure to allow evaulation ofhow comparable the data from the CFM and DC are using the ‘plotCFM()’function

plotCFM(surv.psc)

We make use of the ‘posterior’ package to summarise the posteriordistributions (saved within the ‘draws’ object). Both ‘print()’ and‘coef()’ will show these although the ‘summary()’ function provides modeinformation:

This includes information on the underlying CFM as well as expectedmeand response for patients in the DC along with 95% CI (obtianed usingbootstrapping). A summary of the MCMC fit is supplied along with theoverall summary of the posterior distribution.

summary(surv.psc)#> Counterfactual Model (CFM):#> A model of class 'flexsurvreg'#>  Fit with 5 internal knots#>#> CFM Formula:#> Surv(time, cen) ~ LymphN + ResecM + Diff_Status + PostOpCA199#> <environment: 0x116900d30>#>#> CFM Summary:#> Expected response for the outcome under the CFM:#>     S     lo     hi#> 30.30  25.51  36.12#>#> Observed outcome from the Data Cohort:#>          [,1]#> median   26.33#> 0.95LCL  22.85#> 0.95UCL  31.03#>#> MCMC Fit:#> Posterior Distribution obtaine with fit summary:#>       variable     rhat         ess_bulk     ess_tail     mcse_mean#> [1,]  beta_1       1.001946     1000.802     1144.81      0.002742731#>#> Summary:#> Posterior Distribution for beta:Call:#>  CFM model + beta#>#> Coefficients:#>            variable     mean         sd           median       q5#> posterior  beta_1       0.04360839   0.08719634   0.04053274   -0.09466748#>            q95#> posterior  0.1848537

Lastly to visualise the original model and the fit of the data, theplot function has been included

plot(surv.psc)#>            variable     mean         sd           median       q5#> posterior  beta_1       0.04360839   0.08719634   0.04053274   -0.09466748#>            q95#> posterior  0.1848537#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.#> ℹ Please use `linewidth` instead.#> ℹ The deprecated feature was likely used in the ggpubr package.#>   Please report the issue at <https://github.com/kassambara/ggpubr/issues>.#> This warning is displayed once every 8 hours.#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was#> generated.#> Ignoring unknown labels:#> • colour : "Strata"


[8]ページ先頭

©2009-2025 Movatter.jp