This vignette elaborates and demonstrates the asymmetric and causalShapley value frameworks introduced byFrye,Rowat, and Feige (2020) andHeskes et al.(2020), respectively. We also consider the marginal andconditional Shapley value frameworks, seeLundberg and Lee (2017) andAas, Jullum, and Løland (2021), respectively. Wedemonstrate the frameworks on thebike sharing datasetfrom the UCI Machine Learning Repository. The setup is based on theCauSHAPley package, which is thecodesupplement to theHeskes et al. (2020)paper. TheCauSHAPley package was based on an old versionofshapr and was restricted to thegaussianapproach (see section 6 inHeskes et al.(2020) for more details).
We have extended the causal Shapley value framework to work for allMonte Carlo-based approaches (independence (notrecommended),empirical,gaussian,copula,ctree,vaeac andcategorical), while the extension of the asymmetric Shapleyvalue framework works for all the Monte Carlo and regression-basedapproaches. Our generalization is of uttermost importance, as manyreal-world data sets are far from the Gaussian distribution, and,compared toCauSHAPley, our implementation can utilize allofshapr’s new features, such as batch computation,parallelization and iterative computation for both feature-wise andgroup-wise Shapley values.
The main differences between the marginal, conditional, and causalShapley value frameworks are that they sample/generate the Monte Carlosamples from the marginal distribution, (conventional) observationalconditional distribution, and interventional conditional distribution,respectively. Asymmetric means that we do not consider all possiblecoalitions, but rather only the coalitions that respect a causalordering.
Asymmetric (conditional) Shapley values were proposed byFrye, Rowat, and Feige (2020) as a way toincorporate causal knowledge in the real world by computing the Shapleyvalue explanations using only the feature combinations/coalitionsconsistent with a (partial) causal ordering. See the figure below for aschematic overview of the causal ordering we are going to use in theexamples in this vignette. In the figure, we see that our causalordering consists of three components:\(\tau_1 = \{X_1\}\),\(\tau_2 = \{X_2, X_3\}\), and\(\tau_3 = \{X_4, X_5, X_6, X_7\}\). See thecode section for what the features represent.
To elaborate, instead of considering the\(2^M\) possible coalitions, where\(M\) is the number of features, asymmetricShapley values only consider the subset of coalitions which respects thecausal ordering. For our causal ordering, this means that the asymmetricShapley value explanation framework skips the coalitions where\(X_2\) is included but\(X_1\), as\(X_1\) is the ancestor of\(X_2\). This will skew the explanationstowards distal/root causes, see Section 3.2 inFrye, Rowat, and Feige (2020).
We can use all approaches inshapr, both MonteCarlo-based and regression-based methods, to compute the asymmetricShapley values. This is because the asymmetric Shapley value explanationframework does not change how we compute the contribution functions\(v(S)\), but rather which of thecoalitions\(S\) that are used tocompute the Shapley value explanations. This means that the number ofcoalitions is no longer\(O(2^M)\), butrather\(O(2^{\tau_0})\), where\(\tau_0 = \operatorname{max}_i |\tau_i|\) isthe number of features (\(|\tau_i|\))in the largest component of the causal ordering.
Furthermore, asymmetric Shapley values support groups of features,but then the causal ordering must be given on the group level instead ofon the feature level. The asymmetric Shapley value framework alsosupports sampling of coalitions where the sampling is done from the setof coalitions that respect the causal ordering.
Finally, we remark that asymmetric conditional Shapley values areequivalent to asymmetric causal Shapley values (see below) when we onlyuse the coalitions respecting the causal ordering and assuming that alldependencies within chain components are induced by mutualinteractions.
Schematic overview of the causal ordering used in this vignette.
Causal Shapley values were proposed byHeskeset al. (2020) as a way to explain the total effect of features onthe prediction by taking into account their causal relationships andadapting the sampling procedure inshapr. More precisely,they propose to employ Pearl’s do-calculus to circumvent theindependence assumption, made byLundberg and Lee(2017), without sacrificing any of the desirable properties ofthe Shapley value framework. The causal Shapley value explanationframework can also separate the contribution of direct and indirecteffects, which makes them principally different from marginal andconditional Shapley values. The framework also provides a more directand robust way to incorporate causal knowledge, compared to theasymmetric Shapley value explanation framework.
To compute causal Shapley values, we have to specify a (partial)causal ordering and make an assumption about the confounding in eachcomponent. Together, they form a causal chain graph which containsdirected and undirected edges. All features that are treated on an equalfooting are linked together with undirected edges and become part of thesame chain component. Edges between chain components are directed andrepresent causal relationships. In the figure below, we have the samecausal ordering as above, but we have in addition made the assumptionthat we have confounding in the second component, but no confounding inthe first and third components. This allows us to correctly distinguishbetween dependencies that are due to confounding and mutualinteractions. That is, in the figure, the dependencies in chaincomponent\(\tau_2\) are assumed to bethe result of a common confounder, and those in\(\tau_3\) of mutual interactions, while wehave no mutual interactions in\(\tau_1\) as it is a singleton.
Computing the effect of an intervention depends on how we interpretthe generative process that leads to the feature dependencies withineach component. If they are the result of marginalizing out a commonconfounder, then intervention on a particular feature will break thedependency with the other features, and we denote the set of these chaincomponents by\(\mathcal{T}_{\text{confounding}}\). For thecomponents with mutual feature interactions, setting the value of afeature affects the distribution of the variables within the samecomponent. We denote the set of these components by\(\mathcal{T}_{\,\overline{\text{confounding}}}\).
Heskes et al. (2020) described how anyexpectation by intervention needed to compute the causal Shapley valuescan be translated to an expectation by observation, by using theinterventional formula for causal chain graphs:\[\begin{align}\label{eq:do}P(X_{\bar{\mathcal{S}}} \mid do(X_\mathcal{S} = x_\mathcal{S}))= &\prod_{\tau \in \mathcal{T}_{\,\text{confounding}}}P(X_{\tau \cap \bar{\mathcal{S}}} \mid X_{\text{pa}(\tau) \cap\bar{\mathcal{S}}}, x_{\text{pa}(\tau) \cap \mathcal{S}}) \times \tag{1}\\& \quad\prod_{\tau \in \mathcal{T}_{\,\overline{\text{confounding}}}}P(X_{\tau \cap \bar{\mathcal{S}}} \mid X_{\text{pa}(\tau) \cap\bar{\mathcal{S}}}, x_{\text{pa}(\tau) \cap \mathcal{S}}, x_{\tau \cap\mathcal{S}}).\end{align}\] Here, any of the Monte Carlo-based approaches inshapr can be used to compute the conditionaldistributions/observational expectations. The marginals are estimatedfrom the training data for all approaches exceptgaussian,for which we use the marginals of the Gaussian distribution instead.
For specific causal chain graphs, the causal Shapley value frameworksimplifies to symmetric conditional, asymmetric conditional, andmarginal Shapley values; see Corollaries 1 to 3 in the supplement ofHeskes et al. (2020).
Schematic overview of the causal chain graph used in this vignette.
Causal Shapley values are equivalent to marginal Shapley values whenall\(M\) features are combined into asingle component\(\tau = \mathcal{M} =\{1,2,...,M\}\) and all dependencies are induced by confounding.Then\(\text{pa}(\tau) = \emptyset\),and\(P(X_{\bar{\mathcal{S}}} \middo(X_\mathcal{S} = x_\mathcal{S}))\) in Equation (\(\ref{eq:do}\)) simplifies to\(P(X_{\bar{\mathcal{S}}} \mid do(X_\mathcal{S} =x_\mathcal{S})) = P(X_{\bar{\mathcal{S}}})\), as specified inLundberg and Lee (2017).
The Monte Carlo samples for the marginals are generated by samplingfrom the training data, except for thegaussian approachwhere we use the marginals of the estimated multivariate Gaussiandistribution. This means that for all other approaches, this is the sameas using theindependence approach in the conditionalShapley value explanation framework.
Causal Shapley values are equivalent to symmetric conditional Shapleyvalues when all\(M\) features arecombined into a single component\(\tau =\mathcal{M} = \{1,2,...,M\}\) and all dependencies are induced bymutual interaction. Then\(\text{pa}(\tau) =\emptyset\), and\(P(X_{\bar{\mathcal{S}}} \mid do(X_\mathcal{S} =x_\mathcal{S}))\) in Equation (\(\ref{eq:do}\)) simplifies to\(P(X_{\bar{\mathcal{S}}} \mid do(X_\mathcal{S} =x_\mathcal{S})) = P(X_{\bar{\mathcal{S}}} \mid X_\mathcal{S} =x_\mathcal{S})\), as specified inAas,Jullum, and Løland (2021). Symmetric means that we consider allcoalitions.
We demonstrate the frameworks on thebike sharing datasetfrom the UCI Machine Learning Repository. We let the features be thenumber of days since January 2011 (trend), two cyclicalvariables representing the season (cosyear,sinyear), temperature (temp), feelingtemperature (atemp), wind speed (windspeed),and humidity (hum). The first three features are consideredto be a potential cause of the four weather-related features. The bikerental is strongly seasonal and shows an upward trend, as illustrated inthe figure below. The bike data is split randomly into a training (80%)and test/explicand (20%) set. We train anXGBoost model for100 rounds with default hyperparameters to act as the model we want toexplain.
In the table below, we highlight the Shapley value explanationframeworks introduced above and how to access them by changing theargumentsasymmetric,ordering, andconfounding inshapr::explain(). Note thatsymmetric conditional Shapley values are the default version; i.e., bydefaultasymmetric = FALSE,ordering = NULL,confounding = NULL.
| Framework | Sampling | Approaches | asymmetric | ordering | confounding |
|---|---|---|---|---|---|
| Sym. Conditional | \(P(X_{\bar{\mathcal{S}}}\mid X_\mathcal{S} = x_\mathcal{S})\) | All | FALSE | NULL | NULL |
| Asym. Conditional | \(P(X_{\bar{\mathcal{S}}}\mid X_\mathcal{S} = x_\mathcal{S})\) | All | TRUE | list(...) | NULL |
| Sym. Causal | \(P(X_{\bar{\mathcal{S}}}\mid do(X_\mathcal{S} = x_\mathcal{S}))\) | All MC-based | FALSE | list(...) | c(...) |
| Asym. Causal | \(P(X_{\bar{\mathcal{S}}}\mid do(X_\mathcal{S} = x_\mathcal{S}))\) | All MC-based | TRUE | list(...) | c(...) |
| Sym. Marginal | \(P(X_{\bar{\mathcal{S}}})\) | indep.,gaussian | FALSE | NULL | TRUE |
First, we load the needed libraries, set up the training/explicanddata, plot the data, and train anxgboost model.
library(ggplot2)library(xgboost)library(data.table)library(shapr)# Additional packages which are only used for plotting in this vignette.# These are not listed as dependencies in shaprlibrary(GGally)library(ggpubr)library(gridExtra)# Ensure that shapr's functions are prioritized; otherwise, we need to use the `shapr::`# prefix when calling explain(). The `conflicted` package is imported by `tidymodels`.conflicted::conflicts_prefer(shapr::explain, shapr::prepare_data)# Set up the data# Can also download the data set from the source https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset# temp <- tempfile()# download.file("https://archive.ics.uci.edu/static/public/275/bike+sharing+dataset.zip", temp)# bike <- read.csv(unz(temp, "day.csv"))# unlink(temp)bike<-read.csv("../inst/extdata/day.csv")# Difference in days, which takes DST into accountbike$trend<-as.numeric(difftime(bike$dteday, bike$dteday[1],units ="days"))bike$cosyear<-cospi(bike$trend/365*2)bike$sinyear<-sinpi(bike$trend/365*2)# Unnormalize variables (see data set information in link above)bike$temp<- bike$temp* (39- (-8))+ (-8)bike$atemp<- bike$atemp* (50- (-16))+ (-16)bike$windspeed<-67* bike$windspeedbike$hum<-100* bike$hum# Plot the dataggplot(bike,aes(x = trend,y = cnt,color = temp))+geom_point(size =0.75)+scale_color_gradient(low ="blue",high ="red")+labs(colour ="temp")+xlab("Days since 1 January 2011")+ylab("Number of bikes rented")+theme_minimal()+theme(legend.position ="right",legend.title =element_text(size =10))# Define the features and the response variablex_var<-c("trend","cosyear","sinyear","temp","atemp","windspeed","hum")y_var<-"cnt"#NOTE: To avoid RNG reproducibility issues across different systems, we# load the training-test split from a file. 80% training and 20% testtrain_index<-readRDS("../inst/extdata/train_index.rds")# Training datax_train<-as.matrix(bike[train_index, x_var])y_train_nc<-as.matrix(bike[train_index, y_var])# not centeredy_train<- y_train_nc-mean(y_train_nc)# Plot pairs plotGGally::ggpairs(x_train)# Test/explicand datax_explain<-as.matrix(bike[-train_index, x_var])y_explain_nc<-as.matrix(bike[-train_index, y_var])# not centeredy_explain<- y_explain_nc-mean(y_train_nc)# Get 6 explicands to plot the Shapley values for, with a wide spread in their predicted outcomesn_index_x_explain<-6index_x_explain<-order(y_explain)[seq(1,length(y_explain),length.out = n_index_x_explain)]y_explain[index_x_explain]#> [1] -3900.03 -1872.03 -377.03 411.97 1690.97 3889.97# Fit an XGBoost model to the training datamodel<- xgboost::xgboost(data = x_train,label = y_train,nround =100,verbose =FALSE)# Save the phi0phi0<-mean(y_train)# Look at the root mean squared errorsqrt(mean((predict(model, x_explain)- y_explain)^2))#> [1] 798.71ggplot(data.table("response"= y_explain[,1],"predicted_response"=predict(model, x_explain)),aes(response, predicted_response))+geom_point()We are going to use thecausal_ordering andconfounding illustrated in the figures above. Forcausal_ordering, we can either provide the index of featureor the feature names. Thus, the following two versions ofcausal_ordering will produce equivalent results.Furthermore, we assume that we have confounding for the second component(i.e., the season has an effect on the weather) and no confounding forthe third component (i.e., we do not know how to model the intricaterelations between the weather features).
causal_ordering<-list(1,c(2,3),c(4:7))causal_ordering<-list("trend",c("cosyear","sinyear"),c("temp","atemp","windspeed","hum"))confounding<-c(FALSE,TRUE,FALSE)To make the rest of the vignette easier to follow, we create somehelper functions that plot and summarize the results of the explanationmethods. This code block is optional to understand and can beskipped.
# Extract the MSEv criterion scores and elapsed timesprint_MSEv_scores_and_time<-function(explanation_list) { res<-as.data.frame(t(sapply( explanation_list,function(explanation) {round(c( explanation$MSEv$MSEv$MSEv, explanation$MSEv$MSEv$MSEv_sd, explanation$timing$summary$total_time_secs ),2) } )))colnames(res)<-c("MSEv","MSEv_sd","Time (secs)")return(res)}# Print the full time informationprint_time<-function(explanation_list) {t(sapply(explanation_list,function(explanation) explanation$timing$summary$total_time_secs))}# Make beeswarm plotsplot_beeswarms<-function(explanation_list,title ="", ...) {# Make the beeswarm plots grobs<-lapply(seq(length(explanation_list)),function(explanation_idx) { gg<-plot(explanation_list[[explanation_idx]],plot_type ="beeswarm",print_ggplot =FALSE, ...)+ ggplot2::ggtitle(tools::toTitleCase(gsub("_"," ",names(explanation_list)[[explanation_idx]])))# Flip the order such that the features come in the right order gg<- gg+ ggplot2::scale_x_discrete(limits =rev(levels(gg$data$variable)[levels(gg$data$variable)!="none"])) })# Get the limits ylim<-sapply(grobs,function(grob) ggplot2::ggplot_build(grob)$layout$panel_scales_y[[1]]$range$range) ylim<-c(min(ylim),max(ylim))# Update the limits grobs<-suppressMessages(lapply(grobs,function(grob) grob+ ggplot2::coord_flip(ylim = ylim)))# Make the combined plot gridExtra::grid.arrange(grobs = grobs,ncol =1,top = grid::textGrob(title,gp = grid::gpar(fontsize =18,font =8)) )}We start by demonstrating how to compute symmetric conditionalShapley values. This is the default version inshapr andthere is no need to specify the arguments below. However, we havespecified them for the sake of clarity. We use thegaussian,ctree, andregression_separate(xgboost with defaulthyperparameters) approaches, but any other approach can also beused.
# list to store the resultsexplanation_sym_con<-list()explanation_sym_con[["gaussian"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,approach ="gaussian",phi0 = phi0,seed =1,n_MC_samples =1000,asymmetric =FALSE,# Default value (TRUE will give the same since `causal_ordering = NULL`)causal_ordering =NULL,# Default valueconfounding =NULL# Default value)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:14:33 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_features = 128`,#> and is therefore set to `2^n_features = 128`.#>#>#> ── Explanation overview ──#>#>#>#> • Model class: <xgb.Booster>#>#> • v(S) estimation class: Monte Carlo integration#>#> • Approach: gaussian#>#> • Procedure: Iterative#>#> • Number of Monte Carlo integration samples: 1000#>#> • Number of feature-wise Shapley values: 7#>#> • Number of observations to explain: 144#>#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec72e4056.rds'#>#>#>#> ── Iterative computation started ──#>#>#>#> ── Iteration 1 ──────────────────────────────────────────────────────#>#> ℹ Using 14 of 128 coalitions, 14 new.#>#>#>#> ── Iteration 2 ──────────────────────────────────────────────────────#>#> ℹ Using 26 of 128 coalitions, 12 new.#>#>#>#> ── Iteration 3 ──────────────────────────────────────────────────────#>#> ℹ Using 46 of 128 coalitions, 20 new.#>#>#>#> ── Iteration 4 ──────────────────────────────────────────────────────#>#> ℹ Using 68 of 128 coalitions, 22 new.explanation_sym_con[["ctree"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,approach ="ctree",phi0 = phi0,seed =1,n_MC_samples =1000,asymmetric =FALSE,# Default value (TRUE will give the same since `causal_ordering = NULL`)causal_ordering =NULL,# Default valueconfounding =NULL# Default value)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:14:42 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_features = 128`,#> and is therefore set to `2^n_features = 128`.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: ctree#> • Procedure: Iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of feature-wise Shapley values: 7#> • Number of observations to explain: 144#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec712ce877.rds'#>#> ── Iterative computation started ──#>#> ── Iteration 1 ──────────────────────────────────────────────────────#> ℹ Using 14 of 128 coalitions, 14 new.#>#> ── Iteration 2 ──────────────────────────────────────────────────────#> ℹ Using 26 of 128 coalitions, 12 new.#>#> ── Iteration 3 ──────────────────────────────────────────────────────#> ℹ Using 46 of 128 coalitions, 20 new.#>#> ── Iteration 4 ──────────────────────────────────────────────────────#> ℹ Using 70 of 128 coalitions, 24 new.#>#> ── Iteration 5 ──────────────────────────────────────────────────────#> ℹ Using 94 of 128 coalitions, 24 new.#>#> ── Iteration 6 ──────────────────────────────────────────────────────#> ℹ Using 110 of 128 coalitions, 16 new.explanation_sym_con[["xgboost"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,approach ="regression_separate",regression.model = parsnip::boost_tree(engine ="xgboost",mode ="regression"),asymmetric =FALSE,# Default value (TRUE will give the same as `causal_ordering = NULL`)causal_ordering =NULL,# Default valueconfounding =NULL# Default value)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:15:42 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_features = 128`,#> and is therefore set to `2^n_features = 128`.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Regression#> • Approach: regression_separate#> • Procedure: Iterative#> • Number of feature-wise Shapley values: 7#> • Number of observations to explain: 144#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec478565d8.rds'#>#> ── Iterative computation started ──#>#> ── Iteration 1 ──────────────────────────────────────────────────────#> ℹ Using 14 of 128 coalitions, 14 new.#>#> ── Iteration 2 ──────────────────────────────────────────────────────#> ℹ Using 26 of 128 coalitions, 12 new.#>#> ── Iteration 3 ──────────────────────────────────────────────────────#> ℹ Using 46 of 128 coalitions, 20 new.#>#> ── Iteration 4 ──────────────────────────────────────────────────────#> ℹ Using 70 of 128 coalitions, 24 new.#>#> ── Iteration 5 ──────────────────────────────────────────────────────#> ℹ Using 94 of 128 coalitions, 24 new.#>#> ── Iteration 6 ──────────────────────────────────────────────────────#> ℹ Using 110 of 128 coalitions, 16 new.#>#> ── Iteration 7 ──────────────────────────────────────────────────────#> ℹ Using 118 of 128 coalitions, 8 new.We can then look at the\(\operatorname{MSE}_v\) evaluation scores tocompare the approaches. All approaches are comparable, butxgboost is clearly the fastest approach.
print_MSEv_scores_and_time(explanation_sym_con)#> MSEv MSEv_sd Time (secs)#> gaussian 1076528 75021 9.53#> ctree 1044396 66196 60.03#> xgboost 1086792 62794 21.11We can then plot the Shapley values for the six explicands chosenabove.
plot_SV_several_approaches(explanation_sym_con, index_x_explain,print_ggplot =FALSE)+theme(legend.position ="bottom")We can also make beeswarm plots of the Shapley values to look at thestructure of the Shapley values for all explicands. The figures arequite similar, but with minor differences. E.g., thegaussian approach produces almost no Shapley values around\(500\) for thetrendfeature.
Then we look at the asymmetric conditional Shapley values. To obtainthese types of Shapley values, we have to specify thatasymmetric = TRUE and acausal_ordering. Weusecausal_ordering = list(1, c(2, 3), c(4:7)).
explanation_asym_con<-list()explanation_asym_con[["gaussian"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,n_MC_samples =1000,approach ="gaussian",asymmetric =TRUE,causal_ordering = causal_ordering,confounding =NULL# Default value)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:16:05 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than the number of#> coalitions respecting the causal ordering (20), and is therefore#> set to 20.#>#>#> ── Explanation overview ──#>#>#>#> • Model class: <xgb.Booster>#>#> • v(S) estimation class: Monte Carlo integration#>#> • Approach: gaussian#>#> • Procedure: Iterative#>#> • Number of Monte Carlo integration samples: 1000#>#> • Number of feature-wise Shapley values: 7#>#> • Number of observations to explain: 144#>#> • Number of asymmetric coalitions: 20#>#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp, atemp,#> windspeed, hum}#>#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec52e99a32.rds'#>#>#>#> ── Iterative computation started ──#>#>#>#> ── Iteration 1 ──────────────────────────────────────────────────────#>#> ℹ Using 13 of 20 coalitions, 13 new.#>#>#>#> ── Iteration 2 ──────────────────────────────────────────────────────#>#> ℹ Using 20 of 20 coalitions, 7 new.explanation_asym_con[["gaussian_non_iterative"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,n_MC_samples =1000,approach ="gaussian",asymmetric =TRUE,causal_ordering = causal_ordering,confounding =NULL,# Default valueiterative =FALSE)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:16:09 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than the number of#> coalitions respecting the causal ordering (20), and is therefore#> set to 20.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: gaussian#> • Procedure: Non-iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of feature-wise Shapley values: 7#> • Number of observations to explain: 144#> • Number of asymmetric coalitions: 20#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp, atemp,#> windspeed, hum}#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec3cc23fca.rds'#>#> ── Main computation started ──#>#> ℹ Using 20 of 20 coalitions.explanation_asym_con[["ctree"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,n_MC_samples =1000,approach ="ctree",asymmetric =TRUE,causal_ordering = causal_ordering,confounding =NULL# Default value)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:16:11 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than the number of#> coalitions respecting the causal ordering (20), and is therefore#> set to 20.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: ctree#> • Procedure: Iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of feature-wise Shapley values: 7#> • Number of observations to explain: 144#> • Number of asymmetric coalitions: 20#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp, atemp,#> windspeed, hum}#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec17ae365a.rds'#>#> ── Iterative computation started ──#>#> ── Iteration 1 ──────────────────────────────────────────────────────#> ℹ Using 13 of 20 coalitions, 13 new.#>#> ── Iteration 2 ──────────────────────────────────────────────────────#> ℹ Using 20 of 20 coalitions, 7 new.explanation_asym_con[["xgboost"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,approach ="regression_separate",regression.model = parsnip::boost_tree(engine ="xgboost",mode ="regression"),asymmetric =TRUE,causal_ordering = causal_ordering,confounding =NULL# Default value)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:16:24 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than the number of#> coalitions respecting the causal ordering (20), and is therefore#> set to 20.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Regression#> • Approach: regression_separate#> • Procedure: Iterative#> • Number of feature-wise Shapley values: 7#> • Number of observations to explain: 144#> • Number of asymmetric coalitions: 20#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp, atemp,#> windspeed, hum}#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec7338f218.rds'#>#> ── Iterative computation started ──#>#> ── Iteration 1 ──────────────────────────────────────────────────────#> ℹ Using 13 of 20 coalitions, 13 new.#>#> ── Iteration 2 ──────────────────────────────────────────────────────#> ℹ Using 20 of 20 coalitions, 7 new.The asymmetric conditional Shapley value framework is faster as weonly consider\(20\) coalitions(including the empty and grand coalitions) instead of all\(128\) coalitions (see code below).
print_MSEv_scores_and_time(explanation_asym_con)#> MSEv MSEv_sd Time (secs)#> gaussian 306849 35460 3.88#> gaussian_non_iterative 306458 35412 1.88#> ctree 270172 31527 12.74#> xgboost 304617 35310 3.96# Look at the number of coalitions considered. Decreased from 128 to 20.explanation_sym_con$gaussian$internal$parameters$max_n_coalitions#> [1] 128explanation_asym_con$gaussian$internal$parameters$max_n_coalitions#> [1] 20# Here we can see the 20 coalitions that respect the causal orderingexplanation_asym_con$gaussian$internal$objects$dt_valid_causal_coalitions[["coalitions"]]#> [[1]]#> integer(0)#>#> [[2]]#> [1] 1#>#> [[3]]#> [1] 1 2#>#> [[4]]#> [1] 1 3#>#> [[5]]#> [1] 1 2 3#>#> [[6]]#> [1] 1 2 3 4#>#> [[7]]#> [1] 1 2 3 5#>#> [[8]]#> [1] 1 2 3 6#>#> [[9]]#> [1] 1 2 3 7#>#> [[10]]#> [1] 1 2 3 4 5#>#> [[11]]#> [1] 1 2 3 4 6#>#> [[12]]#> [1] 1 2 3 4 7#>#> [[13]]#> [1] 1 2 3 5 6#>#> [[14]]#> [1] 1 2 3 5 7#>#> [[15]]#> [1] 1 2 3 6 7#>#> [[16]]#> [1] 1 2 3 4 5 6#>#> [[17]]#> [1] 1 2 3 4 5 7#>#> [[18]]#> [1] 1 2 3 4 6 7#>#> [[19]]#> [1] 1 2 3 5 6 7#>#> [[20]]#> [1] 1 2 3 4 5 6 7We can then look at the beeswarm plots of the asymmetric conditionalShapley values. Thectree andxgboostapproaches produce similar figures, while thegaussianapproach shrinks and groups the Shapley values for thetrend feature, and it produces more negative values for thecosyear feature.
When going from symmetric to asymmetric Shapley values, we see thatmany of the features’ Shapley values have now shrunk closer to zero,especiallytemp andatemp.
We can also compare the obtained symmetric and asymmetric conditionalShapley values for the 6 explicands. We often see that the asymmetricversion gives larger Shapley values to the distal/root causes, i.e.,trend andcosyear, than the symmetric version.This is in line with Section 3.2 inFrye, Rowat,and Feige (2020).
# Order the symmetric and asymmetric conditional explanations into a joint listexplanation_sym_con_tmp<-copy(explanation_sym_con)names(explanation_sym_con_tmp)<-paste0(names(explanation_sym_con_tmp),"_sym")explanation_asym_con_tmp<-copy(explanation_asym_con)names(explanation_asym_con_tmp)<-paste0(names(explanation_asym_con_tmp),"_asym")explanation_asym_sym_con<-c(explanation_sym_con_tmp, explanation_asym_con_tmp)[c(1,4,2,5,3,6)]plot_SV_several_approaches(explanation_asym_sym_con, index_x_explain,brewer_palette ="Paired",print_ggplot =FALSE)+theme(legend.position ="bottom")For marginal Shapley values, we can only consider the symmetricversion as we must setcausal_ordering = list(1:7) (orNULL) andconfounding = TRUE. Settingasymmetric = TRUE will have no effect, as the causalordering consists of only a single component containing all features,i.e., all coalitions respect the causal ordering. As stated above,shapr generates the marginal Monte Carlo samples from theGaussian marginals ifapproach = "gaussian", while for allother Monte Carlo approaches the marginals are estimated from thetraining data, i.e., assuming feature independence. Thus, it does notmatter if we setapproach = "independence" or any other ofthe Monte Carlo-based approaches. We useapproach = "independence" for clarity. Furthermore, we alsoobtain marginal Shapley values by using the conditional Shapley valueframework with theindependence approach. However, notethat there will be a minuscule difference in the produced Shapley valuesdue to different sampling setups/orders.
explanation_sym_marg<-list()# Here we sample from the estimated Gaussian marginalsexplanation_sym_marg[["gaussian"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,n_MC_samples =1000,approach ="gaussian",asymmetric =FALSE,causal_ordering =list(1:7),confounding =TRUE)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:16:30 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_features = 128`,#> and is therefore set to `2^n_features = 128`.#>#>#> ── Explanation overview ──#>#>#>#> • Model class: <xgb.Booster>#>#> • v(S) estimation class: Monte Carlo integration#>#> • Approach: gaussian#>#> • Procedure: Iterative#>#> • Number of Monte Carlo integration samples: 1000#>#> • Number of feature-wise Shapley values: 7#>#> • Number of observations to explain: 144#>#> • Causal ordering: {trend, cosyear, sinyear, temp, atemp, windspeed,#> hum}#>#> • Components with confounding: {trend, cosyear, sinyear, temp,#> atemp, windspeed, hum}#>#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec65ff5855.rds'#>#>#>#> ── Iterative computation started ──#>#>#>#> ── Iteration 1 ──────────────────────────────────────────────────────#>#> ℹ Using 14 of 128 coalitions, 14 new.#>#>#>#> ── Iteration 2 ──────────────────────────────────────────────────────#>#> ℹ Using 26 of 128 coalitions, 12 new.#>#>#>#> ── Iteration 3 ──────────────────────────────────────────────────────#>#> ℹ Using 46 of 128 coalitions, 20 new.# Here we sample from the marginals of the training dataexplanation_sym_marg[["independence_marg"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,n_MC_samples =1000,approach ="independence",asymmetric =FALSE,causal_ordering =list(1:7),confounding =TRUE)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:16:41 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_features = 128`,#> and is therefore set to `2^n_features = 128`.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: independence#> • Procedure: Iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of feature-wise Shapley values: 7#> • Number of observations to explain: 144#> • Causal ordering: {trend, cosyear, sinyear, temp, atemp, windspeed,#> hum}#> • Components with confounding: {trend, cosyear, sinyear, temp,#> atemp, windspeed, hum}#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec21a68837.rds'#>#> ── Iterative computation started ──#>#> ── Iteration 1 ──────────────────────────────────────────────────────#> ℹ Using 14 of 128 coalitions, 14 new.#>#> ── Iteration 2 ──────────────────────────────────────────────────────#> ℹ Using 26 of 128 coalitions, 12 new.#>#> ── Iteration 3 ──────────────────────────────────────────────────────#> ℹ Using 46 of 128 coalitions, 20 new.# Here we use the conditional Shapley value framework with the `independence` approachexplanation_sym_marg[["independence_con"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,n_MC_samples =1000,approach ="independence")#>#> ── Starting `shapr::explain()` at 2025-11-11 15:16:50 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_features = 128`,#> and is therefore set to `2^n_features = 128`.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: independence#> • Procedure: Iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of feature-wise Shapley values: 7#> • Number of observations to explain: 144#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec2b61c515.rds'#>#> ── Iterative computation started ──#>#> ── Iteration 1 ──────────────────────────────────────────────────────#> ℹ Using 14 of 128 coalitions, 14 new.#>#> ── Iteration 2 ──────────────────────────────────────────────────────#> ℹ Using 26 of 128 coalitions, 12 new.#>#> ── Iteration 3 ──────────────────────────────────────────────────────#> ℹ Using 46 of 128 coalitions, 20 new.We can look at the beeswarm plots
print_MSEv_scores_and_time(explanation_sym_marg)#> MSEv MSEv_sd Time (secs)#> gaussian 1376367 109262 10.85#> independence_marg 1377106 108898 9.11#> independence_con 1376651 108870 10.56plot_beeswarms(explanation_sym_marg,title ="Symmetric marginal Shapley values")To compute (symmetric/asymmetric) causal Shapley values, we have toprovide thecausal_ordering andconfoundingobjects. We set them to becausal_ordering = list(1, 2:3, 4:7) andconfounding = c(FALSE, TRUE, FALSE), as explainedabove.
The causal framework takes longer than the other frameworks, asgenerating the Monte Carlo samples often consists of a chain of samplingsteps. For example, for\(\mathcal{S} ={2}\), we must generate\(X_1,X_3,X_4,X_5,X_6,X_7 \mid X_2\).However, we cannot do this directly due to thecausal_ordering andconfounding specifiedabove. To generate the Monte Carlo samples, we have to follow a chain ofsampling steps. More precisely, we first need to generate\(X_1\) from the marginal, then\(X_3 \mid X_1\), and finally\(X_4,X_5,X_6,X_7 \mid X_1,X_2,X_3\). Thelatter two steps are done by using the providedapproach tomodel the conditional distributions. Theinternal$objects$S_causal_steps_strings object contains thesampling steps needed for the different feature combinations/coalitions\(\mathcal{S}\).
For causal Shapley values, only the Monte Carlo-based approaches areapplicable.
explanation_sym_cau<-list()explanation_sym_cau[["gaussian"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,n_MC_samples =1000,approach ="gaussian",asymmetric =FALSE,causal_ordering =list(1,2:3,4:7),confounding =c(FALSE,TRUE,FALSE),iterative =FALSE,# Set to FALSE to get a single iteration to illustrate sampling steps belowexact =TRUE)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:17:02 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_features = 128`,#> and is therefore set to `2^n_features = 128`.#>#>#> ── Explanation overview ──#>#>#>#> • Model class: <xgb.Booster>#>#> • v(S) estimation class: Monte Carlo integration#>#> • Approach: gaussian#>#> • Procedure: Non-iterative#>#> • Number of Monte Carlo integration samples: 1000#>#> • Number of feature-wise Shapley values: 7#>#> • Number of observations to explain: 144#>#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp, atemp,#> windspeed, hum}#>#> • Components with confounding: {cosyear, sinyear}#>#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eecf48cd48.rds'#>#>#>#> ── Main computation started ──#>#>#>#> ℹ Using 128 of 128 coalitions.# Look at the sampling steps for the third coalition (S = {2})explanation_sym_cau$gaussian$internal$iter_list[[1]]$S_causal_steps_strings$id_coalition_3#> [1] "1|" "3|1" "4,5,6,7|1,2,3"# Use the copula approachexplanation_sym_cau[["copula"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,n_MC_samples =1000,approach ="copula",asymmetric =FALSE,causal_ordering =list(1,2:3,4:7),confounding =c(FALSE,TRUE,FALSE))#>#> ── Starting `shapr::explain()` at 2025-11-11 15:17:29 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_features = 128`,#> and is therefore set to `2^n_features = 128`.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: copula#> • Procedure: Iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of feature-wise Shapley values: 7#> • Number of observations to explain: 144#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp, atemp,#> windspeed, hum}#> • Components with confounding: {cosyear, sinyear}#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec79c017b.rds'#>#> ── Iterative computation started ──#>#> ── Iteration 1 ──────────────────────────────────────────────────────#> ℹ Using 14 of 128 coalitions, 14 new.#>#> ── Iteration 2 ──────────────────────────────────────────────────────#> ℹ Using 26 of 128 coalitions, 12 new.#>#> ── Iteration 3 ──────────────────────────────────────────────────────#> ℹ Using 46 of 128 coalitions, 20 new.print_MSEv_scores_and_time(explanation_sym_cau)#> MSEv MSEv_sd Time (secs)#> gaussian 1112304 85462 27.09#> copula 1131125 87179 22.55plot_beeswarms(explanation_sym_cau,title ="Symmetric causal Shapley values")We now turn to asymmetric causal Shapley values. That is, we only usethe coalitions that respect the causal ordering. Thus, the computationsare faster as the number of coalitions is reduced.
explanation_asym_cau<-list()explanation_asym_cau[["gaussian"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,n_MC_samples =1000,approach ="gaussian",asymmetric =TRUE,causal_ordering =list(1,2:3,4:7),confounding =c(FALSE,TRUE,FALSE))#>#> ── Starting `shapr::explain()` at 2025-11-11 15:17:52 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than the number of#> coalitions respecting the causal ordering (20), and is therefore#> set to 20.#>#>#> ── Explanation overview ──#>#>#>#> • Model class: <xgb.Booster>#>#> • v(S) estimation class: Monte Carlo integration#>#> • Approach: gaussian#>#> • Procedure: Iterative#>#> • Number of Monte Carlo integration samples: 1000#>#> • Number of feature-wise Shapley values: 7#>#> • Number of observations to explain: 144#>#> • Number of asymmetric coalitions: 20#>#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp, atemp,#> windspeed, hum}#>#> • Components with confounding: {cosyear, sinyear}#>#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec7e5c010d.rds'#>#>#>#> ── Iterative computation started ──#>#>#>#> ── Iteration 1 ──────────────────────────────────────────────────────#>#> ℹ Using 13 of 20 coalitions, 13 new.#>#>#>#> ── Iteration 2 ──────────────────────────────────────────────────────#>#> ℹ Using 20 of 20 coalitions, 7 new.# Use the copula approachexplanation_asym_cau[["copula"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,n_MC_samples =1000,approach ="copula",asymmetric =TRUE,causal_ordering =list(1,2:3,4:7),confounding =c(FALSE,TRUE,FALSE))#>#> ── Starting `shapr::explain()` at 2025-11-11 15:17:57 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than the number of#> coalitions respecting the causal ordering (20), and is therefore#> set to 20.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: copula#> • Procedure: Iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of feature-wise Shapley values: 7#> • Number of observations to explain: 144#> • Number of asymmetric coalitions: 20#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp, atemp,#> windspeed, hum}#> • Components with confounding: {cosyear, sinyear}#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec155c538f.rds'#>#> ── Iterative computation started ──#>#> ── Iteration 1 ──────────────────────────────────────────────────────#> ℹ Using 13 of 20 coalitions, 13 new.#>#> ── Iteration 2 ──────────────────────────────────────────────────────#> ℹ Using 20 of 20 coalitions, 7 new.# Use the vaeac approachexplanation_asym_cau[["vaeac"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,phi0 = phi0,seed =1,n_MC_samples =1000,approach ="vaeac",vaeac.epochs =20,asymmetric =TRUE,causal_ordering =list(1,2:3,4:7),confounding =c(FALSE,TRUE,FALSE) )#>#> ── Starting `shapr::explain()` at 2025-11-11 15:18:01 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than the number of#> coalitions respecting the causal ordering (20), and is therefore#> set to 20.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: vaeac#> • Procedure: Non-iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of feature-wise Shapley values: 7#> • Number of observations to explain: 144#> • Number of asymmetric coalitions: 20#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp, atemp,#> windspeed, hum}#> • Components with confounding: {cosyear, sinyear}#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec3800a568.rds'#>#> ── Main computation started ──#>#> ℹ Using 20 of 20 coalitions.We can look at the elapsed time. See theimplementation details for anexplanation.
We can then plot the beeswarm plots.
# Plot the beeswarm plotsplot_beeswarms(explanation_asym_cau,title ="Asymmetric causal Shapley values")# Plot the Shapley valuesplot_SV_several_approaches(explanation_asym_cau, index_x_explain,print_ggplot =FALSE)+theme(legend.position ="bottom")We can also use the other Monte Carlo-based approaches(independence andempirical), too.
Here we plot the obtained Shapley values for the six explicands whenusing thegaussian approach in the different Shapley valueexplanation frameworks, and we see that the different frameworks providedifferent explanations. The largest difference is between the symmetricand asymmetric versions. To summarize, asymmetric conditional/causalShapley values focus on the root cause, marginal Shapley values on themore direct effect, and symmetric conditional/causal Shapley valuesconsider both for a more natural explanation.
explanation_gaussian<-list(symmetric_marginal = explanation_sym_marg$gaussian,symmetric_conditional = explanation_sym_con$gaussian,symmetric_causal = explanation_sym_cau$gaussian,asymmetric_conditional = explanation_asym_con$gaussian,asymmetric_causal = explanation_asym_cau$gaussian)plot_SV_several_approaches(explanation_gaussian, index_x_explain,print_ggplot =FALSE)+theme(legend.position ="bottom")+guides(fill =guide_legend(nrow =2))+ggtitle("Shapley value prediction explanation (approach = 'gaussian')")+guides(color =guide_legend(title ="Framework"))In this section, we produce scatter plots comparing the symmetricmarginal and symmetric causal Shapley values for the temperature featuretemp and the seasonal featurecosyear for allexplicands. The plots show that the marginal Shapley values almostpurely explain the predictions based on temperature, while the causalShapley values also give credit to season. We can change the featuresand frameworks in the code below, but we chose these values to replicateFigure 3 inHeskes et al. (2020).
# The color of the pointscolor<-"temp"# The features we want to comparefeature_1<-"cosyear"feature_2<-"temp"# The Shapley value frameworks we want to comparesv_framework_1<- explanation_sym_marg[["gaussian"]]sv_framework_1_str<-"Marginal SV"sv_framework_2<- explanation_sym_cau[["gaussian"]]sv_framework_2_str<-"Causal SV"# Set up the data.frame we are going to plotsv_correlation_df<-data.frame(color = x_explain[, color],sv_framework_1_feature_1 = sv_framework_1$shapley_values_est[[feature_1]],sv_framework_2_feature_1 = sv_framework_2$shapley_values_est[[feature_1]],sv_framework_1_feature_2 = sv_framework_1$shapley_values_est[[feature_2]],sv_framework_2_feature_2 = sv_framework_2$shapley_values_est[[feature_2]])# Make the plotsscatterplot_topleft<-ggplot( sv_correlation_df,aes(x = sv_framework_1_feature_2,y = sv_framework_1_feature_1,color = color) )+geom_point(size =1)+xlab(paste(sv_framework_1_str, feature_2))+ylab(paste(sv_framework_1_str, feature_1))+scale_x_continuous(limits =c(-1500,1000),breaks =c(-1000,0,1000))+scale_y_continuous(limits =c(-500,500),breaks =c(-500,0,500))+scale_color_gradient(low ="blue",high ="red")+theme_minimal()+theme(text =element_text(size =12),axis.text.x =element_blank(),axis.text.y =element_text(size =12),axis.ticks.x =element_blank(),axis.title.x =element_blank() )scatterplot_topright<-ggplot( sv_correlation_df,aes(x = sv_framework_2_feature_1,y = sv_framework_1_feature_1,color = color) )+geom_point(size =1)+scale_color_gradient(low ="blue",high ="red")+xlab(paste(sv_framework_2_str, feature_1))+ylab(paste(sv_framework_1_str, feature_1))+scale_x_continuous(limits =c(-1500,1000),breaks =c(-1000,0,1000))+scale_y_continuous(limits =c(-500,500),breaks =c(-500,0,500))+theme_minimal()+theme(text =element_text(size =12),axis.title.x =element_blank(),axis.title.y =element_blank(),axis.text.x =element_blank(),axis.ticks.x =element_blank(),axis.text.y =element_blank(),axis.ticks.y =element_blank() )scatterplot_bottomleft<-ggplot( sv_correlation_df,aes(x = sv_framework_1_feature_2,y = sv_framework_2_feature_2,color = color) )+geom_point(size =1)+scale_color_gradient(low ="blue",high ="red")+xlab(paste(sv_framework_1_str, feature_2))+ylab(paste(sv_framework_2_str, feature_2))+scale_x_continuous(limits =c(-1500,1000),breaks =c(-1000,0,1000))+scale_y_continuous(limits =c(-1000,1000),breaks =c(-500,0,500))+theme_minimal()+theme(text =element_text(size =12),axis.text.x =element_text(size =12),axis.text.y =element_text(size =12) )scatterplot_bottomright<-ggplot( sv_correlation_df,aes(x = sv_framework_2_feature_1,y = sv_framework_2_feature_2,color = color) )+geom_point(size =1)+xlab(paste(sv_framework_2_str, feature_1))+ylab(paste(sv_framework_2_str, feature_2))+scale_x_continuous(limits =c(-1500,1000),breaks =c(-1000,0,1000))+scale_y_continuous(limits =c(-1000,1000),breaks =c(-500,0,500))+scale_color_gradient(low ="blue",high ="red")+theme_minimal()+theme(text =element_text(size =12),axis.text.x =element_text(size =12),axis.title.y =element_blank(),axis.text.y =element_blank(),axis.ticks.y =element_blank() )# Plot of the trend of the databike_plot_new<-ggplot(bike,aes(x = trend,y = cnt,color =get(color)))+geom_point(size =0.75)+scale_color_gradient(low ="blue",high ="red")+labs(color = color)+xlab("Days since 1 January 2011")+ylab("Number of bikes rented")+theme_minimal()+theme(legend.position ="right",legend.title =element_text(size =10))# Combine the plotsggpubr::ggarrange( bike_plot_new, ggpubr::ggarrange( scatterplot_topleft, scatterplot_topright, scatterplot_bottomleft, scatterplot_bottomright,legend ="none" ),nrow =2,heights =c(1,2))We investigate the difference between symmetric/asymmetricconditional, symmetric/asymmetric causal, and marginal Shapley valuesfor two days: October 10 and December 3, 2012. They have more or lessthe same temperature of 13 and 13.27 degrees Celsius, and predicted bikecounts of 6117 and 6241, respectively. The figure below is an extensionof Figure 4 inHeskes et al. (2020), asthey only included asymmetric conditional, symmetric causal, andmarginal Shapley values.
We plot the various Shapley values for thecosyear andtemp features below. We obtain the same results asHeskes et al. (2020) obtained, namely, that themarginal Shapley value explanation framework provides similarexplanations for both days. I.e., it only considers the direct effect oftemp. The asymmetric conditional and causal Shapley valuesare almost indistinguishable and put the most weight on the ‘root’ causecosyear.Heskes et al. (2020)states that the symmetric causal Shapley values provide a sensiblebalance between the two extremes and gives credit to both season andtemperature, but still different explanations for the two days.
However, as we also include symmetric conditional Shapley values, wesee that they are extremely similar to symmetric causal Shapley values.I.e., the conditional Shapley value explanation framework also providesa sensible balance between marginal and asymmetric Shapley values. Tosummarize: as concluded byHeskes et al.(2020) in their Figure 4, the asymmetric conditional/causalShapley values focus on the root cause, marginal Shapley values on themore direct effect, and symmetric conditional/causal Shapley valuesconsider both for a more natural explanation.
# Features of interestfeatures<-c("cosyear","temp")# Get explicands with similar temperature: 2012-10-09 (October) and 2012-12-03 (December)dates<-c("2012-10-09","2012-12-03")dates_idx<-sapply(dates,function(data)which(as.integer(row.names(x_explain))==which(bike$dteday== data)))# predict(model, x_explain)[dates_idx] + mean(y_train_nc) # predicted values for the two points# List of the Shapley value explanationsexplanations<-list("Sym. Mar."= explanation_sym_marg[["gaussian"]],"Sym. Con."= explanation_sym_con[["gaussian"]],"Sym. Cau."= explanation_sym_cau[["gaussian"]],"Asym. Con."= explanation_asym_con[["gaussian"]],"Asym. Cau."= explanation_asym_cau[["gaussian"]])# Extract the relevant Shapley valuesexplanations_extracted<- data.table::rbindlist(lapply(seq_along(explanations),function(idx) { explanations[[idx]]$shapley_values_est[ dates_idx, ..features ][,`:=`(Date = dates,type =names(explanations)[idx])]}))# Set type to be an ordered factorexplanations_extracted[, type:=factor(type,levels =names(explanations),ordered =TRUE)]# Convert from wide to long data tabledt_all<- data.table::melt(explanations_extracted,id.vars =c("Date","type"),variable.name ="feature")# Make the plotggplot(dt_all,aes(x = feature,y = value,group =interaction(Date, feature),fill = Date,label =round(value,2)))+geom_col(position ="dodge")+theme_classic()+ylab("Shapley value")+facet_wrap(vars(type))+theme(axis.title.x =element_blank())+scale_fill_manual(values =c("indianred4","ivory4"))+theme(legend.position.inside =c(0.75,0.25),axis.title =element_text(size =20),legend.title =element_text(size =16),legend.text =element_text(size =14),axis.text.x =element_text(size =12),axis.text.y =element_text(size =12),strip.text.x =element_text(size =14) )We can also make a similar plot using theplot_SV_several_approaches function inshapr,but then we get each explicand in a separate facet instead of a facetfor each framework.
# Here 2012-10-09 is the left facet and 2012-12-03 the right facetplot_SV_several_approaches(explanations,index_explicands = dates_idx,only_these_features = features,# Can include more features.facet_scales ="free_x",horizontal_bars =FALSE,axis_labels_n_dodge =1,print_ggplot =FALSE)+theme(legend.position ="bottom")Furthermore, instead of doing asHeskes et al.(2020) and only considering the featurescosyear andtemp, we can plot all features, too, to get a more completeoverview.
# Here 2012-10-09 is the left facet and 2012-12-03 the right facetplot_SV_several_approaches(explanations,index_explicands = dates_idx,facet_scales ="free_x",horizontal_bars =FALSE,axis_labels_rotate_angle =45,digits =2,print_ggplot =FALSE)+theme(legend.position ="bottom")We can usemax_n_coalitions to specify/reduce the numberof coalitions to use when computing the Shapley value explanationframework. This applies to marginal, conditional, and causal Shapleyvalues, both the symmetric and asymmetric versions. However, recall thatthe asymmetric versions already have fewer valid coalitions due to thecausal ordering.
In the example below, we demonstrate the sampling of coalitions forthe asymmetric and symmetric causal Shapley value explanationframeworks. We half the number of coalitions for both versions and seethat the elapsed times are approximately halved, too.
explanation_n_coal<-list()explanation_n_coal[["sym_cau_gaussian_64"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,approach ="gaussian",phi0 = phi0,seed =1,asymmetric =FALSE,causal_ordering =list(1,2:3,4:7),confounding =c(FALSE,TRUE,FALSE),max_n_coalitions =64# Instead of 128)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:24:03 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#>#>#> ── Explanation overview ──#>#>#>#> • Model class: <xgb.Booster>#>#> • v(S) estimation class: Monte Carlo integration#>#> • Approach: gaussian#>#> • Procedure: Iterative#>#> • Number of Monte Carlo integration samples: 1000#>#> • Number of feature-wise Shapley values: 7#>#> • Number of observations to explain: 144#>#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp, atemp,#> windspeed, hum}#>#> • Components with confounding: {cosyear, sinyear}#>#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec33d6f5a7.rds'#>#>#>#> ── Iterative computation started ──#>#>#>#> ── Iteration 1 ──────────────────────────────────────────────────────#>#> ℹ Using 14 of 128 coalitions, 14 new.#>#>#>#> ── Iteration 2 ──────────────────────────────────────────────────────#>#> ℹ Using 26 of 128 coalitions, 12 new.#>#>#>#> ── Iteration 3 ──────────────────────────────────────────────────────#>#> ℹ Using 46 of 128 coalitions, 20 new.#>#>#>#> ── Iteration 4 ──────────────────────────────────────────────────────#>#> ℹ Using 50 of 128 coalitions, 4 new.explanation_n_coal[["asym_cau_gaussian_10"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,approach ="gaussian",phi0 = phi0,seed =1,asymmetric =TRUE,causal_ordering =list(1,2:3,4:7),confounding =c(FALSE,TRUE,FALSE),verbose =c("basic","convergence","shapley"),max_n_coalitions =10,# Instead of 20iterative =FALSE# Due to small number of coalitions)#>#> ── Starting `shapr::explain()` at 2025-11-11 15:24:16 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: gaussian#> • Procedure: Non-iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of feature-wise Shapley values: 7#> • Number of observations to explain: 144#> • Number of asymmetric coalitions: 20#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp, atemp,#> windspeed, hum}#> • Components with confounding: {cosyear, sinyear}#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec6962fb3c.rds'#>#> ── Main computation started ──#>#> ℹ Using 10 of 10 coalitions.#>#> ── Convergence info#> ✔ Iterative Shapley value estimation stopped at 10 coalitions after 1 iterations, due to:#> Maximum number of iterations (1) reached!#> Maximum number of coalitions (10) reached!#>#> Final estimated Shapley values (sd)#> explain_id none trend cosyear sinyear#> <int> <char> <char> <char> <char>#> 1: 1 0 (0) -2181.91 (389.03) -825.54 (377.21) -240.38 (257.68)#> 2: 2 0 (0) -2174.36 (384.98) -846.61 (381.99) -187.05 (273.99)#> 3: 3 0 (0) -2088.96 (373.84) -793.63 (365.63) -187.85 (248.84)#> 4: 4 0 (0) -2103.36 (379.95) -798.13 (377.31) -109.12 (270.58)#> 5: 5 0 (0) -2003.88 (362.92) -723.94 (346.66) -234.82 (228.12)#> ---#> 140: 140 0 (0) 1575.95 (631.53) -1014.08 (569.93) 248.47 (371.16)#> 141: 141 0 (0) 1588.69 (650.49) -1057.22 (573.59) 33.39 (242.37)#> 142: 142 0 (0) 1466.75 (635.17) -1109.15 (558.07) -97.32 (243.07)#> 143: 143 0 (0) 1003.94 (665.91) -1780.47 (623.77) -104.82 (347.18)#> 144: 144 0 (0) 711.14 (787.04) -2635.90 (784.97) -184.27 (536.88)#> temp atemp windspeed hum#> <char> <char> <char> <char>#> 1: -34.34 ( 63.89) 0.21 ( 55.92) 116.77 ( 70.18) 13.81 ( 88.83)#> 2: -41.25 ( 54.60) 34.98 ( 23.07) 13.85 ( 36.81) 181.12 ( 77.88)#> 3: -94.02 ( 50.84) -17.90 ( 43.43) 244.64 ( 61.81) -132.54 ( 59.19)#> 4: 191.34 (105.19) 44.35 ( 61.89) -183.83 ( 66.85) -228.80 (108.61)#> 5: 42.28 ( 43.88) 4.16 ( 31.93) 203.34 ( 47.27) -32.18 ( 55.58)#> ---#> 140: -41.16 (210.29) 16.01 (195.60) 361.82 (171.47) 589.55 (262.98)#> 141: 11.21 ( 34.71) 6.74 ( 23.45) 216.48 ( 30.02) -80.28 ( 48.53)#> 142: -35.61 ( 69.65) 129.07 ( 27.12) 79.35 ( 39.85) 265.54 ( 93.28)#> 143: 20.56 ( 64.24) -4.87 ( 28.60) 47.65 ( 34.25) -233.05 ( 82.77)#> 144: 47.65 (137.29) -66.93 ( 74.15) -469.80 ( 83.42) 558.27 (204.47)# Look at the timesexplanation_n_coal[["sym_cau_gaussian_all_128"]]<- explanation_sym_cau$gaussianexplanation_n_coal[["asym_cau_gaussian_all_20"]]<- explanation_asym_cau$gaussianexplanation_n_coal<- explanation_n_coal[c(1,3,2,4)]print_time(explanation_n_coal)#> sym_cau_gaussian_64 sym_cau_gaussian_all_128 asym_cau_gaussian_10#> [1,] 12.681 27.086 1.8498#> asym_cau_gaussian_all_20#> [1,] 4.5187We can then plot the beeswarm plots and the Shapley values for thesix selected explicands. We see that there are only minusculedifferences between the Shapley values we obtain when we use all thecoalitions and those we obtain when we use half of the validcoalitions.
plot_SV_several_approaches(explanation_n_coal, index_x_explain,print_ggplot =FALSE)+theme(legend.position ="bottom")+guides(fill =guide_legend(nrow =2))In this section, we demonstrate that we can compute marginal,asymmetric conditional, and symmetric/asymmetric Shapley values forgroups of features, too. For group Shapley values, we need to specifythe causal ordering on the group level and the feature level. Wedemonstrate with thegaussian approach, but otherapproaches are applicable, too.
In the pairs plot above (and below), we see that it can be natural togroup the featurestemp andatemp due to their(conceptual) similarity and high correlation.
We set up the groups and update the causal ordering to be on thegroup level.
group_list<-list(trend ="trend",cosyear ="cosyear",sinyear ="sinyear",temp_group =c("temp","atemp"),windspeed ="windspeed",hum ="hum")causal_ordering_group<-list("trend",c("cosyear","sinyear"),c("temp_group","windspeed","hum"))confounding<-c(FALSE,TRUE,FALSE)We can then compute the (group) Shapley values using the differentShapley value frameworks.
explanation_group_gaussian<-list()explanation_group_gaussian[["symmetric_marginal"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,approach ="gaussian",phi0 = phi0,seed =1,asymmetric =FALSE,causal_ordering =list(seq(length(group_list))),# or `NULL`confounding =TRUE,n_MC_samples =1000,group = group_list,iterative =FALSE )#>#> ── Starting `shapr::explain()` at 2025-11-11 15:24:20 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_groups = 64`, and#> is therefore set to `2^n_groups = 64`.#>#>#> ── Explanation overview ──#>#>#>#> • Model class: <xgb.Booster>#>#> • v(S) estimation class: Monte Carlo integration#>#> • Approach: gaussian#>#> • Procedure: Non-iterative#>#> • Number of Monte Carlo integration samples: 1000#>#> • Number of group-wise Shapley values: 6#>#> • Feature groups: trend: {"trend"}; cosyear: {"cosyear"}; sinyear:#> {"sinyear"}; temp_group: {"temp", "atemp"}; windspeed:#> {"windspeed"}; hum: {"hum"}#>#> • Number of observations to explain: 144#>#> • Causal ordering: {trend, cosyear, sinyear, temp_group, windspeed,#> hum}#>#> • Components with confounding: {trend, cosyear, sinyear, temp_group,#> windspeed, hum}#>#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec2df6a56c.rds'#>#>#>#> ── Main computation started ──#>#>#>#> ℹ Using 64 of 64 coalitions.explanation_group_gaussian[["symmetric_conditional"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,approach ="gaussian",phi0 = phi0,seed =1,asymmetric =FALSE,causal_ordering =list(seq(length(group_list))),# or `NULL`confounding =NULL,n_MC_samples =1000,group = group_list,iterative =FALSE )#>#> ── Starting `shapr::explain()` at 2025-11-11 15:24:27 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_groups = 64`, and#> is therefore set to `2^n_groups = 64`.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: gaussian#> • Procedure: Non-iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of group-wise Shapley values: 6#> • Feature groups: trend: {"trend"}; cosyear: {"cosyear"}; sinyear:#> {"sinyear"}; temp_group: {"temp", "atemp"}; windspeed:#> {"windspeed"}; hum: {"hum"}#> • Number of observations to explain: 144#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec1db99b0b.rds'#>#> ── Main computation started ──#>#> ℹ Using 64 of 64 coalitions.explanation_group_gaussian[["asymmetric_conditional"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,approach ="gaussian",phi0 = phi0,seed =1,asymmetric =TRUE,causal_ordering = causal_ordering_group,confounding =NULL,n_MC_samples =1000,group = group_list,iterative =FALSE )#>#> ── Starting `shapr::explain()` at 2025-11-11 15:24:31 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than the number of#> coalitions respecting the causal ordering (12), and is therefore#> set to 12.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: gaussian#> • Procedure: Non-iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of group-wise Shapley values: 6#> • Feature groups: trend: {"trend"}; cosyear: {"cosyear"}; sinyear:#> {"sinyear"}; temp_group: {"temp", "atemp"}; windspeed:#> {"windspeed"}; hum: {"hum"}#> • Number of observations to explain: 144#> • Number of asymmetric coalitions: 12#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp_group,#> windspeed, hum}#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eecf2a80de.rds'#>#> ── Main computation started ──#>#> ℹ Using 12 of 12 coalitions.explanation_group_gaussian[["symmetric_causal"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,approach ="gaussian",phi0 = phi0,seed =1,asymmetric =FALSE,causal_ordering = causal_ordering_group,confounding = confounding,n_MC_samples =1000,group = group_list,iterative =FALSE )#>#> ── Starting `shapr::explain()` at 2025-11-11 15:24:32 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than `2^n_groups = 64`, and#> is therefore set to `2^n_groups = 64`.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: gaussian#> • Procedure: Non-iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of group-wise Shapley values: 6#> • Feature groups: trend: {"trend"}; cosyear: {"cosyear"}; sinyear:#> {"sinyear"}; temp_group: {"temp", "atemp"}; windspeed:#> {"windspeed"}; hum: {"hum"}#> • Number of observations to explain: 144#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp_group,#> windspeed, hum}#> • Components with confounding: {cosyear, sinyear}#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec37b13e60.rds'#>#> ── Main computation started ──#>#> ℹ Using 64 of 64 coalitions.explanation_group_gaussian[["asymmetric_causal"]]<-explain(model = model,x_train = x_train,x_explain = x_explain,approach ="gaussian",phi0 = phi0,seed =1,asymmetric =TRUE,causal_ordering = causal_ordering_group,confounding = confounding,n_MC_samples =1000,group = group_list,iterative =FALSE )#>#> ── Starting `shapr::explain()` at 2025-11-11 15:24:44 ───────────────#> ℹ Feature classes extracted from the model contain `NA`.#> Assuming feature classes from the data are correct.#> ℹ `max_n_coalitions` is `NULL` or larger than the number of#> coalitions respecting the causal ordering (12), and is therefore#> set to 12.#> ── Explanation overview ──#>#> • Model class: <xgb.Booster>#> • v(S) estimation class: Monte Carlo integration#> • Approach: gaussian#> • Procedure: Non-iterative#> • Number of Monte Carlo integration samples: 1000#> • Number of group-wise Shapley values: 6#> • Feature groups: trend: {"trend"}; cosyear: {"cosyear"}; sinyear:#> {"sinyear"}; temp_group: {"temp", "atemp"}; windspeed:#> {"windspeed"}; hum: {"hum"}#> • Number of observations to explain: 144#> • Number of asymmetric coalitions: 12#> • Causal ordering: {trend}, {cosyear, sinyear}, {temp_group,#> windspeed, hum}#> • Components with confounding: {cosyear, sinyear}#> • Computations (temporary) saved at:#> '/tmp/RtmptIm6LB/shapr_obj_60eec4086d395.rds'#>#> ── Main computation started ──#>#> ℹ Using 12 of 12 coalitions.# Look at the elapsed times (symmetric takes the longest time)print_time(explanation_group_gaussian)#> symmetric_marginal symmetric_conditional asymmetric_conditional#> [1,] 6.9303 3.7603 1.1603#> symmetric_causal asymmetric_causal#> [1,] 11.884 2.1233We can then make the beeswarm plots and Shapley value plots for thesix selected explicands. For the beeswarm plots, we setinclude_group_feature_means = TRUE to make the plots. Thismeans that the plot function uses the mean of thetemp andatemp features as the feature value. This only makes sensedue to the high correlation between the two features.
The main difference between the feature-wise and group-wise Shapleyvalues is that we now see a much wider spread in the Shapley values fortemp_group than we did fortemp andatemp. For example, for the symmetric causal framework, wesaw above that thetemp andatemp obtainedShapley values between (around)\(-500\) and\(500\), while the grouped versiontemp_group obtains Shapley values between\(-1000\) and\(1000\).
plot_beeswarms(explanation_group_gaussian,title ="Group Shapley values (gaussian)",include_group_feature_means =TRUE)plot_SV_several_approaches(explanation_group_gaussian, index_x_explain,print_ggplot =FALSE)+ggtitle("Shapley value prediction explanation (gaussian)")+theme(legend.position ="bottom")+guides(fill =guide_legend(nrow =2))Theshapr package is built to estimate conditionalShapley values, thus, it parallelizes over the coalitions. This makesperfect sense for said framework as each batch of coalitions isindependent of other batches, which means that it is easy toparallelize. Furthermore, by using many batches, we drastically reducethe memory usage asshapr does not need to store the MonteCarlo samples for all coalitions.
This setup is not optimal for the causal Shapley value framework asthe chains of sampling steps for two coalitions\(\mathcal{S}\) and\(\mathcal{S}^*\) can contain many of thesame steps. Ideally, each unique sampling step should only be modeledonce to save computation time, but some of the sampling steps will occurin many of the chains. Thus, we would then have to store the Monte Carlosamples for all coalitions where this sampling step is included, and wecan therefore run into memory consumption problems. Thus, in the currentimplementation, we treat each coalition\(\mathcal{S}\) independently and remodel theneeded sampling steps for each coalition.
Furthermore, in the conditional Shapley value framework, we have that\(\bar{\mathcal{S}} = \mathcal{M} \backslash\mathcal{S}\), thusshapr will by default generateMonte Carlo samples for all features not in\(\mathcal{S}\). For the causal Shapley valueframework, this is not the case, i.e.,\(\bar{\mathcal{S}} \neq \mathcal{M} \backslash\mathcal{S}\) in general. To reuse the code, we generate MonteCarlo samples for all features not in\(\mathcal{S}\), but only keep the samplesfor the features in\(\bar{\mathcal{S}}\). To speed upshapr further, one could rewrite all the approaches tosupport cases where\(\bar{\mathcal{S}}\) is not the complementof\(\mathcal{S}\).
In the code below, we see the unique coalitions/set of features tocondition on to generate the Monte Carlo samples for all coalitions andthe number of times that set of conditional features is needed in thesymmetric causal Shapley value framework for the setup above. We seethat most of the conditional distributions will now be remodeled eighttimes. For thegaussian approach, which is very fast toestimate the conditional distributions, this does not have a majorimpact on the time. However, for, e.g., thectree approachwhich is much slower, this will take a significant amount of extra time.Thevaeac approach trains only on these relevantcoalitions.
S_causal_steps<- explanation_sym_cau$gaussian$internal$iter_list[[1]]$S_causal_stepsS_causal_unlist<-do.call(c,unlist(S_causal_steps,recursive =FALSE))S_causal_steps_freq<- S_causal_unlist[grepl("\\.S(?!bar)",names(S_causal_unlist),perl =TRUE)]S_causal_steps_freq<- S_causal_steps_freq[!sapply(S_causal_steps_freq, is.null)]# Remove NULLsS_causal_steps_freq<- S_causal_steps_freq[sapply(S_causal_steps_freq, length)>0]# Remove extra integer(0)table(sapply(S_causal_steps_freq, paste0,collapse =","))#>#> 1 1,2,3 1,2,3,4 1,2,3,4,5 1,2,3,4,5,6 1,2,3,4,5,7#> 95 7 8 8 8 8#> 1,2,3,4,6 1,2,3,4,6,7 1,2,3,4,7 1,2,3,5 1,2,3,5,6 1,2,3,5,6,7#> 8 8 8 8 8 8#> 1,2,3,5,7 1,2,3,6 1,2,3,6,7 1,2,3,7#> 8 8 8 8Theindependence,empirical,ctree, andcategorical approaches produceweighted Monte Carlo samples. That means that they do not necessarilygeneraten_MC_samples. To ensuren_MC_samples,we samplen_MC_samples samples using weighted sampling withreplacement where the weights are the weights returned by theapproaches.
The marginal Shapley value explanation framework can be extended tosupport modeling the marginal distributions using thecopula andvaeac approaches as both of thesemethods support unconditional sampling.