Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Compute Shapley-Shorrocks value decompositions

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
NotificationsYou must be signed in to change notification settings

elbersb/shapley

Repository files navigation

R-CMD-checkCoverage status

The Shapley value is a concept from game theory that quantifies how mucheach player contributes to the game outcome (Shapley 1953). The concept,however, has many more use cases: it provides a method to quantify theimportance of predictors in regression analysis or machine learningmodels, and can be used in a wide variety of decomposition problems(Shorrocks 2013). Most implementations focus on one narrow use case,although the algorithm for the Shapley value decomposition is always thesame – it is just the concrete value function that varies. This packageprovides a simple algorithm for the Shapley value decomposition, andalso supports hierarchical decomposition using the Owen value.

The key advantage of the Shapley decomposition framework is theconnection with counterfactuals: Once appropriate counterfactuals foreach combination of factors have been identified, the method willproduce an appropriate decomposition.

Installation

devtools::install_github("elbersb/shapley")

Usage

The package provides ashapley function that takes two main arguments:the value function and a vector of factor names. The value functionneeds to be an R function that takes one or more arguments, where thefirst argument defines the factors that are included in the calculationof the outcome value. Theshapley function will call the valuefunction repeatedly, each time with a different set of factors.

For a very simple example, consider that an outcome is determined by twofactors “A” and “B”, which contribute 1 and 2, respectively. (Thefactors are linearly additive, which makes the use of Shapley valuedecomposition unnecessary, but it works as an illustration.) The valuefunction is thus defined as:

simple<-function(factors= c()) {value<-0if ("A"%in%factors)value<-value+1if ("B"%in%factors)value<-value+2return(value)}

We now supply the value function toshapley, along with the factornames:

shapley(simple, c("A","B"),silent=TRUE)#>   factor value#> 1      A     1#> 2      B     2

As expected, the marginal contributions of the two factors are 1 and 2,respectively. For the two factors, we can manually compute thecontribution as follows:

# A:1/2* (simple("A")- simple())+1/2* (simple(c("A","B"))- simple("B"))#> [1] 1# B:1/2* (simple("B")- simple())+1/2* (simple(c("B","A"))- simple("A"))#> [1] 2

Across the two computations, most terms occur twice. Also note thatsimple(c("A", "B")) == simple(c("B", "A")). Theshapley functiononly calculates each term once, and then caches the result. This leadsto great speed improvements once we consider a greater number offactors.

Example 1: Game theory

For this example (taken fromWikipedia),consider three players. Players 1 and 2 supply right-hand gloves, whilePlayer 3 supplies a left-hand glove. The game is only successful ifplayers with both types of gloves enter into a coalition. We thus definethe value function as 1 if pairs{1,3},{2,3} or{1,2,3} areformed, and 0 otherwise. In R code:

glove<-function(factors) {if (length(factors)>1&3%in%factors)return(1)return(0)}

To compute the marginal contributions of each player, use:

shapley(glove, c(1,2,3),silent=TRUE)#>   factor     value#> 1      1 0.1666667#> 2      2 0.1666667#> 3      3 0.6666667

Example 2: Relative importance of predictors

Consider this simple regression model and its R2:

model<- lm(mpg~wt+qsec+am,data=mtcars)summary(model)$r.squared#> [1] 0.8496636

The Shapley value decomposition allows us to determine how much eachpredictor contributes to the R2. To do this, we need todefine the value function in a way that it runs the regression with theappropriate subset of predictors. It should return 0 when there are nopredictors:

reg_mtcars<-function(factors) {if (length(factors)==0)return(0)formula<- paste("mpg ~", paste(factors,collapse="+"))m<- lm(formula,data=mtcars)    summary(m)$r.squared}# test - should be the same as above:reg_mtcars(c("wt","qsec","am"))#> [1] 0.8496636
shapley(reg_mtcars, c("wt","qsec","am"),silent=TRUE)#>   factor     value#> 1     wt 0.4792448#> 2   qsec 0.1574791#> 3     am 0.2129397

We can also generalize the value function to apply to any dataset anddependent variable:

reg<-function(factors,dv,data) {if (length(factors)==0)return(0)formula<- paste(dv,"~", paste(factors,collapse="+"))m<- lm(formula,data=data)    summary(m)$r.squared}shapley(reg, c("cyl","hp","am"),silent=TRUE,dv="wt",data=mtcars)#>   factor     value#> 1    cyl 0.2791418#> 2     hp 0.1960524#> 3     am 0.2740727

Note that there are many packages (e.g.,relaimpo) that providethis functionality specifically for regression analysis.

Example 3: Effects of taxes and transfers on inequality

Another classic use case for the Shapley value is the decomposition ofinequality indices (see Shorrocks 2013 among others). Enami etal. (2018) provide a simple example to show such a decomposition in thecontext of measuring the impact of taxes and transfers on incomeinequality.

Consider the following datasetincome, showing the market incomes,taxes paid, transfers received, and the resulting final incomes for fiveindividuals:

MarketIncomeTaxTransferFinalIncome
1-595
20-5722
30-5530
40-5338
50-5146

The Gini indices of the market and final incomes are:

gini_market<-ineq::Gini(income[["MarketIncome"]])gini_final<-ineq::Gini(income[["FinalIncome"]])round(c(gini_market,gini_final,gini_final-gini_market),3)#> [1]  0.335  0.278 -0.057

Taxes and transfers combined thus reduced inequality by about 0.057.There are now two different approaches to dividing this difference amongthe two factors (i.e., taxes and transfers). In what Sastre and Trannoy(2002) call the “zero income decomposition” (ZID), sources not underconsideration are set to zero. In the alternative scenario, “equalizedincome decomposition” (EID), those sources are distributed evenly amongthe population. Both scenarios are easily implemented using differentvalue functions:

zid<-function(factors,data) {cntf<-data[["MarketIncome"]]# baseline for counterfactual incomefor (finfactors)cntf<-cntf+data[[f]]ineq::Gini(cntf)}eid<-function(factors,data) {cntf<-data[["MarketIncome"]]if ("Tax"%in%factors)cntf<-cntf+data[["Tax"]]elsecntf<-cntf+ mean(data[["Tax"]])if ("Transfer"%in%factors)cntf<-cntf+data[["Transfer"]]elsecntf<-cntf+ mean(data[["Transfer"]])ineq::Gini(cntf)}

These equalities hold in both scenarios:

zid(c(),income)==gini_market#> [1] TRUEzid(c("Tax","Transfer"),income)==gini_final#> [1] TRUEeid(c(),income)==gini_market#> [1] TRUEeid(c("Tax","Transfer"),income)==gini_final#> [1] TRUE

Note that for EID, the first equality only holds because the sum oftaxes and transfers is zero, i.e., those two sources cancel each otherout. Once this is no longer the case, the EID method runs into problems(see Enami et al. for a detailed discussion). In any case, ZID and EIDgive different answers when only one factor is included:

zid("Tax",income)#> [1] 0.4068966eid("Tax",income)#> [1] 0.3347518

This is because in the zero income scenario, transfers are set to zerowhen only taxes are considered, while in the equalized income scenario,transfers are distributed equally among the individuals. The Shapleyvalues of the two scenarios are the following:

shapley(zid, c("Tax","Transfer"),silent=TRUE,data=income)#>     factor       value#> 1      Tax  0.05700719#> 2 Transfer -0.11374478shapley(eid, c("Tax","Transfer"),silent=TRUE,data=income)#>     factor       value#> 1      Tax  0.00000000#> 2 Transfer -0.05673759

Whether ZID or EID is appropriate depends on the context. Sastre andTrannoy (2002) and Enami et al. (2018) address this question in furtherdetail.

Example 4: Hierarchical Shapley decomposition (Owen values)

Continuing from the previous example (and again borrowing from Enami etal.), consider the case that the tax shown above is actually composed oftwo different taxes,Tax1 andTax2 (these two columns sum to thecolumnTax in the previous example):

MarketIncomeTax1Tax2TransferFinalIncome
10-595
20-1-4722
30-2-3530
40-3-2338
50-4-1146

When we now decompose this dataset (income2) by three factors, we getthe following results:

# we can reuse the `zid` function from above,# while the `eid` function would need to be adaptedowen(zid, c("Tax1","Tax2","Transfer"),silent=TRUE,data=income2)#>     factor        value#> 1     Tax1 -0.006012472#> 2     Tax2  0.062502480#> 3 Transfer -0.113227597

Note that the sum of the contributions of the two taxes does not equalthe contribution for the tax above, although this is just the sum of thetwo separate taxes. Furthermore, the size of the transfer component isaffected. As Enami et al. (2018, p. 108) write:

Given that no new tax has been added and that the only change is thatsome additional information about the sources of taxes has beenincluded in the analysis, it is inconvenient that the Shapley valuefor transfers has also changed.

This is a unfortunate property of the Shapley decomposition, but it canbe partially remedied by using a hierarchical procedure, the Owen valuedecomposition (Owen 1977). (An alternative is the Nested Shapleydecomposition recommended by Sastre and Trannoy (2002), which introducesa new set of problems, though.) Theshapley package allows thecomputation of Owen values by specifying the group structure using alist of vectors:

owen(zid,list(c("Tax1","Tax2"), c("Transfer")),silent=TRUE,data=income2)#>   group   factor       value#> 1     1     Tax1 -0.00575388#> 2     1     Tax2  0.06276107#> 3     2 Transfer -0.11374478

Using this notation, we have groupedTax1 andTax2 together in onegroup, whileTransfer is a group in itself. The results now line upwith the results of the Shapley decomposition above, where the taxeswere jointly entered as a single factor.

Note that the hierarchical procedure can also be used as an effectivetool to increase the speed of computation when a large number of factorsis included. For instance, when 8 factors are considered, 8! = 40320permutations need to be calculated for each factor. Once the 8 factorsare grouped into two groups with 4 factors each, the number ofpermutations that need to be calculated for each factor is only 2! * 4!* 4! = 1152.

References

Enami, A., N. Lustig, and R. Aranda. 2018. Analytic Foundations:Measuring the Redistributive Impact of Taxes and Transfers. In: N.Lustig (Ed.),Commitment to Equity Handbook. Estimating the Impact ofFiscal Policy on Inequality and Poverty, Washington, D.C.: Brookings,56-115.

Owen, G. 1977. Values of Games with a Priori Unions. In: R. Henn and O.Moeschlin (Eds.),Mathematical Economics and Game Theory, Berlin andHeidelberg: Springer, 76-88.

Sastre, M. and A. Trannoy. 2002. Shapley Inequality Decomposition byFactor Components: Some Methodological Issues.Journal of Economics77(1): 51-89.https://doi.org/10.1007/BF03052500

Shapley, L. S. 1953. A value for n-person games. In: A. W. Tucker and H.W. Kuhn (Eds.),Contributions to the theory of games (Vol. II),Princeton: Princeton University Press, 307–317.

Shorrocks, A. F. 2013. Decomposition procedures for distributionalanalysis: a unified framework based on the Shapley value.Journal ofEconomic Inequality 11: 1-28.https://doi.org/10.1007/s10888-011-9214-z

About

Compute Shapley-Shorrocks value decompositions

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp