Methodology

Source:vignettes/methodology.Rmd

methodology.Rmd

In the following we present the methodology insurveysdby applying the workflow described invignette("surveysd")to multiple consecutive years of EU-SILC data for one country. Themethodology contains the following steps, in this order

Draw $B B$ bootstrap replicates from EU-SILC data for each year $y_{t} y_t$ , $t = 1, \dots, n_{y} t=1,\ldots,n_y$ separately. Since EU-SILC has a rotating panel design the bootstrapreplicate of a household is carried forward through the years. That is,the bootstrap replicate of a household in the follow-up years is setequal to the bootstrap replicate of the same household when it firstenters EU-SILC.
Multiply each set of bootstrap replicates by the sampling weights toobtain uncalibrated bootstrap weights and calibrate each of theuncalibrated bootstrap weights using iterative proportionalfitting.
Estimate the point estimate of interest $θ \theta$ ,for each year and each calibrated bootstrap weight to obtain ${\tilde{θ}}^{(i, y_{t})} \tilde{\theta}^{(i,y_t)}$ , $t = 1, \dots, n_{y} t=1,\ldots,n_y$ , $i = 1, \dots, B i=1,\ldots,B$ .For fixed $y_{t} y_t$ apply a filter with equal weights for each $i i$ on ${\tilde{θ}}^{(i, y^{*})} \tilde{\theta}^{(i,y^*)}$ , $y^{*} \in {y_{t - 1}, y_{t}, y_{t + 1}} y^*\in \{y_{t-1},y_{t},y_{t+1}\}$ , to obtain ${\tilde{θ}}^{(i, y_{t})} \tilde{\theta}^{(i,y_t)}$ .Estimate the variance of $θ \theta$ using the distribution of ${\tilde{θ}}^{(i, y_{t})} \tilde{\theta}^{(i,y_t)}$ .

Bootstrapping

Bootstrapping has long been around and used widely to estimateconfidence intervals and standard errors of point estimates.[Efron (1979)} Given a random sample $(X_{1}, \dots, X_{n}) (X_1,\ldots,X_n)$ drawn from an unknown distribution $F F$ the distribution of a point estimate $θ (X_{1}, \dots, X_{n}; F) \theta(X_1,\ldots,X_n;F)$ can in many cases not be determined analytically. However when usingbootstrapping one can simulate the distribution of $θ \theta$ .

Let $s_{(.)} s_{(.)}$ be a bootstrap sample, e.g. drawing $n n$ observations with replacement from the sample $(X_{1}, \dots, X_{n}) (X_1,\ldots,X_n)$ ,then one can estimate the standard deviation of $θ \theta$ using $B B$ bootstrap samples through $s d (θ) = \sqrt{\frac{1}{B - 1} \sum_{i = 1}^{B} {(θ (s_{i}) - \bar{θ})}^{2}}, sd(\theta) = \sqrt{\frac{1}{B-1}\sum\limits_{i=1}^B (\theta(s_i)-\overline{\theta})^2} \quad,$

with $\bar{θ} := \frac{1}{B} \sum_{i = 1}^{B} θ (s_{i}) \overline{\theta}:=\frac{1}{B}\sum\limits_{i=1}^B\theta(s_i)$ as the sample mean over all bootstrap samples.

In context of sample surveys with sampling weights one can usebootstrapping to calculate so called bootstrap weights. These arecomputed via the bootstrap samples $s_{i} s_{i}$ , $i = 1, \dots, B i=1,\ldots,B$ ,where for each $s_{i} s_{i}$ every unit of the original sample can appear $00$ -to $m m$ -times.With $f_{j}^{i} f_j^{i}$ as the frequency of occurrence of observation $j j$ in bootstrap sample $s_{i} s_i$ the uncalibrated bootstrap weights ${\tilde{b}}_{j}^{i} \tilde{b}_{j}^{i}$ are defined as:

${\tilde{b}}_{j}^{i} = f_{j}^{i} w_{j}, \tilde{b}_{j}^{i} = f_j^{i} w_j \quad,$

with $w_{j} w_j$ as the calibrated sampling weight of the original sample. Usingiterative proportional fitting procedures one can recalibrate thebootstrap weights ${\tilde{b}}_{j}^{.} \tilde{b}_{j}^{.}$ , $j = 1, \dots, B j=1,\ldots,B$ to get the adapted or calibrated bootstrap weights $b_{j}^{i} b_j^i$ , $j = 1, \dots, B j=1,\ldots,B$ .

Rescaled Bootstrap

Since EU-SILC is a stratified sample without replacement drawn from afinite population the naive bootstrap procedure, as described above,does not take into account the heterogeneous inclusion probabilities ofeach sample unit. Thus it will not yield satisfactory results. Thereforewe will use the so called rescaled bootstrap procedure introduced andinvestigated by(Rao and Wu 1988). Thebootstrap samples are selected without replacement and do incorporatethe stratification as well as clustering on multiple stages (see(Chipperfield and Preston 2007),(Preston 2009)).

For simplistic reasons we will only describe the rescaled bootstrapprocedure for a two stage stratified sampling design. For more detailson a general formulation please see(Preston2009).

Sampling design

Consider the finite population $U U$ which is divided into $H H$ non-overlapping strata $⋃_{h = 1, \dots, H} U_{h} = U \bigcup\limits_{h=1,\ldots,H} U_h = U$ ,of which each strata $h h$ contains of $N_{h} N_h$ clusters. For each strata $h h$ , $C_{h c} C_{hc}$ , $c = 1, \dots, n_{h} c=1,\ldots,n_h$ clusters are drawn, containing $N_{h c} N_{hc}$ households. Furthermore in each cluster $C_{h c} C_{hc}$ of each strata $h h$ simple random sampling is performed to select a set of households $Y_{h c j} Y_{hcj}$ , $j = 1, \dots, n_{h c} j=1,\ldots,n_{hc}$ .

Bootstrap procedure

In contrast to the naive bootstrap procedure where for a stage,containing $n n$ sampling units, the bootstrap replicate is obtained by drawing $n n$ sampling units with replacement, for the rescaled bootstrap procedure $n^{*} = ⌊ \frac{n}{2} ⌋ n^*=\left\lfloor\frac{n}{2}\right\rfloor$ sampling units are drawn without replacement. Given a value $x x$ , $⌊ x ⌋ \lfloor x\rfloor$ denotes the largest integer smaller than $x x$ ,whereas $⌈ x ⌉ \lceil x\rceil$ denotes the smallest integer lager then $x x$ .(Chipperfield and Preston 2007) have shownthat the choice of either $⌊ \frac{n}{2} ⌋ \left\lfloor\frac{n}{2}\right\rfloor$ or $⌈ \frac{n}{2} ⌉ \left\lceil\frac{n}{2}\right\rceil$ is optimal for bootstrap samples without replacement, although $⌊ \frac{n}{2} ⌋ \left\lfloor\frac{n}{2}\right\rfloor$ has the desirable property that the resulting uncalibrated bootstrapweights will never be negative.

At the first stage the $i i$ -thbootstrap replicate, $f_{h c}^{i, 1} f^{i,1}_{hc}$ ,for each cluster $C_{h c} C_{hc}$ , $c = 1, \dots, n_{h} c=1,\ldots,n_h$ ,belonging to strata $h h$ ,is defined by

$f_{h c}^{i, 1} = 1 - λ_{h} + λ_{h} \frac{n_{h}}{n_{h}^{*}} δ_{h c} \forall c \in {1, \dots, n_{h}} f^{i,1}_{hc} = 1-\lambda_h+\lambda_h\frac{n_h}{n_h^*}\delta_{hc} \quad\quad \forall c \in \{1,\ldots,n_h\}$ with $n_{h}^{*} = ⌊ \frac{n_{h}}{2} ⌋ n_h^* = \left\lfloor\frac{n_h}{2}\right\rfloor$ $λ_{h} = \sqrt{\frac{n_{h}^{*} (1 - \frac{n_{h}}{N_{h}})}{n_{h} - n_{h}^{*}}}, \lambda_h = \sqrt{\frac{n_h^*(1-\frac{n_h}{N_h})}{n_h-n_h^*}} \quad ,$

where $δ_{h c} = 1 \delta_{hc}=1$ if cluster $c c$ is selected in the sub-sample of size $n_{h}^{*} n_h^*$ and 0 otherwise.

The $i i$ -thbootstrap replicate at the second stage, $f_{h c j}^{i, 2} f^{i,2}_{hcj}$ ,for each household $Y_{h c j} Y_{hcj}$ , $j = 1, \dots, n_{h c} j=1,\ldots,n_{hc}$ ,belonging to cluster $c c$ in strata $h h$ is defined by

$f_{h c j}^{i, 2} = f_{h c}^{i, 1} - λ_{h c} \sqrt{\frac{n_{h}}{n_{h}^{*}}} δ_{h c} [\frac{n_{h c}}{n_{h c}^{*}} δ_{h c j} - 1] \forall c \in {1, \dots, n_{h}} f^{i,2}_{hcj} = f^{i,1}_{hc} - \lambda_{hc}\sqrt{\frac{n_h}{n_h^*}}\delta_{hc}\left[\frac{n_{hc}}{n_{hc}^*}\delta_{hcj}-1\right] \quad\quad \forall c \in \{1,\ldots,n_h\}$ with $n_{h c}^{*} = ⌊ \frac{n_{h c}}{2} ⌋ n_{hc}^* = \left\lfloor\frac{n_{hc}}{2}\right\rfloor$ $λ_{h c} = \sqrt{\frac{n_{h c}^{*} N_{h} (1 - \frac{n_{h c}}{N_{h c}})}{n_{h c} - n_{h c}^{*}}}, \lambda_{hc} = \sqrt{\frac{n_{hc}^*N_h(1-\frac{n_{hc}}{N_{hc}})}{n_{hc}-n_{hc}^*}} \quad ,$

where $δ_{h c j} = 1 \delta_{hcj}=1$ if household $j j$ is selected in the sub sample of size $n_{h c}^{*} n_{hc}^*$ and 0 otherwise.

Single PSUs

When dealing with multistage sampling designs the issue of singlePSUs, e.g. a single response unit is present at a stage or in a strata,can occur. When applying bootstrapping procedures these single PSUs canlead to a variety of issues. For the methodology proposed in this workwe combined single PSUs at each stage with the next smallest strata orcluster, before applying the bootstrap procedure.

Taking bootstrap replicates forward

The bootstrap procedure above is applied on the EU-SILC data for eachyear $y_{t} y_t$ , $t = 1, \dots, n_{y} t=1,\ldots,n_y$ separately. Since EU-SILC is a yearly survey with rotating penal designthe $i i$ -thbootstrap replicate at the second stage, $f_{h c j}^{i, 2} f^{i,2}_{hcj}$ ,for a household $Y_{h c j} Y_{hcj}$ is taken forward until the household $Y_{h c j} Y_{hcj}$ drops out of the sample. That is, for the household $Y_{h c j} Y_{hcj}$ ,which enters EU-SILC at year $y_{1} y_1$ and drops out at year $y_{\tilde{t}} y_{\tilde{t}}$ ,the bootstrap replicates for the years $y_{2}, \dots, y_{\tilde{t}} y_2,\ldots,y_{\tilde{t}}$ are set to the bootstrap replicate of the year $y_{1} y_1$ .

Split households

Due to the rotating penal design so called split households canoccur. For a household participating in the EU-SILC survey it ispossible that one or more residents move to a new so called splithousehold, which is followed up on in the next wave. To take thisdynamic into account we extended the procedure of taking forward thebootstrap replicate of a household for consecutive waves of EU-SILC bytaking forward the bootstrap replicate to the split household. Thatmeans, that also any new individuals in the split household will inheritthis bootstrap replicate.

Taking bootstrap replicates forward as well as considering splithouseholds ensures that bootstrap replicates are more comparable instructure with the actual design of EU-SILC.

Uncalibrated bootstrap weights

Using the $i i$ -thbootstrap replicates at the second stage one can calculate the $i i$ -thuncalibrated bootstrap weights $b_{h c j}^{i} b_{hcj}^{i}$ for each household $Y_{h c j} Y_{hcj}$ in cluster $c c$ contained in strata $h h$ by

${\tilde{b}}_{h c j}^{i} = f_{h c j}^{i, 2} w_{h c j}, \tilde{b}_{hcj}^{i} = f^{i,2}_{hcj} w_{hcj} \quad,$ where $w_{h c j} w_{hcj}$ corresponds to the original household weight contained in thesample.

For ease of readability we will drop the subindices regarding strata $h h$ and cluster $c c$ for the following sections, meaning that the $j j$ -thhousehold in cluster $c c$ contained in strata $h h$ , $Y_{h c j} Y_{hcj}$ ,will now be denoted as the $j j$ -thhousehold, $Y_{j} Y_{j}$ ,where $j j$ is the position of the household in the data. In accordance to this the $i i$ -thuncalibrated bootstrap replicates for household $j j$ are thus denoted as ${\tilde{b}}_{j}^{i} \tilde{b}_j^{i}$ and the original household weight as $w_{j} w_j$ .

Iterative proportional fitting (IPF)

The uncalibrated bootstrap weights ${\tilde{b}}_{j}^{i} \tilde{b}_j^{i}$ computed through the rescaled bootstrap procedure yields populationstatistics that differ from the known population margins of specifiedsociodemographic variables for which the base weights $w_{j} w_j$ have been calibrated. To adjust for this the bootstrap weights ${\tilde{b}}_{j}^{i} \tilde{b}_{j}^{i}$ can be recalibrated using iterative proportional fitting as described in(Meraner, Gumprecht, and Kowarik2016).

Let the original weight $w_{j} w_{j}$ be calibrated for $n = n_{P} + n_{H} n=n_P+n_H$ sociodemographic variables which are divided into the sets $𝒫 := {p_{c}, c = 1 \dots, n_{P}} \mathcal{P}:=\{p_{c}, c=1 \ldots,n_P\}$ and $ℋ := {h_{c}, c = 1 \dots, n_{H}} \mathcal{H}:=\{h_{c}, c=1 \ldots,n_H\}$ . $𝒫 \mathcal{P}$ and $ℋ \mathcal{H}$ correspond to personal, for example gender or age, or householdvariables, like region or households size, respectively. Each variablein either $𝒫 \mathcal{P}$ or $ℋ \mathcal{H}$ can take on $P_{c} P_{c}$ or $H_{c} H_{c}$ values with and $N_{v}^{p_{c}} N^{p_c}_v$ , $v = 1, \dots, P_{c} v=1,\ldots,P_c$ ,or $N_{v}^{h_{c}} N^{h_c}_v$ , $v = 1, \dots, H_{c} v=1,\ldots,H_c$ ,as the corresponding population margins. Starting with $k = 0 k=0$ the iterative proportional fitting procedure is applied on each ${\tilde{b}}_{j}^{i} \tilde{b}_j^{i}$ , $i = 1, \dots, B i=1,\ldots, B$ separately. The weights are first updated for personal and afterwardsupdated for household variables. If constraints regarding thepopulations margins are not met $k k$ is raised by 1 and the procedure starts from the beginning. For thefollowing denote as starting weight ${\tilde{b}}_{j}^{[0]} := {\tilde{b}}_{j}^{i} \tilde{b}_j^{[0]}:=\tilde{b}_j^{i}$ for fixed $i i$ .

Adjustment and trimming for $𝒫 \mathcal{P}$

The uncalibrated bootstrap weight ${\tilde{b}}_{j}^{[(n + 1) k + c - 1]} \tilde{b}_j^{[(n+1)k+c-1]}$ for the $j j$ -thobservation is iteratively multiplied by a factor so that the projecteddistribution of the population matches the respective calibrationspecification $N_{p_{c}} N_{p_c}$ , $c = 1, \dots, n_{P} c=1, \ldots,n_P$ .For each $c \in {1, \dots, n_{P}} c \in \left\{1, \ldots,n_P\right\}$ the calibrated weights against $N_{v}^{p_{c}} N^{p_c}_v$ are computed as ${\tilde{b}}_{j}^{[(n + 1) k + c]} = {\tilde{b}}_{j}^{[(n + 1) k + c - 1]} \frac{N_{v}^{p_{c}}}{\sum_{l} {\tilde{b}}_{l}^{[(n + 1) k + c - 1]}}, \tilde{b}_j^{[(n+1)k+c]} = {\tilde{b}_j}^{[(n+1)k+c-1]}\frac{N^{p_c}_v}{{\sum\limits_l} {\tilde{b}}_l^{[(n+1)k+c-1]}},$ where the summation in the denominatorexpands over all observations which have the same value as observation $j j$ for the sociodemographic variable $p_{c} p_c$ .If any weights ${\tilde{b}}_{j}^{[n k + c]} \tilde{b}_j^{[nk+c]}$ fall outside the range $[\frac{w_{j}}{4}; 4 w_{j}] \left[\frac{w_j}{4};4w_j\right]$ they will be recoded to the nearest of the two boundaries. The choice ofthe boundaries results from expert-based opinions and restricts thevariance of which has a positive effect on the sampling error. Thisprocedure represents a common form of weight trimming where very largeor small weights are trimmed in order to reduce variance in exchange fora possible increase in bias ((Potter1990),(Potter 1993)).

Averaging weights within households

Since the sociodemographic variables $p_{1}, \dots, p_{n_{c}} p_1,\ldots,p_{n_c}$ include person-specific variables, the weights ${\tilde{b}}_{j}^{[n k + n_{p}]} \tilde{b}_j^{[nk+n_p]}$ resulting from the iterative multiplication can be unequal for membersof the same household. This can lead to inconsistencies between resultsprojected with household and person weights. To avoid suchinconsistencies each household member is assigned the mean of thehousehold weights. That is for each person $j j$ in household $a a$ with $h_{a} h_a$ household members, the weights are defined by ${\tilde{b}}_{j}^{[(n + 1) k + n_{p} + 1]} = \frac{\sum_{l \in a} {\tilde{b}}_{l}^{[(n + 1) k + n_{p}]}}{h_{a}} \tilde{b}_j^{[(n+1)k+n_p+1]} = \frac{{\sum\limits_{l\in a}} {\tilde{b}_l^{[(n+1)k+n_p]}}}{h_a}$ This can result in losing thepopulation structure performed in the previous subsection.

Adjustment and trimming for $ℋ \mathcal{H}$

After adjustment for individual variables the weights $b_{j}^{[n k + n_{p} + 1]} b_j^{[nk+n_p+1]}$ are updated for the set of household variables $ℋ \mathcal{H}$ according to a household convergence constraint parameter $ϵ_{h} \epsilon_h$ .The parameters $ϵ_{h} \epsilon_h$ represent the allowed deviation from the population margins using theweights $b_{j}^{[n k + n_{p} + 1]} b_j^{[nk+n_p+1]}$ compared to $N_{v}^{h_{c}} N^{h_c}_v$ , $c = 1, \dots, n_{H} c=1,\ldots,n_H$ , $v = 1, \dots, H_{c} v=1,\ldots,H_c$ .The updated weights are computed as $b_{j}^{[(n + 1) k + n_{p} + c + 1]} = {\begin{matrix} b_{j}^{[(n + 1) k + n_{p} + 1]} \frac{N_{v}^{h_{c}}}{\sum_{l} b_{l}^{[(n + 1) k + n_{p} + 1]}} if \sum_{l} b_{j}^{[(n + 1) k + n_{p} + 1]} \notin ((1 - 0.9 ϵ_{h}) N_{v}^{h_{c}}, (1 + 0.9 ϵ_{h}) N_{v}^{h_{c}}) \\ b_{j}^{[(n + 1) k + n_{p} + 1]} otherwise \end{matrix} b_j^{[(n+1)k+n_p+c+1]} = \begin{cases} b_j^{[(n+1)k+n_p+1]}\frac{N^{h_c}_v}{\sum\limits_{l} b_l^{[(n+1)k+n_p+1]}} \quad \text{if } \sum\limits_{l} b_j^{[(n+1)k+n_p+1]} \notin ((1-0.9\epsilon_h)N^{h_c}_v,(1+0.9\epsilon_h)N^{h_c}_v) \\ b_j^{[(n+1)k+n_p+1]} \quad \text{otherwise} \end{cases}$ with the summation in the denominatorranging over all households $l l$ which take on the same values for $h_{c} h_c$ as observation $j j$ .As described in the previous subsection the new weight are recoded ifthey exceed the interval $[\frac{w_{j}}{4}; 4 w_{j}] [\frac{w_j}{4};4w_j]$ and set to the upper or lower bound, depending of $b_{j}^{[(n + 1) k + n_{p} + c + 1]} b_j^{[(n+1)k+n_p+c+1]}$ falls below or above the interval respectively.

Convergence

For each adjustment and trimming step the factor $\frac{N_{v}^{(.)}}{\sum_{l} b_{l}^{[(n + 1) k + j]}} \frac{N^{(.)}_v}{\sum\limits_{l} b_l^{[(n+1)k+j]}}$ , $j \in {1, \dots, n + 1} ∖ {n_{p} + 1} j\in \{1,\ldots,n+1\}\backslash \{n_p+1\}$ ,is checked against convergence constraints for households, $ϵ_{h} \epsilon_h$ ,or personal variables $ϵ_{p} \epsilon_p$ ,where $(.) (.)$ corresponds to either a household or personal variable. To be moreprecise for variables in $𝒫 \mathcal{P}$ the constraints

$\frac{N_{v}^{p_{c}}}{\sum_{l} {\tilde{b}}_{l}^{[(n + 1) k + j]}} \in ((1 - ϵ_{p}) N_{v}^{p_{c}}, (1 + ϵ_{p}) N_{v}^{p_{c}}) \frac{N^{p_c}_v}{{\sum\limits_l} {\tilde{b}}_l^{[(n+1)k+j]}} \in ((1-\epsilon_p)N^{p_c}_v,(1+\epsilon_p)N^{p_c}_v)$ and for variables in $ℋ \mathcal{H}$ the constraints

$\frac{N_{v}^{h_{c}}}{\sum_{l} {\tilde{b}}_{l}^{[(n + 1) k + j]}} \in ((1 - ϵ_{h}) N_{v}^{h_{c}}, (1 + ϵ_{h}) N_{v}^{h_{c}}) \frac{N^{h_c}_v}{{\sum\limits_l} {\tilde{b}}_l^{[(n+1)k+j]}} \in ((1-\epsilon_h)N^{h_c}_v,(1+\epsilon_h)N^{h_c}_v)$ are verified, where the sum in thedenominator expands over all observations which have the same value forvariables $h_{c} h_c$ or $p_{c} p_c$ .If these constraints hold true the algorithm reaches convergence,otherwise $k k$ is raised by 1 and the procedure repeats itself.

The above described calibration procedure is applied on each year $y_{t} y_t$ of EU-SILC separately, $t = 1, \dots n_{y} t=1,\ldots n_y$ ,thus resulting in so called calibrated bootstrap sample weights $b_{j}^{(i, y_{t})} b_{j}^{(i,{y_t})}$ , $i = 1, \dots, B i=1,\ldots,B$ for each year $y y$ and each household $j j$ .

Variance estimation

Applying the previously described algorithms to EU-SILC data formultiple consecutive years $y_{t} y_t$ , $t = 1, \dots n_{y} t=1,\ldots n_y$ ,yields calibrated bootstrap sample weights $b_{j}^{(i, y_{t})} b_{j}^{(i,{y_t})}$ for each year $y_{t} y_t$ .Using the calibrated bootstrap sample weights it is straight forward tocompute the standard error of a point estimate $θ (𝐗^{y_{t}}, 𝐰^{y_{t}}) \theta(\textbf{X}^{y_t},\textbf{w}^{y_t})$ for year $y_{t} y_t$ with $𝐗^{y_{t}} = (X_{1}^{y_{t}}, \dots, X_{n}^{y_{t}}) \textbf{X}^{y_t}=(X_1^{y_t},\ldots,X_n^{y_t})$ as the vector of observations for the variable of interest in the surveyand $𝐰^{y_{t}} = (w_{1}^{y_{t}}, \dots, w_{n}^{y_{t}} \textbf{w}^{y_t}=(w_1^{y_t},\ldots,w_n^{y_t}$ as the corresponding weight vector, with

$s d (θ) = \sqrt{\frac{1}{B - 1} \sum_{i = 1}^{B} {(θ^{(i, y_{t})} - \bar{θ^{(., y_{t})}})}^{2}} sd(\theta) = \sqrt{\frac{1}{B-1}\sum\limits_{i=1}^B (\theta^{(i,y_t)}-\overline{\theta^{(.,y_t)}})^2}$ with $\bar{θ^{(., y_{t})}} = \frac{1}{B} \sum_{i = 1}^{B} θ^{(i, y_{t})}, \overline{\theta^{(.,y_t)}} = \frac{1}{B}\sum\limits_{i=1}^B\theta^{(i,y_t)} \quad,$ where $θ^{(i, y_{t})} := θ (𝐗^{y_{t}}, 𝐛^{(i, y_{t})}) \theta^{(i,y_t)}:=\theta(\textbf{X}^{y_t},\textbf{b}^{(i,{y_t})})$ is the estimate of $θ \theta$ in the year $y_{t} y_t$ using the $i i$ -thvector of calibrated bootstrap weights.

As already mentioned the standard error estimation for indicators inEU-SILC yields high quality results for NUTS1 or country level. Whenestimation indicators on regional or other sub-aggregate levels one isconfronted with point estimates yielding high variance.

To overcome this issue we propose to estimate $θ \theta$ for 3, consecutive years using the calibrated bootstrap weights, thuscalculating ${θ^{(i, y_{t - 1})}, θ^{(i, y_{t})}, θ^{(i, y_{t + 1})}} \{\theta^{(i,y_{t-1})},\theta^{(i,y_t)},\theta^{(i,y_{t+1})}\}$ , $i = 1, \dots, B i=1,\ldots,B$ .For fixed $i i$ one can apply a filter with equal filter weights on the time series ${θ^{(i, y_{t - 1})}, θ^{(i, y_{t})}, θ^{(i, y_{t + 1})}} \{\theta^{(i,y_{t-1})},\theta^{(i,y_t)},\theta^{(i,y_{t+1})}\}$ to create ${\tilde{θ}}^{(i, y_{t})} \tilde{\theta}^{(i,y_t)}$

${\tilde{θ}}^{(i, y_{t})} = \frac{1}{3} [θ^{(i, y_{t - 1})} + θ^{(i, y_{t})} + θ^{(i, y_{t + 1})}] . \tilde{\theta}^{(i,y_t)} = \frac{1}{3}\left[\theta^{(i,y_{t-1})}+\theta^{(i,y_t)}+\theta^{(i,y_{t+1})}\right] \quad .$

Doing this for all $i i$ , $i = 1, \dots, B i=1,\ldots,B$ ,yields ${\tilde{θ}}^{(i, y_{t})} \tilde{\theta}^{(i,y_t)}$ , $i = 1, \dots, B i=1,\ldots,B$ .The standard error of $θ \theta$ can then be estimated with

$s d (θ) = \sqrt{\frac{1}{B - 1} \sum_{i = 1}^{B} {({\tilde{θ}}^{(i, y_{t})} - \bar{{\tilde{θ}}^{(., y_{t})}})}^{2}} sd(\theta) = \sqrt{\frac{1}{B-1}\sum\limits_{i=1}^B (\tilde{\theta}^{(i,y_t)}-\overline{\tilde{\theta}^{(.,y_t)}})^2}$ with $\bar{{\tilde{θ}}^{(., y_{t})}} = \frac{1}{B} \sum_{i = 1}^{B} {\tilde{θ}}^{(i, y_{t})} . \overline{\tilde{\theta}^{(.,y_t)}}=\frac{1}{B}\sum\limits_{i=1}^B\tilde{\theta}^{(i,y_t)} \quad.$

Applying the filter over the time series of estimated $θ^{(i, y_{t})} \theta^{(i,y_t)}$ leads to a reduction of variance for $θ \theta$ since the filter reduces the noise in ${θ^{(i, y_{t - 1})}, θ^{(i, y_{t})}, θ^{(i, y_{t + 1})}} \{\theta^{(i,y_{t-1})},\theta^{(i,y_t)},\theta^{(i,y_{t+1})}\}$ and thus leading to a more narrow distribution for ${\tilde{θ}}^{(i, y_{t})} \tilde{\theta}^{(i,y_t)}$ .

It should also be noted that estimating indicators from a survey withrotating panel design is in general not straight forward because of thehigh correlation between consecutive years. However with our approach touse bootstrap weights, which are independent from each other, we canbypass the cumbersome calculation of various correlations, and applythem directly to estimate the standard error.(Bauer et al. 2013) showed that using theproposed method on EU-SILC data for Austria the reduction in resultingstandard errors corresponds in a theoretical increase in sample size byabout25 $% \%$ .Furthermore this study compared this method to the use of small areaestimation techniques and on average the use of bootstrap sample weightsyielded more stable results.

References

Bauer, Martin, Matthias Till, Richard Heuberger, Marcel Bilgili, ThomasGlaser, Elisabeth Kafka, Johannes Klotz, et al. 2013.“Studie ZuArmut Und Sozialer Eingliederung in DenBundesl"andern.” Statistik Austria [in German].

Chipperfield, James, and John Preston. 2007.“Efficient Bootstrapfor Business Surveys.”Survey Methodology 33 (December):167–72.https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X200700210494.

Efron, B. 1979.“Bootstrap Methods: Another Look at theJackknife.”Ann. Statist. 7 (1): 1–26.https://doi.org/10.1214/aos/1176344552.

Meraner, Angelika, Daniela Gumprecht, and Alexander Kowarik. 2016.“Weighting Procedure of the Austrian Microcensus UsingAdministrative Data.”Austrian Journal of Statistics 45(June): 3.https://doi.org/10.17713/ajs.v45i3.120.

Potter, Frank J. 1990.“A Study of Procedures to Identify and TrimExtreme Sampling Weights.”Proceedings of the AmericanStatistical Association, Section on Survey Research Methods,225–30.http://www.asasrms.org/Proceedings/papers/1990_034.pdf.

———. 1993.“The Effect of Weight Trimming on Nonlinear SurveyEstimates.”Proceedings of the American StatisticalAssociation, Section on Survey Research Methods 2: 758–63.http://www.asasrms.org/Proceedings/papers/1993_127.pdf.

Preston, J. 2009.“Rescaled Bootstrap for Stratified MultistageSampling.”Survey Methodology 35 (December): 227–34.https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X200900211044.

Rao, J. N. K., and C. F. J. Wu. 1988.“Resampling Inference withComplex Survey Data.”Journal of the American StatisticalAssociation 83 (401): 231–41.

Movatterモバイル変換