In the following we present the methodology insurveysdby applying the workflow described invignette("surveysd")to multiple consecutive years of EU-SILC data for one country. Themethodology contains the following steps, in this order
- Drawbootstrap replicates from EU-SILC data for each year,separately. Since EU-SILC has a rotating panel design the bootstrapreplicate of a household is carried forward through the years. That is,the bootstrap replicate of a household in the follow-up years is setequal to the bootstrap replicate of the same household when it firstenters EU-SILC.
- Multiply each set of bootstrap replicates by the sampling weights toobtain uncalibrated bootstrap weights and calibrate each of theuncalibrated bootstrap weights using iterative proportionalfitting.
- Estimate the point estimate of interest,for each year and each calibrated bootstrap weight to obtain,,.For fixedapply a filter with equal weights for eachon,, to obtain.Estimate the variance ofusing the distribution of.
Bootstrapping
Bootstrapping has long been around and used widely to estimateconfidence intervals and standard errors of point estimates.[Efron (1979)} Given a random sampledrawn from an unknown distributionthe distribution of a point estimatecan in many cases not be determined analytically. However when usingbootstrapping one can simulate the distribution of.
Letbe a bootstrap sample, e.g. drawingobservations with replacement from the sample,then one can estimate the standard deviation ofusingbootstrap samples through
withas the sample mean over all bootstrap samples.
In context of sample surveys with sampling weights one can usebootstrapping to calculate so called bootstrap weights. These arecomputed via the bootstrap samples,,where for eachevery unit of the original sample can appear-to-times.Withas the frequency of occurrence of observationin bootstrap samplethe uncalibrated bootstrap weightsare defined as:
withas the calibrated sampling weight of the original sample. Usingiterative proportional fitting procedures one can recalibrate thebootstrap weights,to get the adapted or calibrated bootstrap weights,.
Rescaled Bootstrap
Since EU-SILC is a stratified sample without replacement drawn from afinite population the naive bootstrap procedure, as described above,does not take into account the heterogeneous inclusion probabilities ofeach sample unit. Thus it will not yield satisfactory results. Thereforewe will use the so called rescaled bootstrap procedure introduced andinvestigated by(Rao and Wu 1988). Thebootstrap samples are selected without replacement and do incorporatethe stratification as well as clustering on multiple stages (see(Chipperfield and Preston 2007),(Preston 2009)).
For simplistic reasons we will only describe the rescaled bootstrapprocedure for a two stage stratified sampling design. For more detailson a general formulation please see(Preston2009).
Sampling design
Consider the finite populationwhich is divided intonon-overlapping strata,of which each stratacontains ofclusters. For each strata,,clusters are drawn, containinghouseholds. Furthermore in each clusterof each stratasimple random sampling is performed to select a set of households,.
Bootstrap procedure
In contrast to the naive bootstrap procedure where for a stage,containingsampling units, the bootstrap replicate is obtained by drawingsampling units with replacement, for the rescaled bootstrap proceduresampling units are drawn without replacement. Given a value,denotes the largest integer smaller than,whereasdenotes the smallest integer lager then.(Chipperfield and Preston 2007) have shownthat the choice of eitheroris optimal for bootstrap samples without replacement, althoughhas the desirable property that the resulting uncalibrated bootstrapweights will never be negative.
At the first stage the-thbootstrap replicate,,for each cluster,,belonging to strata,is defined by
with
whereif clusteris selected in the sub-sample of sizeand 0 otherwise.
The-thbootstrap replicate at the second stage,,for each household,,belonging to clusterin stratais defined by
with
whereif householdis selected in the sub sample of sizeand 0 otherwise.
Single PSUs
When dealing with multistage sampling designs the issue of singlePSUs, e.g. a single response unit is present at a stage or in a strata,can occur. When applying bootstrapping procedures these single PSUs canlead to a variety of issues. For the methodology proposed in this workwe combined single PSUs at each stage with the next smallest strata orcluster, before applying the bootstrap procedure.
Taking bootstrap replicates forward
The bootstrap procedure above is applied on the EU-SILC data for eachyear,separately. Since EU-SILC is a yearly survey with rotating penal designthe-thbootstrap replicate at the second stage,,for a householdis taken forward until the householddrops out of the sample. That is, for the household,which enters EU-SILC at yearand drops out at year,the bootstrap replicates for the yearsare set to the bootstrap replicate of the year.
Split households
Due to the rotating penal design so called split households canoccur. For a household participating in the EU-SILC survey it ispossible that one or more residents move to a new so called splithousehold, which is followed up on in the next wave. To take thisdynamic into account we extended the procedure of taking forward thebootstrap replicate of a household for consecutive waves of EU-SILC bytaking forward the bootstrap replicate to the split household. Thatmeans, that also any new individuals in the split household will inheritthis bootstrap replicate.
Taking bootstrap replicates forward as well as considering splithouseholds ensures that bootstrap replicates are more comparable instructure with the actual design of EU-SILC.
Uncalibrated bootstrap weights
Using the-thbootstrap replicates at the second stage one can calculate the-thuncalibrated bootstrap weightsfor each householdin clustercontained in strataby
wherecorresponds to the original household weight contained in thesample.
For ease of readability we will drop the subindices regarding strataand clusterfor the following sections, meaning that the-thhousehold in clustercontained in strata,,will now be denoted as the-thhousehold,,whereis the position of the household in the data. In accordance to this the-thuncalibrated bootstrap replicates for householdare thus denoted asand the original household weight as.
Iterative proportional fitting (IPF)
The uncalibrated bootstrap weightscomputed through the rescaled bootstrap procedure yields populationstatistics that differ from the known population margins of specifiedsociodemographic variables for which the base weightshave been calibrated. To adjust for this the bootstrap weightscan be recalibrated using iterative proportional fitting as described in(Meraner, Gumprecht, and Kowarik2016).
Let the original weightbe calibrated forsociodemographic variables which are divided into the setsand.andcorrespond to personal, for example gender or age, or householdvariables, like region or households size, respectively. Each variablein eitherorcan take onorvalues with and,,or,,as the corresponding population margins. Starting withthe iterative proportional fitting procedure is applied on each,separately. The weights are first updated for personal and afterwardsupdated for household variables. If constraints regarding thepopulations margins are not metis raised by 1 and the procedure starts from the beginning. For thefollowing denote as starting weightfor fixed.
Adjustment and trimming for
The uncalibrated bootstrap weightfor the-thobservation is iteratively multiplied by a factor so that the projecteddistribution of the population matches the respective calibrationspecification,.For eachthe calibrated weights againstare computed as where the summation in the denominatorexpands over all observations which have the same value as observationfor the sociodemographic variable.If any weightsfall outside the rangethey will be recoded to the nearest of the two boundaries. The choice ofthe boundaries results from expert-based opinions and restricts thevariance of which has a positive effect on the sampling error. Thisprocedure represents a common form of weight trimming where very largeor small weights are trimmed in order to reduce variance in exchange fora possible increase in bias ((Potter1990),(Potter 1993)).
Averaging weights within households
Since the sociodemographic variablesinclude person-specific variables, the weightsresulting from the iterative multiplication can be unequal for membersof the same household. This can lead to inconsistencies between resultsprojected with household and person weights. To avoid suchinconsistencies each household member is assigned the mean of thehousehold weights. That is for each personin householdwithhousehold members, the weights are defined by This can result in losing thepopulation structure performed in the previous subsection.
Adjustment and trimming for
After adjustment for individual variables the weightsare updated for the set of household variablesaccording to a household convergence constraint parameter.The parametersrepresent the allowed deviation from the population margins using theweightscompared to,,.The updated weights are computed as with the summation in the denominatorranging over all householdswhich take on the same values foras observation.As described in the previous subsection the new weight are recoded ifthey exceed the intervaland set to the upper or lower bound, depending offalls below or above the interval respectively.
Convergence
For each adjustment and trimming step the factor,,is checked against convergence constraints for households,,or personal variables,wherecorresponds to either a household or personal variable. To be moreprecise for variables inthe constraints
and for variables inthe constraints
are verified, where the sum in thedenominator expands over all observations which have the same value forvariablesor.If these constraints hold true the algorithm reaches convergence,otherwiseis raised by 1 and the procedure repeats itself.
The above described calibration procedure is applied on each yearof EU-SILC separately,,thus resulting in so called calibrated bootstrap sample weights,for each yearand each household.
Variance estimation
Applying the previously described algorithms to EU-SILC data formultiple consecutive years,,yields calibrated bootstrap sample weightsfor each year.Using the calibrated bootstrap sample weights it is straight forward tocompute the standard error of a point estimatefor yearwithas the vector of observations for the variable of interest in the surveyandas the corresponding weight vector, with
with whereis the estimate ofin the yearusing the-thvector of calibrated bootstrap weights.
As already mentioned the standard error estimation for indicators inEU-SILC yields high quality results for NUTS1 or country level. Whenestimation indicators on regional or other sub-aggregate levels one isconfronted with point estimates yielding high variance.
To overcome this issue we propose to estimatefor 3, consecutive years using the calibrated bootstrap weights, thuscalculating,.For fixedone can apply a filter with equal filter weights on the time seriesto create
Doing this for all,,yields,.The standard error ofcan then be estimated with
with
Applying the filter over the time series of estimatedleads to a reduction of variance forsince the filter reduces the noise inand thus leading to a more narrow distribution for.
It should also be noted that estimating indicators from a survey withrotating panel design is in general not straight forward because of thehigh correlation between consecutive years. However with our approach touse bootstrap weights, which are independent from each other, we canbypass the cumbersome calculation of various correlations, and applythem directly to estimate the standard error.(Bauer et al. 2013) showed that using theproposed method on EU-SILC data for Austria the reduction in resultingstandard errors corresponds in a theoretical increase in sample size byabout25.Furthermore this study compared this method to the use of small areaestimation techniques and on average the use of bootstrap sample weightsyielded more stable results.
