- Notifications
You must be signed in to change notification settings - Fork3
Gustave: a User-oriented Statistical Toolkit for Analytical Variance Estimation
InseeFr/gustave
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Gustave (Gustave: a User-oriented Statistical Toolkit for Analytical Variance Estimation) is an R package that provides atoolkit for analytical variance estimation in survey sampling.
Apart from the implementation of standard variance estimators (Sen-Yates-Grundy, Deville-Tillé), its main feature is tohelp he methodologist produce easy-to-use variance estimationwrappers, where systematic operations (statistic linearization, domain estimation) are handled in a consistent and transparent way.
Theready-to-use variance estimation wrapperqvar(), adapted for common cases (e.g. stratified simple random sampling, non-response correction through reweighting in homogeneous response groups, calibration), is also included. The core functions of the package (e.g.define_variance_wrapper()) are to be used for more complex cases.
gustave is available on CRAN and can therefore be installed with theinstall.packages() function:
install.packages("gustave")However, if you wish to install the latest version of gustave, you can usedevtools::install_github() to install it directly from thegithub.com repository:
install.packages("devtools")devtools::install_github("martinchevalier/gustave")In this example, we aim at estimating the variance of estimators computed using simulated data inspired from the Information and communication technology (ICT) survey. This survey has the following characteristics:
- stratified one-stage sampling design;
- non-response correction through reweighting in homogeneous response groups based on economic sub-sector and turnover;
- calibration on margins (number of firms and turnover broken down by economic sub-sector).
The ICT simulated data files are shipped with the gustave package:
library(gustave)data(package = "gustave")? ict_surveyA variance estimation can be perform in a single call ofqvar():
qvar( # Sample file data = ict_sample, # Dissemination and identification information dissemination_dummy = "dissemination", dissemination_weight = "w_calib", id = "firm_id", # Scope scope_dummy = "scope", # Sampling design sampling_weight = "w_sample", strata = "strata", # Non-response correction nrc_weight = "w_nrc", response_dummy = "resp", hrg = "hrg", # Calibration calibration_weight = "w_calib", calibration_var = c(paste0("N_", 58:63), paste0("turnover_", 58:63)), # Statistic(s) and variable(s) of interest mean(employees) )The survey methodology description is however cumbersome when several variance estimations are to be conducted. As it does not change from one estimation to another, it could be defined once and for all and then re-used for all variance estimations.qvar() allows for this by defining a so-called variancewrapper, that is an easy-to-use function where the variance estimation methodology for the given survey is implemented and all the technical data used to do so included.
# Definition of the variance estimation wrapper precision_ictprecision_ict <- qvar( # As before data = ict_sample, dissemination_dummy = "dissemination", dissemination_weight = "w_calib", id = "firm_id", scope_dummy = "scope", sampling_weight = "w_sample", strata = "strata", nrc_weight = "w_nrc", response_dummy = "resp", hrg = "hrg", calibration_weight = "w_calib", calibration_var = c(paste0("N_", 58:63), paste0("turnover_", 58:63)), # Replacing the variables of interest by define = TRUE define = TRUE )# Use of the variance estimation wrapperprecision_ict(ict_sample, mean(employees))# The variance estimation wrapper can also be used on the survey fileprecision_ict(ict_survey, mean(speed_quanti))The variance estimationwrapper is much easier-to-use than a standard variance estimation function:
several statistics in one call (with optional labels):
precision_ict(ict_survey, "Mean internet speed in Mbps" = mean(speed_quanti), "Turnover per employee" = ratio(turnover, employees))domain estimation with where and by arguments
precision_ict(ict_survey, mean(speed_quanti), where = employees >= 50)precision_ict(ict_survey, mean(speed_quanti), by = division)# Domain may differ from one estimator to anotherprecision_ict(ict_survey, "Mean turnover, firms with 50 employees or more" = mean(turnover, where = employees >= 50), "Mean turnover, firms with 100 employees or more" = mean(turnover, where = employees >= 100))handy variable evaluation
# On-the-fly evaluation (e.g. discretization)precision_ict(ict_survey, mean(speed_quanti > 100))# Automatic discretization for qualitative (character or factor) variablesprecision_ict(ict_survey, mean(speed_quali))# Standard evaluation capabilitiesvariables_of_interest <- c("speed_quanti", "speed_quali")precision_ict(ict_survey, mean(variables_of_interest))Integration with %>% and dplyr
library(dplyr)ict_survey %>% precision_ict("Internet speed above 100 Mbps" = mean(speed_quanti > 100)) %>% select(label, est, lower, upper)
This software is anR package developed with theRStudio IDE and thedevtools,roxygen2 andtestthat packages. Much help was found inR packages andAdvanced R both written byHadley Wickham.
From the methodological point of view, this package is related to thePoulpe SAS macro (in French) developed at the French statistical institute. From the implementation point of view, some inspiration was found in theggplot2 package. The idea of developing an R package on this specific topic was stimulated by theicarus package and its author.
About
Gustave: a User-oriented Statistical Toolkit for Analytical Variance Estimation
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.