Movatterモバイル変換

Type:

Package

Title:

Calculate Consumer Expenditure Survey (CE) Annual Estimates

Version:

2.1.0

Description:

Provides functions and data files to help CE Public-Use Microdata (PUMD) users calculate annual estimated expenditure means, standard errors, and quantiles according to the methods used by the CE with PUMD. For more information on the CE please visithttps://www.bls.gov/cex. For further reading on CE estimate calculations please see the CE Calculation section of the U.S. Bureau of Labor Statistics (BLS) Handbook of Methods athttps://www.bls.gov/opub/hom/cex/calculation.htm. For further information about CE PUMD please visithttps://www.bls.gov/cex/pumd.htm.

License:

GPL (≥ 3)

URL:

https://arcenis-r.github.io/cepumd/,https://github.com/arcenis-r/cepumd

BugReports:

https://github.com/arcenis-r/cepumd/issues

Depends:

R (≥ 3.5.0)

Imports:

dplyr (≥ 1.0.0), janitor, purrr, readr, readxl, rlang,stringr, tidyr, tidyselect (≥ 1.2.0), utils

Suggests:

covr, knitr, rmarkdown, spelling, testthat (≥ 2.1.0)

VignetteBuilder:

knitr

Encoding:

UTF-8

Language:

en-US

RoxygenNote:

7.3.1

NeedsCompilation:

Packaged:

2024-03-17 14:55:20 UTC; arcen

Author:

Arcenis Rojas [aut, cre, cph]

Maintainer:

Arcenis Rojas <arcenis.rojas@gmail.com>

Repository:

CRAN

Date/Publication:

2024-03-18 17:50:02 UTC

cepumd: Calculate Consumer Expenditure Survey (CE) Annual Estimates

Description

logo

Author(s)

Maintainer: Arcenis Rojasarcenis.rojas@gmail.com [copyright holder]

Convert a CE heiarchical grouping file to a data frame

Description

A CE heiarchical grouping ('HG') file shows the levels ofaggregation for expenditure categories used to produce official CEexpenditure estimates. This function reads in a CE HG file for the givenyear and HG type as data frame.

Usage

ce_hg(year, survey, hg_zip_path = NULL, hg_file_path = NULL)

Arguments

year

A year between 1996 and the last year of available CE PUMD.

survey

The type of HG file; one of "interview", "diary", or"integrated". Accepted as a character or symbol.

hg_zip_path

The path to a zip file containing HG files downloadedfrom the CE website. The structure of the zip file must be exactly as it iswhen downloaded to be useful to this function.

hg_file_path

The path to a single HG file that has already beenextracted. If this argument is given 'hg_zip_path' is ignored.

Details

Interview and Diary HG files are available starting in 1997 and integratedfiles start in 1996. For consistency, this function and othercepumdfunctions only work with data starting in 1997.

The output will contain only expenditure UCCs and not UCCs relatedto household characteristics, income, assets, or liabilities. The scope ofthe functions in this package is limited to expenditures. Income, forexample, is imputed and calculation of income means goes through a differentprocess than do expenditure means. Please seeUser's Guide to Income Imputation in the CE

Value

A data frame containing the following columns:

level - hierarchical level of the expenditure category
title - the title of the expenditure category
ucc - the Universal Classification Code (UCC) for the expenditure category
survey - the survey instrument from which the data for a given UCC aresourced. This is most helpful when data for a type of expenditureare collected in both the Interview and the Diary.
factor - the factor by which to multiply the expenditure in the calculationof estimated means / medians

Examples

## Not run: # 'survey' can be entered as a stringce_hg(2016, "integrated", "hg-files.zip")# 'survey' can also be entered as a symbolce_hg(2016, integrated, "hg-files.zip")## End(Not run)

Calculate a CE weighted mean

Description

Calculate a weighted mean using the method used to produceofficial CE estimates.

Usage

ce_mean(ce_data)

Arguments

ce_data

A data frame containing at least a finlwt21 column,44 replicate weight columns (wtrep01-44), a cost column, and a surveyindicator column. All but the survey column must be numeric.

Value

A 1-row dataframe containing the following columns:

agg_exp - The estimated aggregate expenditure
mean_exp - The estimated mean expenditure
se - The estimated standard error of the estimated mean expenditure
cv - The coefficient of variation of the estimated mean expenditure

Note

Estimates produced using PUMD, which is topcoded by the CE and has somerecords suppressed to protect respondent confidentiality, will not match thepublished estimates released by the CE in most cases. The CE's publishedestimates are based on confidential data that are not topcoded nor haverecords suppressed. You can learn more atCE Protection of Respondent Confidentiality

Examples

# Download the HG file keeping the section for expenditures on utilities## Not run: utils_hg <- ce_hg(2017, interview) |>  ce_uccs("Utilities, fuels, and public services", uccs_only = FALSE)## End(Not run)# Download and prepare interview data## Not run: utils_interview <- ce_prepdata(  2017,  interview,  uccs = ce_uccs(utils_hg, "Utilities, fuels, and public services"),  zp = NULL,  integrate_data = FALSE,  hg = utils_hg,  bls_urbn)## End(Not run)# Calculate the mean expenditure on utilities## Not run: ce_mean(utils_interview)# Calculate the mean expenditure on utilities by urbanicity## Not run: utils_interview |>  tidyr::nest(-bls_urbn) |>  mutate(mean_utils = purrr::map(data, ce_mean)) |>  select(-data) |>  unnest(mean_utils)## End(Not run)

Prepare CE data for calculating an estimated mean or median

Description

Reads in the family characteristics (FMLI/-D) and expendituretabulation (MTBI/EXPD) files and merges the relevant data for calculating aweighted mean or median.

Usage

ce_prepdata(  year,  survey,  hg,  uccs,  ...,  int_zp = NULL,  dia_zp = NULL,  recode_variables = FALSE,  dict_path = NULL,  own_codebook = NULL)

Arguments

year

A year between 1997 and the last year of available CE PUMD.

survey

One of either interview, diary, or integrated as a character orsymbol.

hg

A data frame that has, at least, the title, level, ucc, andfactor columns of a CE HG file. Callingce_hg() will generate avalid HG file.

uccs

A character vector of UCCs corresponding to expenditurecategories in the hierarchical grouping (HG) for a given year and survey.

...

Variables to include in the dataset from the familycharacteristics file. This is intended to allow the user to calculateestimates for subsets of the data.

int_zp

String indicating the path of the Interview data zip file(s) ifalready stored. If the file(s) does not exist its corresponding zip file willbe stored in that path. The default isNULL which causes the zip fileto be stored in temporary memory during function operation.

dia_zp

Same asint_zp above, but for Diary data.

recode_variables

A logical indicating whether to recode all codedvariables except 'UCC' using the codes in the CE's excel dictionary which canbe downloaded from theCE Documentation Page

dict_path

A string indicating the path where the CE PUMD dictionaryis stored if already stored. If the file does not exist andrecode_variables = TRUE the dictionary will be stored in this path.The default isNULL which causes the zip file to be stored intemporary memory during function operation. Automatically changed toNULL if a valid input forown_codebook is given.

own_codebook

An optional data frame containing a user-defined codebookcontaining the same columns as the CE Dictionary "Codes " sheet. If the inputis not a data frame or does not have all of the required columns, thefunction will give an error message. See details for the required columns.

Details

CE microdata include 45 weights. The primary weight that is used forcalculating estimated means and medians is finlwt21. The 44 replicate weightsare computed using Balanced Repeated Replication (BRR) and are used forcalculating weighted standard errors.

"Months in scope" refers to the proportion of the data collection quarter forwhich a CU reported expenditures. For the Diary survey the months in scope isalways 3 because the expenditure data collected are meant to be reported forthe quarter in which they are collected. The Interview Survey, on the otherhand, is a quarterly, rolling, recall survey and the CU's report expendituresfor the 3 months previous to the month in which the data are collected. Forexample, if a CU was interviewed in February 2017, then they would beproviding data for November 2016, December 2016, and January 2017. If one iscalculating a weighted estimated mean for the 2017 calendar year, then onlythe January 2017 data would be "in scope."

CE data are reported quarterly, but the sum of the weights (finlwt21) isfor all CU's is meant to represent the total number of U.S. CU's for a givenyear. Since a calculating a calendar year estimate requires the use of 4quarters of data and the sum of the weights in each quarter equals thenumber of households in the U.S. for a given year, adding up the sums of theweights in the 4 quarters of data would yield a total number of householdsthat is approximately 4 times larger than the actual number of households inthe U.S. in the corresponding year.

Since some UCC's can appear in both surveys, for the purposes of integration,the CE has a source selection procedure by which to choose which source datawill be taken from for a given UCC. For example, of the 4 UCC's in the "Pets"category in 2017 two were sourced for publication from the Diary and two fromthe Interview. Please download the CE Source Selection Document for acomplete listing:https://www.bls.gov/cex/ce_source_integrate.xlsx.

Family characteristic variables added through "..." will be read in ascharacter data type.

Value

A data frame containing the following columns:

newid - A consumer unit (CU), or household, identifier
finlwt21 - CU weight variable
wtrep01 through wtrep44 - CU replicate weight variables (see details)
... - Any family characteristics variables that were kept
mo_scope - Months in scope (see details)
popwt - An adjusted weight meant to account for the fact that a CUsvalue of finlwt21 is meant to be representative of only 1 quarter ofdata (see details)
ucc - The UCC for a given expenditure
ref_yr - The year in which the corresponding expenditure occurred
ref_mo - The month in which the corresponding expenditure occurred
cost - The value of the expenditure (in U.S. Dollars)
survey - An indicator of which survey the data come from: "I" forInterview and "D" for Diary.

Examples

## Not run: # The following workflow will prepare a dataset for calculating integrated# pet expenditures for 2021 keep the "sex_ref" variable in the data to# potentially calculate means by sex of the reference person.# First generate an HG filemy_hg <- ce_hg(2021, integrated, "CE-HG-Inter-2021.txt")# Store a vector of UCC's in the "Pets" categorypet_uccs <- ce_uccs(my_hg, "Pets")# Store the diary data (not run)pets_dia <- ce_prepdata(  year = 2021,  survey = integrated,  uccs = pet_uccs,  integrate_data = FALSE,  hg = my_hg,  dia_zip = "diary21.zip"  sex_ref)## End(Not run)

Calculate a CE weighted quantiles

Description

Calculate a CE weighted quantiles

Usage

ce_quantiles(ce_data, probs = 0.5)

Arguments

ce_data

A data frame containing at least a finlwt21 column and a costcolumn. Both columns must be numeric.

probs

A numeric vector of probabilities between 0 and 1 for which tocompute quantiles. Default is 0.5 (median).

Value

A two-column data frame in which the first column contains theprobabilities for which quantiles were calculated and their correspondingquantiles in the second column.

Examples

## Not run: # Download the HG file keeping the section for expenditures on utilitiesutils_hg <- ce_hg(2017, interview) |>  ce_uccs("Utilities, fuels, and public services", uccs_only = FALSE)# Download and prepare interview datautils_interview <- ce_prepdata(  2017,  interview,  uccs = ce_uccs(utils_hg, "Utilities, fuels, and public services"),  zp = NULL,  integrate_data = FALSE,  hg = utils_hg,  bls_urbn)# Calculate the 25%, 50%, and 75% utilities expenditure quantilesce_quantiles(utils_interview)# Calculate the 25%, 50%, and 75% utilities expenditure quantiles by# urbanicityutils_interview |>  tidyr::nest(-bls_urbn) |>  mutate(quant_utils = purrr::map(data, ce_quantiles, c(0.25, 0.5, 0.75))) |>  select(-data) |>  unnest(quant_utils)## End(Not run)

Find UCCs for expenditure categories

Description

Find UCCs for expenditure categories

Usage

ce_uccs(hg, expenditure = NULL, ucc_group = NULL, uccs_only = TRUE)

Arguments

hg

A data frame that has, at least, the title, level, and ucccolumns of a CE HG file.

expenditure

A string that is an expenditure category contained in aCE HG file (exact match required). Either expenditure or ucc_group isrequired. The default is NULL.

ucc_group

A string indicating an expenditure category by UCC group ina CE HG file (exact match required). Either expenditure or ucc_group isrequired. The default is NULL.

uccs_only

A logical indicating whether to return only the expenditurecategory's component ucc's. If TRUE (default), a vector of UCC's will bereturned. If FALSE, a dataframe will be returned containing the section ofthe HG file containing the expenditure category and its component sub-categories

Details

If both a valid expenditure and valid ucc_group are input, ucc_groupwill be used.

Value

A vector of Universal Classification Codes (UCC's) corresponding tothe lowest hierarchical level for that category.

Examples

## Not run: # First generate an HG filemy_hg <- ce_hg(2021, interview, hg_file_path = "CE-HG-Inter-2021.txt")# Store a vector of UCC's in the "Pets" categorypet_uccs <- ce_uccs(my_hg, "Pets")pet_uccs# [1] "610320" "620410" "620420"## End(Not run)

Generate tables of the necessary survey data files

Description

Generate tables of the necessary survey data files

Usage

get_survey_files(year, survey, file_yrs, qtrs, zp_file)

Arguments

year

A year between 1996 and the last year of available CE PUMD.

survey

One of either interview, diary, or integrated as a character orsymbol.

file_yrs

The substrings of years for which to pull data, i.e., forsome years files have to be pulled from across different files.

qtrs

The quarters to be included in the analysis for a given year.

zp_file

Character indicating the zip file containing the CE PUMD for agiven year

Details

This is a hidden file called only by exported package functions.

Read in and modify EXPD files

Description

Read in and modify EXPD files

Usage

read.expd(fp, zp, year, uccs, integrate_data, hg)

Arguments

fp

File to extract from zip file

zp

Zip file path

year

Year

uccs

Vector of UCC's to filter for

integrate_data

Whether to prepare data for integrated estimates

hg

Hierarchical grouping data

Details

This is a hidden file called only by exported package functions.

Read in and modify FMLD files

Description

Read in and modify FMLD files

Usage

read.fmld(fp, zp, ...)

Arguments

fp

File to extract from zip file

zp

Zip file path

...

<dynamic-dots> Additional variables to keep(intended for grouping)

Details

This is a hidden file called only by exported package functions.

Read in and modify FMLI files

Description

Read in and modify FMLI files

Usage

read.fmli(fp, zp, year, ...)

Arguments

fp

File to extract from zip file

zp

Zip file path within ce_dir

year

Year

...

<dynamic-dots> Additional variables to keep(intended for grouping)

Details

This is a hidden file called only by exported package functions.

Read in and modify MTBI files

Description

Read in and modify MTBI files

Usage

read.mtbi(fp, zp, year, uccs, integrate_data, hg)

Arguments

fp

File to extract from zip file

zp

Zip file path

year

Year

uccs

Vector of UCC's to filter for

integrate_data

Whether to prepare data for integrated estimates

hg

Hierarchical grouping data

Details

This is a hidden file called only by exported package functions.

Recode variables in interview and diary data

Description

Recode variables in interview and diary data

Usage

recode_ce_variables(srvy_data, code_file, srvy)

Arguments

srvy_data

A data frame containing either Interview or Diary data thathas been prepped

code_file

A dataframe containing variable names, codes,code descriptions, and other required columns for recoding variables

srvy

The survey instrument to be recoded (this is for filteringthe codebook)

Details

This is a hidden file called only by exported package functions.

Movatterモバイル変換

cepumd: Calculate Consumer Expenditure Survey (CE) Annual Estimates

Description

Author(s)

See Also

Convert a CE heiarchical grouping file to a data frame

Description

Usage

Arguments

Details

Value

Examples

Calculate a CE weighted mean

Description

Usage

Arguments

Value

Note

See Also

Examples

Prepare CE data for calculating an estimated mean or median

Description

Usage

Arguments

Details

Value

Examples

Calculate a CE weighted quantiles

Description

Usage

Arguments

Value

See Also

Examples

Find UCCs for expenditure categories

Description

Usage

Arguments

Details

Value

Examples

Generate tables of the necessary survey data files

Description

Usage

Arguments

Details

Read in and modify EXPD files

Description

Usage

Arguments

Details

Read in and modify FMLD files

Description

Usage

Arguments

Details

Read in and modify FMLI files

Description

Usage

Arguments

Details

Read in and modify MTBI files

Description

Usage

Arguments

Details

Recode variables in interview and diary data

Description

Usage

Arguments

Details