Movatterモバイル変換

Title:

Analyzing the Survey of Consumer Finances

Version:

1.0.5

Description:

Analyze public-use micro data from the Survey of Consumer Finances. Provides tools to download prepared data files, construct replicate-weighted multiply imputed survey designs, compute descriptive statistics and model estimates, and produce plots and tables. Methods follow design-based inference for complex surveys and pooling across multiple imputations. See the package website and the code book for background.

License:

MIT + file LICENSE

URL:

https://github.com/jncohen/scf

BugReports:

https://github.com/jncohen/scf/issues

Depends:

R (≥ 3.6)

Imports:

ggplot2, haven, httr, rlang, stats, survey, utils

Suggests:

dplyr, hexbin, kableExtra, knitr, mitools, rmarkdown,markdown, testthat (≥ 3.0.0)

VignetteBuilder:

knitr, markdown, rmarkdown

Config/testthat/edition:

Encoding:

UTF-8

RoxygenNote:

7.3.3

NeedsCompilation:

Packaged:

2025-11-20 20:45:08 UTC; jcohen

Author:

Joseph Cohen [aut, cre]

Maintainer:

Joseph Cohen <joseph.cohen@qc.cuny.edu>

Repository:

CRAN

Date/Publication:

2025-11-20 21:20:02 UTC

Analyzing Survey of Consumer Finances Public-Use Microdata

Description

This package provides functions to analyze the U.S. Federal Reserve'sSurvey of Consumer Finances (SCF) public-use microdata. It encapsulatesthe SCF’s multiply-imputed, replicate-weighted structure in a customobject class (scf_mi_survey) and supports estimation of population-levelstatistics, including univariate and bivariate distributions,hypothesis tests, data visualizations, and regression models.

Designed for generalist analysts, the package assumes familiarity withstandard statistical methods but not with the complexities ofmultiply-imputed or survey-weighted data. All functions prioritizetransparency, reproducibility, and pedagogical clarity.

Methodological Background

The SCF is one of the most detailed and methodologically rigorous sources ofdata on U.S. household finances. It is nationally representative, includes anoversample of high-wealth households and households in predominantly Blackcommunities, and provides multiply-imputed estimates for item nonresponse.These features increase the analytical value of the data set but alsointroduce methodological complexity. Valid inference requires attention to:

Survey Weights: The SCF employs a dual-frame, stratified, and clusteredprobability sample. Analysts must apply the provided sampling weights toproduce population-representative estimates.
Replicate Weights: Each observation is associated with 999 replicate weights,generated using a custom replication method developed by the FederalReserve. These are used to estimate sampling variance.
Multiple Imputation: The SCF uses multiple imputation to address item nonresponse,providing five implicates per household. Estimates must be pooled acrossimplicates to obtain valid point estimates and standard errors.

Thescf package provides a structured, user-friendly interface for handlingthese design complexities, enabling applied researchers and generalistanalysts to conduct principled and reproducible analysis of SCF microdatausing familiar statistical workflows.

Package Architecture and Workflow

This section recommends a sequence of operations enacted through the package'sfunctions. For an in-depth discussion of the methodological considerationsinvolved in these functions' formulation, see Cohen (2025).

Data Acquisition: Download the data from Federal Reserve servers to your working directory usingscf_download().
Data Loading: Load the data into R usingscf_load(). This function returns anscf_mi_survey object (described below).
Data Wrangling: Usescf_update() to modify the data, orscf_subset() to filter it. These functions return newscf_mi_survey objects.
Descriptive Statistics: Compute univariate and bivariate statistics using functions likescf_mean(),scf_median(),scf_percentile(),scf_freq(),scf_xtab(), andscf_corr().
Basic Inferential Tests: Conduct hypothesis tests usingscf_ttest() for means andscf_prop_test() for proportions.
Regression Modeling: Fit regression models usingscf_ols() for linear regression,scf_logit() for logistic regression, andscf_glm() for generalized linear models.
Data Visualization: Create informative visualizations usingscf_plot_dist() for distributions,scf_plot_cbar() andscf_plot_bbar() for categorical data,scf_plot_smooth() for smoothers, andscf_plot_hex() for hexbin plots.
Diagnostics and Infrastructure: Usescf_MIcombine() to pool results across implicates.

Core Data Object and Its Structure

This suite of functions operate from a custom object class,scf_mi_survey,which is created byscf_design() viascf_load(). Specifically, theobject is a structured list containing the elements:

mi_design: A list of fivesurvey::svrepdesign() objects (one per implicate)
year: Year of survey
n_households: Estimated number of U.S. households in that year, per data from the Federal Reserve Economic Data (FRED) series TTLHH, accessed 6/17/2025.

Imputed Missing Data

The SCF addresses item nonresponse using multiple imputation(see Kennickell 1998). This procedure generates five completed data sets,each containing distinct but plausible values for the missing entries. Themethod applies a predictive model to the observed data, simulates variationin both model parameters and residuals, and generates five independentestimates for each missing value. These completed data sets—calledimplicates—reflect both observed relationships and the uncertainty inestimating them. Seescf_MIcombine() for details.

Mock Data for Testing

A mock SCF dataset (scf2022_mock_raw.rds) is bundled in⁠inst/extdata/⁠ forinternal testing purposes. It is a structurally validscf_mi_survey objectcreated by retaining only the first ~200 rows per implicate and only variablesused in examples and tests.

This object is intended solely for package development and documentation rendering.It isnot suitable for analytical use or valid statistical inference.

Theming and Visual Style

All built-in graphics follow a common aesthetic set byscf_theme(). Usersmay modify the default theme by callingscf_theme() explicitly within theirscripts. Seescf_theme() documentation for customization options.

Pedagogical Design

The package is designed to support instruction in advanced methods courseson complex survey analysis and missing data. It promotes pedagogicaltransparency through several features:

Each implicate’s design object is accessible viascf_mi_survey$mi_design[[i]]
Raw implicate-level data can be viewed directly throughscf_mi_survey$mi_design[[i]]$variables
Users can execute analyses on individual implicates or combine them using Rubin’s Rules
Key functions implement design-based estimation strategies explicitly, such as replicate-weight variance estimation
Minimal abstraction is used, so each step remains visible and tractable

These features allow instructors to demonstrate how survey weights, replicatedesigns, and multiple imputation contribute to final results. Students canfollow the full analytic path from raw inputs to pooled estimates usingtransparent, inspectable code and data structures.

Author(s)

Joseph N. Cohen, CUNY Queens College

References

Barnard J, Rubin DB. Small-sample degrees of freedom with multiple imputation.doi:10.1093/biomet/86.4.948.

Bricker J, Henriques AM, Moore KB. Updates to the sampling of wealthy families in the Survey of Consumer Finances.Finance and Economics Discussion Series 2017-114.https://www.federalreserve.gov/econres/scfindex.htm

Kennickell AB, McManus DA, Woodburn RL. Weighting design for the 1992 Survey of Consumer Finances.U.S. Federal Reserve.https://www.federalreserve.gov/Pubs/OSS/oss2/papers/weight92.pdf

Kennickell AB. Multiple imputation in the Survey of Consumer Finances.Statistical Journal of the IAOS 33(1):143-151.doi:10.3233/SJI-160278.

Little RJA, Rubin DB. Statistical analysis with missing data.ISBN: 9780470526798.

Lumley T. survey: Analysis of complex survey samples. R package version 4.1-1.https://CRAN.R-project.org/package=survey

Lumley T. Analysis of complex survey samples.doi:10.18637/jss.v009.i08.

Lumley T. Complex surveys: A guide to analysis using R. ISBN: 9781118210932.

U.S. Federal Reserve. Codebook for 2022 Survey of Consumer Finances.https://www.federalreserve.gov/econres/scfindex.htm

Generic S3 Method: AIC.scf_model_result

Description

Extracts the mean of the Akaike Information Criterion (AIC) across implicates.

Usage

## S3 method for class 'scf_model_result'AIC(object, k = 2, ...)

Arguments

object

An object of class 'scf_model_result'.

k

The penalty term (2 for AIC, log(n) for BIC). Defaults to 2.

...

Not used.

Value

The numeric mean AIC pooled across implicates.

Extract Standard Errors from Pooled SCF Model Results

Description

Generic extractor for pooled standard errors from objects of class"scf_MIresult".

Usage

SE(object, ...)## S3 method for class 'scf_MIresult'SE(object, ...)

Arguments

object

A pooled result object of class"scf_MIresult".

...

Not used.

Value

A numeric vector of standard errors.

Generic S3 Method: coef.scf_model_result

Description

Extracts the pooled coefficient estimates from the model result.

Usage

## S3 method for class 'scf_model_result'coef(object, ...)

Arguments

object

An object of class 'scf_model_result'.

...

Not used.

Value

A named numeric vector of pooled coefficient estimates.

Generic S3 Method: formula.scf_model_result

Description

Extracts the formula used to fit a multiply-imputed SCF regression model.

Usage

## S3 method for class 'scf_model_result'formula(x, ...)

Arguments

x

An object of class 'scf_model_result'.

...

Not used.

Value

A model formula object.

Generic S3 Method: predict.scf_model_result

Description

Calculates predicted values for a new data set (or the original data)by pooling predictions across all multiply-imputed models.

Usage

## S3 method for class 'scf_model_result'predict(object, newdata, type = "link", ...)

Arguments

object

An object of class 'scf_model_result'.

newdata

A data frame containing variables for which to predict.If missing, predictions are made on the original data (from the first implicate).

type

Character string specifying the type of prediction.Options are "link" (default, linear predictor) or "response" (fitted values on the outcome scale).

...

Additional arguments passed to predict.glm.

Value

A numeric vector of pooled predicted values (mean prediction across implicates).

Generic S3 Method: residuals.scf_model_result

Description

Extracts the residuals vector from the first underlying implicate model.

Usage

## S3 method for class 'scf_model_result'residuals(object, ...)

Arguments

object

An object of class 'scf_model_result'.

...

Not used.

Details

In multiply-imputed data, pooled residuals are typically not calculated.This returns the residuals from the primary (first) implicate model,which is suitable for simple diagnostics.

Value

A numeric vector of residuals.

Combine Estimates Across SCF Implicates Using Rubin's Rules

Description

This function implementsRubin’s Rules for combining multiply-imputedsurvey model results in thescf package. It pools point estimates,variance-covariance matrices, and degrees of freedom across the SCF’sfive implicates.

Usage

scf_MIcombine(results, variances, call = sys.call(), df.complete = Inf)

Arguments

results

A list of implicate-level model outputs. Each element must be a named numeric vectoror an object with methods forcoef() andvcov(). Typically generated internally by modeling functions.

variances

Optional list of variance-covariance matrices. If omitted, extracted usingvcov().

call

Optional. The originating function call. Defaults tosys.call().

df.complete

Optional degrees of freedom for the complete-data model. Used for small-samplecorrections. Defaults toInf, assuming large-sample asymptotics.

Value

An object of class"scf_MIresult" with components:

coefficients: Pooled point estimates across implicates.
variance: Pooled variance-covariance matrix.
df: Degrees of freedom for each parameter, adjusted using Barnard-Rubin formula.
missinfo: Estimated fraction of missing information for each parameter.
nimp: Number of implicates used in pooling.
call: Function call recorded for reproducibility.

Supportscoef(),SE(),confint(), andsummary() methods.

Scope

scf_MIcombine() is used formodel-based analyses such asscf_ols(),scf_glm(), andscf_logit(), where each implicate’s modeloutput includes both parameter estimates and replicate-weighted samplingvariances.

Descriptive estimators—functions such asscf_mean(),scf_percentile(),andscf_median()—donot apply Rubin’s Rules. They follow the Survey ofConsumer Finances convention used in the Federal Reserve Board’s SAS macro,combining (i) the replicate-weight sampling variance from implicate 1 with(ii) the between-implicate variance scaled by (m + 1)/m.

This separation is intentional: descriptive statistics inscf aim toreproduce the Survey of Consumer Finances' published standard errors,whereas model-based functions use Rubin's Rules.

Implementation

scf_MIcombine() pools a set of implicate-level estimates and theirassociated variance-covariance matrices using Rubin’s Rules.

This includes:

Calculation of pooled point estimates
Total variance from within- and between-imputation components
Degrees of freedom via Barnard-Rubin method
Fraction of missing information

Inputs are typically produced by modeling functions such asscf_ols(),scf_glm(), orscf_logit(), which return implicate-level coefficientvectors and variance-covariance matrices.

This function is primarily used internally, but may be called directly byadvanced users constructing custom estimation routines from implicate-levelresults.

Details

The SCF provides five implicates per survey wave, each a plausible versionof the population under a specific missing-data model. Analysts conduct thesame statistical procedure on each implicate, producing a set of fiveestimates Q_1, Q_2, ..., Q_5. These are then combined using Rubin’sRules, a procedure to combine results across these implicates with anattempt to account for:

Within-imputation variance: Uncertainty from complex sample design
Between-imputation variance: Uncertainty due to missing data

For a scalar quantity Q, the pooled estimate andtotal variance are calculated as:

\bar{Q} = \frac{1}{M} \sum Q_m

\bar{U} = \frac{1}{M} \sum U_m

B = \frac{1}{M - 1} \sum (Q_m - \bar{Q})^2

T = \bar{U} + \left(1 + \frac{1}{M} \right) B

Where:

M is the number of implicates (typically 5 for SCF)
Q_m is the estimate from implicate m
U_m is the sampling variance of Q_m, accounting for replicate weights and design

The total variance T reflects both within-imputation uncertainty (sampling error)and between-imputation uncertainty (missing-data imputation).

The standard error of the pooled estimate is \sqrt{T}. Degrees of freedom areadjusted using the Barnard-Rubin method:

\nu = (M - 1) \left(1 + \frac{\bar{U}}{(1 + \frac{1}{M}) B} \right)^2

The fraction of missing information (FMI) is also reported:it reflects the proportion of total variance attributable to imputation uncertainty.

References

Barnard J, Rubin DB. Small-sample degrees of freedom with multiple imputation.doi:10.1093/biomet/86.4.948.

Little RJA, Rubin DB. Statistical analysis with missing data.ISBN: 9780470526798.

U.S. Federal Reserve. Codebook for 2022 Survey of Consumer Finances.https://www.federalreserve.gov/econres/scfindex.htm

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("MIcombine_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Pool simple survey mean for mock dataoutlist <- lapply(scf2022$mi_design, function(d) survey::svymean(~I(age >= 65), d))pooled  <- scf_MIcombine(outlist)     # vcov/coef extracted automaticallySE(pooled); coef(pooled)unlink(td, recursive = TRUE, force = TRUE)

Activate SCF Plot Theme

Description

Sets the defaultggplot2 theme toscf_theme(). Call this functionmanually in your session or script to apply the style globally.

Usage

scf_activate_theme()

Value

No return value, called for side effects.

Estimate Correlation Between Two Continuous Variables in SCF Microdata

Description

This function estimates the linear association between two continuous variablesusing Pearson's correlation. Estimates are computed within each implicate and thenpooled across implicates to account for imputation uncertainty.

Usage

scf_corr(scf, var1, var2)

Arguments

scf

Anscf_mi_survey object, created byscf_load()

var1

One-sided formula specifying the first variable

var2

One-sided formula specifying the second variable

Details

Computes the Pearson correlation coefficient between two continuous variables usingmultiply-imputed, replicate-weighted SCF data. Returns pooled estimates and standard errorsusing Rubin’s Rules.

Value

An object of classscf_corr, containing:

results: Data frame with pooled correlation estimate, standard error,t-statistic, degrees of freedom, p-value, and minimum/maximum values across implicates.
imps: Named vector of implicate-level correlations.
aux: Variable names used in the estimation.

Implementation

Inputs: anscf_mi_survey object and two one-sided formulas (e.g.,~income)
Correlation computed usingcor(..., use = "complete.obs") within each implicate
Rubin’s Rules applied to pool results across implicates

Interpretation

Pearson’s⁠$r$⁠ ranges from -1 to +1 and reflects the strength anddirection of a linear bivariate association between two continuous variables.Values near 0 indicate weak linear association. Note that the operation issensitive to outliers and does not capture nonlinear relationships nor adjustfor covariates.

Statistical Notes

Correlation is computed within each implicate using complete cases. Rubin’sRules are applied manually to pool estimates and calculate total variance.This function does not usescf_MIcombine(), which is intendedfor vector-valued estimates; direct pooling is more appropriate forscalar statistics like correlation coefficients.

Note

Degrees of freedom are approximated using a simplified Barnard–Rubinadjustment, since correlation is a scalar quantity. Interpret cautiously withfew implicates.

Examples

# Ignore this code block.  It loads mock data for CRAN.# In your analysis, download and load your data using the# functions `scf_download()` and `scf_load()`td <- tempfile("corr_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# EXAMPLE IMPLEMENTATION OF `scf_corr()`:corr <- scf_corr(scf2022, ~income, ~networth)print(corr)summary(corr)# Ignore the code below.  It is for CRAN:unlink(td, recursive = TRUE, force = TRUE)

Construct SCF Core Data Object

Description

Stores SCF microdata as five implicate-specific designs created bysurvey::svrepdesign().

Usage

scf_design(design, year, n_households)

Arguments

design

A list of fivesurvey::svrepdesign() objects (one per implicate).

year

Numeric SCF survey year (e.g., 2022).

n_households

Numeric total U.S. households represented inyear.

Details

This is a helper function for thescf_download() andscf_load() functions.Wrap a list of replicate-weighted survey designs into an "scf_mi_survey".Typically called byscf_load(). The function creates a complex object thatincludes the Survey's five implicates, along with the year and anestimate of the total U.S. households in that year.

Value

An object of class "scf_mi_survey" with:

mi_design: List of replicate-weighted designs (one per implicate).
year: SCF survey year.
n_households: Estimated number of U.S. households.

Examples

# Ignore this code block.  It loads mock data for CRAN.# In your analysis, download and load your data using the# functions `scf_download()` and `scf_load()`td <- tempfile("design_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# EXAMPLE IMPLEMENTATION: Construct scf_mi_survey objectobj <- scf_design(  design = scf2022$mi_design,  year = 2022,  n_households = attr(scf2022, "n_households"))class(obj)length(obj$mi_design)# Ignore the code below.  It is for CRAN:unlink(td, recursive = TRUE, force = TRUE)

Download and Prepare SCF Microdata for Local Analysis

Description

Downloads SCF public-use microdata from official servers. For each year,this function retrieves five implicates, merges them with replicate weightsand official summary variables, and saves them as.rds files ready for usewithscf_load().

Usage

scf_download(years = seq(1989, 2022, 3), overwrite = FALSE, verbose = TRUE)

Arguments

years

Integer vector of SCF years to download (e.g.,c(2016, 2019)). Must be triennial from 1989 to 2022.

overwrite

Logical. IfTRUE, re-download and overwrite existing.rds files. Default isFALSE.

verbose

Logical. IfTRUE, display progress messages. Default isTRUE.

Value

These files are designed to be loaded using scf_load(), which wraps them into replicate-weighted designs.

Implementation

This function downloads from official servers three types of files for eachyear:

five versions of the dataset (one per implicate), each stored as a separate data frame in a list
a table of replicate weights, and
a data table with official derivative variables

These tables are collected to a list and saved to an.rds format file inthe working directory. By default, the function downloads all availableyears.

Details

The SCF employs multiply-imputed data sets to address unit-level missingdata. Each household appears in one of five implicates. This function ensuresall implicates are downloaded, merged, and prepared for downstream analysisusingscf_load(),scf_design(), and thescf workflow.

References

U.S. Federal Reserve. Codebook for 2022 Survey of Consumer Finances.https://www.federalreserve.gov/econres/scfindex.htm

Examples

if (FALSE) {  # Download and prepare SCF data for 2022  td <- tempfile("download_")  dir.create(td)  old <- getwd()  setwd(td)  scf_download(2022)  # Load into a survey design object  scf2022 <- scf_load(2022, data_directory = td)  # Cleanup for package check  unlink(td, recursive = TRUE, force = TRUE)  setwd(old)}

Estimate the Frequencies of a Discrete Variable from SCF Microdata

Description

This function estimates the relative frequency (proportion) of each categoryin a discrete variable from the SCF public-use microdata. Use this functionto discern the univariate distribution of a discrete variable.

Usage

scf_freq(scf, var, by = NULL, percent = TRUE)

Arguments

scf

Ascf_mi_survey object created byscf_load(). Must contain five replicate-weighted implicates.

var

A one-sided formula specifying a categorical variable (e.g.,~racecl).

by

Optional one-sided formula specifying a discrete grouping variable (e.g.,~own).

percent

Logical. IfTRUE (default), scales results and standard errors to percentages.

Details

Computes weighted proportions and standard errors for a discrete variablein multiply-imputed SCF data, optionally stratified by a grouping variable.Proportions and standard errors are computed separately within eachimplicate usingsvymean(), then averaged acrossimplicates using SCF-recommended pooling logic. Group-wise frequencies aresupported, but users may find the features ofscf_xtab() to be more useful.

Value

A list of class"scf_freq" with:

results: Pooled category proportions and standard errors, by group if specified.
imps: A named list of implicate-level proportion estimates.
aux: Metadata about the variable and grouping structure.

Details

Proportions are estimated within each implicate usingsurvey::svymean(),then pooled using the standard MI formula for proportions. When a groupingvariable is provided viaby, estimates are produced separately for eachgroup-category combination. Results may be scaled to percentages using thepercent argument.

Estimates are pooled using the standard formula:

The mean of implicate-level proportions is the point estimate
The standard error reflects both within-implicate variance and across-implicate variation

Unlike means or model parameters, category proportions do not use Rubin's full combination rules (e.g., degrees of freedom).

Examples

# Ignore this code block.  It loads mock data for CRAN.# In your analysis, download and load your data using the# functions `scf_download()` and `scf_load()`td <- tempfile("freq_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# EXAMPLE IMPLEMENTATION: Proportions of homeownershipscf_freq(scf2022, ~own)# EXAMPLE IMPLEMENTATION: Cross-tabulate education by homeownershipscf_freq(scf2022, ~own, by = ~edcl)# Ignore the code below.  It is for CRAN:unlink(td, recursive = TRUE, force = TRUE)

Estimate Generalized Linear Model from SCF Microdata

Description

Estimates generalized linear models (GLMs) with SCF public-use microdata.Use this function when modeling outcomes that follow non-Gaussiandistributions (e.g., binary or count data). Rubin's Rules are used to combineimplicate-level coefficient and variance estimates.

GLMs are performed across SCF implicates usingsvyglm() and returnspooled coefficients, standard errors, z-values, p-values, and fit diagnosticsincluding AIC and pseudo-R-Squared when applicable.

Usage

scf_glm(object, formula, family = binomial())

Arguments

object

Ascf_mi_survey object, typically created usingscf_load() andscf_design().

formula

A valid model formula, e.g.,rich ~ age + factor(edcl).

family

A GLM family object such asbinomial(),poisson(), orgaussian(). Defaults tobinomial().

Value

An object of class"scf_glm" and"scf_model_result" with:

results: A data frame of pooled coefficients, standard errors, z-values, p-values, and significance stars.
fit: A list of fit diagnostics including mean and SD of AIC; for binomial models, pseudo-R2 and its SD.
models: A list of implicate-levelsvyglm model objects.
call: The matched function call.

Implementation

This function fits a GLM to each implicate in ascf_mi_survey objectusingsurvey::svyglm(). The user specifies a model formula and a valid GLMfamily (e.g.,binomial(),poisson(),gaussian()). Coefficients andvariance-covariance matrices are extracted from each implicate and pooledusing Rubin's Rules.

Details

Generalized linear models (GLMs) extend linear regression to accommodatenon-Gaussian outcome distributions. The choice offamily determines thelink function and error distribution. For example:

binomial() fits logistic regression for binary outcomes
poisson() models count data
gaussian() recovers standard OLS behavior

Model estimation is performed independently on each implicate usingsvyglm() with replicate weights. Rubin's Rules are used to pool coefficientestimates and variance matrices. For the pooling procedure, seescf_MIcombine().

Internal Suppression

For CRAN compliance and to prevent diagnostic overload during package checks,this function internally wraps each implicate-level model call insuppressWarnings().This suppresses the known benign warning:

"non-integer #successes in a binomial glm!"

which arises from using replicate weights withfamily = binomial(). This suppressiondoes not affect model validity or inference. Users wishing to inspect warnings canrunsurvey::svyglm() directly on individual implicates viamodel$models[[i]].

For further background, see:https://stackoverflow.com/questions/12953045/warning-non-integer-successes-in-a-binomial-glm-survey-packages

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("glm_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Run logistic regressionmodel <- scf_glm(scf2022, own ~ age + factor(edcl), family = binomial())summary(model)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Extract Implicate-Level Estimates from SCF Results

Description

Returns implicate-level outputs from SCF result objects produced by functionsin thescf suite. Supports result objects containing implicate-level dataframes,svystat summaries, orsvyglm model fits.

Usage

scf_implicates(x, long = FALSE)

Arguments

x

A result object containing implicate-level estimates (e.g., from scf_mean, scf_ols).

long

Logical. If TRUE, returns stacked data frame. If FALSE, returns list.

Value

A list of implicate-level data frames, or a single stacked data frame iflong = TRUE.

Usage

This function allows users to inspect how estimates vary across the SCF’s five implicates,which is important for diagnostics, robustness checks, and transparent reporting.

For example:

scf_implicates(scf_mean(scf2022, ~income))scf_implicates(scf_ols(scf2022, networth ~ age + income), long = TRUE)

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("implicates_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Extract implicate-level resultsout <- scf_freq(scf2022, ~own)scf_implicates(out, long = TRUE)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Internal Import Declarations

Description

Declares functions from base packages used in nonstandard evaluation ordynamic contexts acrossscf package functions. Ensures all used basefunctions are properly registered in the NAMESPACE.

Load SCF Data as Multiply-Imputed Survey Designs

Description

Converts SCF.rds files prepared byscf_download() intoscf_mi_surveyobjects. Each object wraps five implicates per year in replicate-weighted,multiply-imputed survey designs suitable for use withscf_ functions.

Usage

scf_load(min_year, max_year = min_year, data_directory = ".")

Arguments

min_year

Integer. First SCF year to load (1989–2022, divisible by 3).

max_year

Integer. Last SCF year to load. Defaults tomin_year.

data_directory

Character. Directory containing.rds files or afull path to a single.rds file. Defaults to the current working directory".".For examples and tests, usetempdir() to avoid leaving files behind.

Value

Invisibly returns ascf_mi_survey (or named list if multiple years).Attributes:mock (logical),year,n_households.

Implementation

Provide a year or range and either (1) a directory containing⁠scf<year>.rds⁠files, or (2) a full path to a single.rds file. Files must contain fiveimplicate data frames with columnswgt andwt1b1..wt1bK (typically K=999).

Examples

# Using with CRAN-compliant mock data:# Use functions `scf_download()` and `scf_load()`td <- tempfile("load_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Estimate Logistic Regression Model using SCF Microdata

Description

Fits a replicate-weighted logistic regression model to multiply-imputed SCF data,returning pooled coefficients or odds ratios with model diagnostics. Use thisfunction to model a binary variable as a function of predictors.

Usage

scf_logit(object, formula, odds = TRUE, ...)

Arguments

object

Ascf_mi_survey object created withscf_load() andscf_design().

formula

A model formula specifying a binary outcome and predictors, e.g.,rich ~ age + factor(edcl).

odds

Logical. IfTRUE (default), exponentiates coefficient estimates to produce odds ratios for interpretability.

...

Additional arguments passed toscf_glm().

Value

An object of class"scf_logit" and"scf_model_result" with:

results: A data frame of pooled estimates (log-odds or odds ratios), standard errors, and test statistics.
fit: Model diagnostics including AIC and pseudo-R-Squared (for binomial family).
models: List of implicate-levelsvyglm model objects.
call: The matched function call.

Details

This function internally callsscf_glm() withfamily = binomial() and optionallyexponentiates pooled log-odds to odds ratios.

Logistic regression models the probability of a binary outcome using thelogit link.

Coefficients reflect the change in log-odds associated with a one-unit changein the predictor.

Whenodds = TRUE, the coefficient estimates and standard errors aretransformed from log-odds to odds ratios and approximate SEs.

Warning

When modeling binary outcomes using survey-weighted logistic regression,users may encounter the warning:

"non-integer #successes in a binomial glm!"

This message is benign. It results from replicate-weighted survey designswhere the implied number of "successes" is non-integer. The model isestimated correctly. Coefficients are valid and consistent withmaximum likelihood.

For background, see:https://stackoverflow.com/questions/12953045/warning-non-integer-successes-in-a-binomial-glm-survey-packages

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("logit_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Run logistic regressionmodel <- scf_logit(scf2022, own ~ age)summary(model)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Estimate Mean in Multiply-Imputed SCF Data

Description

Returns the population-level estimate of a continuous variable's weightedmean across the Survey's five implicates. Use this operation to derive anestimate of a population's 'typical' or 'average' score on a continuousvariable.

Usage

scf_mean(scf, var, by = NULL, verbose = FALSE)

Arguments

scf

A scf_mi_survey object created withscf_load(). Must contain five replicate-weighted implicates.

var

A one-sided formula identifying the continuous variable to summarize (e.g., ~networth).

by

Optional one-sided formula specifying a discrete grouping variable for stratified means.

verbose

Logical. If TRUE, include implicate-level results in print output. Default is FALSE.

Value

A list of class "scf_mean" with:

results: Pooled estimates with standard errors and range across implicates. One row per group, or one row total.
imps: A named list of implicate-level estimates.
aux: Variable and group metadata.

Details

The mean is a measure of central tendency that represents the arithmetic average of a distribution.It is most appropriate when the distribution is symmetric and not heavily skewed.Unlike the median, the mean is sensitive to extreme values, which may distort interpretation in the presence of outliers.Use this function to describe the “typical” value of a continuous variable in the population or within subgroups.

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("mean_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Estimate meansscf_mean(scf2022, ~networth)scf_mean(scf2022, ~networth, by = ~edcl)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Estimate the Population Median of a Continuous SCF Variable

Description

Estimates the median (50th percentile) of a continuous SCF variable. Use thisoperation to characterize a typical or average value. In contrast toscf_mean(), this function is both uninfluenced by, and insensitive to,outliers.

Usage

scf_median(scf, var, by = NULL, verbose = FALSE)

Arguments

scf

Ascf_mi_survey object created byscf_load(). Must contain five implicates.

var

A one-sided formula specifying the continuous variable of interest (e.g.,~networth).

by

Optional one-sided formula for a categorical grouping variable.

verbose

Logical; if TRUE, show implicate-level results.

Value

A list of class"scf_median" with:

results: A data frame with pooled medians, standard errors, and range across implicates.
imps: A list of implicate-level results.
aux: Variable and grouping metadata.

Implementation

This function wrapsscf_percentile() withq = 0.5. The user supplies ascf_mi_survey object and a one-sided formula for the variable of interest,with an optional grouping formula. Output includes pooled medians,standard errors, min/max across implicates, and implicate-level values.Point estimates are the mean of the five implicate medians. Standard errorsare computed using the Survey of Consumer Finances convention describedbelow, not Rubin’s Rules.

Statistical Notes

Median estimates follow the Federal Reserve Board’s SCF variance convention.For each implicate, the median is computed with replicate weights viasurvey::svyquantile(). The pooled estimate is the average of the fiveimplicate medians. The pooled variance isV_total = V1 + ((m + 1) / m) * B,where V1 is the replicate-weight sampling variance from the first implicateand B is the between-implicate variance of the five implicate medians, withm = 5 implicates. The reported standard error is sqrt(V_total). This matchesthe Federal Reserve Board's published SAS macro for SCF descriptivestatistics and is not Rubin’s Rules.

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("median_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Estimate mediansscf_median(scf2022, ~networth)scf_median(scf2022, ~networth, by = ~edcl)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Estimate an Ordinary Least Squares Regression on SCF Microdata

Description

Computes an OLS regression on SCF data usingsvyglm() across the SCF'sfive implicates. Returns coefficient estimates, standard errors, teststatistics, and model diagnostics.

Usage

scf_ols(object, formula)

Arguments

object

Ascf_mi_survey object created withscf_load() andscf_design(). Must contain five implicates with replicate weights.

formula

A model formula specifying a continuous outcome and predictor variables (e.g.,networth ~ income + age).

Details

Fits a replicate-weighted linear regression model to each implicate ofmultiply-imputed SCF data and pools coefficients and standard errors usingRubin’s Rules.

Value

An object of class"scf_ols" and"scf_model_result" with:

results: A data frame of pooled coefficients, standard errors, t-values, p-values, and significance stars.
fit: A list of model diagnostics including mean AIC, standard deviation of AIC, mean R-squared, and its standard deviation.
imps: A list of implicate-levelsvyglm model objects.
call: The matched call used to produce the model.

Implementation

Ordinary least squares (OLS) regression estimates the linear relationshipbetween a continuous outcome and one or more predictor variables. Eachcoefficient represents the expected change in the outcome for a one-unitincrease in the corresponding predictor, holding all other predictorsconstant.

Use this function to model associations between SCF variables whileaccounting for complex survey design and multiple imputation.

This function takes ascf_mi_survey object and a model formula. Internally,it fits a weighted linear regression to each implicate usingsurvey::svyglm(), extracts coefficients and variance-covariance matrices,and pools them viascf_MIcombine().

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("ols_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Run OLS modelmodel <- scf_ols(scf2022, networth ~ income + age)summary(model)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Estimate Percentiles in SCF Microdata

Description

This function estimates a weighted percentile of a continuous variablein the Survey of Consumer Finances (SCF). It reproduces the procedure usedin the Federal Reserve Board's published SCF Bulletin SAS macro fordistributional statistics (Federal Reserve Board 2023c). This convention isspecific to SCF descriptive distributional statistics (quantiles,proportions) and differs from standard handling (i.e., using Rubin's Rule).

Usage

scf_percentile(scf, var, q = 0.5, by = NULL, verbose = FALSE)

Arguments

scf

Ascf_mi_survey object created withscf_load(). Mustcontain the list of replicate-weighted designs for each implicate inscf$mi_design.

var

A one-sided formula naming the continuous variable tosummarize (for example~networth).

q

Numeric percentile in between 0 and 1. Default 0.5 (median).

by

Optional one-sided formula naming a categorical groupingvariable. If supplied, the percentile is estimated separately withineach group.

verbose

Logical. If TRUE, include implicate-level estimates inthe returned object for inspection. Default FALSE.

Details

The operation to render the estimates:

For each implicate, estimate the requested percentile usingsurvey::svyquantile() withse = TRUE.
The reported point estimate is the mean of the M implicate-specificpercentile estimates.
The standard error follows the SCF Bulletin SAS macro convention:
```
V_total = V1 + ((M + 1) / M) * B
```
where:
- V1 is the replicate-weight sampling variance of the percentilefrom the first implicate only.
- B is the between-implicate variance of the percentile estimates.
The reported standard error is sqrt(V_total).
If a grouping variable is supplied, the same logic is appliedseparately within each group.

Value

An object of class"scf_percentile" containing:

results: A data frame containing pooled percentile estimates, pooledstandard errors, and implicate min/max values. One row per group (ifby is supplied) or one row otherwise.
imps: A list of implicate-level percentile estimates and standard errors.
aux: A list containing the variable name, optional group variable name,and the quantile requested.
verbose: Logical flag indicating whether implicate-level estimatesshould be printed byprint() orsummary().

References

Federal Reserve Board. 2023c. "SAS Macro: Variable Definitions."https://www.federalreserve.gov/econres/files/bulletin.macro.txt

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()` for actual SCF datatd <- tempfile("percentile_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Estimate the 75th percentile of net worthscf_percentile(scf2022, ~networth, q = 0.75)# Estimate the median net worth by ownership groupscf_percentile(scf2022, ~networth, q = 0.5, by = ~own)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)rm(scf2022)

Stacked Bar Chart of Two Discrete Variables in SCF Data

Description

Visualizes a discrete-discrete bivariate distribution using stacked barsbased on pooled cross-tabulations fromscf_xtab(). Use this function tovisualize the relationship between two discrete variables.

Usage

scf_plot_bbar(  design,  rowvar,  colvar,  scale = c("percent", "count"),  percent_by = c("total", "row", "col"),  title = NULL,  xlab = NULL,  ylab = NULL,  fill_colors = NULL,  row_labels = NULL,  col_labels = NULL)

Arguments

design

Ascf_mi_survey object created byscf_load(). Must contain five implicates with replicate weights.

rowvar

A one-sided formula for the x-axis grouping variable (e.g.,~edcl).

colvar

A one-sided formula for the stacked fill variable (e.g.,~racecl).

scale

Character. One of"percent" (default) or"count".

percent_by

Character. One of"total" (default),"row", or"col" — determines normalization base whenscale = "percent".

title

Optional character string for the plot title.

xlab

Optional character string for the x-axis label.

ylab

Optional character string for the y-axis label.

fill_colors

Optional vector of fill colors to pass toggplot2::scale_fill_manual().

row_labels

Optional named vector to relabelrow categories (x-axis).

col_labels

Optional named vector to relabelcol categories (legend).

Value

Aggplot2 object.

Implementation

This function callsscf_xtab() to estimate the joint distribution of twocategorical variables across multiply-imputed SCF data. The result is translatedinto aggplot2 stacked bar chart using pooled counts or normalized percentages.

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("plot_bbar_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Stacked bar chart: education by ownershipscf_plot_bbar(scf2022, ~own, ~edcl)# Example for real analysis: Column percentages instead of total percentscf_plot_bbar(scf2022, ~own, ~edcl, percent_by = "col")# Example for real analysis: Raw counts (estimated number of households)scf_plot_bbar(scf2022, ~own, ~edcl, scale = "count")# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Bar Plot of Summary Statistics by Grouping Variable in SCF Data

Description

Computes and plots a grouped summary statistic (either a mean, median, orquantile) for a continuous variable across a discrete factor. Estimates arepooled across implicates usingscf_mean(),scf_median(), orscf_percentile(). Use this function to visualize the bivariate relationshipbetween a discrete and a continuous variable.

Usage

scf_plot_cbar(  design,  yvar,  xvar,  stat = "mean",  title = NULL,  xlab = NULL,  ylab = NULL,  fill = "#0072B2",  angle = 30,  label_map = NULL)

Arguments

design

Ascf_mi_survey object fromscf_load().

yvar

One-sided formula for the continuous variable (e.g.,~networth).

xvar

One-sided formula for the grouping variable (e.g.,~racecl).

stat

"mean" (default),"median", or a quantile (numeric between 0 and 1).

title

Plot title (optional).

xlab

X-axis label (optional).

ylab

Y-axis label (optional).

fill

Bar fill color. Default is"#0072B2".

angle

Angle of x-axis labels. Default is 30.

label_map

Optional named vector to relabel x-axis category labels.

Value

Aggplot2 object.

Implementation

The user specifies a continuous outcome (yvar) and a discrete groupingvariable (xvar) via one-sided formulas. Group means are plotted by default.Medians or other percentiles can be specified via thestat argument.

Results are plotted usingggplot2::geom_col(), styled withscf_theme(),and optionally customized with additional arguments (e.g., axis labels,color, angles).

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("plot_cbar_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Plot mean net worth by education levelscf_plot_cbar(scf2022, ~networth, ~edcl, stat = "mean")# Example for real analysis: Visualize 90th percentile of income by educationscf_plot_cbar(scf2022, ~income, ~edcl, stat = 0.9, fill = "#D55E00")# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Plot Bar Chart of a Discrete Variable from SCF Data

Description

Creates a bar chart that visualizes the distribution of a discrete variable.

Usage

scf_plot_dbar(  design,  variable,  title = NULL,  xlab = NULL,  ylab = "Percent",  angle = 30,  fill = "#0072B2",  label_map = NULL)

Arguments

design

Ascf_mi_survey object created byscf_load(). Must contain valid implicates.

variable

A one-sided formula specifying a categorical variable (e.g.,~racecl).

title

Optional character string for the plot title. Default:"Distribution of <variable>".

xlab

Optional x-axis label. Default: variable name.

ylab

Optional y-axis label. Default:"Percent".

angle

Integer. Rotation angle for x-axis labels. Default is30.

fill

Fill color for bars. Default is"#0072B2".

label_map

Optional named vector to relabel x-axis category labels.

Value

Aggplot2 object representing the pooled bar chart.

Implementation

This function internally callsscf_freq() to compute population proportionestimates, which are then plotted usingggplot2::geom_col(). The defaultoutput is scaled to percent and can be customized via title, axis labels,angle, and color.

Details

Produces a bar chart of category proportions from a one-way tabulation,pooled across SCF implicates usingscf_freq(). This function summarizesweighted sample composition and communicates categorical distributionseffectively in descriptive analysis.

Dependencies

Requires theggplot2 package.

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("plot_dbar_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Bar chart of education categoriesscf_plot_dbar(scf2022, ~edcl)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Plot a Univariate Distribution of an SCF Variable

Description

This function provides a unified plotting interface for visualizing thedistribution of a single variable from multiply-imputed SCF data. Discretevariables produce bar charts of pooled proportions; continuous variablesproduce binned histograms. Use this function to visualize the univariatedistribution of an SCF variable.

Usage

scf_plot_dist(  design,  variable,  bins = 30,  title = NULL,  xlab = NULL,  ylab = "Percent",  angle = 30,  fill = "#0072B2",  labels = NULL)

Arguments

design

Ascf_mi_survey object created byscf_load().

variable

A one-sided formula specifying the variable to plot.

bins

Number of bins for continuous variables. Default is 30.

title

Optional plot title.

xlab

Optional x-axis label.

ylab

Optional y-axis label. Default is "Percent".

angle

Angle for x-axis tick labels. Default is 30.

fill

Fill color for bars. Default is"#0072B2".

labels

Optional named vector of custom axis labels (for discrete variables only).

Value

Aggplot2 object.

Implementation

For discrete variables (factor or numeric with <= 25 unique values), thefunction usesscf_freq() to calculate category proportions and produces abar chart. For continuous variables, it bins values across implicates andestimates Rubin-pooled frequencies for each bin.

Users may supply a named vector of custom axis labels using thelabels argument.

Examples

# Mock workflow for CRAN (demo only — not real SCF data)td <- tempfile("plot_dist_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)scf_plot_dist(scf2022, ~own)scf_plot_dist(scf2022, ~age, bins = 10)unlink(td, recursive = TRUE, force = TRUE)

Hexbin Plot of Two Continuous SCF Variables

Description

Visualizes the bivariate relationship between two continuous SCF variablesusing hexagonal bins.

Usage

scf_plot_hex(design, x, y, bins = 50, title = NULL, xlab = NULL, ylab = NULL)

Arguments

design

Ascf_mi_survey object created byscf_load().

x

A one-sided formula for the x-axis variable (e.g.,~income).

y

A one-sided formula for the y-axis variable (e.g.,~networth).

bins

Integer. Number of hexagonal bins along the x-axis. Default is50.

title

Optional character string for the plot title.

xlab

Optional x-axis label. Defaults to the variable name.

ylab

Optional y-axis label. Defaults to the variable name.

Value

Aggplot2 object displaying a Rubin-pooled hexbin plot.

Implementation

The function stacks all implicates into one data frame, retains replicate weights,and usesggplot2::geom_hex() to produce a density-style scatterplot. The colorintensity of each hexagon reflects the Rubin-pooled weighted count of householdsin that cell. Missing values are excluded.

This plot is especially useful for visualizing joint distributions with largesamples and skewed marginals, such as net worth vs. income.

Aesthetic Guidance

This plot uses a log-scale fill andviridis palette to highlight variationin density. To adjust the visual style globally, usescf_theme() or set itexplicitly withggplot2::theme_set(scf_theme()). For mobile-friendly orpublication-ready appearance, export the plot at 5.5 x 5.5 inches, 300 dpi.

Dependencies

Requires theggplot2 package. The fill scale usesscale_fill_viridis_c() fromggplot2.Requires thehexbin package. The function will stop with an error if it is not installed.

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("plot_hex_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Plot hexbin of income vs. net worthscf_plot_hex(scf2022, ~income, ~networth)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Histogram of a Continuous Variable in Multiply-Imputed SCF Data

Description

Produces a histogram of a continuous SCF variable by binning across implicates,pooling weighted bin counts usingscf_freq(), and plotting the result.Values outsidexlim are clamped into the nearest endpoint to ensure allobservations are included and replicate-weighted bins remain stable.

Usage

scf_plot_hist(  design,  variable,  bins = 30,  xlim = NULL,  title = NULL,  xlab = NULL,  ylab = "Weighted Count",  fill = "#0072B2")

Arguments

design

Ascf_mi_survey object fromscf_load().

variable

A one-sided formula indicating the numeric variable to plot.

bins

Number of bins (default: 30).

xlim

Optional numeric range. Values outside will be included in edge bins.

title

Optional plot title.

xlab

Optional x-axis label. Defaults to the variable name.

ylab

Optional y-axis label. Defaults to "Weighted Count".

fill

Fill color for bars (default:"#0072B2").

Value

Aggplot2 object representing the Rubin-pooled histogram.

Implementation

This function bins a continuous variable (after clamping toxlim if supplied),applies the samecut() breaks across implicates usingscf_update_by_implicate(),and computes Rubin-pooled frequencies withscf_freq(). Results are filtered toremove bins with undefined proportions and then plotted usingggplot2::geom_col().

The logic here is specific to operations where the bin assignment must be computedwithin each implicate, not after pooling. This approach ensures consistent binningand stable pooled estimation in the presence of multiply-imputed microdata.

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("plot_hist_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Plot histogram of agescf_plot_hist(scf2022, ~age, bins = 10)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Smoothed Distribution Plot of a Continuous Variable in SCF Data

Description

Draws a smoothed distribution plot of a continuous variable in the SCF. Usethis function to visualize a single continuous variable's distribution.

Usage

scf_plot_smooth(  design,  variable,  binwidth = NULL,  xlim = NULL,  method = "loess",  span = 0.2,  color = "blue",  xlab = NULL,  ylab = "Percent of Households",  title = NULL)

Arguments

design

Ascf_mi_survey object created byscf_load().

variable

A one-sided formula specifying a continuous variable (e.g.,~networth).

binwidth

Optional bin width. Default uses Freedman–Diaconis rule.

xlim

Optional numeric vector of length 2 to truncate axis.

method

Character. Smoothing method:"loess" (default) or"lm".

span

Numeric LOESS span. Default is0.2. Ignored ifmethod = "lm".

color

Line color. Default is"blue".

xlab

Optional label for x-axis. Defaults to the variable name.

ylab

Optional label for y-axis. Defaults to"Percent of Households".

title

Optional plot title.

Value

Aggplot2 object.

Implementation

Visualizes the weighted distribution of a continuous SCF variable by stacking implicates,binning observations, and smoothing pooled proportions. This function is useful forexamining distribution shape, skew, or modality in variables like income or wealth.

All implicates are stacked and weighted, binned across a data-driven or user-specifiedbin width. Each bin's weight share is calculated, and a smoothing curve is fit tothe resulting pseudo-density.

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("plot_smooth_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Plot smoothed distributionscf_plot_smooth(scf2022, ~networth, xlim = c(0, 2e6),                method = "loess", span = 0.25)     # Do not implement these lines in real analysis: Cleanup for package check           unlink(td, recursive = TRUE, force = TRUE)

Test a Proportion in SCF Data

Description

Tests a binary variable's proportion against a null hypothesis (one-sample),or compares proportions across two groups (two-sample). Supports two-sided,less-than, or greater-than alternatives.

Usage

scf_prop_test(  design,  var,  group = NULL,  p = 0.5,  alternative = c("two.sided", "less", "greater"),  conf.level = 0.95)

Arguments

design

Ascf_mi_survey object created byscf_load(). Must contain replicate-weighted implicates.

var

A one-sided formula indicating a binary variable (e.g.,~rich).

group

Optional one-sided formula indicating a binary grouping variable (e.g.,~female). If omitted, a one-sample test is performed.

p

Null hypothesis value. Defaults to0.5 for one-sample,0 for two-sample tests.

alternative

Character. One of"two.sided" (default),"less", or"greater".

conf.level

Confidence level for the confidence interval. Default is0.95.

Value

An object of class"scf_prop_test" with:

results: A data frame with the pooled estimate, standard error, z-statistic, p-value, confidence interval, and significance stars.
proportions: (Only in two-sample tests) A data frame of pooled proportions by group.
fit: A list describing the method, null value, alternative hypothesis, and confidence level.

Statistical Notes

Proportions are computed in each implicate using weighted means, and variances are approximated under the binomial model.Rubin’s Rules are applied to pool point estimates and standard errors. For pooling details, seescf_MIcombine().

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("proptest_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Wrangle data for examplescf2022 <- scf_update(scf2022,  rich   = networth > 1e6,  female = factor(hhsex, levels = 1:2, labels = c("Male","Female")),  over50 = age > 50)# Example for real analysis: One-sample testscf_prop_test(scf2022, ~rich, p = 0.10)# Example for real analysis: Two-sample testscf_prop_test(scf2022, ~rich, ~female, alternative = "less")# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Format and Display Regression Results from Multiply-Imputed SCF Models

Description

This function formats and aligns coefficient estimates, standard errors, andsignificance stars from one or more SCF regression model objects(e.g., fromscf_ols(),scf_logit(), orscf_glm()).

Usage

scf_regtable(  ...,  model.names = NULL,  digits = 0,  auto_digits = FALSE,  labels = NULL,  output = c("console", "markdown", "csv"),  file = NULL)

Arguments

...

One or more SCF regression model objects, or a single list of such models.

model.names

Optional character vector naming the models. Defaults to"Model 1","Model 2", etc.

digits

Integer specifying decimal places for numeric formatting whenauto_digits = FALSE. Default is 0.

auto_digits

Logical; ifTRUE, uses adaptive decimal places:0 digits for large numbers (>= 1000), 2 digits for moderate (>= 1),and 3 digits for smaller values.

labels

Optional named character vector or labeling function to replaceterm names with descriptive labels.

output

Output format: one of"console" (print to console),"markdown" (print Markdown table for R Markdown), or"csv"(write CSV file).

file

File path for CSV output; required ifoutput = "csv".

Details

It compiles a side-by-side table with terms matched across models, appendsmodel fit statistics (sample size N, R-squared or pseudo-R-squared, and AIC),and outputs the results as console text, Markdown for R Markdown documents,or a CSV file.

The function aligns all unique coefficient terms across provided models, formatscoefficients with significance stars and standard errors, appends model fitstatistics as additional rows, and renders output in the specified format.It avoids external dependencies by using base R formatting and simple text orMarkdown output.

Value

Invisibly returns a data frame with formatted regression results and fit statistics.

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("regtable_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Wrangle data for example:  Perform OLS regression m1 <- scf_ols(scf2022, income ~ age)# Example for real analysis: Print regression results as a console tablescf_regtable(m1, digits = 2)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Subset an`scf_mi_survey` Object

Description

Subsetting refers to the process of retaining only those observations thatsatisfy a logical (TRUE/FALSE) condition. This function applies such afilter independently to each implicate in anscf_mi_survey object createdbyscf_design() viascf_load(). The result is a new multiply-imputed,replicate-weighted survey object with appropriately restricted designs.

Usage

scf_subset(scf, expr)

Arguments

scf

Ascf_mi_survey object, typically created byscf_load().

expr

A logical expression used to filter rows, evaluated separately in each implicate's variable frame (e.g.,age < 65 & own == 1).

Value

A newscf_mi_survey object (seescf_design())

Implementation

Usescf_subset() to focus analysis on analytically meaningfulsub-populations. For example, to analyze only households headed by seniors:

scf2022_seniors <- scf_subset(scf2022, age >= 65)

This is especially useful when analyzing populations such as renters, homeowners, specific age brackets,or any group defined by logical expressions over SCF variables.

Details

Filtering is conducted separately in each implicate. This preserves valid design structure but meansthat the same household may fall into or out of the subset depending on imputed values.For example, a household with five different age imputations—say, 64, 66, 63, 65, and 67—would beclassified as a senior in only three of five implicates if subsetting onage >= 65.

Empty subsets in any implicate can cause downstream analysis to fail. Always check subgroup sizes after subsetting.

Examples

# Mock workflow for CRAN (demo only — not real SCF data)td <- tempfile("subset_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Filter for working-age households with positive net worthscf_sub <- scf_subset(scf2022, age < 65 & networth > 0)scf_mean(scf_sub, ~income)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Default Plot Theme for SCF Visualizations

Description

The theme is designed to:

Render cleanly inprint (single-column or wrapped layout)
Scale well onHD desktop monitors without visual clutter
Remainlegible on mobile with clear fonts and sufficient contrast

The default figure dimensions assumed for export are5.5 inches by 5.5 inchesat300 dpi, which balances compactness with accessibility across media.

All theme settings are exposed via comments to enable easy brand customization.

Usage

scf_theme(base_size = 13, base_family = "sans", grid = TRUE, axis = TRUE, ...)

Arguments

base_size

Base font size. Defaults to 13.

base_family

Font family. Defaults to "sans".

grid

Logical. Show gridlines? Defaults to TRUE.

axis

Logical. Include axis ticks and lines? Defaults to TRUE.

...

Additional arguments passed toggplot2::theme_minimal().

Details

Defines the SCF package's defaultggplot2 theme, optimized for legibility,clarity, and aesthetic coherence across print, desktop, and mobile platforms.

Value

Aggplot2 theme object applied by all⁠scf_plot_*()⁠ functions.

Examples

library(ggplot2)ggplot(mtcars, aes(factor(cyl))) +  geom_bar(fill = "#0072B2") +  scf_theme()

T-Test of Means using SCF Microdata

Description

Tests whether the mean of a continuous variable differs from a specifiedvalue (one-sample), or whether group means differ across a binary factor(two-sample). Estimates and standard errors are computed usingsvymean()within each implicate, then pooled using Rubin’s Rules. Use this functionto test hypotheses about means in the SCF microdata.

Usage

scf_ttest(  design,  var,  group = NULL,  mu = 0,  alternative = c("two.sided", "less", "greater"),  conf.level = 0.95)

Arguments

design

Ascf_mi_survey object created byscf_load().

var

A one-sided formula specifying a numeric variable (e.g.,~income).

group

Optional one-sided formula specifying a binary grouping variable (e.g.,~female).

mu

Numeric. Null hypothesis value. Default is0.

alternative

Character. One of"two.sided" (default),"less", or"greater".

conf.level

Confidence level for the confidence interval. Default is0.95.

Value

An object of classscf_ttest with:

results: A data frame with pooled estimate, standard error, t-statistic, degrees of freedom, p-value, and confidence interval.
means: Group-specific means (for two-sample tests only).
fit: List describing the test type, null hypothesis, confidence level, and alternative.

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("ttest_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Wrangle data for example: Derive analysis varsscf2022 <- scf_update(scf2022,  female = factor(hhsex, levels = 1:2, labels = c("Male","Female")),  over50 = age > 50)# Example for real analysis:  One-sample t-testscf_ttest(scf2022, ~income, mu = 75000)# Example for real analysis:  Two-sample t-testscf_ttest(scf2022, ~income, group = ~female)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Create or Alter SCF Variables

Description

Use this function to create or alter SCF variables once the raw data set hasbeen loaded into memory using thescf_load() function. This functionupdates anscf_mi_survey object by evaluating transformations within eachimplicate, and then returning a new object with the new or amended variables.

Most of the time, you can usescf_update() to define variables based onsimple logical conditions, arithmetic transformations, or categoricalbinning. These rules are evaluated separately in each implicate, using thesame formula. However, if the transformation you want to apply depends on thedistribution of the data within each implicate, such as computing anaverage percentile or ranking households across all implicates,this function will not suffice. In those cases, usescf_update_by_implicate() to write a custom function that operates on eachimplicate individually.

Usage

scf_update(object, ...)

Arguments

object

Ascf_mi_survey object, typically created byscf_load().

...

Named expressions assigning new or modified variables using= syntax.Each expression must return a vector of the same length as the implicate data frame.

Value

A newscf_mi_survey object with:

implicates: A list of updated data frames (one per implicate).
mi_design: A list of updatedsvyrep.design survey objects.
data: (If present in the original object) unchanged pooled data.

Usage

Usescf_update() during data wrangling to clean, create, or alter variables before calculatingstatistics or running models. The function is useful when the analyst wishes to:

Recode missing values that are coded as numeric data
Recast variables that are not in the desired format (e.g., converting a numeric variable to a factor)
Create new variables based on existing ones (e.g., calculating ratios, differences, or indicators)

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("update_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Create a binary indicator for being over age 50scf2022 <- scf_update(scf2022,  over50 = age > 50)# Example: Create a log-transformed income variablescf2022 <- scf_update(scf2022,  log_income = log(income + 1))# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Modify Each Implicate Individually in SCF Data

Description

Each household in SCF data is represented by fiveimplicates, which reflectuncertainty from the imputation process. Most transformations — such as computinglog income or assigning categorical bins — can be applied uniformly across implicatesusingscf_update(). However, some operations depend on theinternal distributionof variables within each implicate. For those, you need to modify each one separately.

This function extracts each implicate from the replicate-weighted survey design,applies your transformation, and rebuilds the survey design objects accordingly.

Usage

scf_update_by_implicate(object, f)

Arguments

object

Ascf_mi_survey object fromscf_load().

f

A function that takes a data frame as input and returns a modified data frame.This function will be applied independently to each implicate.

Details

Applies a user-defined transformation to each implicate's data frame separately.This is useful when you need to compute values that depend on the distributionwithin each implicate — such as ranks, percentiles, or groupwise comparisons —which cannot be computed reliably usingscf_update().

Value

A modifiedscf_mi_survey object with updated implicate-level designs.

Use this When

You need implicate-specific quantiles (e.g., flag households in the top 10% of wealth)
You want to assign percentile ranks (e.g., income percentile by implicate)
You are computing statistics within groups (e.g., groupwise z-scores)
You need to derive a variable based on implicate-specific thresholds or bins

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("update_by_implicate_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: compute implicate-specific z-scores of incomescf2022 <- scf_update_by_implicate(scf2022, function(df) {  mu <- mean(df$income, na.rm = TRUE)  sigma <- sd(df$income, na.rm = TRUE)  df$z_income <- (df$income - mu) / sigma  df})# Verify new variable existshead(scf2022$mi_design[[1]]$variables$z_income)# Do not implement these lines in real analysis: Cleanup for package checkunlink(td, recursive = TRUE, force = TRUE)

Cross-Tabulate Two Discrete Variables in Multiply-Imputed SCF Data

Description

Computes replicate-weighted two-way cross-tabulations of two discrete variablesusing multiply-imputed SCF data. Estimates cell proportions and standard errors,with optional scaling of proportions by cell, row, or column. Results are pooledacross implicates using Rubin's Rules.

Usage

scf_xtab(scf, rowvar, colvar, scale = "cell")

Arguments

scf

Ascf_mi_survey object, typically created byscf_load(). Must include five implicates with replicate weights.

rowvar

A one-sided formula specifying the row variable (e.g.,~edcl).

colvar

A one-sided formula specifying the column variable (e.g.,~racecl).

scale

Character. Proportion basis: "cell" (default), "row", or "col".

Value

A list of class"scf_xtab" with:

results: Data frame with one row per cell. Columns:row,col,prop,se,row_share,col_share,rowvar, andcolvar.
matrices: List of matrices:cell (default proportions),row,col, andse.
imps: List of implicate-level cell count tables.
aux: List withrowvar andcolvar names.

Statistical Notes

Implicate-level tables are created usingsvytable() on replicate-weighted designs.Proportions are calculated as shares of total population estimates. Variance acrossimplicates is used to estimate uncertainty. Rubin's Rules are applied in simplified form.

For technical details on pooling logic, seescf_MIcombine() or the SCF package manual.

Examples

# Do not implement these lines in real analysis:# Use functions `scf_download()` and `scf_load()`td <- tempfile("xtab_")dir.create(td)src <- system.file("extdata", "scf2022_mock_raw.rds", package = "scf")file.copy(src, file.path(td, "scf2022.rds"), overwrite = TRUE)scf2022 <- scf_load(2022, data_directory = td)# Example for real analysis: Cross-tabulate ownership by educationscf_xtab(scf2022, ~own, ~edcl, scale = "row")# Do not implement these lines in real analysis: Cleanup for package checkunlink(file.path(td, "scf2022.rds"), force = TRUE)

Generic S3 Method: vcov.scf_model_result

Description

Reconstructs the pooled variance-covariance matrix of the model coefficients.

Usage

## S3 method for class 'scf_model_result'vcov(object, ...)

Arguments

object

An object of class 'scf_model_result'.

...

Not used.

Details

NOTE: The pooled variance matrix is NOT stored directly, only the coefficientsand SEs. This method is typically skipped in favour of direct SE access orcustom pooling, but is included here to provide a complete S3 interface.

Value

Returns the variance-covariance matrix from the internal pooling object if available,or stops with an error if the model object doesn't retain the raw pooled variance.

Movatterモバイル変換

Analyzing Survey of Consumer Finances Public-Use Microdata

Description

Methodological Background

Package Architecture and Workflow

Core Data Object and Its Structure

Imputed Missing Data

Mock Data for Testing

Theming and Visual Style

Pedagogical Design

Author(s)

References

See Also

Generic S3 Method: AIC.scf_model_result

Description

Usage

Arguments

Value

Extract Standard Errors from Pooled SCF Model Results

Description

Usage

Arguments

Value

See Also

Generic S3 Method: coef.scf_model_result

Description

Usage

Arguments

Value

Generic S3 Method: formula.scf_model_result

Description

Usage

Arguments

Value

Generic S3 Method: predict.scf_model_result

Description

Usage

Arguments

Value

Generic S3 Method: residuals.scf_model_result

Description

Usage

Arguments

Details

Value

Combine Estimates Across SCF Implicates Using Rubin's Rules

Description

Usage

Arguments

Value

Scope

Implementation

Details

References

Examples

Activate SCF Plot Theme

Description

Usage

Value

Estimate Correlation Between Two Continuous Variables in SCF Microdata

Description

Usage

Arguments

Details

Value

Implementation

Interpretation

Statistical Notes

Note

See Also

Examples

Construct SCF Core Data Object

Description

Usage

Arguments

Details

Value

See Also

Examples

Download and Prepare SCF Microdata for Local Analysis