Movatterモバイル変換

Title:

Plausibility Analysis of Epidemiological Signals

Version:

0.1.0

Description:

Provides functionality to prepare data and analyze plausibility of both forecasted and reported epidemiological signals. The functions implement a set of plausibility algorithms that are agnostic to geographic and time resolutions and are calculated independently then presented as a combined score.

License:

MIT + file LICENSE

URL:

https://signaturescience.github.io/rplanes/

Depends:

R (≥ 2.10)

Imports:

dplyr, dtw, ecp, lubridate, magrittr, purrr, readr, rlang,stringr, tibble, tidyr, utils

Suggests:

DT, ggplot2, markdown, knitr, MMWRweek, rmarkdown, shiny,shinyjs, shinyWidgets, shinycssloaders, testthat (≥ 3.0.0),tools

VignetteBuilder:

knitr

Config/testthat/edition:

Encoding:

UTF-8

RoxygenNote:

7.2.3

NeedsCompilation:

Packaged:

2024-07-15 16:28:47 UTC; vpnagraj

Author:

VP Nagraj

[aut, cre], Desiree Williams [aut], Amy Benefield [aut]

Maintainer:

VP Nagraj <nagraj@nagraj.net>

Repository:

CRAN

Date/Publication:

2024-07-17 10:50:02 UTC

rplanes: Plausibility Analysis of Epidemiological Signals

Description

logo

Author(s)

Maintainer: VP Nagrajnagraj@nagraj.net (ORCID)

Authors:

Desiree Williams
Amy Benefield

Pipe operator

Description

Seemagrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of callingrhs(lhs).

Month subtraction

Description

Seelubridate::%m-% for details.

Arguments

e1

A period or a date-time object of classPOSIXlt,POSIXctorDate.

e2

A period or a date-time object of classPOSIXlt,POSIXctorDate. Note that one of e1 and e2 must be a period and the other adate-time object.

Value

A date-time object of class POSIXlt, POSIXct or Date

Check completeness of seed and signal data

Description

This unexported helper is used internally invalid_dates to optionally issue a warning for potential completeness of seed and signal data based on dates provided.

Usage

check_incomplete(seed_date, signal_date, resolution)

Arguments

seed_date

Last date available in seed object

signal_date

First date available in signal object

resolution

Character vector specifying the temporal resolution (e.g., "weeks", "months")

Value

Operates as side-effect and returns awarning() if there are the seed and signal dates combined indicate an incomplete week or month.

Sliding windows

Description

This unexported helper function is used withinplane_shape() to generate sliding windows from a vector and return a data frame where each row is a subset (a sliding window) of a time series. The length of the each windowed time series (and therefore number of columns) is equal to "window_size". The number of windows is equal to(length(vector) - window_size) + 1. For example, given a time series of length 38 and a window size of length 4, then there will be 35 windowed time series (rows), with 4 time stamps each (columns).

Usage

create_sliding_windows_df(vector, window_size)

Arguments

vector

A numeric or integer vector that is the time series to be used to create sliding windows

window_size

An integer specifying the size (i.e., number of elements) of the windowed time series desired

Value

Adata.frame where each row is a subset (a sliding window) of a time series.

Cut into categorical differences

Description

This unexported helper function takes an input number for an observed difference and cuts it into a categorical description (e.g., "increase", "decrease", or "stable") of the change.

Usage

cutter(x, threshold = 1)

Arguments

x

Vector of length 1 with scaled difference to be categorized

threshold

Limit used to define the categorical differences; default is1

Value

Character vector of length 1 with the categorical description of difference

Epiweek start

Description

This unexported helper identifies the date of the first day for the epiweek of the given date. The function is used internally inside ofvalid_dates.

Usage

epiweek_start(date)

Arguments

date

Date to be queried

Value

Date of the first day of the epiweek for the input date.

Determine shapes

Description

This unexported helper function is used to identify the shape in theplane_shape() function's scaled difference ("sdiff") method.

Usage

get_shapes(input_data, window_size)

Arguments

input_data

A data frame containing at least two columns, one of which must be named "value" with the value assessed and another named "dates" with the date for the observed data

window_size

The number of of categorical differences used to define the shape

Value

A vector with the shapes identified. Each element of the vector will include a shape, which is a cluster of categorical differences (of the same size as the specified "window_size") collapsed with ";" (e.g.,c("decrease;stable;stable;stable","stable;stable;stable;increase","stable;stable;increase;increase")).

Check forecast

Description

This function checks if the object is of classsignal andforecast.

Usage

is_forecast(x)

Arguments

x

Input object to be checked

Value

Logical as to whether or not the input object inherits the "signal" and "forecast" classes.

Examples

## get path to example forecast filefp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")ex_forecast <- read_forecast(fp)sig <- to_signal(ex_forecast, outcome="flu.admits", type="forecast", horizon=4, resolution="weeks")is_forecast(sig)

Check observed

Description

This function checks if the object is of classsignal andobserved.

Usage

is_observed(x)

Arguments

x

Input object to be checked

Value

Logical as to whether or not the input object inherits the "signal" and "observed" classes.

Examples

hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")sig <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")is_observed(sig)

Month start

Description

This unexported helper identifies the date of the first day of the month for the given date. The function is used internally inside ofvalid_dates.

Usage

month_start(date)

Arguments

date

Date to be queried

Value

Date of the first day of the month for the input date.

Coverage component

Description

This function evaluates whether or not the evaluated signal interval covers the last observed value. The interval used in this plausibility component is drawn from the upper and lower bounds of the forecasted prediction interval. As such, the only accepted signal format isforecast, which will include upper and lower bounds.

Usage

plane_cover(location, input, seed)

Arguments

location

Character vector with location code; the location must appear in input and seed

input

Input signal data to be scored; object must be one offorecast

seed

Preparedseed

Value

Alist with the following values:

indicator: Logical as to whether or not the last value falls outside of the interval (e.g., not in between lower and upper bounds of prediction interval) of the evaluated signal
last_value: A vector with the last value recorded in the seed
bounds: A list with a two elements corresponding to the upper and lower bounds of the evaluated signal interval

Examples

## read in example observed data and prep observed signalhosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")## read in example forecast and prep forecast signalfp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")prepped_forecast <- read_forecast(fp) %>%  to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)## prepare seed with cut dateprepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")## run plane componentplane_cover(location = "08", input = prepped_forecast, seed = prepped_seed)plane_cover(location = "47", input = prepped_forecast, seed = prepped_seed)

Difference component

Description

This function implements the point-to-point difference plausibility component. Differences in evaluated signals are calculated from input values iteratively subtracted from the previous values (i.e., for each x at time point i, the difference will be calculated as xi - xi-1). The plausibility analysis uses the evaluated differences to compare against the maximum difference observed and recorded in the seed.

Usage

plane_diff(location, input, seed)

Arguments

location

Character vector with location code; the location must appear in input and seed

input

Input signal data to be scored; object must be one offorecast orobserved

seed

Preparedseed

Value

Alist with the following values:

indicator: Logical as to whether or not the absolute value of any of the evaluated differences exceeds the maximum difference
values: A vector with the values assessed including the last value in seed concatenated with the evaluated signal values
evaluated_differences: A vector with the consecutive differences for the values
maximum_difference: A vector with one value for the maximum difference observed in seed

Examples

## read in example observed data and prep observed signalhosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")## read in example forecast and prep forecast signalfp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")prepped_forecast <- read_forecast(fp) %>%  to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)## prepare seed with cut dateprepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")## run plane componentplane_diff(location = "10", input = prepped_forecast, seed = prepped_seed)plane_diff(location = "51", input = prepped_forecast, seed = prepped_seed)

Repeat component

Description

This function evaluates whether consecutive values in observations or forecasts are repeated a k number of times. This function takes in aforecast orobserved object that is either from an observed dataset or forecast dataset. Note that if a signal is contant (i.e., the same value is repeated for all time points) then the repeat component will returnFALSE.

Usage

plane_repeat(location, input, seed, tolerance = NULL, prepend = NULL)

Arguments

location

Character vector with location code; the location must appear in input and seed

input

Input signal data to be scored; object must be one offorecast orobserved

seed

Preparedseed

tolerance

Integer value for the number of allowed repeats before flag is raised. Default isNULL and allowed repeats will be determined from seed.

prepend

Integer value for the number of values from seed to add before the evaluated signal. Default isNULL and the number of values will be determined from seed.

Value

Alist with the following values:

indicator: Logical as to whether or not the value is repeated sequentially k number of times.
repeats: Atibble with repeating values found. If there are no repeats (i.e., indicator isFALSE) then thetibble will have 0 rows.

Examples

## read in example observed data and prep observed signalhosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")## read in example forecast and prep forecast signalfp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")prepped_forecast <- read_forecast(fp) %>%  to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)## prepare seed with cut dateprepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")## run plane component## use defaultsplane_repeat(location = "12", input = prepped_forecast, seed = prepped_seed)## set tolerated repeats to 2plane_repeat(location = "12", input = prepped_forecast, seed = prepped_seed, tolerance = 2)## use defaultsplane_repeat(location = "49", input = prepped_forecast, seed = prepped_seed)## set number of values prepended for evaluation to 4plane_repeat(location = "49", input = prepped_forecast, seed = prepped_seed, prepend = 4)

Score PLANES components

Description

This function wraps PLANES scoring for specified components across all locations in single step.

Usage

plane_score(input, seed, components = "all", args = NULL, weights = NULL)

Arguments

input

Input signal data to be scored; object must be one offorecast orobserved

seed

Preparedseed

components

Character vector specifying component; must be either"all" or any combination of"cover","diff","taper","trend","repeat","shape", and"zero"; default is"all" and will use all available components for the given signal

args

Named list of arguments for component functions. List elements must be named to match the given component and arguments passed as a nested list (e.g.,args = list("trend" = list("sig_lvl" = 0.05))). Default isNULL and defaults for all components will be used

weights

Named vector with weights to be applied; default isNULL and all components will be equally weighted; if notNULL then the length of the vector must equal the number of components, with each component given a numeric weight (see Examples). Specified weights must be real numbers greater than or equal to 1.

Value

Alist with scoring results for all locations.

Examples

## read in example observed data and prep observed signalhosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")## read in example forecast and prep forecast signalfp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")prepped_forecast <- read_forecast(fp) %>%  to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)## prepare seed with cut dateprepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")## run plane scoring with all componentsplane_score(input = prepped_forecast, seed = prepped_seed)## run plane scoring with select componentsplane_score(input = prepped_forecast, seed = prepped_seed, components = c("cover","taper"))## run plane scoring with all components and additional argstrend_args <- list("sig_lvl" = 0.05)repeat_args <- list("prepend" = 4, "tolerance" = 8)shape_args <- list("method" = "dtw")comp_args <- list("trend" = trend_args, "repeat" = repeat_args, "shape" = shape_args)plane_score(input = prepped_forecast, seed = prepped_seed, args = comp_args)## run plane scoring with specific components and weightscomps <- c("cover", "taper", "diff")wts <- c("cover" = 1.5, "taper" = 1, "diff" = 4)plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts)

Create seed

Description

This function wraps theseed_engine to operate across all locations in the input signal.

Usage

plane_seed(input, cut_date = NULL)

Arguments

input

Input signal data used for seeding; must be an observed signal object

cut_date

Maximum date (inclusive) for which seeding should be performed; default isNULL and the entire input will be used for seeding

Value

A namedlist of lengthn, where multiple elements corresponding to seed characteristics and metadata for each of then locations are nested in independent lists.

Examples

## read in example observed data and prep observed signalhosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")## prepare seed with no cut dateplane_seed(prepped_observed)## prepare seed with cut dateplane_seed(prepped_observed, cut_date = "2022-10-29")

Shape component

Description

This function identifies the shape of the trajectory for a forecasted signal to compare against existing shapes in seed data. If the shape is identified as novel, a flag is raised, and the signal is considered implausible. See the Details section for further information.

Usage

plane_shape(location, input, seed, method = "sdiff")

Arguments

location

Character vector with location code; the location must appear in input and seed

input

Input signal data to be scored; object must be one offorecast

seed

Preparedseed

method

The method for determining shapes; must be one of "sdiff" or "dtw" (see Details); default is "sdiff"

Details

The approach for determining shapes can be customized by the user with theplane_shape() "method" argument. The two methods available are "sdiff" (default) and "dtw". Compared with "sdiff", the "dtw" method has been shown to have a higher sensitivity, lower specificity, and much greater computational cost in some circumstances. The "sdiff" method is recommended if computational efficiency is a concern.

The "sdiff" method will use consecutive scaled differences to construct shapes. The algorithm operates in three steps:

The preparedseed data is combined with forecasted point estimates and each point-to-point difference is calculated.
The differences are centered and scaled, then cut into categories. Differences greater than or equal to one standard deviation above the mean of differences are considered an "increase". Differences less than or equal to one standard deviation below the mean of differences are considered a "decrease". All other differences are considered "stable".
The categorical differences are then combined into windows of equal size to the forecasted horizon. Collectively these combined categorical differences create a "shape" (e.g., "increase;stable;stable;decrease").
Lastly, the algorithm compares the shape for the forecast to all of the shapes observed. If the shape assessed has not been previously observed in the time series then a flag is raised and the indicator returned isTRUE.

The "dtw" method uses a Dynamic Time Warping (DTW) algorithm to identify shapes within the seed data and then compares the shape of the forecast input signal to the observed shapes. This is done in three broad steps:

The preparedseed data is divided into a set of sliding windows with a step size of one, each representing a section of the overall time series. The length of these windows is determined by the horizon length of the input data signal (e.g., 2 weeks). For example, if the seed data was a vector,c(1, 2, 3, 4, 5), and the horizon length was 2, then the sliding windows for the observed seed data would be:c(1, 2),c(2, 3),c(3, 4), andc(4, 5). Each sliding window is a subset of the total trajectory shape of the observed data.
Shape-based DTW distances are calculated for every 1x1 combination of the observed sliding windows and are stored in a distance matrix. These distances calibrate the function for identifying outlying shapes in forecast data. The algorithm finds the minimum distances for each windowed time series to use as a baseline for "observed distances" between chunks of the larger observed time series. The maximum of those minimum distances across the observed time series is set as the threshold. If the minimum of the forecast:observed distance matrix is greater than the threshold, then the forecast is inferred to be unfamiliar (i.e., a novel shape).
Next, the algorithm calculates the shape-based DTW distances between the forecast signal (including the point estimate, lower, and upper bounds) and every observed sliding window. If the distance between the forecast and any observed sliding window is less than or equal to the threshold defined above, then this shape is not novel and no flag is raised (indicator isFALSE).

Value

Alist with the following values:

indicator: Logical as to whether or not the the shape of the evaluated signal is novel (TRUE if shape is novel,FALSE if a familiar shape exists in the seed)

References

Toni Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software, 31(7), 1-24. doi:10.18637/jss.v031.i07

Tormene, P.; Giorgino, T.; Quaglini, S. & Stefanelli, M. Matching incomplete time series with dynamic time warping: an algorithm and an application to post-stroke rehabilitation. Artif Intell Med, 2009, 45, 11-34. doi:10.1016/j.artmed.2008.11.007

Examples

## read in example observed data and prep observed signalhosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))tmp_hosp <- hosp %>% dplyr::select(date, location, flu.admits) %>% dplyr::mutate(date = as.Date(date))prepped_observed <- to_signal(tmp_hosp,                               outcome = "flu.admits",                               type = "observed",                               resolution = "weeks")## read in example forecast and prep forecast signalprepped_forecast <- read_forecast(system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv",                                                package = "rplanes")) %>%   to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)## prepare seed with cut dateprepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")## run plane componentplane_shape(location = "37", input = prepped_forecast, seed = prepped_seed)## run plane component with DTW methodplane_shape(location = "37", input = prepped_forecast, seed = prepped_seed, method = "dtw")

Taper component

Description

This function evaluates whether or not the evaluated signal interval tapers (i.e., decreases in width) as horizons progress. The interval used in this plausibility component is drawn from the upper and lower bounds of the forecasted prediction interval. As such, the only accepted signal format isforecast, which will include upper and lower bounds.

Usage

plane_taper(location, input, seed)

Arguments

location

Character vector with location code; the location must appear in input and seed

input

Input signal data to be scored; object must be one offorecast

seed

Preparedseed

Value

Alist with the following values:

indicator: Logical as to whether or not the prediction interval width tapers with advancing horizons
widths: Consecutive interval widths for forecasted data

Examples

## read in example observed data and prep observed signalhosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")## read in example forecast and prep forecast signalfp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")prepped_forecast <- read_forecast(fp) %>%  to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)## prepare seed with cut dateprepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")## run plane componentplane_taper(location = "19", input = prepped_forecast, seed = prepped_seed)plane_taper(location = "44", input = prepped_forecast, seed = prepped_seed)

Trend component

Description

This function identifies any change points in the forecast data or in the final observed data point. Change points are identified by any significant change in magnitude or direction of the slope of the time series.

Usage

plane_trend(location, input, seed, sig_lvl = 0.1)

Arguments

location

Character vector with location code; the location must appear in input and seed

input

Input signal data to be scored; object must beforecast

seed

Preparedseed

sig_lvl

The significance level at which to identify change points (between zero and one); default is0.1

Details

This function usese.divisive(), which implements a hierarchical divisive algorithm to identify change points based on distances between segments (calculated using equations 3 and 5 in Matteson and James, 2014; the larger the distance, the more likely a change point). Then a permutation test is used to calculate an approximate p-value.

The input toe.divisive() is transformed using differencing (i.e.,diff(x) instead of the raw data,x). This slightly changes the way that change points are identified, as the index aligns with the gap between points rather than the points themselves. Instead of identifying a change point based on the change in size between two points, it identifies change points based on the change in the change itself. The dataframe below illustrates the difference betweenx anddiff(x):

Index	x	diff(x)
1	3	6
2	9	0
3	9	28
4	37	37
5	74	1
6	75	0
7	75	0

Given this data,e.divisive(x) would identify index 5 (74) as the change point, because there was a jump of +37 between index 4 and 5. Bute.divisive(diff(x)) would pick both index 3 (28) and 5 (1), because there was a jump of +28 from index 2 and 3, and there was a jump of -36 between index 4 and 5.

Internally, the trend function uses an extra argument toe.divisive() formin.size = 2, which requires a gap of at least 2 points between detecting change points. This can indirectly increase the significance level or decrease the number of change points identified.

Value

Alist with the following values:

indicator: Logical as to whether or not the any forecast data or the final observed data point are a significant change point
output: An n x 7 tibble. The length of the forecast plus the observed data determine the length of n. The columns are:
- Location: A character vector with the location code
- Index: An integer index of all observed and forecast data
- Date: The dates corresponding to all observed and forecast data (formatted as date)
- Value: The incidence of all observed and forecast data (e.g., hospitalization rates)
- Type: Indicates whether the data row is observed or forecast data
- Changepoint: Logical identifying any change point (whether in observed or forecast data). A TRUE is returned if any point is determined a change point based on the user defined significance level (sig_lvl).
- Flagged: Logical indicating whether or not the change point was flagged. Change points are only flagged if they are in the forecast data or are the final observed data point. A TRUE is returned if the Changepoint is TRUE and is a final observed data point or any forecast point.
flagged_dates: The date of any flagged change point(s). If there are none, NA is returned

References

Matteson, D. S., & James, N. A. (2014). A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505), 334–345. https://doi.org/10.1080/01621459.2013.849605

Matteson DS, James NA (2013). “A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.” ArXiv e-prints. To appear in the Journal of the American Statistical Association, 1306.4933.

Gandy, A. (2009) "Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk." Journal of the American Statistical Association.

Examples

## read in example observed data and prep observed signalhosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))tmp_hosp <-  hosp %>%  dplyr::select(date, location, flu.admits) %>%  dplyr::mutate(date = as.Date(date))prepped_observed <- to_signal(tmp_hosp, outcome = "flu.admits",                             type = "observed", resolution = "weeks")## read in example forecast and prep forecast signalprepped_forecast <- read_forecast(system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv",                                               package = "rplanes")) %>%   to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)## prepare seed with cut dateprepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")## run plane componentplane_trend(location = "05", input = prepped_forecast, seed = prepped_seed, sig_lvl = .2)## change locationplane_trend(location = "09", input = prepped_forecast, seed = prepped_seed, sig_lvl = .2)## change sig_lvlplane_trend(location = "06", input = prepped_forecast, seed = prepped_seed, sig_lvl = .05)

Zero component

Description

This function checks for the presence of any value(s) equal to zero in the evaluated signal. If there are any zeros found, then the function assesses whether or not any zeros have been observed in theseed for the given location. If so, the function will consider the evaluated zero plausible and no flag will be raised (i.e., indicator returned asFALSE). If not, the function will consider the evaluated zero implausible and a flag will be raised (i.e., indicator returned asTRUE).

Usage

plane_zero(location, input, seed)

Arguments

location

Character vector with location code; the location must appear in input and seed

input

Input signal data to be scored; object must be one offorecast orobserved

seed

Preparedseed

Value

Alist with the following values:

indicator: Logical as to whether or not there are zeros in evaluated signal but not in seed data

Examples

## read in example observed data and prep observed signalhosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d")prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")## read in example forecast and prep forecast signalfp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")prepped_forecast <- read_forecast(fp) %>%  to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4)## prepare seed with cut dateprepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29")## run plane componentplane_zero(location = "10", input = prepped_forecast, seed = prepped_seed)plane_zero(location = "51", input = prepped_forecast, seed = prepped_seed)

Quantile boundary

Description

This unexported helper generates a vector of lower bound, median, and upper bound for the prediction interval of specified width. The function is used internally inside ofread_forecast.

Usage

q_boundary(pi_width)

Arguments

pi_width

Interval width as an integer

Value

Vector of quantiles corresponding to lower and upper bounds centered on median.

Read in forecast file

Description

This function reads a probabilistic ("quantile") forecast csv file and prepares it for theto_signal function and downstream plausibility analysis. The quantile forecast file can be either a "legacy" or "hubverse" format (see Details for more information). The object returned is atibble with summarized forecast data (i.e., prediction interval) for each location and horizon in the original file.

Usage

read_forecast(file, pi_width = 95, format = "legacy")

Arguments

file

Path to csv file containing quantile forecasts

pi_width

Width of prediction interval as integer; default95 corresponds to 95% prediction interval

format

Format of the probabilistic format file; must be one of"legacy" or"hubverse" (see Details for more information); default is"legacy"

Details

The probabilistic forecast format has been used by multiple forecasting hubs. In general, this format includes one row per combination of quantile, location, target, and horizon. At each row the forecasted value is provided. The specific format, including columns required, has changed over time. This function accommodates the "legacy" as well as more recent "hubverse" formats. For more details on specific columns and see the links in the References.

Value

Atibble with the following columns:

location: Geographic unit such as FIPS code
date: Date corresponding the forecast horizon
horizon: Forecast horizon
lower: Lower limit of the prediction interval for the forecast
point: Point estimate for the forecast
upper: Upper limit of the prediction interval for the forecast

References

Hubverse:https://hubdocs.readthedocs.io/en/latest/user-guide/model-output.html

Legacy:https://github.com/cdcepi/Flusight-forecast-data/tree/master/data-forecasts#forecast-file-format

Examples

## read in example forecast and prep forecast signal (legacy format)fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")read_forecast(fp)fp2 <- system.file("extdata/forecast/2023-11-04-SigSci-TSENS.csv", package = "rplanes")read_forecast(fp2, format = "hubverse")

Resolve resolution

Description

This helper function uses argument matching to resolve the resolution from input. The function also handles casing. This will allow, for example, an input resolution of "daily" or "day" to be resolved to "days".

Usage

resolve_resolution(resolution)

Arguments

resolution

Character vector specifying the temporal resolution (e.g., "days", "weeks", "months")

Value

If the resolution matches to "days", "weeks", or "months" then the match will be returned. If not, the function will throw an error.

`rplanes` explorer app launcher

Description

Therplanes explorer app allows a user to interactively upload their own data (or view an internal example) to explore the plausibility analysis functionality.

Usage

rplanes_explorer(...)

Arguments

...

Additional arguments to be passed toshiny::runApp

Value

This function operates as a side-effect and starts therplanes Shiny app.

Examples

## Not run: # Launch the explorer apprplanes_explorer(host = "0.0.0.0",                 launch.browser = TRUE,                 port = 80)## End(Not run)

Seed engine

Description

This helper function is used inside ofplane_seed to evaluate characteristics of observed data to use for downstream plausibility analysis.

Usage

seed_engine(input, location, cut_date = NULL)

Arguments

input

Input signal data used for seeding; must be an observed signal object

location

Character vector with location code

cut_date

Maximum date (inclusive) for which seeding should be performed; default isNULL and the entire input will be used for seeding

Value

Alist of length 1 with multiple elements corresponding to seed characteristics and metadata for the given location.

Chunk a vector

Description

This unexported helper function creates a list with contents of a vector spit into chunks. The user can specify how large each chunk should be with the "size" argument.

Usage

to_chunk(x, size)

Arguments

x

Vector to be split into chunks as large as the "size" specified

size

Width of the chunks for "x" vector

Value

A list with as many elements as the number of chunks created. Each element will include vector with a length equal to the "size" specified.

Create signal object

Description

This function creates an object of the S3 class "signal". The user can conditionally specify either a "forecast" or "observed" signal.

Usage

to_signal(  input,  outcome,  type = "observed",  resolution = "weeks",  horizon = NULL)

Arguments

input

Data to be converted to signal; see "Details" for more information

outcome

Name of the outcome column in the input data

type

Signal type; must be one of "observed" or "forecast"; default is "observed"

resolution

The temporal resolution of the signal; data can be aggregated daily, weekly, or monthly; default is"weeks"; see "Details" for more information

horizon

Number of time steps ahead for forecast signals; only used iftype="forecast"; default isNULL

Details

The input signal data may be either "observed" or "forecast" data. Depending on the type, the input data must conform to certain format prior to submission. In both cases, the data must be passed as a data frame.

For "observed" data the data frame must at minimum include columns forlocation (geographic unit such as FIPS code) anddate (date of reported value; must bedate class). The data should also include a column that contains the outcome (e.g., case count).

For "forecast" data the data frame must include columns forlocation (geographic unit such as FIPS code),date (date corresponding to forecast horizon; must bedate class or character formatted as 'YYYY-MM-DD'),horizon (forecast horizon),lower (the lower limit of the prediction interval for the forecast),point (the point estimate for the forecast), andupper (the upper limit of the prediction interval for the forecast). Note that theread_forecast function returns data in this format.

The input data must at the daily, weekly, or monthly resolution. The "resolution" parameter is designed to use string matching. This allows flexibility for the user, such that, for example, an input of "day", "days", or "daily" would all resolve to a resolution ofdays. The same rules apply for designating weekly or monthly resolution.

Value

An object of the classsignal. The object will have a second class of eitherobserved orforecast depending on the value passed to the "type" argument.

Examples

hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes"))to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks")fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")ex_forecast <- read_forecast(fp)to_signal(ex_forecast, outcome = "flu.admits", type = "forecast", horizon = 4, resolution = "weeks")

Validate dates

Description

This function validates that there are no gaps or overlaps between dates specified in the "seed_date" and "signal_date". During plausibility component analyses, the function is called to validate the seed against the evaluated signal.

Usage

valid_dates(seed_date, signal_date, resolution, warn_incomplete = FALSE)

Arguments

seed_date

Last date available in seed object

signal_date

First date available in signal object

resolution

Character vector specifying the temporal resolution (e.g., "days", "weeks", "months")

warn_incomplete

Logical as to whether or not the validation should warn for completeness of seed and signal; default isFALSE

Value

The validation will return with astop() if there is an overlap or gap between seed and signal dates. Otherwise the function will invisibly returnTRUE indicating that the date span is valid.

Examples

seed_date <- as.Date("2023-03-08")signal_date <- as.Date("2023-03-15")valid_dates(seed_date = seed_date, signal_date = signal_date, resolution="weeks")x <- try(valid_dates(seed_date = seed_date,                          signal_date = signal_date,                          resolution="days"), silent=TRUE)xx <- try(valid_dates(seed_date = seed_date,                          signal_date = signal_date,                          resolution="months"), silent=TRUE)x

Validate location

Description

This unexported helper is used inside of the individual plausibility component functions (e.g.,plane_diff()) to validate that the location specified appears in both the input signal and seed and that the location has as many values as other locations in the seed.

Usage

valid_location(location, input, seed)

Arguments

location

Character vector with location code; the location must appear in input and seed

input

Input signal data to be scored; object must beforecast

seed

Preparedseed

Value

The validation will return with astop() if the location is not found in the seed or input signal. Otherwise the function will invisibly returnTRUE indicating that the location is valid.

Movatterモバイル変換

rplanes: Plausibility Analysis of Epidemiological Signals

Description

Author(s)

See Also

Pipe operator

Description

Usage

Arguments

Value

Month subtraction

Description

Arguments

Value

Check completeness of seed and signal data

Description

Usage

Arguments

Value

Sliding windows

Description

Usage

Arguments

Value

Cut into categorical differences

Description

Usage

Arguments

Value

Epiweek start

Description

Usage

Arguments

Value

Determine shapes

Description

Usage

Arguments

Value

Check forecast

Description

Usage

Arguments

Value

Examples

Check observed

Description

Usage

Arguments

Value

Examples

Month start

Description

Usage

Arguments

Value

Coverage component

Description

Usage

Arguments

Value

Examples

Difference component

Description

Usage

Arguments

Value

Examples

Repeat component

Description

Usage

Arguments

Value

Examples

Score PLANES components

Description

Usage

Arguments

Value

Examples

`rplanes` explorer app launcher