Various analyses involve working with multiple signals at once. Thecovidcast package provides some helper functions for fetching multiplesignals from the API, and aggregating them into one data frame forvarious downstream uses.
To load confirmed cases and deaths at the state level, in a singlefunction call, we can usecovidcast_signals() (note theplural form of “signals”):
library(covidcast)start_day<-"2020-06-01"end_day<-"2020-10-01"signals<-covidcast_signals(data_source="jhu-csse", signal=c("confirmed_7dav_incidence_prop","deaths_7dav_incidence_prop"), start_day=start_day, end_day=end_day, geo_type="state", geo_values="tx")summary(signals[[1]])A `covidcast_signal` dataframe with 123 rows and 15 columns.data_source : jhu-cssesignal : confirmed_7dav_incidence_propgeo_type : statefirst date : 2020-06-01last date : 2020-10-01median number of geo_values per day : 1summary(signals[[2]])A `covidcast_signal` dataframe with 123 rows and 15 columns.data_source : jhu-cssesignal : deaths_7dav_incidence_propgeo_type : statefirst date : 2020-06-01last date : 2020-10-01median number of geo_values per day : 1This returns a list ofcovidcast_signal objects. Theargument structure forcovidcast_signals() matches that ofcovidcast_signal(), except the first four arguments(data_source,signal,start_day,end_day) are allowed to be vectors. See thecovidcast_signals() documentation for details.
To aggregate multiple signals together, we can use theaggregate_signals() function, which accepts a list ofcovidcast_signal objects, as returned bycovidcast_signals(). With all arguments set to theirdefault values,aggregate_signals() returns a data frame in“wide” format:
library(dplyr)aggregate_signals(signals)%>%head() geo_value time_value value+0:jhu-csse_confirmed_7dav_incidence_prop1 tx 2020-06-01 3.3932562 tx 2020-06-02 3.6443203 tx 2020-06-03 3.7236294 tx 2020-06-04 6.9850285 tx 2020-06-05 7.9201926 tx 2020-06-06 8.034533 value+0:jhu-csse_deaths_7dav_incidence_prop1 0.08563422 0.09536543 0.09098644 0.09779825 0.10023106 0.0909864In “wide” format, only the latest issue of data is retained, and thecolumnsdata_source,signal,issue,lag,stderr,sample_size are all dropped from the returned data frame.Each unique signal—defined by a combination of data source name, signalname, and time-shift—is given its own column, whose name indicates itsdefining quantities.
As hinted above,aggregate_signals() can also applytime-shifts to the given signals, through the optionaldtargument. This can be either be a single vector of shifts or a list ofvectors of shifts, this list having the same length as the list ofcovidcast_signal objects (to apply, respectively, the sameshifts or a different set of shifts to eachcovidcast_signal object). Negative shifts translate into inalag value and positive shifts into alead value; forexample, ifdt = -1, then the value on June 2 that getsreported is the original value on June 1; ifdt = 0, thenthe values are left as is.
aggregate_signals(signals, dt=c(-1,0))%>%head() geo_value time_value value-1:jhu-csse_confirmed_7dav_incidence_prop1 tx 2020-06-01 NA2 tx 2020-06-02 3.3932563 tx 2020-06-03 3.6443204 tx 2020-06-04 3.7236295 tx 2020-06-05 6.9850286 tx 2020-06-06 7.920192 value+0:jhu-csse_confirmed_7dav_incidence_prop1 3.3932562 3.6443203 3.7236294 6.9850285 7.9201926 8.034533 value-1:jhu-csse_deaths_7dav_incidence_prop1 NA2 0.08563423 0.09536544 0.09098645 0.09779826 0.1002310 value+0:jhu-csse_deaths_7dav_incidence_prop1 0.08563422 0.09536543 0.09098644 0.09779825 0.10023106 0.0909864aggregate_signals(signals, dt=list(0,c(-1,0,1)))%>%head() geo_value time_value value+0:jhu-csse_confirmed_7dav_incidence_prop1 tx 2020-06-01 3.3932562 tx 2020-06-02 3.6443203 tx 2020-06-03 3.7236294 tx 2020-06-04 6.9850285 tx 2020-06-05 7.9201926 tx 2020-06-06 8.034533 value-1:jhu-csse_deaths_7dav_incidence_prop1 NA2 0.08563423 0.09536544 0.09098645 0.09779826 0.1002310 value+0:jhu-csse_deaths_7dav_incidence_prop1 0.08563422 0.09536543 0.09098644 0.09779825 0.10023106 0.0909864 value+1:jhu-csse_deaths_7dav_incidence_prop1 0.09536542 0.09098643 0.09779824 0.10023105 0.09098646 0.0885536Finally,aggregate_signals() also accepts a single dataframe (instead of a list of data frames), intended to be convenient whenapplying shifts to a singlecovidcast_signal object:
aggregate_signals(signals[[1]], dt=c(-1,0,1))%>%head() geo_value time_value value-1:jhu-csse_confirmed_7dav_incidence_prop1 tx 2020-06-01 NA2 tx 2020-06-02 3.3932563 tx 2020-06-03 3.6443204 tx 2020-06-04 3.7236295 tx 2020-06-05 6.9850286 tx 2020-06-06 7.920192 value+0:jhu-csse_confirmed_7dav_incidence_prop1 3.3932562 3.6443203 3.7236294 6.9850285 7.9201926 8.034533 value+1:jhu-csse_confirmed_7dav_incidence_prop1 3.6443202 3.7236293 6.9850284 7.9201925 8.0345336 7.957171We can also useaggregate_signals() in “long” format,with one observation per row:
aggregate_signals(signals, format="long")%>%head() data_source signal geo_value time_value source1 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-01 jhu-csse2 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-02 jhu-csse3 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-03 jhu-csse4 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-04 jhu-csse5 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-05 jhu-csse6 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-06 jhu-csse geo_type time_type issue lag missing_value missing_stderr1 state day 2023-03-03 1005 0 52 state day 2023-03-03 1004 0 53 state day 2023-03-03 1003 0 54 state day 2023-03-03 1002 0 55 state day 2023-03-03 1001 0 56 state day 2023-03-03 1000 0 5 missing_sample_size stderr sample_size dt value1 5 NA NA 0 3.3932562 5 NA NA 0 3.6443203 5 NA NA 0 3.7236294 5 NA NA 0 6.9850285 5 NA NA 0 7.9201926 5 NA NA 0 8.034533aggregate_signals(signals, dt=c(-1,0), format="long")%>%head() data_source signal geo_value time_value source1 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-01 jhu-csse2 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-01 jhu-csse3 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-02 jhu-csse4 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-02 jhu-csse5 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-03 jhu-csse6 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-03 jhu-csse geo_type time_type issue lag missing_value missing_stderr1 state day 2023-03-03 1005 0 52 state day 2023-03-03 1005 0 53 state day 2023-03-03 1004 0 54 state day 2023-03-03 1004 0 55 state day 2023-03-03 1003 0 56 state day 2023-03-03 1003 0 5 missing_sample_size stderr sample_size dt value1 5 NA NA -1 NA2 5 NA NA 0 3.3932563 5 NA NA -1 3.3932564 5 NA NA 0 3.6443205 5 NA NA -1 3.6443206 5 NA NA 0 3.723629aggregate_signals(signals, dt=list(-1,0), format="long")%>%head() data_source signal geo_value time_value source1 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-01 jhu-csse2 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-02 jhu-csse3 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-03 jhu-csse4 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-04 jhu-csse5 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-05 jhu-csse6 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-06 jhu-csse geo_type time_type issue lag missing_value missing_stderr1 state day 2023-03-03 1005 0 52 state day 2023-03-03 1004 0 53 state day 2023-03-03 1003 0 54 state day 2023-03-03 1002 0 55 state day 2023-03-03 1001 0 56 state day 2023-03-03 1000 0 5 missing_sample_size stderr sample_size dt value1 5 NA NA -1 NA2 5 NA NA -1 3.3932563 5 NA NA -1 3.6443204 5 NA NA -1 3.7236295 5 NA NA -1 6.9850286 5 NA NA -1 7.920192As we can see, time-shifts work just as before, in “wide” format.However, in “long” format, all columns are retained, and an additionaldt column is added to record the time-shift being used.
Just as before,covidcast_signals() can also operate ona single data frame, to conveniently apply shifts, in “long” format:
aggregate_signals(signals[[1]], dt=c(-1,0), format="long")%>%head() data_source signal geo_value time_value source1 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-01 jhu-csse2 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-01 jhu-csse3 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-02 jhu-csse4 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-02 jhu-csse5 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-03 jhu-csse6 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-03 jhu-csse geo_type time_type issue lag missing_value missing_stderr1 state day 2023-03-03 1005 0 52 state day 2023-03-03 1005 0 53 state day 2023-03-03 1004 0 54 state day 2023-03-03 1004 0 55 state day 2023-03-03 1003 0 56 state day 2023-03-03 1003 0 5 missing_sample_size stderr sample_size dt value1 5 NA NA -1 NA2 5 NA NA 0 3.3932563 5 NA NA -1 3.3932564 5 NA NA 0 3.6443205 5 NA NA -1 3.6443206 5 NA NA 0 3.723629The package also provides functions for pivoting an aggregated signaldata frame longer or wider. These are essentially wrappers aroundpivot_longer() andpivot_wider() from thetidyr package, that set the column structure and columnnames appropriately. For example, to pivot longer:
aggregate_signals(signals, dt=list(-1,0))%>%covidcast_longer()%>%head() data_source signal geo_value time_value dt value1 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-01 -1 NA2 jhu-csse deaths_7dav_incidence_prop tx 2020-06-01 0 0.08563423 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-02 -1 3.39325604 jhu-csse deaths_7dav_incidence_prop tx 2020-06-02 0 0.09536545 jhu-csse confirmed_7dav_incidence_prop tx 2020-06-03 -1 3.64432006 jhu-csse deaths_7dav_incidence_prop tx 2020-06-03 0 0.0909864And to pivot wider:
aggregate_signals(signals, dt=list(-1,0), format="long")%>%covidcast_wider()%>%head() geo_value time_value value-1:jhu-csse_confirmed_7dav_incidence_prop1 tx 2020-06-01 NA2 tx 2020-06-02 3.3932563 tx 2020-06-03 3.6443204 tx 2020-06-04 3.7236295 tx 2020-06-05 6.9850286 tx 2020-06-06 7.920192 value+0:jhu-csse_deaths_7dav_incidence_prop1 0.08563422 0.09536543 0.09098644 0.09779825 0.10023106 0.0909864Lastly, here’s a small sanity check, that lagging cases by 7 daysusingaggregate_signals() and correlating this with deathsusingcovidcast_cor() yields the same result as tellingcovidcast_cor() to do the time-shifting itself:
df_cor1<-covidcast_cor(x=aggregate_signals(signals[[1]], dt=-7, format="long"), y=signals[[2]])df_cor2<-covidcast_cor(x=signals[[1]], y=signals[[2]], dt_x=-7)identical(df_cor1,df_cor2)[1] TRUE