Movatterモバイル変換


[0]ホーム

URL:


covidcast0.5.2

3. Manipulating multiple signals

Source:vignettes/multi-signals.Rmd
multi-signals.Rmd

Various analyses involve working with multiple signals at once. Thecovidcast package provides some helper functions for fetching multiplesignals from the API, and aggregating them into one data frame forvarious downstream uses.

Fetching multiple signals

To load confirmed cases and deaths at the state level, in a singlefunction call, we can usecovidcast_signals() (note theplural form of “signals”):

library(covidcast)start_day<-"2020-06-01"end_day<-"2020-10-01"signals<-covidcast_signals(data_source="jhu-csse",                             signal=c("confirmed_7dav_incidence_prop","deaths_7dav_incidence_prop"),                             start_day=start_day, end_day=end_day,                             geo_type="state", geo_values="tx")summary(signals[[1]])
A `covidcast_signal` dataframe with 123 rows and 15 columns.data_source : jhu-cssesignal      : confirmed_7dav_incidence_propgeo_type    : statefirst date                          : 2020-06-01last date                           : 2020-10-01median number of geo_values per day : 1
summary(signals[[2]])
A `covidcast_signal` dataframe with 123 rows and 15 columns.data_source : jhu-cssesignal      : deaths_7dav_incidence_propgeo_type    : statefirst date                          : 2020-06-01last date                           : 2020-10-01median number of geo_values per day : 1

This returns a list ofcovidcast_signal objects. Theargument structure forcovidcast_signals() matches that ofcovidcast_signal(), except the first four arguments(data_source,signal,start_day,end_day) are allowed to be vectors. See thecovidcast_signals() documentation for details.

Aggregating signals, wide format

To aggregate multiple signals together, we can use theaggregate_signals() function, which accepts a list ofcovidcast_signal objects, as returned bycovidcast_signals(). With all arguments set to theirdefault values,aggregate_signals() returns a data frame in“wide” format:

library(dplyr)aggregate_signals(signals)%>%head()
  geo_value time_value value+0:jhu-csse_confirmed_7dav_incidence_prop1        tx 2020-06-01                                       3.3932562        tx 2020-06-02                                       3.6443203        tx 2020-06-03                                       3.7236294        tx 2020-06-04                                       6.9850285        tx 2020-06-05                                       7.9201926        tx 2020-06-06                                       8.034533  value+0:jhu-csse_deaths_7dav_incidence_prop1                                   0.08563422                                   0.09536543                                   0.09098644                                   0.09779825                                   0.10023106                                   0.0909864

In “wide” format, only the latest issue of data is retained, and thecolumnsdata_source,signal,issue,lag,stderr,sample_size are all dropped from the returned data frame.Each unique signal—defined by a combination of data source name, signalname, and time-shift—is given its own column, whose name indicates itsdefining quantities.

As hinted above,aggregate_signals() can also applytime-shifts to the given signals, through the optionaldtargument. This can be either be a single vector of shifts or a list ofvectors of shifts, this list having the same length as the list ofcovidcast_signal objects (to apply, respectively, the sameshifts or a different set of shifts to eachcovidcast_signal object). Negative shifts translate into inalag value and positive shifts into alead value; forexample, ifdt = -1, then the value on June 2 that getsreported is the original value on June 1; ifdt = 0, thenthe values are left as is.

aggregate_signals(signals, dt=c(-1,0))%>%head()
  geo_value time_value value-1:jhu-csse_confirmed_7dav_incidence_prop1        tx 2020-06-01                                             NA2        tx 2020-06-02                                       3.3932563        tx 2020-06-03                                       3.6443204        tx 2020-06-04                                       3.7236295        tx 2020-06-05                                       6.9850286        tx 2020-06-06                                       7.920192  value+0:jhu-csse_confirmed_7dav_incidence_prop1                                       3.3932562                                       3.6443203                                       3.7236294                                       6.9850285                                       7.9201926                                       8.034533  value-1:jhu-csse_deaths_7dav_incidence_prop1                                          NA2                                   0.08563423                                   0.09536544                                   0.09098645                                   0.09779826                                   0.1002310  value+0:jhu-csse_deaths_7dav_incidence_prop1                                   0.08563422                                   0.09536543                                   0.09098644                                   0.09779825                                   0.10023106                                   0.0909864
aggregate_signals(signals, dt=list(0,c(-1,0,1)))%>%head()
  geo_value time_value value+0:jhu-csse_confirmed_7dav_incidence_prop1        tx 2020-06-01                                       3.3932562        tx 2020-06-02                                       3.6443203        tx 2020-06-03                                       3.7236294        tx 2020-06-04                                       6.9850285        tx 2020-06-05                                       7.9201926        tx 2020-06-06                                       8.034533  value-1:jhu-csse_deaths_7dav_incidence_prop1                                          NA2                                   0.08563423                                   0.09536544                                   0.09098645                                   0.09779826                                   0.1002310  value+0:jhu-csse_deaths_7dav_incidence_prop1                                   0.08563422                                   0.09536543                                   0.09098644                                   0.09779825                                   0.10023106                                   0.0909864  value+1:jhu-csse_deaths_7dav_incidence_prop1                                   0.09536542                                   0.09098643                                   0.09779824                                   0.10023105                                   0.09098646                                   0.0885536

Finally,aggregate_signals() also accepts a single dataframe (instead of a list of data frames), intended to be convenient whenapplying shifts to a singlecovidcast_signal object:

aggregate_signals(signals[[1]], dt=c(-1,0,1))%>%head()
  geo_value time_value value-1:jhu-csse_confirmed_7dav_incidence_prop1        tx 2020-06-01                                             NA2        tx 2020-06-02                                       3.3932563        tx 2020-06-03                                       3.6443204        tx 2020-06-04                                       3.7236295        tx 2020-06-05                                       6.9850286        tx 2020-06-06                                       7.920192  value+0:jhu-csse_confirmed_7dav_incidence_prop1                                       3.3932562                                       3.6443203                                       3.7236294                                       6.9850285                                       7.9201926                                       8.034533  value+1:jhu-csse_confirmed_7dav_incidence_prop1                                       3.6443202                                       3.7236293                                       6.9850284                                       7.9201925                                       8.0345336                                       7.957171

Aggregating signals, long format

We can also useaggregate_signals() in “long” format,with one observation per row:

aggregate_signals(signals, format="long")%>%head()
  data_source                        signal geo_value time_value   source1    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse2    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse3    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse4    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-04 jhu-csse5    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-05 jhu-csse6    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-06 jhu-csse  geo_type time_type      issue  lag missing_value missing_stderr1    state       day 2023-03-03 1005             0              52    state       day 2023-03-03 1004             0              53    state       day 2023-03-03 1003             0              54    state       day 2023-03-03 1002             0              55    state       day 2023-03-03 1001             0              56    state       day 2023-03-03 1000             0              5  missing_sample_size stderr sample_size dt    value1                   5     NA          NA  0 3.3932562                   5     NA          NA  0 3.6443203                   5     NA          NA  0 3.7236294                   5     NA          NA  0 6.9850285                   5     NA          NA  0 7.9201926                   5     NA          NA  0 8.034533
aggregate_signals(signals, dt=c(-1,0), format="long")%>%head()
  data_source                        signal geo_value time_value   source1    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse2    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse3    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse4    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse5    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse6    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse  geo_type time_type      issue  lag missing_value missing_stderr1    state       day 2023-03-03 1005             0              52    state       day 2023-03-03 1005             0              53    state       day 2023-03-03 1004             0              54    state       day 2023-03-03 1004             0              55    state       day 2023-03-03 1003             0              56    state       day 2023-03-03 1003             0              5  missing_sample_size stderr sample_size dt    value1                   5     NA          NA -1       NA2                   5     NA          NA  0 3.3932563                   5     NA          NA -1 3.3932564                   5     NA          NA  0 3.6443205                   5     NA          NA -1 3.6443206                   5     NA          NA  0 3.723629
aggregate_signals(signals, dt=list(-1,0), format="long")%>%head()
  data_source                        signal geo_value time_value   source1    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse2    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse3    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse4    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-04 jhu-csse5    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-05 jhu-csse6    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-06 jhu-csse  geo_type time_type      issue  lag missing_value missing_stderr1    state       day 2023-03-03 1005             0              52    state       day 2023-03-03 1004             0              53    state       day 2023-03-03 1003             0              54    state       day 2023-03-03 1002             0              55    state       day 2023-03-03 1001             0              56    state       day 2023-03-03 1000             0              5  missing_sample_size stderr sample_size dt    value1                   5     NA          NA -1       NA2                   5     NA          NA -1 3.3932563                   5     NA          NA -1 3.6443204                   5     NA          NA -1 3.7236295                   5     NA          NA -1 6.9850286                   5     NA          NA -1 7.920192

As we can see, time-shifts work just as before, in “wide” format.However, in “long” format, all columns are retained, and an additionaldt column is added to record the time-shift being used.

Just as before,covidcast_signals() can also operate ona single data frame, to conveniently apply shifts, in “long” format:

aggregate_signals(signals[[1]], dt=c(-1,0), format="long")%>%head()
  data_source                        signal geo_value time_value   source1    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse2    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 jhu-csse3    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse4    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 jhu-csse5    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse6    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 jhu-csse  geo_type time_type      issue  lag missing_value missing_stderr1    state       day 2023-03-03 1005             0              52    state       day 2023-03-03 1005             0              53    state       day 2023-03-03 1004             0              54    state       day 2023-03-03 1004             0              55    state       day 2023-03-03 1003             0              56    state       day 2023-03-03 1003             0              5  missing_sample_size stderr sample_size dt    value1                   5     NA          NA -1       NA2                   5     NA          NA  0 3.3932563                   5     NA          NA -1 3.3932564                   5     NA          NA  0 3.6443205                   5     NA          NA -1 3.6443206                   5     NA          NA  0 3.723629

Pivoting longer or wider

The package also provides functions for pivoting an aggregated signaldata frame longer or wider. These are essentially wrappers aroundpivot_longer() andpivot_wider() from thetidyr package, that set the column structure and columnnames appropriately. For example, to pivot longer:

aggregate_signals(signals, dt=list(-1,0))%>%covidcast_longer()%>%head()
  data_source                        signal geo_value time_value dt     value1    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-01 -1        NA2    jhu-csse    deaths_7dav_incidence_prop        tx 2020-06-01  0 0.08563423    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-02 -1 3.39325604    jhu-csse    deaths_7dav_incidence_prop        tx 2020-06-02  0 0.09536545    jhu-csse confirmed_7dav_incidence_prop        tx 2020-06-03 -1 3.64432006    jhu-csse    deaths_7dav_incidence_prop        tx 2020-06-03  0 0.0909864

And to pivot wider:

aggregate_signals(signals, dt=list(-1,0), format="long")%>%covidcast_wider()%>%head()
  geo_value time_value value-1:jhu-csse_confirmed_7dav_incidence_prop1        tx 2020-06-01                                             NA2        tx 2020-06-02                                       3.3932563        tx 2020-06-03                                       3.6443204        tx 2020-06-04                                       3.7236295        tx 2020-06-05                                       6.9850286        tx 2020-06-06                                       7.920192  value+0:jhu-csse_deaths_7dav_incidence_prop1                                   0.08563422                                   0.09536543                                   0.09098644                                   0.09779825                                   0.10023106                                   0.0909864

A sanity check

Lastly, here’s a small sanity check, that lagging cases by 7 daysusingaggregate_signals() and correlating this with deathsusingcovidcast_cor() yields the same result as tellingcovidcast_cor() to do the time-shifting itself:

df_cor1<-covidcast_cor(x=aggregate_signals(signals[[1]], dt=-7,                                              format="long"),                        y=signals[[2]])df_cor2<-covidcast_cor(x=signals[[1]], y=signals[[2]], dt_x=-7)identical(df_cor1,df_cor2)
[1] TRUE

[8]ページ先頭

©2009-2025 Movatter.jp