Movatterモバイル変換


[0]ホーム

URL:


prt0.2.1

prt

Building ondata.frame serialization provided byfst,prt offers an interface for working with partitioneddata.frames, saved as individualfst files.

Installation

You can install the development version ofprt from GitHub by running

source("https://install-github.me/nbenn/prt")

Alternatively, if you have theremotes package available, the latest release is available by callinginstall_github() as

# install.packages("remotes")remotes::install_github("nbenn/prt@*release")

Short demo

Creating aprt object can be done either by callingnew_prt() on a list of previously createdfst files or by coercing adata.frame object toprt usingas_prt().

tmp<-tempfile()dir.create(tmp)flights<-as_prt(nycflights13::flights, n_chunks=2L, dir=tmp)#> fstcore package v0.9.14#> (OpenMP was not detected, using single threaded mode)print(flights)#> # A prt:        336,776 × 19#> # Partitioning: [168,388, 168,388] rows#>          year month   day dep_time sched_dep_t…¹ dep_delay arr_time sched_arr_…²#>         <int> <int> <int>    <int>         <int>     <dbl>    <int>        <int>#> 1        2013     1     1      517           515         2      830          819#> 2        2013     1     1      533           529         4      850          830#> 3        2013     1     1      542           540         2      923          850#> 4        2013     1     1      544           545        -1     1004         1022#> 5        2013     1     1      554           600        -6      812          837#> …#> 336,772  2013     9    30       NA          1455        NA       NA         1634#> 336,773  2013     9    30       NA          2200        NA       NA         2312#> 336,774  2013     9    30       NA          1210        NA       NA         1330#> 336,775  2013     9    30       NA          1159        NA       NA         1344#> 336,776  2013     9    30       NA           840        NA       NA         1020#> # ℹ 336,771 more rows#> # ℹ abbreviated names: ¹​sched_dep_time, ²​sched_arr_time#> # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,#> #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,#> #   hour <dbl>, minute <dbl>, time_hour <dttm>

In case aprt object is created from adata.frame, the specified number of files is written to the directory of choice (a newly created directory withintempdir() by default).

list.files(tmp)#> [1] "1.fst" "2.fst"

Subsetting and printing is closely modeled aftertibble and behavior that deviates from that oftibble will most likely be considered a bug (pleasereport). Some design choices that do set aprt object apart from atibble include the use ofdata.tables for any result of a subsetting operation and the complete disregard forrow.names.

In addition to standard subsetting operations involving the functions`[`(),`[[`() and`$`(), the base generic functionsubset() is implemented for theprt class, enabling subsetting operations using non-standard evaluation. Combined with random access to tables stored asfst files, this can make data access more efficient in cases where only a subset of the data is of interest.

jan<-flights[flights$month==1,]identical(jan,subset(flights,month==1))#> [1] TRUEprint(jan)#>        year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time#>     1: 2013     1   1      517            515         2      830            819#>     2: 2013     1   1      533            529         4      850            830#>     3: 2013     1   1      542            540         2      923            850#>     4: 2013     1   1      544            545        -1     1004           1022#>     5: 2013     1   1      554            600        -6      812            837#>    ---#> 27000: 2013     1  31       NA           1325        NA       NA           1505#> 27001: 2013     1  31       NA           1200        NA       NA           1430#> 27002: 2013     1  31       NA           1410        NA       NA           1555#> 27003: 2013     1  31       NA           1446        NA       NA           1757#> 27004: 2013     1  31       NA            625        NA       NA            934#>        arr_delay carrier flight tailnum origin dest air_time distance hour#>     1:        11      UA   1545  N14228    EWR  IAH      227     1400    5#>     2:        20      UA   1714  N24211    LGA  IAH      227     1416    5#>     3:        33      AA   1141  N619AA    JFK  MIA      160     1089    5#>     4:       -18      B6    725  N804JB    JFK  BQN      183     1576    5#>     5:       -25      DL    461  N668DN    LGA  ATL      116      762    6#>    ---#> 27000:        NA      MQ   4475  N730MQ    LGA  RDU       NA      431   13#> 27001:        NA      MQ   4658  N505MQ    LGA  ATL       NA      762   12#> 27002:        NA      MQ   4491  N734MQ    LGA  CLE       NA      419   14#> 27003:        NA      UA    337    <NA>    LGA  IAH       NA     1416   14#> 27004:        NA      UA   1497    <NA>    LGA  IAH       NA     1416    6#>        minute           time_hour#>     1:     15 2013-01-01 05:00:00#>     2:     29 2013-01-01 05:00:00#>     3:     40 2013-01-01 05:00:00#>     4:     45 2013-01-01 05:00:00#>     5:      0 2013-01-01 06:00:00#>    ---#> 27000:     25 2013-01-31 13:00:00#> 27001:      0 2013-01-31 12:00:00#> 27002:     10 2013-01-31 14:00:00#> 27003:     46 2013-01-31 14:00:00#> 27004:     25 2013-01-31 06:00:00

A subsetting operation on aprt object yields adata.table. If the full table is of interest, aprt-specific implementation of theas.data.table() generic is available.

unlink(tmp, recursive=TRUE)

Links

License

Citation

Developers

Dev status

  • Lifecycle
  • Codecov test coverage
  • R build status
  • pkgdown build status
  • covr status

[8]ページ先頭

©2009-2025 Movatter.jp