Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/prtPublic

Tabular Data Backed by Partitioned `fst` Files

License

NotificationsYou must be signed in to change notification settings

nbenn/prt

Repository files navigation

LifecycleCodecov test coverageR build statuspkgdown build statuscovr status

Building ondata.frame serialization provided byfst,prt offers an interface forworking with partitioneddata.frames, saved as individualfst files.

Installation

You can install the development version ofprt from GitHub by running

source("https://install-github.me/nbenn/prt")

Alternatively, if you have theremotes package available, the latestrelease is available by callinginstall_github() as

# install.packages("remotes")remotes::install_github("nbenn/prt@*release")

Short demo

Creating aprt object can be done either by callingnew_prt() on alist of previously createdfst files or by coercing adata.frameobject toprt usingas_prt().

tmp<- tempfile()dir.create(tmp)flights<- as_prt(nycflights13::flights,n_chunks=2L,dir=tmp)#> fstcore package v0.9.14#> (OpenMP was not detected, using single threaded mode)print(flights)#> # A prt:        336,776 × 19#> # Partitioning: [168,388, 168,388] rows#>          year month   day dep_time sched_dep_t…¹ dep_delay arr_time sched_arr_…²#>         <int> <int> <int>    <int>         <int>     <dbl>    <int>        <int>#> 1        2013     1     1      517           515         2      830          819#> 2        2013     1     1      533           529         4      850          830#> 3        2013     1     1      542           540         2      923          850#> 4        2013     1     1      544           545        -1     1004         1022#> 5        2013     1     1      554           600        -6      812          837#> …#> 336,772  2013     9    30       NA          1455        NA       NA         1634#> 336,773  2013     9    30       NA          2200        NA       NA         2312#> 336,774  2013     9    30       NA          1210        NA       NA         1330#> 336,775  2013     9    30       NA          1159        NA       NA         1344#> 336,776  2013     9    30       NA           840        NA       NA         1020#> # ℹ 336,771 more rows#> # ℹ abbreviated names: ¹​sched_dep_time, ²​sched_arr_time#> # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,#> #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,#> #   hour <dbl>, minute <dbl>, time_hour <dttm>

In case aprt object is created from adata.frame, the specifiednumber of files is written to the directory of choice (a newly createddirectory withintempdir() by default).

list.files(tmp)#> [1] "1.fst" "2.fst"

Subsetting and printing is closely modeled aftertibble and behaviorthat deviates from that oftibble will most likely be considered a bug(pleasereport). Some designchoices that do set aprt object apart from atibble include the useofdata.tables for any result of a subsetting operation and thecomplete disregard forrow.names.

In addition to standard subsetting operations involving the functions`[`(),`[[`() and`$`(), the base generic functionsubset() isimplemented for theprt class, enabling subsetting operations usingnon-standard evaluation. Combined with random access to tables stored asfst files, this can make data access more efficient in cases whereonly a subset of the data is of interest.

jan<-flights[flights$month==1, ]identical(jan, subset(flights,month==1))#> [1] TRUEprint(jan)#>        year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time#>     1: 2013     1   1      517            515         2      830            819#>     2: 2013     1   1      533            529         4      850            830#>     3: 2013     1   1      542            540         2      923            850#>     4: 2013     1   1      544            545        -1     1004           1022#>     5: 2013     1   1      554            600        -6      812            837#>    ---#> 27000: 2013     1  31       NA           1325        NA       NA           1505#> 27001: 2013     1  31       NA           1200        NA       NA           1430#> 27002: 2013     1  31       NA           1410        NA       NA           1555#> 27003: 2013     1  31       NA           1446        NA       NA           1757#> 27004: 2013     1  31       NA            625        NA       NA            934#>        arr_delay carrier flight tailnum origin dest air_time distance hour#>     1:        11      UA   1545  N14228    EWR  IAH      227     1400    5#>     2:        20      UA   1714  N24211    LGA  IAH      227     1416    5#>     3:        33      AA   1141  N619AA    JFK  MIA      160     1089    5#>     4:       -18      B6    725  N804JB    JFK  BQN      183     1576    5#>     5:       -25      DL    461  N668DN    LGA  ATL      116      762    6#>    ---#> 27000:        NA      MQ   4475  N730MQ    LGA  RDU       NA      431   13#> 27001:        NA      MQ   4658  N505MQ    LGA  ATL       NA      762   12#> 27002:        NA      MQ   4491  N734MQ    LGA  CLE       NA      419   14#> 27003:        NA      UA    337    <NA>    LGA  IAH       NA     1416   14#> 27004:        NA      UA   1497    <NA>    LGA  IAH       NA     1416    6#>        minute           time_hour#>     1:     15 2013-01-01 05:00:00#>     2:     29 2013-01-01 05:00:00#>     3:     40 2013-01-01 05:00:00#>     4:     45 2013-01-01 05:00:00#>     5:      0 2013-01-01 06:00:00#>    ---#> 27000:     25 2013-01-31 13:00:00#> 27001:      0 2013-01-31 12:00:00#> 27002:     10 2013-01-31 14:00:00#> 27003:     46 2013-01-31 14:00:00#> 27004:     25 2013-01-31 06:00:00

A subsetting operation on aprt object yields adata.table. If thefull table is of interest, aprt-specific implementation of theas.data.table() generic is available.

unlink(tmp,recursive=TRUE)

About

Tabular Data Backed by Partitioned `fst` Files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors3

  •  
  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp