Movatterモバイル変換


[0]ホーム

URL:


Title:Tabular Data Backed by Partitioned 'fst' Files
Version:0.2.1
Description:Intended for larger-than-memory tabular data, 'prt' objects provide an interface to read row and/or column subsets into memory as data.table objects. Data queries, constructed as 'R' expressions, are evaluated using the non-standard evaluation framework provided by 'rlang' and file-backing is powered by the fast and efficient 'fst' package.
URL:https://nbenn.github.io/prt/
BugReports:https://github.com/nbenn/prt/issues
License:GPL-3
Imports:assertthat, fst, data.table, utils, vctrs, tibble, cli, pillar(≥ 1.7.0), crayon, backports, rlang,
Suggests:testthat, xml2, covr, withr, nycflights13, datasets,rmarkdown, knitr, bench
Encoding:UTF-8
RoxygenNote:7.3.2
VignetteBuilder:knitr
NeedsCompilation:no
Packaged:2025-09-03 20:23:08 UTC; nbennett
Author:Nicolas Bennett [aut, cre], Drago Plecko [ctb]
Maintainer:Nicolas Bennett <r@nbenn.ch>
Repository:CRAN
Date/Publication:2025-09-03 21:50:16 UTC

Get a glimpse of your data

Description

Thetibble S3 generic functionpillar::glimpse() is implemented forprt objects as well. Inspired by the output ofstr() when applied todata.frames, this function is intended to display the structure of thedata in terms of columns, irrespective of how the data is organized in termsofR objects. Similarly toformat_dt(), the function providing the bulkof functionality,glimpse_dt(), is exported such that implementing aclass specificpillar::glimpse() function for other classes thatrepresenting tabular data is straightforward.

Usage

## S3 method for class 'prt'glimpse(x, width = NULL, ...)glimpse_dt(x, width = NULL)str_sum(x)## S3 method for class 'prt'str(object, ...)str_dt(x, ...)

Arguments

x

An object to glimpse at.

width

Width of output: defaults to the setting of thewidthoption (if finite)or the width of the console.

...

Unused, for extensibility.

object

anyR object about which you want to have someinformation.

Details

Alongside aprt-specificpillar::glimpse() method, astr() method isprovided as well forprt objects. However, breaking with baseRexpectations, it is not the structure of the object in terms ofR objectsthat is shown, but in the same spirit aspillar::glimpse() it is thestructure of the data that is printed. How this data is represents withrespect toR objects is abstracted away as to show output as would beexpected if the data were represented by adata.frame.

In similar spirit asformat_dt() andglimpse_dt(), astr_dt() functionis exported which provides the core functionality driving theprtimplementation ofstr(). This function requires availability of ahead() function for any object that is passed and output can becustomized by implementing an optionalstr_sum() function.

Examples

cars <- as_prt(mtcars)pillar::glimpse(cars)pillar::glimpse(cars, width = 30)str(cars)str(cars, vec.len = 1)str(unclass(cars))str_sum(cars)

Methods for creating and inspecting prt objects

Description

The constructornew_prt() creates aprt object from one or severalfst files, making sure that each table consist of identically named,ordered and typed columns. In order to create aprt object from anin-memory table,as_prt() coerces objects inheriting fromdata.frametoprt by first splitting rows inton_chunks, writingfst files to thedirectorydir and callingnew_prt() on the resultingfst files. Ifthis default splitting of rows (which might impact efficiency of subsequentqueries on the data) is not optimal, a list of objects inheriting fromdata.frame is a validx argument as well.

Usage

new_prt(files)as_prt(x, n_chunks = NULL, dir = tempfile())is_prt(x)n_part(x)part_nrow(x)## S3 method for class 'prt'head(x, n = 6L, ...)## S3 method for class 'prt'tail(x, n = 6L, ...)## S3 method for class 'prt'as.data.table(x, ...)## S3 method for class 'prt'as.list(x, ...)## S3 method for class 'prt'as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'prt'as.matrix(x, ...)

Arguments

files

Character vector of file name(s).

x

Aprt object.

n_chunks

Count variable specifying the number of chunksx issplit into.

dir

Directory where the chunkedfst::fst() objects reside in.

n

Count variable indicating the number of rows to return.

...

Generic consistency: additional arguments are ignored and awarning is issued.

row.names,optional

Generic consistency: passing anything other thanthe default value issues a warning.

Details

To check whether an object inherits fromprt, the functionis_prt() isexported, the number of partitions can be queried by callingn_part() andthe number of rows per partition is available aspart_nrow().

The baseR S3 generic functionsdim(),length(),dimnames() andnames(),haveprt-specific implementations, wheredim() returns theoverall table dimensions,length() is synonymous forncol(),dimnames() returns a length 2 list containingNULL column names ascharacter vector andnames() is synonymous forcolnames(). Both settingand getting row names onprt objects is not supported and more generally,calling replacement functions such as⁠names<-()⁠ or⁠dimnames<-()⁠ leadsto an error, asprt objects are immutable. The baseR S3 genericfunctionshead() andtail() are available as well and are usedinternally to provide an extensible mechanism for printing (seeformat_dt()).

Coercion to other baseR objects is possible viaas.list(),as.data.frame() andas.matrix() and for coercion todata.table, itsgeneric functiondata.table::as.data.table() is available toprtobjects. All coercion involves reading the full data into memory at oncewhich might be problematic in cases of large data sets.

Examples

cars <- as_prt(mtcars, n_chunks = 2L)is_prt(cars)n_part(cars)part_nrow(cars)nrow(cars)ncol(cars)colnames(cars)names(cars)head(cars)tail(cars, n = 2)str(as.list(cars))str(as.data.frame(cars))

NSE subsetting

Description

A cornerstone feature ofprt is the ability to load a (small) subset ofrows (or columns) from a much larger tabular dataset. In order to specifysuch a subset, an implementation of the base R S3 generic functionsubset() is provided, driving the non-standard evaluation (NSE) of anexpression within the context of the data (with similar semantics as thebase R implementation fordata.frames).

Usage

## S3 method for class 'prt'subset(x, subset, select, part_safe = FALSE, drop = FALSE, ...)subset_quo(  x,  subset = NULL,  select = NULL,  part_safe = FALSE,  env = parent.frame())

Arguments

x

object to be subsetted.

subset

logical expression indicating elements or rows to keep:missing values are taken as false.

select

expression, indicating columns to select from adata frame.

part_safe

Logical flag indicating whether thesubset expressioncan be safely be applied to individual partitions.

drop

passed on to[ indexing operator.

...

further arguments to be passed to or from other methods.

env

The environment in whichsubset andselect are evaluated in.This environment is not applicable for quosures because they have their ownenvironments.

Details

The functions powering NSE arerlang::enquo() which quote thesubset andselect arguments andrlang::eval_tidy() which evaluates theexpressions. This allows for somerlang-specific features to be used, such as the.data/.env pronouns, or the double-curly brace forwarding operator. Forsome example code, please refer tovignette("prt", package = "prt").

While the functionsubset() quotes the arguments passed assubset andselect, the functionsubset_quo() can be used to operate on alreadyquoted expressions. A final noteworthy departure from the base R interfaceis thepart_safe argument: this logical flag indicates whether it is safeto evaluate the expression on partitions individually or whetherdependencies between partitions prevent this from yielding correct results.As it is not straightforward to determine if dependencies might exists fromthe expression alone, the default isFALSE, which in many cases willresult in a less efficient resolution of the row-selection and it is up tothe user to enable this optimization.

Examples

dat <- as_prt(mtcars, n_chunks = 2L)subset(dat, cyl == 6)subset(dat, cyl == 6 & hp > 110)colnames(subset(dat, select = mpg:hp))colnames(subset(dat, select = -c(vs, am)))sub_6 <- subset(dat, cyl == 6)thresh <- 6identical(subset(dat, cyl == thresh), sub_6)identical(subset(dat, cyl == .env$thresh), sub_6)cyl <- 6identical(subset(dat, cyl == cyl), data.table::as.data.table(dat))identical(subset(dat, cyl == !!cyl), sub_6)identical(subset(dat, .data$cyl == .env$cyl), sub_6)expr <- quote(cyl == 6)# passing a quoted expression to subset() will yield an error## Not run:   subset(dat, expr)## End(Not run)identical(subset_quo(dat, expr), sub_6)identical(  subset(dat, qsec > mean(qsec), part_safe = TRUE),  subset(dat, qsec > mean(qsec), part_safe = FALSE))

Printing prt

Description

Printing ofprt objects combines the concise yet informative designof only showing as many columns as the terminal width allows for, introducedbytibble, with thedata.table approach of showing both the first andlast few rows of a table. Implementation wise, the interface is designed tomimic that oftibble printing as closely as possibly, offering the samefunction arguments and using the same option settings (and default values)as introduced bytibble.

Usage

## S3 method for class 'prt'print(x, ..., n = NULL, width = NULL, max_extra_cols = NULL)## S3 method for class 'prt'format(x, ..., n = NULL, width = NULL, max_extra_cols = NULL)format_dt(  x,  ...,  n = NULL,  width = NULL,  max_extra_cols = NULL,  max_footer_lines = NULL)trunc_dt(...)

Arguments

x

Object to format or print.

...

These dots are for future extensions and must be empty.

n

Number of rows to show. IfNULL, the default, will print all rowsif less than theprint_maxoption.Otherwise, will print as many rows as specified by theprint_minoption.

width

Width of text output to generate. This defaults toNULL, whichmeans use thewidthoption.

max_extra_cols

Number of extra columns to print abbreviated information for,if the width is too small for the entire tibble. IfNULL,themax_extra_colsoption is used.The previously definedn_extra argument is soft-deprecated.

max_footer_lines

Maximum number of footer lines. IfNULL,themax_footer_linesoption is used.

Details

While the functiontibble::trunc_mat() does most of the heavy liftingfor formattingtibble printing output,prt exports the functiontrunc_dt(), which drives analogous functionality while adding thetop/bottomn row concept. This function can be used for creatingprint()methods for other classes which represent tabular data, given that thisclass implementsdim(),utils::head() andutils::tail() (andoptionallypillar::tbl_sum()) methods. For an example of this, seevignette("prt", package = "prt").

The following session options are set bytibble and are respected byprt, as well as any other package that were to calltrunc_dt():

Bothtibble andprt rely onpillar for formatting columns andtherefore, the following options set bypillar are applicable toprtprinting as well.

Options for the pillar package

Examples

cars <- as_prt(mtcars)print(cars)print(cars, n = 2)print(cars, width = 30)print(cars, width = 30, max_extra_cols = 2)

Subsetting operations

Description

Both single element subsetting via[[ and$, as well as multi-elementsubsetting via[ are available forprt objects. Subsetting semanticsare modeled after those of thetibble class with the main differencebeing that theretibble returnstibble objects,prt returnsdata.tables. Differences to base R include that partial column namematching for$ is not allowed and coercion to lower dimensions for[ is always disabled by default. Asprt objects are immutable, allsubset-replace functions (⁠[[<-⁠,⁠$<-⁠ and⁠[<-⁠) yield an error whenpassed aprt object.

Usage

## S3 method for class 'prt'x[[i, j, ..., exact = TRUE]]## S3 method for class 'prt'x$name## S3 method for class 'prt'x[i, j, drop = FALSE]

Arguments

x

Aprt object.

i,j

Row/column indexes. Ifj is omitted,i is used as columnindex.

...

Generic compatibility: any further arguments are ignored.

exact

Generic compatibility: only the default value ofTRUE issupported.

name

a literal character string or aname (possiblybacktickquoted).

drop

Coerce to a vector if fetching one column viatbl[, j].DefaultFALSE, ignored when accessing a column viatbl[j].

Examples

dat <- as_prt(mtcars)identical(dat$mpg, dat[["mpg"]])dat$mpmtcars$mpidentical(dim(dat["mpg"]), dim(mtcars["mpg"]))identical(dim(dat[, "mpg"]), dim(mtcars[, "mpg"]))identical(dim(dat[1L, ]), dim(mtcars[1L, ]))

[8]ページ先頭

©2009-2025 Movatter.jp