| Title: | Tabular Data Backed by Partitioned 'fst' Files |
| Version: | 0.2.1 |
| Description: | Intended for larger-than-memory tabular data, 'prt' objects provide an interface to read row and/or column subsets into memory as data.table objects. Data queries, constructed as 'R' expressions, are evaluated using the non-standard evaluation framework provided by 'rlang' and file-backing is powered by the fast and efficient 'fst' package. |
| URL: | https://nbenn.github.io/prt/ |
| BugReports: | https://github.com/nbenn/prt/issues |
| License: | GPL-3 |
| Imports: | assertthat, fst, data.table, utils, vctrs, tibble, cli, pillar(≥ 1.7.0), crayon, backports, rlang, |
| Suggests: | testthat, xml2, covr, withr, nycflights13, datasets,rmarkdown, knitr, bench |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2025-09-03 20:23:08 UTC; nbennett |
| Author: | Nicolas Bennett [aut, cre], Drago Plecko [ctb] |
| Maintainer: | Nicolas Bennett <r@nbenn.ch> |
| Repository: | CRAN |
| Date/Publication: | 2025-09-03 21:50:16 UTC |
Get a glimpse of your data
Description
Thetibble S3 generic functionpillar::glimpse() is implemented forprt objects as well. Inspired by the output ofstr() when applied todata.frames, this function is intended to display the structure of thedata in terms of columns, irrespective of how the data is organized in termsofR objects. Similarly toformat_dt(), the function providing the bulkof functionality,glimpse_dt(), is exported such that implementing aclass specificpillar::glimpse() function for other classes thatrepresenting tabular data is straightforward.
Usage
## S3 method for class 'prt'glimpse(x, width = NULL, ...)glimpse_dt(x, width = NULL)str_sum(x)## S3 method for class 'prt'str(object, ...)str_dt(x, ...)Arguments
x | An object to glimpse at. |
width | Width of output: defaults to the setting of the |
... | Unused, for extensibility. |
object | anyR object about which you want to have someinformation. |
Details
Alongside aprt-specificpillar::glimpse() method, astr() method isprovided as well forprt objects. However, breaking with baseRexpectations, it is not the structure of the object in terms ofR objectsthat is shown, but in the same spirit aspillar::glimpse() it is thestructure of the data that is printed. How this data is represents withrespect toR objects is abstracted away as to show output as would beexpected if the data were represented by adata.frame.
In similar spirit asformat_dt() andglimpse_dt(), astr_dt() functionis exported which provides the core functionality driving theprtimplementation ofstr(). This function requires availability of ahead() function for any object that is passed and output can becustomized by implementing an optionalstr_sum() function.
Examples
cars <- as_prt(mtcars)pillar::glimpse(cars)pillar::glimpse(cars, width = 30)str(cars)str(cars, vec.len = 1)str(unclass(cars))str_sum(cars)Methods for creating and inspecting prt objects
Description
The constructornew_prt() creates aprt object from one or severalfst files, making sure that each table consist of identically named,ordered and typed columns. In order to create aprt object from anin-memory table,as_prt() coerces objects inheriting fromdata.frametoprt by first splitting rows inton_chunks, writingfst files to thedirectorydir and callingnew_prt() on the resultingfst files. Ifthis default splitting of rows (which might impact efficiency of subsequentqueries on the data) is not optimal, a list of objects inheriting fromdata.frame is a validx argument as well.
Usage
new_prt(files)as_prt(x, n_chunks = NULL, dir = tempfile())is_prt(x)n_part(x)part_nrow(x)## S3 method for class 'prt'head(x, n = 6L, ...)## S3 method for class 'prt'tail(x, n = 6L, ...)## S3 method for class 'prt'as.data.table(x, ...)## S3 method for class 'prt'as.list(x, ...)## S3 method for class 'prt'as.data.frame(x, row.names = NULL, optional = FALSE, ...)## S3 method for class 'prt'as.matrix(x, ...)Arguments
files | Character vector of file name(s). |
x | A |
n_chunks | Count variable specifying the number of chunks |
dir | Directory where the chunked |
n | Count variable indicating the number of rows to return. |
... | Generic consistency: additional arguments are ignored and awarning is issued. |
row.names,optional | Generic consistency: passing anything other thanthe default value issues a warning. |
Details
To check whether an object inherits fromprt, the functionis_prt() isexported, the number of partitions can be queried by callingn_part() andthe number of rows per partition is available aspart_nrow().
The baseR S3 generic functionsdim(),length(),dimnames() andnames(),haveprt-specific implementations, wheredim() returns theoverall table dimensions,length() is synonymous forncol(),dimnames() returns a length 2 list containingNULL column names ascharacter vector andnames() is synonymous forcolnames(). Both settingand getting row names onprt objects is not supported and more generally,calling replacement functions such asnames<-() ordimnames<-() leadsto an error, asprt objects are immutable. The baseR S3 genericfunctionshead() andtail() are available as well and are usedinternally to provide an extensible mechanism for printing (seeformat_dt()).
Coercion to other baseR objects is possible viaas.list(),as.data.frame() andas.matrix() and for coercion todata.table, itsgeneric functiondata.table::as.data.table() is available toprtobjects. All coercion involves reading the full data into memory at oncewhich might be problematic in cases of large data sets.
Examples
cars <- as_prt(mtcars, n_chunks = 2L)is_prt(cars)n_part(cars)part_nrow(cars)nrow(cars)ncol(cars)colnames(cars)names(cars)head(cars)tail(cars, n = 2)str(as.list(cars))str(as.data.frame(cars))NSE subsetting
Description
A cornerstone feature ofprt is the ability to load a (small) subset ofrows (or columns) from a much larger tabular dataset. In order to specifysuch a subset, an implementation of the base R S3 generic functionsubset() is provided, driving the non-standard evaluation (NSE) of anexpression within the context of the data (with similar semantics as thebase R implementation fordata.frames).
Usage
## S3 method for class 'prt'subset(x, subset, select, part_safe = FALSE, drop = FALSE, ...)subset_quo( x, subset = NULL, select = NULL, part_safe = FALSE, env = parent.frame())Arguments
x | object to be subsetted. |
subset | logical expression indicating elements or rows to keep:missing values are taken as false. |
select | expression, indicating columns to select from adata frame. |
part_safe | Logical flag indicating whether the |
drop | passed on to |
... | further arguments to be passed to or from other methods. |
env | The environment in which |
Details
The functions powering NSE arerlang::enquo() which quote thesubset andselect arguments andrlang::eval_tidy() which evaluates theexpressions. This allows for somerlang-specific features to be used, such as the.data/.env pronouns, or the double-curly brace forwarding operator. Forsome example code, please refer tovignette("prt", package = "prt").
While the functionsubset() quotes the arguments passed assubset andselect, the functionsubset_quo() can be used to operate on alreadyquoted expressions. A final noteworthy departure from the base R interfaceis thepart_safe argument: this logical flag indicates whether it is safeto evaluate the expression on partitions individually or whetherdependencies between partitions prevent this from yielding correct results.As it is not straightforward to determine if dependencies might exists fromthe expression alone, the default isFALSE, which in many cases willresult in a less efficient resolution of the row-selection and it is up tothe user to enable this optimization.
Examples
dat <- as_prt(mtcars, n_chunks = 2L)subset(dat, cyl == 6)subset(dat, cyl == 6 & hp > 110)colnames(subset(dat, select = mpg:hp))colnames(subset(dat, select = -c(vs, am)))sub_6 <- subset(dat, cyl == 6)thresh <- 6identical(subset(dat, cyl == thresh), sub_6)identical(subset(dat, cyl == .env$thresh), sub_6)cyl <- 6identical(subset(dat, cyl == cyl), data.table::as.data.table(dat))identical(subset(dat, cyl == !!cyl), sub_6)identical(subset(dat, .data$cyl == .env$cyl), sub_6)expr <- quote(cyl == 6)# passing a quoted expression to subset() will yield an error## Not run: subset(dat, expr)## End(Not run)identical(subset_quo(dat, expr), sub_6)identical( subset(dat, qsec > mean(qsec), part_safe = TRUE), subset(dat, qsec > mean(qsec), part_safe = FALSE))Printing prt
Description
Printing ofprt objects combines the concise yet informative designof only showing as many columns as the terminal width allows for, introducedbytibble, with thedata.table approach of showing both the first andlast few rows of a table. Implementation wise, the interface is designed tomimic that oftibble printing as closely as possibly, offering the samefunction arguments and using the same option settings (and default values)as introduced bytibble.
Usage
## S3 method for class 'prt'print(x, ..., n = NULL, width = NULL, max_extra_cols = NULL)## S3 method for class 'prt'format(x, ..., n = NULL, width = NULL, max_extra_cols = NULL)format_dt( x, ..., n = NULL, width = NULL, max_extra_cols = NULL, max_footer_lines = NULL)trunc_dt(...)Arguments
x | Object to format or print. |
... | These dots are for future extensions and must be empty. |
n | Number of rows to show. If |
width | Width of text output to generate. This defaults to |
max_extra_cols | Number of extra columns to print abbreviated information for,if the width is too small for the entire tibble. If |
max_footer_lines | Maximum number of footer lines. If |
Details
While the functiontibble::trunc_mat() does most of the heavy liftingfor formattingtibble printing output,prt exports the functiontrunc_dt(), which drives analogous functionality while adding thetop/bottomn row concept. This function can be used for creatingprint()methods for other classes which represent tabular data, given that thisclass implementsdim(),utils::head() andutils::tail() (andoptionallypillar::tbl_sum()) methods. For an example of this, seevignette("prt", package = "prt").
The following session options are set bytibble and are respected byprt, as well as any other package that were to calltrunc_dt():
tibble.print_max: Row number threshold: Maximum number of rows printed.Set toInfto always print all rows. Default: 20.tibble.print_min: Number of rows printed if row number threshold isexceeded. Default: 10.tibble.width: Output width. Default:NULL(usewidthoption).tibble.max_extra_cols: Number of extra columns printed in reduced form.Default: 100.
Bothtibble andprt rely onpillar for formatting columns andtherefore, the following options set bypillar are applicable toprtprinting as well.
Options for the pillar package
width: The width option controls the output width.Settingoptions(pillar.width = )to a larger valuewill lead to printing in multiple tiers (stacks).pillar.print_max: Maximum number of rows printed, default:20.Set toInfto always print all rows.For compatibility reasons,getOption("tibble.print_max")andgetOption("dplyr.print_max")are also consulted,this will be soft-deprecated in pillar v2.0.0.pillar.print_min: Number of rows printed if the table has more thanprint_maxrows, default:10.For compatibility reasons,getOption("tibble.print_min")andgetOption("dplyr.print_min")are also consulted,this will be soft-deprecated in pillar v2.0.0.pillar.width: Output width. Default:NULL(usegetOption("width")).This can be larger thangetOption("width"), in this case the outputof the table's body is distributed over multiple tiers for wide tibbles.For compatibility reasons,getOption("tibble.width")andgetOption("dplyr.width")are also consulted,this will be soft-deprecated in pillar v2.0.0.pillar.max_footer_lines: The maximum number of lines in the footer,default:7. Set toInfto turn off truncation of footer lines.Themax_extra_colsoption still limitsthe number of columns printed.pillar.max_extra_cols: The maximum number of columns printed in the footer,default:100. Set toInfto show all columns.Set the more predictablemax_footer_linesto control the numberof footer lines instead.pillar.bold: Use bold font, e.g. for column headers? This currentlydefaults toFALSE, because many terminal fonts have poor support forbold fonts.pillar.subtle: Use subtle style, e.g. for row numbers and data types?Default:TRUE.pillar.subtle_num: Use subtle style for insignificant digits? Default:FALSE, is also affected by thesubtleoption.pillar.neg: Highlight negative numbers? Default:TRUE.pillar.sigfig: The number of significant digits that will be printed andhighlighted, default:3. Set thesubtleoption toFALSEtoturn off highlighting of significant digits.pillar.min_title_chars: The minimum number of characters for the columntitle, default:20. Column titles may be truncated up to that width tosave horizontal space. Set toInfto turn off truncation of columntitles.pillar.min_chars: The minimum number of characters wide todisplay character columns, default:3. Character columns may betruncated up to that width to save horizontal space. Set toInftoturn off truncation of character columns.pillar.max_dec_width: The maximum allowed width for decimal notation,default:13.pillar.bidi: Set toTRUEfor experimental support for bidirectional scripts.Default:FALSE. When this option is set, "left right override"and "first strong isolate"Unicode controlsare inserted to ensure that text appears in its intended directionand that the column headings correspond to the correct columns.pillar.superdigit_sep: The string inserted between superscript digitsand column names in the footnote. Defaults to a"\u200b", a zero-widthspace, on UTF-8 platforms, and to": "on non-UTF-8 platforms.pillar.advice: Should advice be displayed in the footer when columns or rowsare missing from the output? Defaults toTRUEfor interactive sessions,and toFALSEotherwise.
Examples
cars <- as_prt(mtcars)print(cars)print(cars, n = 2)print(cars, width = 30)print(cars, width = 30, max_extra_cols = 2)Subsetting operations
Description
Both single element subsetting via[[ and$, as well as multi-elementsubsetting via[ are available forprt objects. Subsetting semanticsare modeled after those of thetibble class with the main differencebeing that theretibble returnstibble objects,prt returnsdata.tables. Differences to base R include that partial column namematching for$ is not allowed and coercion to lower dimensions for[ is always disabled by default. Asprt objects are immutable, allsubset-replace functions ([[<-,$<- and[<-) yield an error whenpassed aprt object.
Usage
## S3 method for class 'prt'x[[i, j, ..., exact = TRUE]]## S3 method for class 'prt'x$name## S3 method for class 'prt'x[i, j, drop = FALSE]Arguments
x | A |
i,j | Row/column indexes. If |
... | Generic compatibility: any further arguments are ignored. |
exact | Generic compatibility: only the default value of |
name | a literal character string or aname (possiblybacktickquoted). |
drop | Coerce to a vector if fetching one column via |
Examples
dat <- as_prt(mtcars)identical(dat$mpg, dat[["mpg"]])dat$mpmtcars$mpidentical(dim(dat["mpg"]), dim(mtcars["mpg"]))identical(dim(dat[, "mpg"]), dim(mtcars[, "mpg"]))identical(dim(dat[1L, ]), dim(mtcars[1L, ]))