Movatterモバイル変換


[0]ホーム

URL:


Importing data.table

2025-07-07

Translations of this document are available in

This document is focused on usingdata.table as a dependency in other R packages. If you are interested in usingdata.table C code from a non-R application, or in calling its C functions directly, jump to thelast section of this vignette.

Importingdata.table is no different from importing other R packages. This vignette is meant to answer the most common questions arising around that subject; the lessons presented here can be applied to other R packages.

Why to importdata.table

One of the biggest features ofdata.table is its concise syntax which makes exploratory analysis faster and easier to write and perceive; this convenience can drive package authors to usedata.table. Another, perhaps more important reason is high performance. When outsourcing heavy computing tasks from your package todata.table, you usually get top performance without needing to re-invent any of these numerical optimization tricks on your own.

Importingdata.table is easy

It is very easy to usedata.table as a dependency due to the fact thatdata.table does not have any of its own dependencies. This applies both to operating system and to R dependencies. It means that if you have R installed on your machine, it already has everything needed to installdata.table. It also means that addingdata.table as a dependency of your package will not result in a chain of other recursive dependencies to install, making it very convenient for offline installation.

DESCRIPTION file

The first place to define a dependency in a package is theDESCRIPTION file. Most commonly, you will need to adddata.table under theImports: field. Doing so will necessitate an installation ofdata.table before your package can compile/install. As mentioned above, no other packages will be installed becausedata.table does not have any dependencies of its own. You can also specify the minimal required version of a dependency; for example, if your package is using thefwrite function, which was introduced indata.table in version 1.9.8, you should incorporate this asImports: data.table (>= 1.9.8). This way you can ensure that the version ofdata.table installed is 1.9.8 or later before your users will be able to install your package. Besides theImports: field, you can also useDepends: data.table but we strongly discourage this approach (and may disallow it in future) because this loadsdata.table into your user’s workspace; i.e. it enablesdata.table functionality in your user’s scripts without them requesting that.Imports: is the proper way to usedata.table within your package without inflictingdata.table on your user. In fact, we hope theDepends: field is eventually deprecated in R since this is true for all packages.

NAMESPACE file

The next thing is to define what content ofdata.table your package is using. This needs to be done in theNAMESPACE file. Most commonly, package authors will want to useimport(data.table) which will import all exported (i.e., listed indata.table’s ownNAMESPACE file) functions fromdata.table.

You may also want to use just a subset ofdata.table functions; for example, some packages may simply make use ofdata.table’s high-performance CSV reader and writer, for which you can addimportFrom(data.table, fread, fwrite) in yourNAMESPACE file. It is also possible to import all functions from a packageexcluding particular ones usingimport(data.table, except=c(fread, fwrite)).

Be sure to read also the note about non-standard evaluation indata.table inthe section on “undefined globals”

Usage

As an example we will define two functions ina.pkg package that usesdata.table. One function,gen, will generate a simpledata.table; another,aggr, will do a simple aggregation of it.

gen = function (n = 100L) {  dt = as.data.table(list(id = seq_len(n)))  dt[, grp := ((id - 1) %% 26) + 1     ][, grp := letters[grp]       ][]}aggr = function (x) {  stopifnot(    is.data.table(x),    "grp" %in% names(x)  )  x[, .N, by = grp]}

Testing

Be sure to include tests in your package. Before each major release ofdata.table, we check reverse dependencies. This means that if any changes indata.table would break your code, we will be able to spot breaking changes and inform you before releasing the new version. This of course assumes you will publish your package to CRAN or Bioconductor. The most basic test can be a plaintext R script in your package directorytests/test.R:

library(a.pkg)dt = gen()stopifnot(nrow(dt) == 100)dt2 = aggr(dt)stopifnot(nrow(dt2) < 100)

When testing your package, you may want to useR CMD check --no-stop-on-test-error, which will continue after an error and run all your tests (as opposed to stopping on the first line of script that failed) NB this requires R 3.4.0 or greater.

Testing usingtestthat

It is very common to use thetestthat package for purpose of tests. Testing a package that importsdata.table is no different from testing other packages. An example test scripttests/testthat/test-pkg.R:

context("pkg tests")test_that("generate dt", { expect_true(nrow(gen()) == 100) })test_that("aggregate dt", { expect_true(nrow(aggr(gen())) < 100) })

Ifdata.table is in Suggests (but not Imports) then you need to declare.datatable.aware=TRUE in one of the R/* files to avoid “object not found” errors when testing viatestthat::test_package ortestthat::test_check.

Dealing with “undefined global functions or variables”

data.table‘s use of R’s deferred evaluation (especially on the left-hand side of:=) is not well-recognised byR CMD check. This results inNOTEs like the following during package check:

* checking R code for possible problems ... NOTEaggr: no visible binding for global variable 'grp'gen: no visible binding for global variable 'grp'gen: no visible binding for global variable 'id'Undefined global functions or variables:grp id

The easiest way to deal with this is to pre-define those variables within your package and set them toNULL, optionally adding a comment (as is done in the refined version ofgen below). When possible, you could also use a character vector instead of symbols (as inaggr below):

gen = function (n = 100L) {  id = grp = NULL # due to NSE notes in R CMD check  dt = as.data.table(list(id = seq_len(n)))  dt[, grp := ((id - 1) %% 26) + 1     ][, grp := letters[grp]       ][]}aggr = function (x) {  stopifnot(    is.data.table(x),    "grp" %in% names(x)  )  x[, .N, by = "grp"]}

The case fordata.table’s special symbols (e.g..SD and.N) and assignment operator (:=) is slightly different (see?.N for more, including a complete listing of such symbols). You should import whichever of these values you use fromdata.table’s namespace to protect against any issues arising from the unlikely scenario that we change the exported value of these in the future, e.g. if you want to use.N,.I, and:=, a minimalNAMESPACE would have:

importFrom(data.table, .N, .I, ':=')

Much simpler is to just useimport(data.table) which will greedily allow usage in your package’s code of any object exported fromdata.table.

If you don’t mind havingid andgrp registered as variables globally in your package namespace you can use?globalVariables. Be aware that these notes do not have any impact on the code or its functionality; if you are not going to publish your package, you may simply choose to ignore them.

Care needed when providing and using options

Common practice by R packages is to provide customization options set byoptions(name=val) and fetched usinggetOption("name", default). Function arguments often specify a call togetOption() so that the user knows (from?fun orargs(fun)) the name of the option controlling the default for that parameter; e.g.fun(..., verbose=getOption("datatable.verbose", FALSE)). Alldata.table options start withdatatable. so as to not conflict with options in other packages. A user simply callsoptions(datatable.verbose=TRUE) to turn on verbosity. This affects all data.table function calls unlessverbose=FALSE is provided explicitly; e.g.fun(..., verbose=FALSE).

The option mechanism in R isglobal. Meaning that if a user sets adata.table option for their own use, that setting also affects code inside any package that is usingdata.table too. For an option likedatatable.verbose, this is exactly the desired behavior since the desire is to trace and log alldata.table operations from wherever they originate; turning on verbosity does not affect the results. Another unique-to-R and excellent-for-production option is R’soptions(warn=2) which turns all warnings into errors. Again, the desire is to affect any warning in any package so as to not miss any warnings in production. There are 6datatable.print.* options and 3 optimization options which do not affect the result of operations. However, there is onedata.table option that does and is now a concern:datatable.nomatch. This option changes the default join from outer to inner. [Aside, the default join is outer because outer is safer; it doesn’t drop missing data silently; moreover it is consistent to base R way of matching by names and indices.] Some users prefer inner join to be the default and we provided this option for them. However, a user setting this option can unintentionally change the behavior of joins inside packages that usedata.table. Accordingly, in v1.12.4 (Oct 2019) a message was printed when thedatatable.nomatch option was used, and from v1.14.2 it is now ignored with warning. It was the onlydata.table option with this concern.

Troubleshooting

If you face any problems in creating a package that uses data.table, please confirm that the problem is reproducible in a clean R session using the R console:R CMD check package.name.

Some of the most common issues developers are facing are usually related to helper tools that are meant to automate some package development tasks, for example, usingroxygen to generate yourNAMESPACE file from metadata in the R code files. Others are related to helpers that build and check the package. Unfortunately, these helpers sometimes have unintended/hidden side effects which can obscure the source of your troubles. As such, be sure to double check using R console (run R on the command line) and ensure the import is defined in theDESCRIPTION andNAMESPACE files following theinstructionsabove.

If you are not able to reproduce problems you have using the plain R console build and check, you may try to get some support based on past issues we’ve encountered withdata.table interacting with helper tools:devtools#192 ordevtools#1472.

License

Since version 1.10.5data.table is licensed as Mozilla Public License (MPL). The reasons for the change from GPL should be read in fullhere and you can read more about MPL on Wikipediahere andhere.

Optionally importdata.table: Suggests

If you want to usedata.table conditionally, i.e., only when it is installed, you should useSuggests: data.table in yourDESCRIPTION file instead of usingImports: data.table. By default this definition will not force installation ofdata.table when installing your package. This also requires you to conditionally usedata.table in your package code which should be done using the?requireNamespace function. The below example demonstrates conditional use ofdata.table’s fast CSV writer?fwrite. If thedata.table package is not installed, the much-slower base R?write.table function is used instead.

my.write = function (x) {  if(requireNamespace("data.table", quietly=TRUE)) {    data.table::fwrite(x, "data.csv")  } else {    write.table(x, "data.csv")  }}

A slightly more extended version of this would also ensure that the installed version ofdata.table is recent enough to have thefwrite function available:

my.write = function (x) {  if(requireNamespace("data.table", quietly=TRUE) &&    utils::packageVersion("data.table") >= "1.9.8") {    data.table::fwrite(x, "data.csv")  } else {    write.table(x, "data.csv")  }}

When using a package as a suggested dependency, you should notimport it in theNAMESPACE file. Just mention it in theDESCRIPTION file.When usingdata.table functions in package code (R/* files) you need to use thedata.table:: prefix because none of them are imported.When usingdata.table in package tests (e.g. tests/testthat/test* files), you need to declare.datatable.aware=TRUE in one of the R/* files.

data.table inImports but nothing imported

Some users (e.g.) may prefer to eschew usingimportFrom orimport in theirNAMESPACE file and instead usedata.table:: qualification on all internal code (of course keepingdata.table under theirImports: inDESCRIPTION).

In this case, the un-exported function[.data.table will revert to calling[.data.frame as a safeguard sincedata.table has no way of knowing that the parent package is aware it’s attempting to make calls against the syntax ofdata.table’s query API (which could lead to unexpected behavior as the structure of calls to[.data.frame and[.data.table fundamentally differ, e.g. the latter has many more arguments).

If this is anyway your preferred approach to package development, please define.datatable.aware = TRUE anywhere in your R source code (no need to export). This tellsdata.table that you as a package developer have designed your code to intentionally rely ondata.table functionality even though it may not be obvious from inspecting yourNAMESPACE file.

data.table determines on the fly whether the calling function is aware it’s tapping intodata.table with the internalcedta function (CallingEnvironment isDataTableAware), which, beyond checking the?getNamespaceImports for your package, also checks the existence of this variable (among other things).

Further information on dependencies

For more canonical documentation of defining packages dependency check the official manual:Writing R Extensions.

Importing data.table C routines

Some of internally used C routines are now exported on C level thus can be used in R packages directly from their C code. See?cdt for details andWriting R ExtensionsLinking to native routines in other packages section for usage.

Importing from non-r Applications

Some tiny parts ofdata.table C code were isolated from the R C API and can now be used from non-R applications by linking to .so / .dll files. More concrete details about this will be provided later; for now you can study the C code that was isolated from the R C API insrc/fread.c andsrc/fwrite.c.

How to convert your Depends dependency on data.table to Imports

To convert aDepends dependency ondata.table to anImports dependency in your package, follow these steps:

Step 0. Ensure your package is passing R CMD check initially

Step 1. Update the DESCRIPTION file to put data.table in Imports, not Depends

Before:

Depends:    R (>= 3.5.0),    data.tableImports:

After:

Depends:    R (>= 3.5.0)Imports:    data.table

Step 2.1: RunR CMD check

RunR CMD check to identify any missing imports or symbols. This step helps:

Note: Not all such usages are caught byR CMD check. In particular,R CMD check skips some symbols/functions in formulas and will completely miss parsed expressions likeparse(text = "data.table(a = 1)"). Packages will need good test coverage to detect these edge cases.

Step 2.2: Modify the NAMESPACE file

Based on theR CMD check results, ensure all used functions, special symbols, S3 generics, and S4 classes fromdata.table are imported.

That means addingimportFrom(data.table, ...) directives for symbols, functions, and S3 generics, and/orimportClassesFrom(data.table, ...) directives for S4 classes as appropriate. See ‘Writing R Extensions’ for full details on how to do so properly.

Blanket import

Alternatively, you can import all functions fromdata.table at once, though this is generally not recommended:

import(data.table)

Justification for Avoiding Blanket Imports:

  1. Documentation: The NAMESPACE file can serve as good documentation of how you depend on certain packages.
  2. Avoiding Conflicts: Blanket imports leave you open to subtle breakage. For example, if youimport(pkgA) andimport(pkgB), but later pkgB exports a function also exported by pkgA, this will break your package due to conflicts in your namespace, which is disallowed byR CMD check and CRAN.

Step 3: Update Your R code files outside the package’s R/ directory

When you move a package fromDepends toImports, it will no longer be automatically attached when your package is loaded. This can be important for examples, tests, vignettes, and demos, whereImports packages need to be attached explicitly.

Before (withDepends):

# data.table functions are directly availablelibrary(MyPkgDependsDataTable)dt <- data.table(x = 1:10, y = letters[1:10])setDT(dt)result <- merge(dt, other_dt, by = "x")

After (withImports):

# Explicitly load data.table in user scripts or vignetteslibrary(data.table)library(MyPkgDependsDataTable)dt <- data.table(x = 1:10, y = letters[1:10])setDT(dt)result <- merge(dt, other_dt, by = "x")

Benefits of usingImports


[8]ページ先頭

©2009-2025 Movatter.jp