data: Data Sets

data	R Documentation

Data Sets

Description

Loads specified data sets, or list the available data sets.

Usage

data(..., list = character(), package = NULL, lib.loc = NULL,     verbose = getOption("verbose"), envir = .GlobalEnv,     overwrite = TRUE)

Arguments

`...`	literal character strings or names.
`list`	a character vector.
`package`	a character vector giving the package(s) to lookin for data sets, or`NULL`. By default, all packages in the search path are used, thenthe ‘data’ subdirectory (if present) of the current workingdirectory.
`lib.loc`	a character vector of directory names ofR libraries,or`NULL`. The default value of`NULL` corresponds to alllibraries currently known.
`verbose`	a logical. If`TRUE`, additional diagnostics areprinted.
`envir`	the environment where the data should be loaded.
`overwrite`	logical: should existing objects of the same name inenvir be replaced?

Details

Currently, four formats of data files are supported:

files ending ‘.R’ or ‘.r’ aresource()d in, with theR working directory changedtemporarily to the directory containing the respective file.(data ensures that theutils package is attached, incase it had been runviautils::data.)
files ending ‘.RData’ or ‘.rda’ areload()ed.
files ending ‘.tab’, ‘.txt’ or ‘.TXT’ are readusingread.table(..., header = TRUE, as.is=FALSE),and henceresult in a data frame.
files ending ‘.csv’ or ‘.CSV’ are read usingread.table(..., header = TRUE, sep = ";", as.is=FALSE),and also result in a data frame.

If more than one matching file name is found, the first on this listis used. (Files with extensions ‘.txt’, ‘.tab’ or‘.csv’ can be compressed, with or without further extension‘.gz’, ‘.bz2’ or ‘.xz’.)

The data sets to be loaded can be specified as a set of characterstrings or names, or as the character vectorlist, or as both.

For each given data set, the first two types (‘.R’ or ‘.r’,and ‘.RData’ or ‘.rda’ files) can create several variablesin the load environment, which might all be named differently from thedata set. The third and fourth types will always result in thecreation of a single variable with the same name (without extension)as the data set.

If no data sets are specified,data lists the available datasets. For each package,it looks for a data index in the ‘Meta’ subdirectory or, ifthis is not found, scans the ‘data’ subdirectory for data filesusinglist_files_with_type.The information aboutavailable data sets is returned in an object of class"packageIQR". The structure of this class is experimental.Where the datasets have a different name from the argument that shouldbe used to retrieve them the index will have an entry likebeaver1 (beavers) which tells us that datasetbeaver1can be retrieved by the calldata(beaver).

Iflib.loc andpackage are bothNULL (thedefault), the data sets are searched for in all the currently loadedpackages then in the ‘data’ directory (if any) of the currentworking directory.

Iflib.loc = NULL butpackage is specified as acharacter vector, the specified package(s) are searched for firstamongst loaded packages and then in the default library/ies(see.libPaths).

Iflib.locis specified (and notNULL), packagesare searched for in the specified library/ies, even if they arealready loaded from another library.

To just look in the ‘data’ directory of the current workingdirectory, setpackage = character(0)(andlib.loc = NULL, the default).

Value

A character vector of all data sets specified (whether found or not),or information about all available data sets in an object of class"packageIQR" if none were specified.

Good practice

There is no requirement fordata(foo) to create an objectnamedfoo (nor to create one object), although it muchreduces confusion if this convention is followed (and it is enforcedif datasets are lazy-loaded).

data() was originally intended to allow users to load datasetsfrom packages for use in their examples, and as such it loaded thedatasets into the workspace.GlobalEnv. This avoidedhaving large datasets in memory when not in use: that need has beenalmost entirely superseded by lazy-loading of datasets.

The ability to specify a dataset by name (without quotes) is aconvenience: in programming the datasets should be specified bycharacter strings (with quotes).

Use ofdata within a function without anenvir argumenthas the almost always undesirable side-effect of putting an object inthe user's workspace (and indeed, of replacing any object of that namealready there). It would almost always be better to put the object inthe current evaluation environment bydata(..., envir = environment()).However, two alternatives are usually preferable,both described in the ‘Writing R Extensions’ manual.

For sets of data, set up a package to use lazy-loading of data.
For objects which are system data, for example lookup tablesused in calculations within the function, use a file‘R/sysdata.rda’ in the package sources or create the objects byR code at package installation time.

A sometimes important distinction is that the second approach placesobjects in the namespace but the first does not. So if it is importantthat the function seesmytable as an object from the package,it is system data and the second approach should be used. In theunusual case that a package uses a lazy-loaded dataset as a defaultargument to a function, that needs to be specified by::,e.g.,survival::survexp.us.

Warning

This function creates objects in theenvir environment (bydefault the user's workspace) replacing any which alreadyexisted.data("foo") can silently create objects other thanfoo: there have been instances in published packages where itcreated/replaced.Random.seed and hence change the seedfor the session.

Note

One can take advantage of the search order and the fact that a‘.R’ file will change directory. If raw data are stored in‘mydata.txt’ then one can set up ‘mydata.R’ to read‘mydata.txt’ and pre-process it, e.g., usingtransform().For instance one can convert numeric vectors to factors with theappropriate labels. Thus, the ‘.R’ file can effectively containa metadata specification for the plaintext formats.

In older versions ofR, up to 3.6.x, bothpackage = "base" andpackage = "stats" were usingpackage = "datasets", (with awarning), as before 2004, (most of) the datasets indatasets wereeither inbase orstats. For these packages, the resultis now empty as they contain no data sets.

Examples

require(utils)data()                         # list all available data setstry(data(package = "rpart"), silent = TRUE) # list the data sets in the rpart packagedata(USArrests, "VADeaths")    # load the data sets 'USArrests' and 'VADeaths'## Not run: ## Alternativelyds <- c("USArrests", "VADeaths"); data(list = ds)## End(Not run)help(USArrests)                # give information on data set 'USArrests'

Movatterモバイル変換

data: Data Sets

Data Sets

Description

Usage

Arguments

Details

Value

Good practice

Warning

Note

See Also

Examples

utils

R Package Documentation

Browse R Packages

We want your feedback!