Thedatapack R package provides an abstraction for collatingheterogeneous collections of data objects and metadata into a bundlethat can be transported and loaded into a single composite file. Themethods in this package provide a convenient way to load data fromcommon repositories such as DataONE into the R environment, and todocument, serialize, and save data from R to data repositoriesworldwide.
Note that this package (‘datapack’) is not related to the similarlynamed rOpenSci package ‘DataPackageR’. Documentation from theDataPackageR github repository states that “DataPackageR is used toreproducibly process raw data into packaged, analysis-ready datasets.”
Thedatapack R package requires the R packageredland. If you are installing on Ubuntu then the Redland Clibraries must be installed before theredland anddatapack package can be installed. If you are installing on MacOS X or Windows then installing these libraries is not required.
The following instructions illustrate how to installdatapack and its requirements.
On Mac OS X datapack can be installed with the followingcommands:
install.packages("datapack")library(datapack)Thedatapack R package should be available for use at thispoint.
Note: if you wish to build the requiredredland package fromsource before installingdatapack, please see the redlandinstallationinstructions.
For Ubuntu, install the required Redland C libraries by entering thefollowing commands in a terminal window:
sudo apt-get updatesudo apt-get install librdf0 librdf0-devThen install the R packages from the R console:
install.packages("datapack")library(datapack)Thedatapack R package should be available for use at thispoint
For windows, the required redland R package is distributed as abinary release, so it is not necessary to install any additional systemlibraries.
To install the R packages from the R console:
install.packages("datapack")library(datapack)See the full manual for documentation, but once installed, thepackage can be run in R using:
library(datapack)help("datapack")Create a DataPackage and add metadata and data DataObjects to it:
library(datapack)library(uuid)dp <- new("DataPackage")mdFile <- system.file("extdata/sample-eml.xml", package="datapack")mdId <- paste("urn:uuid:", UUIDgenerate(), sep="")md <- new("DataObject", id=mdId, format="eml://ecoinformatics.org/eml-2.1.0", file=mdFile)addData(dp, md)csvfile <- system.file("extdata/sample-data.csv", package="datapack")sciId <- paste("urn:uuid:", UUIDgenerate(), sep="")sciObj <- new("DataObject", id=sciId, format="text/csv", filename=csvfile)dp <- addData(dp, sciObj)ids <- getIdentifiers(dp)Add a relationship to the DataPackage that shows that the metadatadescribes, or “documents”, the science data:
dp <- insertRelationship(dp, subjectID=mdId, objectIDs=sciId)relations <- getRelationships(dp)Create an Resource Description Framework representation of therelationships in the package:
serializationId <- paste("resourceMap", UUIDgenerate(), sep="")filePath <- file.path(sprintf("%s/%s.rdf", tempdir(), serializationId))status <- serializePackage(dp, filePath, id=serializationId, resolveURI="")Save the DataPackage to a file, using the BagIt packaging format:
bagitFile <- serializeToBagIt(dp)Note that thedataone R package can be used to upload aDataPackage to a DataONE Member Node using theuploadDataPackage method. Please see the documentation for thedataone R package, for example:
vignette("upload-data", package="dataone")Work on this package was supported by:
Additional support was provided for working group collaboration bythe National Center for Ecological Analysis and Synthesis, a Centerfunded by the University of California, Santa Barbara, and the State ofCalifornia.