Movatterモバイル変換

Version:

1.0.0

Title:

High Performance Interface to 'GBIF'

Description:

A high performance interface to the Global Biodiversity Information Facility, 'GBIF'. In contrast to 'rgbif', which can access small subsets of 'GBIF' data through web-based queries to a central server, 'gbifdb' provides enhanced performance for R users performing large-scale analyses on servers and cloud computing providers, providing full support for arbitrary 'SQL' or 'dplyr' operations on the complete 'GBIF' data tables (now over 1 billion records, and over a terabyte in size). 'gbifdb' accesses a copy of the 'GBIF' data in 'parquet' format, which is already readily available in commercial computing clouds such as the Amazon Open Data portal and the Microsoft Planetary Computer, or can be accessed directly without downloading, or downloaded to any server with suitable bandwidth and storage space. The high-performance techniques for local and remote access are described inhttps://duckdb.org/why_duckdb andhttps://arrow.apache.org/docs/r/articles/fs.html respectively.

License:

Apache License (≥ 2)

Encoding:

UTF-8

ByteCompile:

true

Depends:

R (≥ 4.0)

Imports:

arrow (≥ 8.0.0), dplyr, duckdbfs

Suggests:

spelling, dbplyr, testthat (≥ 3.0.0), covr, knitr,rmarkdown, minioclient

URL:

https://docs.ropensci.org/gbifdb/,https://github.com/ropensci/gbifdb

BugReports:

https://github.com/ropensci/gbifdb

Language:

en-US

RoxygenNote:

7.2.3

Config/testthat/edition:

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2023-10-19 19:47:13 UTC; cboettig

Author:

Carl Boettiger

[aut, cre]

Maintainer:

Carl Boettiger <cboettig@gmail.com>

Repository:

CRAN

Date/Publication:

2023-10-19 20:30:03 UTC

gbifdb: High Performance Interface to 'GBIF'

Description

Author(s)

Maintainer: Carl Boettigercboettig@gmail.com (ORCID)

Default storage location

Description

Default location can be set with the env var GBIF_HOME,otherwise will use the default provided bytools::R_user_dir()

Usage

gbif_dir()

Value

path to the gbif home directory directory

Examples

gbif_dir()

Download GBIF data using minioclient

Description

Sync a local directory with selected release of the AWS copy of GBIF

Usage

gbif_download(  version = gbif_version(),  dir = gbif_dir(),  bucket = gbif_default_bucket(),  region = "")

Arguments

version

Release date (YYYY-MM-DD) which should be synced. Willdetect latest version by default.

dir

path to local directory where parquet files should be stored.Fine to leave at default, seegbif_dir().

bucket

Name of the regional S3 bucket desired.Default is "gbif-open-data-us-east-1". Select a bucket closer to yourcompute location for improved performance, e.g. European researchers mayprefer "gbif-open-data-eu-central-1" etc.

region

bucket region (usually ignored? Just set the bucket appropriately)

Details

Sync parquet files from GBIF public data catalog,https://registry.opendata.aws/gbif/.

Note that data can also be found on the Microsoft Cloud,https://planetarycomputer.microsoft.com/dataset/gbif

Also, some users may prefer to download this data using an alternativeinterface or work on a cloud-host machine where data is already available.Note, these data include all CC0 and CC-BY licensed data in GBIF that havecoordinates which passed automated quality checks,seehttps://github.com/gbif/occurrence/blob/master/aws-public-data.md.

Value

logical indicating success or failure.

Examples

gbif_download()

Return a path to the directory containing GBIF example parquet data

Description

Return a path to the directory containing GBIF example parquet data

Usage

gbif_example_data()

Details

example data is taken from the first 1000 rows of the2011-11-01 release of the parquet data.

Value

path to the example occurrence data installed with the package.

Examples

gbif_example_data()

Local connection to a downloaded GBIF Parquet database

Description

Local connection to a downloaded GBIF Parquet database

Usage

gbif_local(  dir = gbif_parquet_dir(version = gbif_version(local = TRUE)),  tblname = "gbif",  backend = c("arrow", "duckdb"),  safe = TRUE)

Arguments

dir

location of downloaded GBIF parquet files

tblname

name for the database table

backend

choose duckdb or arrow.

safe

logical. Should we exclude columnsmediatypeandissue? (defaultTRUE).varchar datatype on these columns substantially slows downs queries.

Details

A summary of this GBIF data, along with column meanings can be found athttps://github.com/gbif/occurrence/blob/master/aws-public-data.md

Value

a remote tibbletbl_sql class object

Examples

gbif <- gbif_local(gbif_example_data())

gbif remote

Description

Connect to GBIF remote directly. Can be much faster than downloadingfor one-off use or when using the package from a server in the same regionas the data. See Details.

Usage

gbif_remote(  version = gbif_version(),  bucket = gbif_default_bucket(),  safe = TRUE,  unset_aws = getOption("gbif_unset_aws", TRUE),  endpoint_override = Sys.getenv("AWS_S3_ENDPOINT", "s3.amazonaws.com"),  backend = c("arrow", "duckdb"),  ...)

Arguments

version

GBIF snapshot date

bucket

GBIF bucket name (including region). A default can also be set usingthe optiongbif_default_bucket, seeoptions.

safe

logical, default TRUE. Should we exclude columnsmediatype andissue?varchar datatype on these columns substantially slows downs queries.

unset_aws

Unset AWS credentials? GBIF is provided in a public bucket,so credentials are not needed, but having a AWS_ACCESS_KEY_ID or other AWSenvironmental variables set can cause the connection to fail. By default,this will unset any set environmental variables for the duration of the R session.This behavior can also be turned off globally by setting the optiongbif_unset_aws to FALSE (e.g. to use an alternative network endpoint)

endpoint_override

optional parameter toarrow::s3_bucket()

backend

duckdb or arrow

...

additional parameters passed to thearrow::s3_bucket()

Details

Query performance is dramatically improved in queries that return onlya subset of columns. Consider using explicitselect() commands to return onlythe columns you need.

A summary of this GBIF data, along with column meanings can be found athttps://github.com/gbif/occurrence/blob/master/aws-public-data.md

Value

a remote tibbletbl_sql class object.

Examples

gbif <- gbif_remote()gbif()

Get the latest gbif version string

Description

Can also return latest locally downloaded version, or list all versions

Usage

gbif_version(  local = FALSE,  dir = gbif_dir(),  bucket = gbif_default_bucket(),  all = FALSE,  ...)

Arguments

local

Search only local versions? logical, defaultFALSE.

dir

local directory (gbif_dir())

bucket

Which remote bucket (region) should be checked

all

show all versions? (logical, defaultFALSE)

...

additional arguments toarrow::s3_bucket

Details

A default version can be set using optiongbif_default_version

Value

latest available gbif version, string

Examples

## Latest local version available:gbif_version(local=TRUE)## default versionoptions(gbif_default_version="2021-01-01")gbif_version()## Latest online version available:gbif_version()## All online versions:gbif_version(all=TRUE)

Movatterモバイル変換

gbifdb: High Performance Interface to 'GBIF'

Description

Author(s)

See Also

Default storage location

Description

Usage

Value

Examples

Download GBIF data using minioclient

Description

Usage

Arguments

Details

Value

Examples

Return a path to the directory containing GBIF example parquet data

Description

Usage

Details

Value

Examples

Local connection to a downloaded GBIF Parquet database

Description

Usage

Arguments

Details

Value

Examples

gbif remote

Description

Usage

Arguments

Details

Value

Examples

Get the latest gbif version string

Description

Usage

Arguments

Details

Value

Examples