Movatterモバイル変換

Title:

Edit and Validate Darwin Core Taxon Data

Version:

2.0.4

Description:

Edit and validate taxonomic data in compliance with Darwin Core standards (Darwin Core 'Taxon' classhttps://dwc.tdwg.org/terms/#taxon).

License:

MIT + file LICENSE

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.2

Imports:

assertthat, digest, dplyr, glue, purrr, rlang, settings,stringr, tibble

Suggests:

testthat (≥ 3.0.0), mockery, readr, usethis, knitr,rmarkdown, patrick, stringi, english, tidyr, utils, curl, httr

Depends:

R (≥ 4.2.0)

Config/testthat/edition:

URL:

https://docs.ropensci.org/dwctaxon/,https://github.com/ropensci/dwctaxon

BugReports:

https://github.com/ropensci/dwctaxon/issues

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2025-12-15 02:07:24 UTC; joelnitta

Author:

Joel H. Nitta

[aut, cre, cph], Wataru Iwasaki

[ctb], Collin Schwantes [rev] (Collin reviewed the package (v. 1.0.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/574>), Stephen Formel [rev] (Stephen reviewed the package (v. 1.0.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/574>)

Maintainer:

Joel H. Nitta <joelnitta@gmail.com>

Repository:

CRAN

Date/Publication:

2025-12-15 10:00:02 UTC

dwctaxon: Edit and Validate Darwin Core Taxon Data

Description

logo

Edit and validate taxonomic data in compliance with Darwin Core standards (Darwin Core 'Taxon' classhttps://dwc.tdwg.org/terms/#taxon).

Author(s)

Maintainer: Joel H. Nittajoelnitta@gmail.com (ORCID) [copyright holder]

Other contributors:

Wataru Iwasaki (ORCID) [contributor]
Collin Schwantes (Collin reviewed the package (v. 1.0.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/574>) [reviewer]
Stephen Formel (Stephen reviewed the package (v. 1.0.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/574>) [reviewer]

Add row(s) to a taxonomic database

Description

Add one or more rows to a taxonomic database in Darwin Core (DwC) format.

Usage

dct_add_row(  tax_dat,  taxonID = NULL,  scientificName = NULL,  taxonomicStatus = NULL,  acceptedNameUsageID = NULL,  acceptedNameUsage = NULL,  new_dat = NULL,  fill_taxon_id = dct_options()$fill_taxon_id,  fill_usage_id = dct_options()$fill_usage_id,  taxon_id_length = dct_options()$taxon_id_length,  stamp_modified = dct_options()$stamp_modified,  stamp_modified_by_id = dct_options()$stamp_modified_by_id,  stamp_modified_by = dct_options()$stamp_modified_by,  strict = dct_options()$strict,  ...)

Arguments

tax_dat

Dataframe; taxonomic database in DwC format.

taxonID

Character or numeric vector; values to add to taxonID column.Ignored ifnew_dat is notNULL.

scientificName

Character vector; values to add to scientificNamecolumn. Ignored ifnew_dat is notNULL.

taxonomicStatus

Character vector; values to add to taxonomicStatuscolumn. Ignored ifnew_dat is notNULL.

acceptedNameUsageID

Character or numeric vector; values to add toacceptedNameUsageID column. Ignored ifnew_dat is notNULL.

acceptedNameUsage

Character vector; values to add to acceptedNameUsagecolumn. Ignored ifnew_dat is notNULL.

new_dat

A dataframe including columns corresponding to one or more ofthe above arguments, except fortax_dat. Other DwC terms can also beincluded as additional columns. All rows innew_dat will be appended to theinput data (tax_dat).

fill_taxon_id

Logical vector of length 1; iftaxon_id is not provided, should values in the taxonID column be filled in by generating them automatically from the scientificName? If thetaxonID column does not yet exist it will be created. DefaultTRUE.

fill_usage_id

Logical vector of length 1; ifusage_id is not provided, should values in the acceptedNameUsageID column be filled in by matching acceptedNameUsage to scientificName? If theacceptedNameUsageID column does not yet exist it will be created. DefaultTRUE.

taxon_id_length

Numeric vector of length 1; how many characters should be included in automatically generated values of taxonID? Must be between 1 and 32, inclusive. Default32.

stamp_modified

Logical vector of length 1; should themodified column of any newly created or modified row include a timestamp with the date and time of its creation/modification? If themodified column does not yet exist it will be created. DefaultTRUE.

stamp_modified_by_id

Logical vector of length 1; should themodifiedByID column of any newly created or modified row include the ID of thecurrent user? If themodifiedByID column does not yet exist it will be created; note thatthis is a non-DWC standard column, so"modifiedByID" is required inextra_cols. The current user ID can be specified with theuser_id option. DefaultFALSE.

stamp_modified_by

Logical vector of length 1; should themodifiedBy column of any newly created or modified row include the name of thecurrent user? If themodifiedBy column does not yet exist it will be created; note thatthis is a non-DWC standard column, so"modifiedBy" is required inextra_cols. The current user can be specified with theuser_name option. DefaultFALSE.

strict

Logical vector of length 1; should taxonomic checks be run on the updated taxonomic database? DefaultFALSE.

...

Additional data to add, specified as sets of namedcharacter or numeric vectors; e.g.,parentNameUsageID = "6SH4". The name ofeach must be a valid column name for data in DwC format. Ignored ifnew_datis notNULL.

Details

fill_taxon_id andfill_usage_id only act on the newly added data (theydo not fill columns intax_dat).

If "taxonID" is not provided for the new row andfill_taxon_id isTRUE,a value for taxonID will be automatically generated from the md5 hash digestof the scientific name.

To modify settings used for validation ifstrict isTRUE,usedct_options().

Value

Dataframe; taxonomic database in DwC format.

Examples

tibble::tibble(  taxonID = "123",  scientificName = "Foogenus barspecies",  acceptedNameUsageID = NA_character_,  taxonomicStatus = "accepted") |>  dct_add_row(    scientificName = "Foogenus barspecies var. bla",    parentNameUsageID = "123",    nameAccordingTo = "me",    strict = TRUE  )

Check mapping of usage taxonomic IDs

Description

Check that values of terms like 'acceptedUsageID' map properly to taxonID inDarwin Core (DwC) taxonomic data.

Usage

dct_check_mapping(  tax_dat,  on_fail = dct_options()$on_fail,  on_success = dct_options()$on_success,  col_select = "acceptedNameUsageID",  quiet = dct_options()$quiet)

Arguments

tax_dat

Dataframe; taxonomic database in DwC format.

on_fail

Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default"error".

on_success

Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default"data".

col_select

Character vector of length 1; the name of the column(DwC term) to check. Default"acceptedNameUsageID".

quiet

Logical vector of length 1; should warnings be silenced? DefaultFALSE.

Details

The following rules are enforced:

Value of taxonID may not be identical to that of the selected column withina single row (in other words, a name cannot be its own accepted name,parent taxon, or basionym).
Every value in the selected column must have a corresponding taxonID.

col_select can take one of the following values:

"acceptedNameUsageID": taxonID corresponding to the accepted name (ofa synonym).
"parentNameUsageID": taxonID corresponding to the immediate parent taxonof a name (for example, for a species, this would be the genus).
"originalNameUsageID": taxonID corresponding to the basionym of a name.

Value

Depends on the result of the check and on values ofon_fail andon_success:

If the check passes andon_success is "logical", returnTRUE
If the check passes andon_success is "data", return the input dataframe
If the check fails andon_fail is "error", return an error
If the check fails andon_fail is "summary", issue a warning andreturn a dataframe with a summary of the reasons for failure

Examples

# The bad data has an acceptedNameUsageID (third row, "4") that lacks a# corresponding taxonIDbad_dat <- tibble::tribble(  ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName,  "1", NA, "accepted", "Species foo",  "2", "1", "synonym", "Species bar",  "3", "4", "synonym", "Species bat")dct_check_mapping(bad_dat, on_fail = "summary", quiet = TRUE)

Check scientificName

Description

Check for correctly formatted scientificName column in Darwin Coretaxonomic data.

Usage

dct_check_sci_name(  tax_dat,  on_fail = dct_options()$on_fail,  on_success = dct_options()$on_success,  quiet = dct_options()$quiet)

Arguments

tax_dat

Dataframe; taxonomic database in DwC format.

on_fail

Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default"error".

on_success

Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default"data".

quiet

Logical vector of length 1; should warnings be silenced? DefaultFALSE.

Details

The following rules are enforced:

scientificName may not be missing (NA)
scientificName must be unique

Value

Depends on the result of the check and on values ofon_fail andon_success:

If the check passes andon_success is "logical", returnTRUE
If the check passes andon_success is "data", return the input dataframe
If the check fails andon_fail is "error", return an error
If the check fails andon_fail is "summary", issue a warning andreturn a dataframe with a summary of the reasons for failure

Examples

dct_check_sci_name(  data.frame(scientificName = NA_character_),  on_fail = "summary", quiet = TRUE)dct_check_sci_name(data.frame(scientificName = "a"))

Check that taxonomicStatus is within valid values inDarwin Core taxonomic data

Description

Check that taxonomicStatus is within valid values inDarwin Core taxonomic data

Usage

dct_check_tax_status(  tax_dat,  on_fail = dct_options()$on_fail,  on_success = dct_options()$on_success,  valid_tax_status = dct_options()$valid_tax_status,  quiet = dct_options()$quiet)

Arguments

tax_dat

Dataframe; taxonomic database in DwC format.

on_fail

Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default"error".

on_success

Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default"data".

valid_tax_status

Character vector of length 1; valid values fortaxonomicStatus. Each value must be separated by a comma. Default⁠accepted, synonym, variant, NA⁠."NA" indicates that missing (NA) values are valid. Case-sensitive. (see Examples).

quiet

Logical vector of length 1; should warnings be silenced? DefaultFALSE.

Value

Depends on the result of the check and on values ofon_fail andon_success:

If the check passes andon_success is "logical", returnTRUE
If the check passes andon_success is "data", return the input dataframe
If the check fails andon_fail is "error", return an error
If the check fails andon_fail is "summary", issue a warning andreturn a dataframe with a summary of the reasons for failure

References

https://dwc.tdwg.org/terms/#dwc:taxonomicStatus

Examples

# The bad data has an taxonomicStatus (third row, "foo") that is not# a valid valuebad_dat <- tibble::tribble(  ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName,  "1", NA, "accepted", "Species foo",  "2", "1", "synonym", "Species bar",  "3", NA, "foo", "Species bat")dct_check_tax_status(bad_dat, on_fail = "summary", quiet = TRUE)# Example of setting valid values of taxonomicStatus via dct_options()# First store existing settings, including any changes made by the userold_settings <- dct_options()# Change options for valid_tax_statusdct_options(valid_tax_status = "provisionally accepted, synonym, NA")tibble::tribble(  ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName,  "1", NA, "provisionally accepted", "Species foo",  "2", "1", "synonym", "Species bar",  "3", NA, NA, "Strange name") |>  dct_check_tax_status()# Reset options to those before this example was rundo.call(dct_options, old_settings)

Check taxonID

Description

Check for correctly formatted taxonID column in Darwin Core taxonomic data.

Usage

dct_check_taxon_id(  tax_dat,  on_fail = dct_options()$on_fail,  on_success = dct_options()$on_success,  quiet = dct_options()$quiet)

Arguments

tax_dat

Dataframe; taxonomic database in DwC format.

on_fail

Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default"error".

on_success

Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default"data".

quiet

Logical vector of length 1; should warnings be silenced? DefaultFALSE.

Details

The following rules are enforced:

taxonID may not be missing (NA)
taxonID must be unique

Value

Depends on the result of the check and on values ofon_fail andon_success:

If the check passes andon_success is "logical", returnTRUE
If the check passes andon_success is "data", return the input dataframe
If the check fails andon_fail is "error", return an error
If the check fails andon_fail is "summary", issue a warning andreturn a dataframe with a summary of the reasons for failure

Examples

dct_check_taxon_id(  data.frame(taxonID = NA_character_),  on_fail = "summary", quiet = TRUE)dct_check_taxon_id(data.frame(taxonID = 1))

Drop row(s) of a taxonomic database

Description

Drop one or more rows from a taxonomic database in Darwin Core (DwC) formatby taxonID or scientificName.

Usage

dct_drop_row(tax_dat, taxonID = NULL, scientificName = NULL)

Arguments

tax_dat

Dataframe; taxonomic database in DwC format.

taxonID

Character or numeric vector; taxonID of the row(s)to be dropped.

scientificName

Character vector; scientificName of the row(s)to be dropped.

Details

Only works if values of taxonID or scientificName are unique and non-missingin the taxonomic database (tax_dat).

Either taxonID or scientificName should be provided, but not both.

Value

Dataframe; taxonomic database in DwC format

Examples

# Can drop rows by scientificName or taxonIDdct_filmies |>  dct_drop_row(scientificName = "Cephalomanes atrovirens Presl")dct_filmies |>  dct_drop_row(taxonID = "54133783")# Can drop multiple rows at once by providing multiple values for# scientificName or taxonIDdct_filmies |>  dct_drop_row(    scientificName = c(      "Cephalomanes atrovirens Presl",      "Trichomanes crassum Copel."    )  )dct_filmies |>  dct_drop_row(    taxonID = c(      "54133783", "54133783"    )  )

Fill a column of a taxonomic database

Description

Fill a column in a taxonomic database in Darwin Core (DwC) format.

Usage

dct_fill_col(  tax_dat,  fill_to = "acceptedNameUsage",  fill_from = "scientificName",  match_to = "taxonID",  match_from = "acceptedNameUsageID",  stamp_modified = dct_options()$stamp_modified)

Arguments

tax_dat

Dataframe; taxonomic database in DwC format.

fill_to

Character vector of length 1; name of column to fill.If the column does not yet exist it will be created.

fill_from

Character vector of length 1; name of column to copyvalues from when filling.

match_to

Character vector of length 1; name of column to match to.

match_from

Character vector of length 1; name of column to match from.

stamp_modified

Details

Several terms (columns) in DwC format come in pairs of "term" and "termID";for example, "acceptedNameUsage" and "acceptedNameUsageID", where the firstis the value in a human-readable form (in this case, scientific name of theaccepted taxon) and the second is the value used by a machine (in this case,taxonID of the accepted taxon). Other pairs include "parentNameUsage" and"parentNameUsageID", "scientificName" and "scientificNameID", etc. None arerequired to be used in a given DwC dataset.

Often when updating data, the user may only fill in one value or the other(e.g., "acceptedNameUsage" or "acceptedNameUsageID"), but not both. Thepurpose ofdct_fill_col() is to fill the missing column.

match_from andmatch_to are used to locate the values used for fillingeach cell. The values in thematch_to column must be unique.

The default settings are to fill acceptedNameUsage with values fromscientificName by matching acceptedNameUsageID to taxonID (see Example).

When adding timestamps withstamp_modified, any row that differs from theoriginal data (tax_dat) is considered modified. This includes when a newcolumn is added, in which case all rows will be considered modified.

Value

Dataframe; taxonomic database in DwC format.

Examples

# Fill acceptedNameUsage with values from scientificName by# matching acceptedNameUsageID to taxonID(head(dct_filmies, 5)) |>  dct_fill_col(    fill_to = "acceptedNameUsage",    fill_from = "scientificName",    match_to = "taxonID",    match_from = "acceptedNameUsageID"  )

Taxonomic data of filmy ferns

Description

Taxonomic data of filmy ferns (family Hymenophyllaceae) in Darwin Coreformat. Non-ASCII characters have been converted to ASCII, so some authornames may not be as expected. Meant for demonstration purposes only, notformal data analysis.

Usage

dct_filmies

Format

Dataframe (tibble), with 2451 rows and5 columns. For details about data format, seehttps://dwc.tdwg.org/terms/#taxon.

Details

Modified from data downloaded from theCatalog of Life under theCreative Commons Attribution (CC BY) 4.0license.

Source

https://www.catalogueoflife.org/

Examples

dct_filmies

Modify row(s) of a taxonomic database

Description

Modify one or more rows in a taxonomic database in Darwin Core (DwC) format.

Usage

dct_modify_row(  tax_dat,  taxonID = NULL,  scientificName = NULL,  taxonomicStatus = NULL,  acceptedNameUsageID = NULL,  acceptedNameUsage = NULL,  clear_usage_id = dct_options()$clear_usage_id,  clear_usage_name = dct_options()$clear_usage_name,  fill_usage_name = dct_options()$fill_usage_name,  remap_names = dct_options()$remap_names,  remap_parent = dct_options()$remap_parent,  remap_variant = dct_options()$remap_variant,  stamp_modified = dct_options()$stamp_modified,  stamp_modified_by_id = dct_options()$stamp_modified_by_id,  stamp_modified_by = dct_options()$stamp_modified_by,  strict = dct_options()$strict,  quiet = dct_options()$quiet,  args_tbl = NULL,  ...)

Arguments

tax_dat

Dataframe; taxonomic database in DwC format.

taxonID

Character or numeric vector of length 1; taxonID of the rowto be modified (the selected row).

scientificName

Character vector of length 1; scientificName of the rowto be modified iftaxonID isNULL, OR the scientificName to assign to theselected row iftaxonID is provided (see Details).

taxonomicStatus

Character vector of length 1; taxonomicStatus toassign to the selected row.

acceptedNameUsageID

Character or numeric vector of length 1;acceptedNameUsageID to assign to the selected row.

acceptedNameUsage

Character vector of length 1; acceptedNameUsage toassign to the selected row.

clear_usage_id

Logical vector of length 1; should acceptedNameUsageID of the selected row be set toNA if the word "accepted" is detected in tax_status (not case-sensitive)? DefaultTRUE.

clear_usage_name

Logical vector of length 1; should acceptedNameUsageID of the selected row be set toNA if the word "accepted" is detected in tax_status (not case-sensitive)? DefaultTRUE.

fill_usage_name

Logical vector of length 1; should the acceptedNameUsage of the selected row be set to the scientificName corresponding to its acceptedNameUsageID? DefaultTRUE.

remap_names

Logical vector of length 1; should the acceptedNameUsageID be updated (remapped) for rows with the same acceptedNameUsageID as the taxonID of the row to be modified? DefaultTRUE.

remap_parent

Logical vector of length 1; should the parentNameUsageID be updated (remapped) to that of its accepted name if it is a synonym? Will also apply to any other rows with the same parentNameUsageID as the taxonID of the row to be modified. DefaultTRUE.

remap_variant

Same asremap_names, but applies specifically to rows with taxonomicStatus of "variant". DefaultFALSE.

stamp_modified

stamp_modified_by_id

stamp_modified_by

strict

Logical vector of length 1; should taxonomic checks be run on the updated taxonomic database? DefaultFALSE..

quiet

Logical vector of length 1; should warnings be silenced? DefaultFALSE..

args_tbl

A dataframe including columns corresponding to one or more ofthe above arguments, except fortax_dat. In this case, the input taxonomicdatabase will be modified sequentially over each row of input inargs_tbl.Other DwC terms can also be included as additional columns,similar to using... to modify a single row.

...

other DwC terms to modify, specified as sets of named values.Each element of the vector must have a name corresponding to a validDwC term; seedct_terms.

Details

taxonID is only used to identify the row(s) to modify and is not itselfmodified.scientificName can be used in the same way iftaxonID is notprovided (as long asscientificName matches a single row). If bothtaxonID andscientificName are provided,scientificName will beassigned to the scientificName of the row identified bytaxonID, replacingany value that already exists.

acceptedNameUsageID andacceptedNameUsage must match existing values ofacceptedNameUsageID and acceptedNameUsage in the input data (tax_dat). Ondefault settings, either can be used and the other will be filled inautomatically (fill_usage_id andfill_usage_name are bothTRUE).

Any other arguments provided that are DwC terms will be assigned to theselected row (i.e., they will modify the row).

Ifremap_names isTRUE (default) andacceptedNameUsageID is provided,any names that have an acceptedNameUsageID matching the taxonID of theselected row (i.e., synonyms of that row) will also have theiracceptedNameUsageID replaced with the new acceptedNameUsageID. This appliestoacceptedNameUsage as well. This behavioris not applied to names with taxonomicStatus of "variant" by default, but canbe turned on for such names withremap_variant.

Ifremap_parent isTRUE (default) and theparentNameUsageID of theselected row is a synonym, theparentNameUsageID will be changed tothat of the accepted name (of the parent taxon). This will also apply to anyother row with the sameparentNameUsageID as the selected row. This appliestoparentNameUsage as well.

Ifclear_usage_id orclear_usage_name isTRUE andtaxonomicStatusincludes the word "accepted", acceptedNameUsageIDor acceptedNameUsage will be set to NA respectively, regardless of thevalues ofacceptedNameUsageID,acceptedNameUsage, orfill_usage_name.

Can either modify a single row in the input taxonomic database if eachargument is supplied as a vector of length 1, or can apply a set of changesto the taxonomic database if the input is supplied as a dataframe viaargs_tbl.

Value

Dataframe; taxonomic database in DwC format

Examples

# Swap the accepted / synonym status of# Cephalomanes crassum (Copel.) M. G. Price# and Trichomanes crassum Copel.dct_filmies |>  dct_modify_row(    scientificName = "Cephalomanes crassum (Copel.) M. G. Price",    taxonomicStatus = "synonym",    acceptedNameUsage = "Trichomanes crassum Copel."  ) |>  dct_modify_row(    scientificName = "Trichomanes crassum Copel.",    taxonomicStatus = "accepted"  ) |>  dct_validate(    check_tax_status = FALSE,    check_mapping_accepted_status = FALSE,    check_sci_name = FALSE  )# Sometimes changing one name will affect others, if they map# to the new synonymdct_modify_row(  tax_dat = dct_filmies |> head(),  scientificName = "Cephalomanes crassum (Copel.) M. G. Price",  taxonomicStatus = "synonym",  acceptedNameUsage = "Cephalomanes densinervium (Copel.) Copel.")# Apply a set of changeslibrary(tibble)updates <- tibble(  scientificName = c(    "Cephalomanes atrovirens Presl",    "Cephalomanes crassum (Copel.) M. G. Price"  ),  taxonomicStatus = "synonym",  acceptedNameUsage = "Trichomanes crassum Copel.")dct_filmies |>  dct_modify_row(args_tbl = updates) |>  dct_modify_row(    scientificName = "Trichomanes crassum Copel.",    taxonomicStatus = "accepted"  )

Get and set function arguments via options

Description

Changes the default values of function arguments.

Usage

dct_options(reset = FALSE, ...)

Arguments

reset

Logical vector of length 1; if TRUE, reset all options to theirdefault values.

...

Any number ofargument = value pairs, where the left side is thename of the argument and the right side is its value. See Details andExamples.

Details

Use this to change the default values of function arguments. That way, youdon't have to type the same thing each time you call a function.

The arguments that can be set with this function are as follows:

Validation arguments

check_col_names: Logical vector of length 1; should all column names be required to be a valid DwC term? DefaultTRUE.
check_mapping_accepted_status: Logical vector of length 1; should rules about mapping of variants and synonyms be enforced? DefaultFALSE.(Seedct_validate()).
check_mapping_accepted: Logical vector of length 1; should all values ofacceptedNameUsageID be required to map to thetaxonID of an existing name? DefaultTRUE.
check_mapping_original: Logical vector of length 1; should all values oforiginalNameUsageID be required to map to thetaxonID of an existing name? DefaultTRUE.
check_mapping_parent: Logical vector of length 1; should all values ofparentNameUsageID be required to map to thetaxonID of an existing name? DefaultTRUE.
check_mapping_parent_accepted: Logical vector of length 1; should all values ofparentNameUsageID be required to map to thetaxonID of an accepted name? DefaultFALSE.
check_sci_name: Logical vector of length 1; should all instances ofscientificName be required to be non-missing and unique? DefaultTRUE.
check_status_diff: Logical vector of length 1; should each scientific name be allowed to have only one taxonomic status? DefaultFALSE.
check_tax_status: Logical vector of length 1; should all taxonomic names be required to have a valid value for taxonomic status (by default, "accepted", "synonym", or "variant")? DefaultTRUE.
check_taxon_id: Logical vector of length 1; should all instances oftaxonID be required to be non-missing and unique? DefaultTRUE.
extra_cols: Character vector; names of columns that should be allowed beyondthose defined by the DwC taxon standard. Default NULL. Providing column name(s) that are valid DwC taxon column(s) has no effect.
on_fail: Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default"error".
on_success: Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default"data".
skip_missing_cols: Logical vector of length 1; should checks be silently skipped if any of thecolumns they inspect are missing? DefaultFALSE.
valid_tax_status: Character vector of length 1; valid values fortaxonomicStatus. Each value must be separated by a comma. Default⁠accepted, synonym, variant, NA⁠."NA" indicates that missing (NA) values are valid. Case-sensitive.

Editing arguments

clear_usage_id: Logical vector of length 1; should acceptedNameUsageID of the selected row be set toNA if the word "accepted" is detected in tax_status (not case-sensitive)? DefaultTRUE.
clear_usage_name: Logical vector of length 1; should acceptedNameUsage of the selected row be set toNA if the word "accepted" is detected in tax_status (not case-sensitive)? DefaultTRUE.
fill_taxon_id: Logical vector of length 1; iftaxon_id is not provided, should values in the taxonID column be filled in by generating them automatically from the scientificName? If thetaxonID column does not yet exist it will be created. DefaultTRUE.
fill_usage_id: Logical vector of length 1; ifusage_id is not provided, should values in the acceptedNameUsageID column be filled in by matching acceptedNameUsage to scientificName? If theacceptedNameUsageID column does not yet exist it will be created. DefaultTRUE.
fill_usage_name: Logical vector of length 1; should the acceptedNameUsage of the selected row be set to the scientificName corresponding to its acceptedNameUsageID? DefaultTRUE.
remap_names: Logical vector of length 1; should the acceptedNameUsageID be updated (remapped) for rows with the same acceptedNameUsageID as the taxonID of the row to be modified? DefaultTRUE.
remap_parent: Logical vector of length 1; should the parentNameUsageID be updated (remapped) to that of its accepted name if it is a synonym? Will also apply to any other rows with the same parentNameUsageID as the taxonID of the row to be modified. DefaultTRUE.
remap_variant: Same asremap_names, but applies specifically to rows with taxonomicStatus of "variant". DefaultFALSE.
stamp_modified: Logical vector of length 1; should themodified column of any newly created or modified row include a timestamp with the date and time of its creation/modification? If themodified column does not yet exist it will be created. DefaultTRUE.
stamp_modified_by: Logical vector of length 1; should themodifiedBy column of any newly created or modified row include the name of thecurrent user? If themodifiedBy column does not yet exist it will be created; note thatthis is a non-DWC standard column, so"modifiedBy" is required inextra_cols. The current user can be specified with theuser_name option. DefaultFALSE.
stamp_modified_by_id: Logical vector of length 1; should themodifiedByID column of any newly created or modified row include the ID of thecurrent user? If themodifiedByID column does not yet exist it will be created; note thatthis is a non-DWC standard column, so"modifiedByID" is required inextra_cols. The current user ID can be specified with theuser_id option. DefaultFALSE.
taxon_id_length: Numeric vector of length 1; how many characters should be included in automatically generated values of taxonID? Must be between 1 and 32, inclusive. Default32.

General arguments

quiet: Logical vector of length 1; should warnings be silenced? DefaultFALSE.
strict: Logical vector of length 1; should taxonomic checks be run on the updated taxonomic database? DefaultFALSE.
user_name: Character vector of length 1; the name of the current user. Default"".
user_id: Character vector of length 1; the ID of the current user. Default"".

Value

Nothing; used for its side-effect.

Examples

# Show all optionsdct_options()# Store existing settings, including any changes made by the userold_settings <- dct_options()# View one optiondct_options()$valid_tax_status# Change one optiondct_options(valid_tax_status = "accepted, weird, whatever")dct_options()$valid_tax_status# Reset to default valuesdct_options(reset = TRUE)dct_options()$valid_tax_status# Multiple options may also be set at oncedct_options(check_taxon_id = FALSE, check_status_diff = TRUE)# Reset options to those before this example was rundo.call(dct_options, old_settings)

Darwin Core Taxon terms

Description

A table of valid Darwin Core terms. Only terms in the Taxon class or at therecord-level are included.

Usage

dct_terms

Format

Dataframe (tibble), including two columns:

group: Darwin Core term group; either "taxon" (terms in the Taxon class)or "record-level" (terms that are generic in that they might applyto any type of record in a dataset.)
term: Darwin Core term

with two additional attributes:

retrieved: Date the terms were obtained
url: URL from which the terms were obtained

Details

Modified from data downloaded fromTDWG Darwin Coreunder theCreative Commons Attribution (CC BY) 4.0license.

Source

https://dwc.tdwg.org/terms/#taxon

Examples

dct_terms

Validate a taxonomic database

Description

Runs a series of automated checks on a taxonomic database in Darwin Core(DwC) format.

Usage

dct_validate(  tax_dat,  check_taxon_id = dct_options()$check_taxon_id,  check_tax_status = dct_options()$check_tax_status,  check_mapping_accepted = dct_options()$check_mapping_accepted,  check_mapping_parent = dct_options()$check_mapping_parent,  check_mapping_parent_accepted = dct_options()$check_mapping_parent_accepted,  check_mapping_original = dct_options()$check_mapping_original,  check_mapping_accepted_status = dct_options()$check_mapping_accepted_status,  check_sci_name = dct_options()$check_sci_name,  check_status_diff = dct_options()$check_status_diff,  check_col_names = dct_options()$check_col_names,  valid_tax_status = dct_options()$valid_tax_status,  extra_cols = dct_options()$extra_cols,  on_success = dct_options()$on_success,  on_fail = dct_options()$on_fail,  skip_missing_cols = dct_options()$skip_missing_cols,  quiet = dct_options()$quiet)

Arguments

tax_dat

Dataframe; taxonomic database in DwC format.

check_taxon_id

Logical vector of length 1; should all instances oftaxonID be required to be non-missing and unique? DefaultTRUE.

check_tax_status

Logical vector of length 1; should all taxonomic names be required to have a valid value for taxonomic status (by default, "accepted", "synonym", or "variant")? DefaultTRUE.

check_mapping_accepted

Logical vector of length 1; should all values ofacceptedNameUsageID be required to map to thetaxonID of an existing name? DefaultTRUE.

check_mapping_parent

Logical vector of length 1; should all values ofparentNameUsageID be required to map to thetaxonID of an existing name? DefaultTRUE.

check_mapping_parent_accepted

Logical vector of length 1; should all values ofparentNameUsageID be required to map to thetaxonID of an accepted name? DefaultFALSE.

check_mapping_original

Logical vector of length 1; should all values oforiginalNameUsageID be required to map to thetaxonID of an existing name? DefaultTRUE.

check_mapping_accepted_status

Logical vector of length 1; should rules about mapping of variants and synonyms be enforced? DefaultFALSE.(see Details).

check_sci_name

Logical vector of length 1; should all instances ofscientificName be required to be non-missing and unique? DefaultTRUE.

check_status_diff

Logical vector of length 1; should each scientific name be allowed to have only one taxonomic status? DefaultFALSE.

check_col_names

Logical vector of length 1; should all column names be required to be a valid DwC term? DefaultTRUE.

valid_tax_status

extra_cols

Character vector; names of columns that should be allowed beyondthose defined by the DwC taxon standard. Default NULL. Providing column name(s) that are valid DwC taxon column(s) has no effect.

on_success

Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default"data".

on_fail

Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default"error".

skip_missing_cols

Logical vector of length 1; should checks be silently skipped if any of thecolumns they inspect are missing? DefaultFALSE.

quiet

Logical vector of length 1; should warnings be silenced? DefaultFALSE.

Details

Forcheck_mapping_accepted_status andcheck_status_diff, "accepted","synonym", and "variant" are determined by string matching oftaxonomicStatus; so "provisionally accepted" is counted as "accepted","ambiguous synonym" is counted as "synonym", etc. (case-sensitive).

Forcheck_mapping_accepted_status, the following rules are enforced:

Rows withtaxonomicStatus of "synonym" (synonyms) must have anacceptedNameUsageID matching thetaxonID of an accepted name(taxonomicStatus of "accepted")
Rows withtaxonomicStatus of "variant" (orthographic variants) musthave anacceptedNameUsageID matching thetaxonID of an accepted name orsynonym (but not another variant)
Rows withtaxonomicStatus of "accepted" must not have any value enteredforacceptedNameUsageID
Rows with a value foracceptedNameUsageID must have a valid value fortaxonomicStatus.

Default settings of all arguments can be modified withdct_options() (seeExamples).

Most columns are expected to be vectors of class character, but this is notchecked for all columns. Columns (DwC terms) with names including 'ID', forexample 'taxonID', may be character, numeric, or integer.

Value

Depends on the result of the check and on values ofon_fail andon_success:

If the check passes andon_success is "logical", returnTRUE
If the check passes andon_success is "data", return the input dataframe
If the check fails andon_fail is "error", return an error
If the check fails andon_fail is "summary", issue a warning andreturn a dataframe with a summary of the reasons for failure

Examples

# The example dataset dct_filmies is already correctly formatted and passes# validationdct_validate(dct_filmies)# So make some bad data on purpose with a duplicated scientific namebad_dat <- dct_filmiesbad_dat$scientificName[1] <- bad_dat$scientificName[2]# The incorrectly formatted data won't passtry(  dct_validate(bad_dat))# It will pass if we allow duplicated scientific names thoughdct_validate(bad_dat, check_sci_name = FALSE)# Individual checks can also be turned or off with dct_options()# First save the current settings before making any changesold_settings <- dct_options()# Let's allow duplicated scientific names by defaultdct_options(check_sci_name = FALSE)# The data passes validation as before, but we don't have to specify# `check_sci_name = FALSE` in the function calldct_validate(bad_dat)# Reset options to those before this example was rundo.call(dct_options, old_settings)

Movatterモバイル変換

dwctaxon: Edit and Validate Darwin Core Taxon Data

Description

Author(s)

See Also

Add row(s) to a taxonomic database

Description

Usage

Arguments

Details

Value

Examples

Check mapping of usage taxonomic IDs

Description

Usage

Arguments

Details

Value

Examples

Check scientificName

Description

Usage

Arguments

Details

Value

Examples

Check that taxonomicStatus is within valid values inDarwin Core taxonomic data

Description

Usage

Arguments

Value

References

Examples

Check taxonID

Description

Usage

Arguments

Details

Value

Examples

Drop row(s) of a taxonomic database

Description

Usage

Arguments

Details

Value

Examples

Fill a column of a taxonomic database

Description

Usage

Arguments

Details

Value

Examples

Taxonomic data of filmy ferns

Description

Usage

Format

Details

Source

Examples

Modify row(s) of a taxonomic database

Description

Usage

Arguments

Details

Value

Examples

Get and set function arguments via options

Description

Usage

Arguments

Details

Validation arguments

Editing arguments

General arguments

Value

Examples

Darwin Core Taxon terms

Description