| Title: | Edit and Validate Darwin Core Taxon Data |
| Version: | 2.0.4 |
| Description: | Edit and validate taxonomic data in compliance with Darwin Core standards (Darwin Core 'Taxon' classhttps://dwc.tdwg.org/terms/#taxon). |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| Imports: | assertthat, digest, dplyr, glue, purrr, rlang, settings,stringr, tibble |
| Suggests: | testthat (≥ 3.0.0), mockery, readr, usethis, knitr,rmarkdown, patrick, stringi, english, tidyr, utils, curl, httr |
| Depends: | R (≥ 4.2.0) |
| Config/testthat/edition: | 3 |
| URL: | https://docs.ropensci.org/dwctaxon/,https://github.com/ropensci/dwctaxon |
| BugReports: | https://github.com/ropensci/dwctaxon/issues |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2025-12-15 02:07:24 UTC; joelnitta |
| Author: | Joel H. Nitta |
| Maintainer: | Joel H. Nitta <joelnitta@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-12-15 10:00:02 UTC |
dwctaxon: Edit and Validate Darwin Core Taxon Data
Description

Edit and validate taxonomic data in compliance with Darwin Core standards (Darwin Core 'Taxon' classhttps://dwc.tdwg.org/terms/#taxon).
Author(s)
Maintainer: Joel H. Nittajoelnitta@gmail.com (ORCID) [copyright holder]
Other contributors:
Wataru Iwasaki (ORCID) [contributor]
Collin Schwantes (Collin reviewed the package (v. 1.0.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/574>) [reviewer]
Stephen Formel (Stephen reviewed the package (v. 1.0.0.9000) for rOpenSci, see <https://github.com/ropensci/software-review/issues/574>) [reviewer]
See Also
Useful links:
Report bugs athttps://github.com/ropensci/dwctaxon/issues
Add row(s) to a taxonomic database
Description
Add one or more rows to a taxonomic database in Darwin Core (DwC) format.
Usage
dct_add_row( tax_dat, taxonID = NULL, scientificName = NULL, taxonomicStatus = NULL, acceptedNameUsageID = NULL, acceptedNameUsage = NULL, new_dat = NULL, fill_taxon_id = dct_options()$fill_taxon_id, fill_usage_id = dct_options()$fill_usage_id, taxon_id_length = dct_options()$taxon_id_length, stamp_modified = dct_options()$stamp_modified, stamp_modified_by_id = dct_options()$stamp_modified_by_id, stamp_modified_by = dct_options()$stamp_modified_by, strict = dct_options()$strict, ...)Arguments
tax_dat | Dataframe; taxonomic database in DwC format. |
taxonID | Character or numeric vector; values to add to taxonID column.Ignored if |
scientificName | Character vector; values to add to scientificNamecolumn. Ignored if |
taxonomicStatus | Character vector; values to add to taxonomicStatuscolumn. Ignored if |
acceptedNameUsageID | Character or numeric vector; values to add toacceptedNameUsageID column. Ignored if |
acceptedNameUsage | Character vector; values to add to acceptedNameUsagecolumn. Ignored if |
new_dat | A dataframe including columns corresponding to one or more ofthe above arguments, except for |
fill_taxon_id | Logical vector of length 1; if |
fill_usage_id | Logical vector of length 1; if |
taxon_id_length | Numeric vector of length 1; how many characters should be included in automatically generated values of taxonID? Must be between 1 and 32, inclusive. Default |
stamp_modified | Logical vector of length 1; should the |
stamp_modified_by_id | Logical vector of length 1; should the |
stamp_modified_by | Logical vector of length 1; should the |
strict | Logical vector of length 1; should taxonomic checks be run on the updated taxonomic database? Default |
... | Additional data to add, specified as sets of namedcharacter or numeric vectors; e.g., |
Details
fill_taxon_id andfill_usage_id only act on the newly added data (theydo not fill columns intax_dat).
If "taxonID" is not provided for the new row andfill_taxon_id isTRUE,a value for taxonID will be automatically generated from the md5 hash digestof the scientific name.
To modify settings used for validation ifstrict isTRUE,usedct_options().
Value
Dataframe; taxonomic database in DwC format.
Examples
tibble::tibble( taxonID = "123", scientificName = "Foogenus barspecies", acceptedNameUsageID = NA_character_, taxonomicStatus = "accepted") |> dct_add_row( scientificName = "Foogenus barspecies var. bla", parentNameUsageID = "123", nameAccordingTo = "me", strict = TRUE )Check mapping of usage taxonomic IDs
Description
Check that values of terms like 'acceptedUsageID' map properly to taxonID inDarwin Core (DwC) taxonomic data.
Usage
dct_check_mapping( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, col_select = "acceptedNameUsageID", quiet = dct_options()$quiet)Arguments
tax_dat | Dataframe; taxonomic database in DwC format. |
on_fail | Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default |
on_success | Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default |
col_select | Character vector of length 1; the name of the column(DwC term) to check. Default |
quiet | Logical vector of length 1; should warnings be silenced? Default |
Details
The following rules are enforced:
Value of taxonID may not be identical to that of the selected column withina single row (in other words, a name cannot be its own accepted name,parent taxon, or basionym).
Every value in the selected column must have a corresponding taxonID.
col_select can take one of the following values:
"acceptedNameUsageID": taxonID corresponding to the accepted name (ofa synonym)."parentNameUsageID": taxonID corresponding to the immediate parent taxonof a name (for example, for a species, this would be the genus)."originalNameUsageID": taxonID corresponding to the basionym of a name.
Value
Depends on the result of the check and on values ofon_fail andon_success:
If the check passes and
on_successis "logical", returnTRUEIf the check passes and
on_successis "data", return the input dataframeIf the check fails and
on_failis "error", return an errorIf the check fails and
on_failis "summary", issue a warning andreturn a dataframe with a summary of the reasons for failure
Examples
# The bad data has an acceptedNameUsageID (third row, "4") that lacks a# corresponding taxonIDbad_dat <- tibble::tribble( ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName, "1", NA, "accepted", "Species foo", "2", "1", "synonym", "Species bar", "3", "4", "synonym", "Species bat")dct_check_mapping(bad_dat, on_fail = "summary", quiet = TRUE)Check scientificName
Description
Check for correctly formatted scientificName column in Darwin Coretaxonomic data.
Usage
dct_check_sci_name( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, quiet = dct_options()$quiet)Arguments
tax_dat | Dataframe; taxonomic database in DwC format. |
on_fail | Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default |
on_success | Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default |
quiet | Logical vector of length 1; should warnings be silenced? Default |
Details
The following rules are enforced:
scientificName may not be missing (NA)
scientificName must be unique
Value
Depends on the result of the check and on values ofon_fail andon_success:
If the check passes and
on_successis "logical", returnTRUEIf the check passes and
on_successis "data", return the input dataframeIf the check fails and
on_failis "error", return an errorIf the check fails and
on_failis "summary", issue a warning andreturn a dataframe with a summary of the reasons for failure
Examples
dct_check_sci_name( data.frame(scientificName = NA_character_), on_fail = "summary", quiet = TRUE)dct_check_sci_name(data.frame(scientificName = "a"))Check that taxonomicStatus is within valid values inDarwin Core taxonomic data
Description
Check that taxonomicStatus is within valid values inDarwin Core taxonomic data
Usage
dct_check_tax_status( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, valid_tax_status = dct_options()$valid_tax_status, quiet = dct_options()$quiet)Arguments
tax_dat | Dataframe; taxonomic database in DwC format. |
on_fail | Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default |
on_success | Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default |
valid_tax_status | Character vector of length 1; valid values for |
quiet | Logical vector of length 1; should warnings be silenced? Default |
Value
Depends on the result of the check and on values ofon_fail andon_success:
If the check passes and
on_successis "logical", returnTRUEIf the check passes and
on_successis "data", return the input dataframeIf the check fails and
on_failis "error", return an errorIf the check fails and
on_failis "summary", issue a warning andreturn a dataframe with a summary of the reasons for failure
References
https://dwc.tdwg.org/terms/#dwc:taxonomicStatus
Examples
# The bad data has an taxonomicStatus (third row, "foo") that is not# a valid valuebad_dat <- tibble::tribble( ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName, "1", NA, "accepted", "Species foo", "2", "1", "synonym", "Species bar", "3", NA, "foo", "Species bat")dct_check_tax_status(bad_dat, on_fail = "summary", quiet = TRUE)# Example of setting valid values of taxonomicStatus via dct_options()# First store existing settings, including any changes made by the userold_settings <- dct_options()# Change options for valid_tax_statusdct_options(valid_tax_status = "provisionally accepted, synonym, NA")tibble::tribble( ~taxonID, ~acceptedNameUsageID, ~taxonomicStatus, ~scientificName, "1", NA, "provisionally accepted", "Species foo", "2", "1", "synonym", "Species bar", "3", NA, NA, "Strange name") |> dct_check_tax_status()# Reset options to those before this example was rundo.call(dct_options, old_settings)Check taxonID
Description
Check for correctly formatted taxonID column in Darwin Core taxonomic data.
Usage
dct_check_taxon_id( tax_dat, on_fail = dct_options()$on_fail, on_success = dct_options()$on_success, quiet = dct_options()$quiet)Arguments
tax_dat | Dataframe; taxonomic database in DwC format. |
on_fail | Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default |
on_success | Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default |
quiet | Logical vector of length 1; should warnings be silenced? Default |
Details
The following rules are enforced:
taxonID may not be missing (NA)
taxonID must be unique
Value
Depends on the result of the check and on values ofon_fail andon_success:
If the check passes and
on_successis "logical", returnTRUEIf the check passes and
on_successis "data", return the input dataframeIf the check fails and
on_failis "error", return an errorIf the check fails and
on_failis "summary", issue a warning andreturn a dataframe with a summary of the reasons for failure
Examples
dct_check_taxon_id( data.frame(taxonID = NA_character_), on_fail = "summary", quiet = TRUE)dct_check_taxon_id(data.frame(taxonID = 1))Drop row(s) of a taxonomic database
Description
Drop one or more rows from a taxonomic database in Darwin Core (DwC) formatby taxonID or scientificName.
Usage
dct_drop_row(tax_dat, taxonID = NULL, scientificName = NULL)Arguments
tax_dat | Dataframe; taxonomic database in DwC format. |
taxonID | Character or numeric vector; taxonID of the row(s)to be dropped. |
scientificName | Character vector; scientificName of the row(s)to be dropped. |
Details
Only works if values of taxonID or scientificName are unique and non-missingin the taxonomic database (tax_dat).
Either taxonID or scientificName should be provided, but not both.
Value
Dataframe; taxonomic database in DwC format
Examples
# Can drop rows by scientificName or taxonIDdct_filmies |> dct_drop_row(scientificName = "Cephalomanes atrovirens Presl")dct_filmies |> dct_drop_row(taxonID = "54133783")# Can drop multiple rows at once by providing multiple values for# scientificName or taxonIDdct_filmies |> dct_drop_row( scientificName = c( "Cephalomanes atrovirens Presl", "Trichomanes crassum Copel." ) )dct_filmies |> dct_drop_row( taxonID = c( "54133783", "54133783" ) )Fill a column of a taxonomic database
Description
Fill a column in a taxonomic database in Darwin Core (DwC) format.
Usage
dct_fill_col( tax_dat, fill_to = "acceptedNameUsage", fill_from = "scientificName", match_to = "taxonID", match_from = "acceptedNameUsageID", stamp_modified = dct_options()$stamp_modified)Arguments
tax_dat | Dataframe; taxonomic database in DwC format. |
fill_to | Character vector of length 1; name of column to fill.If the column does not yet exist it will be created. |
fill_from | Character vector of length 1; name of column to copyvalues from when filling. |
match_to | Character vector of length 1; name of column to match to. |
match_from | Character vector of length 1; name of column to match from. |
stamp_modified | Logical vector of length 1; should the |
Details
Several terms (columns) in DwC format come in pairs of "term" and "termID";for example, "acceptedNameUsage" and "acceptedNameUsageID", where the firstis the value in a human-readable form (in this case, scientific name of theaccepted taxon) and the second is the value used by a machine (in this case,taxonID of the accepted taxon). Other pairs include "parentNameUsage" and"parentNameUsageID", "scientificName" and "scientificNameID", etc. None arerequired to be used in a given DwC dataset.
Often when updating data, the user may only fill in one value or the other(e.g., "acceptedNameUsage" or "acceptedNameUsageID"), but not both. Thepurpose ofdct_fill_col() is to fill the missing column.
match_from andmatch_to are used to locate the values used for fillingeach cell. The values in thematch_to column must be unique.
The default settings are to fill acceptedNameUsage with values fromscientificName by matching acceptedNameUsageID to taxonID (see Example).
When adding timestamps withstamp_modified, any row that differs from theoriginal data (tax_dat) is considered modified. This includes when a newcolumn is added, in which case all rows will be considered modified.
Value
Dataframe; taxonomic database in DwC format.
Examples
# Fill acceptedNameUsage with values from scientificName by# matching acceptedNameUsageID to taxonID(head(dct_filmies, 5)) |> dct_fill_col( fill_to = "acceptedNameUsage", fill_from = "scientificName", match_to = "taxonID", match_from = "acceptedNameUsageID" )Taxonomic data of filmy ferns
Description
Taxonomic data of filmy ferns (family Hymenophyllaceae) in Darwin Coreformat. Non-ASCII characters have been converted to ASCII, so some authornames may not be as expected. Meant for demonstration purposes only, notformal data analysis.
Usage
dct_filmiesFormat
Dataframe (tibble), with 2451 rows and5 columns. For details about data format, seehttps://dwc.tdwg.org/terms/#taxon.
Details
Modified from data downloaded from theCatalog of Life under theCreative Commons Attribution (CC BY) 4.0license.
Source
https://www.catalogueoflife.org/
Examples
dct_filmiesModify row(s) of a taxonomic database
Description
Modify one or more rows in a taxonomic database in Darwin Core (DwC) format.
Usage
dct_modify_row( tax_dat, taxonID = NULL, scientificName = NULL, taxonomicStatus = NULL, acceptedNameUsageID = NULL, acceptedNameUsage = NULL, clear_usage_id = dct_options()$clear_usage_id, clear_usage_name = dct_options()$clear_usage_name, fill_usage_name = dct_options()$fill_usage_name, remap_names = dct_options()$remap_names, remap_parent = dct_options()$remap_parent, remap_variant = dct_options()$remap_variant, stamp_modified = dct_options()$stamp_modified, stamp_modified_by_id = dct_options()$stamp_modified_by_id, stamp_modified_by = dct_options()$stamp_modified_by, strict = dct_options()$strict, quiet = dct_options()$quiet, args_tbl = NULL, ...)Arguments
tax_dat | Dataframe; taxonomic database in DwC format. |
taxonID | Character or numeric vector of length 1; taxonID of the rowto be modified (the selected row). |
scientificName | Character vector of length 1; scientificName of the rowto be modified if |
taxonomicStatus | Character vector of length 1; taxonomicStatus toassign to the selected row. |
acceptedNameUsageID | Character or numeric vector of length 1;acceptedNameUsageID to assign to the selected row. |
acceptedNameUsage | Character vector of length 1; acceptedNameUsage toassign to the selected row. |
clear_usage_id | Logical vector of length 1; should acceptedNameUsageID of the selected row be set to |
clear_usage_name | Logical vector of length 1; should acceptedNameUsageID of the selected row be set to |
fill_usage_name | Logical vector of length 1; should the acceptedNameUsage of the selected row be set to the scientificName corresponding to its acceptedNameUsageID? Default |
remap_names | Logical vector of length 1; should the acceptedNameUsageID be updated (remapped) for rows with the same acceptedNameUsageID as the taxonID of the row to be modified? Default |
remap_parent | Logical vector of length 1; should the parentNameUsageID be updated (remapped) to that of its accepted name if it is a synonym? Will also apply to any other rows with the same parentNameUsageID as the taxonID of the row to be modified. Default |
remap_variant | Same as |
stamp_modified | Logical vector of length 1; should the |
stamp_modified_by_id | Logical vector of length 1; should the |
stamp_modified_by | Logical vector of length 1; should the |
strict | Logical vector of length 1; should taxonomic checks be run on the updated taxonomic database? Default |
quiet | Logical vector of length 1; should warnings be silenced? Default |
args_tbl | A dataframe including columns corresponding to one or more ofthe above arguments, except for |
... | other DwC terms to modify, specified as sets of named values.Each element of the vector must have a name corresponding to a validDwC term; seedct_terms. |
Details
taxonID is only used to identify the row(s) to modify and is not itselfmodified.scientificName can be used in the same way iftaxonID is notprovided (as long asscientificName matches a single row). If bothtaxonID andscientificName are provided,scientificName will beassigned to the scientificName of the row identified bytaxonID, replacingany value that already exists.
acceptedNameUsageID andacceptedNameUsage must match existing values ofacceptedNameUsageID and acceptedNameUsage in the input data (tax_dat). Ondefault settings, either can be used and the other will be filled inautomatically (fill_usage_id andfill_usage_name are bothTRUE).
Any other arguments provided that are DwC terms will be assigned to theselected row (i.e., they will modify the row).
Ifremap_names isTRUE (default) andacceptedNameUsageID is provided,any names that have an acceptedNameUsageID matching the taxonID of theselected row (i.e., synonyms of that row) will also have theiracceptedNameUsageID replaced with the new acceptedNameUsageID. This appliestoacceptedNameUsage as well. This behavioris not applied to names with taxonomicStatus of "variant" by default, but canbe turned on for such names withremap_variant.
Ifremap_parent isTRUE (default) and theparentNameUsageID of theselected row is a synonym, theparentNameUsageID will be changed tothat of the accepted name (of the parent taxon). This will also apply to anyother row with the sameparentNameUsageID as the selected row. This appliestoparentNameUsage as well.
Ifclear_usage_id orclear_usage_name isTRUE andtaxonomicStatusincludes the word "accepted", acceptedNameUsageIDor acceptedNameUsage will be set to NA respectively, regardless of thevalues ofacceptedNameUsageID,acceptedNameUsage, orfill_usage_name.
Can either modify a single row in the input taxonomic database if eachargument is supplied as a vector of length 1, or can apply a set of changesto the taxonomic database if the input is supplied as a dataframe viaargs_tbl.
Value
Dataframe; taxonomic database in DwC format
Examples
# Swap the accepted / synonym status of# Cephalomanes crassum (Copel.) M. G. Price# and Trichomanes crassum Copel.dct_filmies |> dct_modify_row( scientificName = "Cephalomanes crassum (Copel.) M. G. Price", taxonomicStatus = "synonym", acceptedNameUsage = "Trichomanes crassum Copel." ) |> dct_modify_row( scientificName = "Trichomanes crassum Copel.", taxonomicStatus = "accepted" ) |> dct_validate( check_tax_status = FALSE, check_mapping_accepted_status = FALSE, check_sci_name = FALSE )# Sometimes changing one name will affect others, if they map# to the new synonymdct_modify_row( tax_dat = dct_filmies |> head(), scientificName = "Cephalomanes crassum (Copel.) M. G. Price", taxonomicStatus = "synonym", acceptedNameUsage = "Cephalomanes densinervium (Copel.) Copel.")# Apply a set of changeslibrary(tibble)updates <- tibble( scientificName = c( "Cephalomanes atrovirens Presl", "Cephalomanes crassum (Copel.) M. G. Price" ), taxonomicStatus = "synonym", acceptedNameUsage = "Trichomanes crassum Copel.")dct_filmies |> dct_modify_row(args_tbl = updates) |> dct_modify_row( scientificName = "Trichomanes crassum Copel.", taxonomicStatus = "accepted" )Get and set function arguments via options
Description
Changes the default values of function arguments.
Usage
dct_options(reset = FALSE, ...)Arguments
reset | Logical vector of length 1; if TRUE, reset all options to theirdefault values. |
... | Any number of |
Details
Use this to change the default values of function arguments. That way, youdon't have to type the same thing each time you call a function.
The arguments that can be set with this function are as follows:
Validation arguments
check_col_names: Logical vector of length 1; should all column names be required to be a valid DwC term? DefaultTRUE.check_mapping_accepted_status: Logical vector of length 1; should rules about mapping of variants and synonyms be enforced? DefaultFALSE.(Seedct_validate()).check_mapping_accepted: Logical vector of length 1; should all values ofacceptedNameUsageIDbe required to map to thetaxonIDof an existing name? DefaultTRUE.check_mapping_original: Logical vector of length 1; should all values oforiginalNameUsageIDbe required to map to thetaxonIDof an existing name? DefaultTRUE.check_mapping_parent: Logical vector of length 1; should all values ofparentNameUsageIDbe required to map to thetaxonIDof an existing name? DefaultTRUE.check_mapping_parent_accepted: Logical vector of length 1; should all values ofparentNameUsageIDbe required to map to thetaxonIDof an accepted name? DefaultFALSE.check_sci_name: Logical vector of length 1; should all instances ofscientificNamebe required to be non-missing and unique? DefaultTRUE.check_status_diff: Logical vector of length 1; should each scientific name be allowed to have only one taxonomic status? DefaultFALSE.check_tax_status: Logical vector of length 1; should all taxonomic names be required to have a valid value for taxonomic status (by default, "accepted", "synonym", or "variant")? DefaultTRUE.check_taxon_id: Logical vector of length 1; should all instances oftaxonIDbe required to be non-missing and unique? DefaultTRUE.extra_cols: Character vector; names of columns that should be allowed beyondthose defined by the DwC taxon standard. Default NULL. Providing column name(s) that are valid DwC taxon column(s) has no effect.on_fail: Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default"error".on_success: Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default"data".skip_missing_cols: Logical vector of length 1; should checks be silently skipped if any of thecolumns they inspect are missing? DefaultFALSE.valid_tax_status: Character vector of length 1; valid values fortaxonomicStatus. Each value must be separated by a comma. Defaultaccepted, synonym, variant, NA."NA"indicates that missing (NA) values are valid. Case-sensitive.
Editing arguments
clear_usage_id: Logical vector of length 1; should acceptedNameUsageID of the selected row be set toNAif the word "accepted" is detected in tax_status (not case-sensitive)? DefaultTRUE.clear_usage_name: Logical vector of length 1; should acceptedNameUsage of the selected row be set toNAif the word "accepted" is detected in tax_status (not case-sensitive)? DefaultTRUE.fill_taxon_id: Logical vector of length 1; iftaxon_idis not provided, should values in the taxonID column be filled in by generating them automatically from the scientificName? If thetaxonIDcolumn does not yet exist it will be created. DefaultTRUE.fill_usage_id: Logical vector of length 1; ifusage_idis not provided, should values in the acceptedNameUsageID column be filled in by matching acceptedNameUsage to scientificName? If theacceptedNameUsageIDcolumn does not yet exist it will be created. DefaultTRUE.fill_usage_name: Logical vector of length 1; should the acceptedNameUsage of the selected row be set to the scientificName corresponding to its acceptedNameUsageID? DefaultTRUE.remap_names: Logical vector of length 1; should the acceptedNameUsageID be updated (remapped) for rows with the same acceptedNameUsageID as the taxonID of the row to be modified? DefaultTRUE.remap_parent: Logical vector of length 1; should the parentNameUsageID be updated (remapped) to that of its accepted name if it is a synonym? Will also apply to any other rows with the same parentNameUsageID as the taxonID of the row to be modified. DefaultTRUE.remap_variant: Same asremap_names, but applies specifically to rows with taxonomicStatus of "variant". DefaultFALSE.stamp_modified: Logical vector of length 1; should themodifiedcolumn of any newly created or modified row include a timestamp with the date and time of its creation/modification? If themodifiedcolumn does not yet exist it will be created. DefaultTRUE.stamp_modified_by: Logical vector of length 1; should themodifiedBycolumn of any newly created or modified row include the name of thecurrent user? If themodifiedBycolumn does not yet exist it will be created; note thatthis is a non-DWC standard column, so"modifiedBy"is required inextra_cols. The current user can be specified with theuser_nameoption. DefaultFALSE.stamp_modified_by_id: Logical vector of length 1; should themodifiedByIDcolumn of any newly created or modified row include the ID of thecurrent user? If themodifiedByIDcolumn does not yet exist it will be created; note thatthis is a non-DWC standard column, so"modifiedByID"is required inextra_cols. The current user ID can be specified with theuser_idoption. DefaultFALSE.taxon_id_length: Numeric vector of length 1; how many characters should be included in automatically generated values of taxonID? Must be between 1 and 32, inclusive. Default32.
General arguments
quiet: Logical vector of length 1; should warnings be silenced? DefaultFALSE.strict: Logical vector of length 1; should taxonomic checks be run on the updated taxonomic database? DefaultFALSE.user_name: Character vector of length 1; the name of the current user. Default"".user_id: Character vector of length 1; the ID of the current user. Default"".
Value
Nothing; used for its side-effect.
Examples
# Show all optionsdct_options()# Store existing settings, including any changes made by the userold_settings <- dct_options()# View one optiondct_options()$valid_tax_status# Change one optiondct_options(valid_tax_status = "accepted, weird, whatever")dct_options()$valid_tax_status# Reset to default valuesdct_options(reset = TRUE)dct_options()$valid_tax_status# Multiple options may also be set at oncedct_options(check_taxon_id = FALSE, check_status_diff = TRUE)# Reset options to those before this example was rundo.call(dct_options, old_settings)Darwin Core Taxon terms
Description
A table of valid Darwin Core terms. Only terms in the Taxon class or at therecord-level are included.
Usage
dct_termsFormat
Dataframe (tibble), including two columns:
group: Darwin Core term group; either "taxon" (terms in the Taxon class)or "record-level" (terms that are generic in that they might applyto any type of record in a dataset.)term: Darwin Core term
with two additional attributes:
retrieved: Date the terms were obtainedurl: URL from which the terms were obtained
Details
Modified from data downloaded fromTDWG Darwin Coreunder theCreative Commons Attribution (CC BY) 4.0license.
Source
https://dwc.tdwg.org/terms/#taxon
Examples
dct_termsValidate a taxonomic database
Description
Runs a series of automated checks on a taxonomic database in Darwin Core(DwC) format.
Usage
dct_validate( tax_dat, check_taxon_id = dct_options()$check_taxon_id, check_tax_status = dct_options()$check_tax_status, check_mapping_accepted = dct_options()$check_mapping_accepted, check_mapping_parent = dct_options()$check_mapping_parent, check_mapping_parent_accepted = dct_options()$check_mapping_parent_accepted, check_mapping_original = dct_options()$check_mapping_original, check_mapping_accepted_status = dct_options()$check_mapping_accepted_status, check_sci_name = dct_options()$check_sci_name, check_status_diff = dct_options()$check_status_diff, check_col_names = dct_options()$check_col_names, valid_tax_status = dct_options()$valid_tax_status, extra_cols = dct_options()$extra_cols, on_success = dct_options()$on_success, on_fail = dct_options()$on_fail, skip_missing_cols = dct_options()$skip_missing_cols, quiet = dct_options()$quiet)Arguments
tax_dat | Dataframe; taxonomic database in DwC format. |
check_taxon_id | Logical vector of length 1; should all instances of |
check_tax_status | Logical vector of length 1; should all taxonomic names be required to have a valid value for taxonomic status (by default, "accepted", "synonym", or "variant")? Default |
check_mapping_accepted | Logical vector of length 1; should all values of |
check_mapping_parent | Logical vector of length 1; should all values of |
check_mapping_parent_accepted | Logical vector of length 1; should all values of |
check_mapping_original | Logical vector of length 1; should all values of |
check_mapping_accepted_status | Logical vector of length 1; should rules about mapping of variants and synonyms be enforced? Default |
check_sci_name | Logical vector of length 1; should all instances of |
check_status_diff | Logical vector of length 1; should each scientific name be allowed to have only one taxonomic status? Default |
check_col_names | Logical vector of length 1; should all column names be required to be a valid DwC term? Default |
valid_tax_status | Character vector of length 1; valid values for |
extra_cols | Character vector; names of columns that should be allowed beyondthose defined by the DwC taxon standard. Default NULL. Providing column name(s) that are valid DwC taxon column(s) has no effect. |
on_success | Character vector of length 1, either "logical" or "data". Describes what to do if the check passes. Default |
on_fail | Character vector of length 1, either "error" or "summary". Describes what to do if the check fails. Default |
skip_missing_cols | Logical vector of length 1; should checks be silently skipped if any of thecolumns they inspect are missing? Default |
quiet | Logical vector of length 1; should warnings be silenced? Default |
Details
Forcheck_mapping_accepted_status andcheck_status_diff, "accepted","synonym", and "variant" are determined by string matching oftaxonomicStatus; so "provisionally accepted" is counted as "accepted","ambiguous synonym" is counted as "synonym", etc. (case-sensitive).
Forcheck_mapping_accepted_status, the following rules are enforced:
Rows with
taxonomicStatusof "synonym" (synonyms) must have anacceptedNameUsageIDmatching thetaxonIDof an accepted name(taxonomicStatusof "accepted")Rows with
taxonomicStatusof "variant" (orthographic variants) musthave anacceptedNameUsageIDmatching thetaxonIDof an accepted name orsynonym (but not another variant)Rows with
taxonomicStatusof "accepted" must not have any value enteredforacceptedNameUsageIDRows with a value for
acceptedNameUsageIDmust have a valid value fortaxonomicStatus.
Default settings of all arguments can be modified withdct_options() (seeExamples).
Most columns are expected to be vectors of class character, but this is notchecked for all columns. Columns (DwC terms) with names including 'ID', forexample 'taxonID', may be character, numeric, or integer.
Value
Depends on the result of the check and on values ofon_fail andon_success:
If the check passes and
on_successis "logical", returnTRUEIf the check passes and
on_successis "data", return the input dataframeIf the check fails and
on_failis "error", return an errorIf the check fails and
on_failis "summary", issue a warning andreturn a dataframe with a summary of the reasons for failure
Examples
# The example dataset dct_filmies is already correctly formatted and passes# validationdct_validate(dct_filmies)# So make some bad data on purpose with a duplicated scientific namebad_dat <- dct_filmiesbad_dat$scientificName[1] <- bad_dat$scientificName[2]# The incorrectly formatted data won't passtry( dct_validate(bad_dat))# It will pass if we allow duplicated scientific names thoughdct_validate(bad_dat, check_sci_name = FALSE)# Individual checks can also be turned or off with dct_options()# First save the current settings before making any changesold_settings <- dct_options()# Let's allow duplicated scientific names by defaultdct_options(check_sci_name = FALSE)# The data passes validation as before, but we don't have to specify# `check_sci_name = FALSE` in the function calldct_validate(bad_dat)# Reset options to those before this example was rundo.call(dct_options, old_settings)