- Notifications
You must be signed in to change notification settings - Fork10
Check out the vignettes with detailed documentation on each module of the bdc package
License
brunobrr/bdc
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Handle biodiversity data from several different sources is not an easytask. Here, we present theBiodiversityDataCleaning(bdc), an R package to address quality issues and improve thefitness-for-use of biodiversity datasets.bdc contains functions toharmonize and integrate data from different sources following commonstandards and protocols, and implements various tests and tools to flag,document, clean, and correct taxonomic, spatial, and temporal data.
Compared to other available R packages, the main strengths of thebdcpackage are that it brings together available tools – and a series ofnew ones – to assess the quality of different dimensions of biodiversitydata into a single and flexible toolkit. The functions can be applied toa multitude of taxonomic groups, datasets (including regional or localrepositories), countries, or worldwide.
Thebdc toolkit is organized in thematic modules related to differentbiodiversity dimensions.
⚠️ The modules illustrated, andfunctions within,werelinked to form a proposed reproducibleworkflow (seevignettes). However, allfunctionscan also be executed independently.
Standardization and integration of different datasets into a standarddatabase.
bdc_standardize_datasets()Standardization and integration ofdifferent datasets into a new dataset with column names followingDarwin Core terminology
Flagging and removal of invalid or non-interpretable information,followed by data amendments (e.g., correct transposed coordinates andstandardize country names).
bdc_scientificName_empty()Identification of records lacking namesor with names not interpretablebdc_coordinates_empty()Identification of records lackinginformation on latitude or longitudebdc_coordinates_outOfRange()Identification of records without-of-range coordinates (latitude > 90 or -90; longitude >180 or-180)bdc_basisOfRecords_notStandard()Identification of records fromdoubtful sources (e.g., fossil or machine observation) impossible tointerpret and not compatible with Darwin Core recommended vocabularybdc_country_from_coordinates()Derive country name from validgeographic coordinatesbdc_country_standardized()Standardization of country names andretrieve country codebdc_coordinates_transposed()Identification of records withpotentially transposed latitude and longitudebdc_coordinates_country_inconsistent()Identification of coordinatesin other countries or far from a specified distance from the coast ofa reference country (i.e., in the ocean)bdc_coordinates_from_locality()Identification of records lackingcoordinates but with a detailed description of the locality associatewith records from which coordinates can be derived
3.Taxonomy
Cleaning, parsing, and harmonization of scientific names againstmultiple taxonomic references.
bdc_clean_names()Name-checking routines to clean and split ataxonomic name into its binomial and authority componentsbdc_query_names_taxadb()Harmonization of scientific names bycorrecting spelling errors and converting nomenclatural synonyms tocurrently accepted names.bdc_filter_out_names()Function used to filter out records accordingto their taxonomic status present in the column “notes”. For example,to filter only valid accepted names categorized as “accepted”
4.Space
Flagging of erroneous, suspicious, and low-precision geographiccoordinates.
bdc_coordinates_precision()Identification of records with acoordinate precision below a specified number of decimal placesclean_coordinates()(FromCoordinateCleaner package and part ofthe data-cleaning workflow). Identification of potentially problematicgeographic coordinates based on geographic gazetteers and metadata.Include tests for flagging records: around country capitals or countryor province centroids, duplicated, with equal coordinates, aroundbiodiversity institutions, within urban areas, plain zeros in thecoordinates, and suspect geographic outliers
5.Time
Flagging and, whenever possible, correction of inconsistent collectiondate.
bdc_eventDate_empty()Identification of records lacking informationon event date (i.e., when a record was collected or observed)bdc_year_outOfRange()Identification of records with illegitimate orpotentially imprecise collecting year. The year provided can beout-of-range (e.g., in the future) or collected before a specifiedyear supplied by the user (e.g., 1900)bdc_year_from_eventDate()This function extracts four-digit yearfrom unambiguously interpretable collecting dates
Aim to facilitate thedocumentation, visualization, andinterpretation of results of data quality tests the package containsfunctions for documenting the results of the data-cleaning tests,including functions for saving i) records needing further inspection,ii) figures, and iii) data-quality reports.
bdc_create_report()Creation of data-quality reports documenting theresults of data-quality tests and the taxonomic harmonization processbdc_create_figures()Creation of figures (i.e., bar plots and maps)reporting the results of data-quality testsbdc_filter_out_flags()Removal of columns containing the results ofdata quality tests (i.e., column starting with “.”) or other columnsspecifiedbdc_quickmap()Creation of a map of points using ggplot2. Helpful ininspecting the results of data-cleaning testsbdc_summary_col()This function creates or updates the columnsummarizing the results of data quality tests (i.e., the column“.summary”)
install.packages("bdc")library(bdc)
or the development version fromGitHub using:
install.packages("remotes")remotes::install_github("brunobrr/bdc")
Load the package with:
library(bdc)Seebdc package website (https://brunobrr.github.io/bdc/) fordetailed explanation on each module.
If you encounter a clear bug, please file an issuehere. For questions orsuggestion, please send us a email (ribeiro.brr@gmail.com).
Ribeiro, BR; Velazco, SJE; Guidoni-Martins, K; Tessarolo, G; Jardim,Lucas; Bachman, SP; Loyola, R (2022). bdc: A toolkit for standardizing,integrating, and cleaning biodiversity data. Methods in Ecology andEvolution.doi.org/10.1111/2041-210X.13868
About
Check out the vignettes with detailed documentation on each module of the bdc package
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Contributors8
Uh oh!
There was an error while loading.Please reload this page.
