Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Check out the vignettes with detailed documentation on each module of the bdc package

License

NotificationsYou must be signed in to change notification settings

brunobrr/bdc

Repository files navigation

A toolkit for standardizing, integrating, and cleaning biodiversity data

CRAN statusdownloads

R-CMD-checkCodecov test coverageDOILicense

Overview

Handle biodiversity data from several different sources is not an easytask. Here, we present theBiodiversityDataCleaning(bdc), an R package to address quality issues and improve thefitness-for-use of biodiversity datasets.bdc contains functions toharmonize and integrate data from different sources following commonstandards and protocols, and implements various tests and tools to flag,document, clean, and correct taxonomic, spatial, and temporal data.

Compared to other available R packages, the main strengths of thebdcpackage are that it brings together available tools – and a series ofnew ones – to assess the quality of different dimensions of biodiversitydata into a single and flexible toolkit. The functions can be applied toa multitude of taxonomic groups, datasets (including regional or localrepositories), countries, or worldwide.

Structure ofbdc

Thebdc toolkit is organized in thematic modules related to differentbiodiversity dimensions.


⚠️ The modules illustrated, andfunctions within,werelinked to form a proposed reproducibleworkflow (seevignettes). However, allfunctionscan also be executed independently.



Standardization and integration of different datasets into a standarddatabase.

  • bdc_standardize_datasets() Standardization and integration ofdifferent datasets into a new dataset with column names followingDarwin Core terminology

Flagging and removal of invalid or non-interpretable information,followed by data amendments (e.g., correct transposed coordinates andstandardize country names).

  • bdc_scientificName_empty() Identification of records lacking namesor with names not interpretable
  • bdc_coordinates_empty() Identification of records lackinginformation on latitude or longitude
  • bdc_coordinates_outOfRange() Identification of records without-of-range coordinates (latitude > 90 or -90; longitude >180 or-180)
  • bdc_basisOfRecords_notStandard() Identification of records fromdoubtful sources (e.g., fossil or machine observation) impossible tointerpret and not compatible with Darwin Core recommended vocabulary
  • bdc_country_from_coordinates() Derive country name from validgeographic coordinates
  • bdc_country_standardized() Standardization of country names andretrieve country code
  • bdc_coordinates_transposed() Identification of records withpotentially transposed latitude and longitude
  • bdc_coordinates_country_inconsistent() Identification of coordinatesin other countries or far from a specified distance from the coast ofa reference country (i.e., in the ocean)
  • bdc_coordinates_from_locality() Identification of records lackingcoordinates but with a detailed description of the locality associatewith records from which coordinates can be derived

Cleaning, parsing, and harmonization of scientific names againstmultiple taxonomic references.

  • bdc_clean_names() Name-checking routines to clean and split ataxonomic name into its binomial and authority components
  • bdc_query_names_taxadb() Harmonization of scientific names bycorrecting spelling errors and converting nomenclatural synonyms tocurrently accepted names.
  • bdc_filter_out_names() Function used to filter out records accordingto their taxonomic status present in the column “notes”. For example,to filter only valid accepted names categorized as “accepted”

Flagging of erroneous, suspicious, and low-precision geographiccoordinates.

  • bdc_coordinates_precision() Identification of records with acoordinate precision below a specified number of decimal places
  • clean_coordinates() (FromCoordinateCleaner package and part ofthe data-cleaning workflow). Identification of potentially problematicgeographic coordinates based on geographic gazetteers and metadata.Include tests for flagging records: around country capitals or countryor province centroids, duplicated, with equal coordinates, aroundbiodiversity institutions, within urban areas, plain zeros in thecoordinates, and suspect geographic outliers

Flagging and, whenever possible, correction of inconsistent collectiondate.

  • bdc_eventDate_empty() Identification of records lacking informationon event date (i.e., when a record was collected or observed)
  • bdc_year_outOfRange() Identification of records with illegitimate orpotentially imprecise collecting year. The year provided can beout-of-range (e.g., in the future) or collected before a specifiedyear supplied by the user (e.g., 1900)
  • bdc_year_from_eventDate() This function extracts four-digit yearfrom unambiguously interpretable collecting dates

Aim to facilitate thedocumentation, visualization, andinterpretation of results of data quality tests the package containsfunctions for documenting the results of the data-cleaning tests,including functions for saving i) records needing further inspection,ii) figures, and iii) data-quality reports.

  • bdc_create_report() Creation of data-quality reports documenting theresults of data-quality tests and the taxonomic harmonization process
  • bdc_create_figures() Creation of figures (i.e., bar plots and maps)reporting the results of data-quality tests
  • bdc_filter_out_flags() Removal of columns containing the results ofdata quality tests (i.e., column starting with “.”) or other columnsspecified
  • bdc_quickmap() Creation of a map of points using ggplot2. Helpful ininspecting the results of data-cleaning tests
  • bdc_summary_col() This function creates or updates the columnsummarizing the results of data quality tests (i.e., the column“.summary”)

Installation

install.packages("bdc")library(bdc)

or the development version fromGitHub using:

install.packages("remotes")remotes::install_github("brunobrr/bdc")

Load the package with:

library(bdc)

Package website

Seebdc package website (https://brunobrr.github.io/bdc/) fordetailed explanation on each module.

Getting help

If you encounter a clear bug, please file an issuehere. For questions orsuggestion, please send us a email (ribeiro.brr@gmail.com).

Citation

Ribeiro, BR; Velazco, SJE; Guidoni-Martins, K; Tessarolo, G; Jardim,Lucas; Bachman, SP; Loyola, R (2022). bdc: A toolkit for standardizing,integrating, and cleaning biodiversity data. Methods in Ecology andEvolution.doi.org/10.1111/2041-210X.13868

About

Check out the vignettes with detailed documentation on each module of the bdc package

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors8


[8]ページ先頭

©2009-2025 Movatter.jp