The R packageforcis is an interface to theFORCIS databaseon global foraminifera distribution (Chaabane et al. 2023). Thisdatabase includes data on living planktonic foraminifera diversity anddistribution in the global oceans from 1910 until 2018 collected usingplankton tows, continuous plankton recorder, sediment traps and planktonpump from the global ocean.
This package has been developed for researchers interested in workingwith the FORCIS database, even without advanced R skills. It providesbasic functions to facilitate the handling of this large database,including functions to download, select, filter, homogenize, andvisualize the data. It also enables users to explore the spatialdistribution and temporal evolution of planktonic foraminifera.
This vignette is an overview of the main features of the package.
To install theforcis package, run:
## Install < remotes > package (if not already installed) ----if (!requireNamespace("remotes",quietly =TRUE)) {install.packages("remotes")}## Install dev version of < forcis > from GitHub ----remotes::install_github("FRBCesab/forcis")The
forcispackage depends on thesfpackage whichrequires some spatial system libraries (GDAL and PROJ). Please readthispage if you have any trouble to installforcis.
Now let’s attach the required packages.
The FORCIS database consists of a collection of fivecsvfiles hosted onZenodo. Thesecsv are regularly updated and we recommend to use thelatest version
Let’s download the latest version of the FORCIS database withdownload_forcis_db():
# Create a data/ folder in the current directory ----dir.create("data")# Download latest version of the database ----download_forcis_db(path ="data",version =NULL)By default (i.e. version = NULL), this functiondownloads the latest version of the database. The database is saved indata/forcis-db/version-99/, where99 is theversion number.
N.B. The packageforcis is designed tohandle the versioning of the database on Zenodo. Read theDatabaseversions for more information.
In this vignette, we will use the plankton nets data of the FORCISdatabase. Let’s import the latest release of the data.
# Print data ----net_data#> # A tibble: 2,451 × 86#> data_type cruise_id profile_id sample_id sample_min_depth sample_max_depth#> <chr> <chr> <chr> <chr> <dbl> <dbl>#> 1 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 86#> 2 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 86#> 3 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 118#> 4 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 106#> 5 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 118#> 6 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 86#> 7 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 64#> 8 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 73#> 9 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 83#> 10 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 127#> # ℹ 2,441 more rows#> # ℹ 80 more variables: profile_depth_min <int>, profile_depth_max <dbl>,#> # profile_date_time <chr>, cast_net_op_m2 <dbl>, subsample_id <chr>,#> # sample_segment_length <lgl>, subsample_count_type <chr>,#> # subsample_size_fraction_min <int>, subsample_size_fraction_max <int>,#> # site_lat_start_decimal <dbl>, site_lon_start_decimal <dbl>,#> # sample_volume_filtered <dbl>, …N.B. For this vignette, we use a subset of theplankton nets data, not the whole dataset.
The FORCIS database provides three different taxonomies:
OT: original taxonomy, i.e. the initial list of speciesnames and attributes (e.g., shell pigmentation, coiling direction) asreported in various datasets and studies.VT: validated taxonomy, i.e. a refined version of theoriginal taxonomy that resolves issues of synonymy (different names forthe same taxon) and shifting taxonomic concepts.LT: lumped taxonomy, i.e. a simplified version of thevalidated taxonomy. It merges taxa that are difficult to distinguishacross datasets (morphospecies), ensuring consistency and comparabilityin broader analyses.See the associateddata paper forfurther information.
After importing the data and before going any further, the next stepinvolves choosing the taxonomic level for the analyses.This ismandatory to avoid duplicated records.
Let’s use the functionselect_taxonomy() to select theVT taxonomy (validated taxonomy):
# Select taxonomy ----net_data_vt<- net_data|>select_taxonomy(taxonomy ="VT")net_data_vt#> # A tibble: 2,451 × 80#> data_type cruise_id profile_id sample_id sample_min_depth sample_max_depth#> <chr> <chr> <chr> <chr> <dbl> <dbl>#> 1 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 86#> 2 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 86#> 3 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 118#> 4 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 106#> 5 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 118#> 6 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 86#> 7 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 64#> 8 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 73#> 9 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 83#> 10 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 127#> # ℹ 2,441 more rows#> # ℹ 74 more variables: profile_depth_min <int>, profile_depth_max <dbl>,#> # profile_date_time <chr>, cast_net_op_m2 <dbl>, subsample_id <chr>,#> # sample_segment_length <lgl>, subsample_count_type <chr>,#> # subsample_size_fraction_min <int>, subsample_size_fraction_max <int>,#> # site_lat_start_decimal <dbl>, site_lon_start_decimal <dbl>,#> # sample_volume_filtered <dbl>, …This function has removed species columns associated with othertaxonomies.
At this stage user can choose what he/she wants to do with thiscleaned dataset. In the next sections, we present some use cases.
In this first use case, we want to have an overview of our data.
# How many subsamples do we have? ----nrow(net_data_vt)#> [1] 2451# How many species have been sampled? ----net_data_vt|>get_species_names()|>length()#> [1] 56We can use theplot_record_by_year() function to displaythe number of samples per year.
Theplot_record_by_month() andplot_record_by_season() are also available to displaysamples at different temporal resolutions.
Let’s use theggmap_data() function to get an idea ofthe spatial extent of these data.
TheDatavisualization vignette provides a complete description of allplotting functions available inforcis.
In this second use case we want to answer the following question:
What is the distribution of the planktonic foraminifera speciesNeogloboquadrina pachyderma between 1970 and 2000 in theMediterranean Sea?
We can divide the problem into different stages:
# Filter data by species ----net_data_vt_pachyderma<- net_data_vt|>filter_by_species(species = sp_name)net_data_vt_pachyderma#> # A tibble: 2,451 × 25#> data_type cruise_id profile_id sample_id sample_min_depth sample_max_depth#> <chr> <chr> <chr> <chr> <dbl> <dbl>#> 1 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 86#> 2 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 86#> 3 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 118#> 4 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 106#> 5 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 118#> 6 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 86#> 7 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 64#> 8 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 73#> 9 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 83#> 10 Net ATLANTIS_II… ATLANTIS_… ATLANTIS… 0 127#> # ℹ 2,441 more rows#> # ℹ 19 more variables: profile_depth_min <int>, profile_depth_max <dbl>,#> # profile_date_time <chr>, cast_net_op_m2 <dbl>, subsample_id <chr>,#> # sample_segment_length <lgl>, subsample_count_type <chr>,#> # subsample_size_fraction_min <int>, subsample_size_fraction_max <int>,#> # site_lat_start_decimal <dbl>, site_lon_start_decimal <dbl>,#> # sample_volume_filtered <dbl>, …# Remove empty samples for N. pachyderma ----net_data_vt_pachyderma<- net_data_vt_pachyderma|> dplyr::filter(n_pachyderma_VT>0)net_data_vt_pachyderma#> # A tibble: 823 × 25#> data_type cruise_id profile_id sample_id sample_min_depth sample_max_depth#> <chr> <chr> <chr> <chr> <dbl> <dbl>#> 1 Net NIOP-C1 NIOP-C1_309-… NIOP-C1_… 28.3 49.5#> 2 Net NIOP-C1 NIOP-C1_309-… NIOP-C1_… 8 18.2#> 3 Net NIOP-C1 NIOP-C1_310-… NIOP-C1_… 74.3 99.6#> 4 Net NIOP-C1 NIOP-C1_310-… NIOP-C1_… 48.5 74.3#> 5 Net NIOP-C1 NIOP-C1_310-… NIOP-C1_… 23.3 48.5#> 6 Net NIOP-C1 NIOP-C1_310-… NIOP-C1_… 8.1 23.3#> 7 Net NIOP-C1 NIOP-C1_310-… NIOP-C1_… 298. 498.#> 8 Net NIOP-C1 NIOP-C1_310-… NIOP-C1_… 200. 298.#> 9 Net NIOP-C1 NIOP-C1_310-… NIOP-C1_… 148. 200.#> 10 Net NIOP-C1 NIOP-C1_313-… NIOP-C1_… 74.8 99.6#> # ℹ 813 more rows#> # ℹ 19 more variables: profile_depth_min <int>, profile_depth_max <dbl>,#> # profile_date_time <chr>, cast_net_op_m2 <dbl>, subsample_id <chr>,#> # sample_segment_length <lgl>, subsample_count_type <chr>,#> # subsample_size_fraction_min <int>, subsample_size_fraction_max <int>,#> # site_lat_start_decimal <dbl>, site_lon_start_decimal <dbl>,#> # sample_volume_filtered <dbl>, …# Get the list of ocean names ----get_ocean_names()#> [1] "Arctic Ocean" "Indian Ocean" "Mediterranean Sea"#> [4] "North Atlantic Ocean" "North Pacific Ocean" "South Atlantic Ocean"#> [7] "South Pacific Ocean" "Southern Ocean"# Filter data by ocean ----net_data_vt_pachyderma_7000_med<- net_data_vt_pachyderma_7000|>filter_by_ocean(ocean ="Mediterranean Sea")# Number of records ----nrow(net_data_vt_pachyderma_7000_med)#> [1] 2Finally, we can combine all these steps into one single pipeline:
# Final use case 2 code ----net_data_vt|>filter_by_species(species ="n_pachyderma_VT")|> dplyr::filter(n_pachyderma_VT>0)|>filter_by_year(years =1970:2000)|>filter_by_ocean(ocean ="Mediterranean Sea")|>ggmap_data()TheSelect,reshape, and filter data vignette shows examples to handle FORCISdata.
Additional vignettes are available depending on user wishes:
forcis to compute abundances, concentrations, andfrequenciesforcisChaabane S, De Garidel-Thoron T, Giraud X, Schiebel R, Beaugrand G,Brummer G-J, Casajus N, Greco M, Grigoratou M, Howa H, Jonkers L, KuceraM, Kuroyanagi A, Meilland J, Monteiro F, Mortyn G, Almogi-Labin A, AsahiH, Avnaim-Katav S, Bassinot F, Davis CV, Field DB, Hernández-Almeida I,Herut B, Hosie G, Howard W, Jentzen A, Johns DG, Keigwin L, Kitchener J,Kohfeld KE, Lessa DVO, Manno C, Marchant M, Ofstad S, Ortiz JD, Post A,Rigual-Hernandez A, Rillo MC, Robinson K, Sagawa T, Sierro F, TakahashiKT, Torfstein A, Venancio I, Yamasaki M & Ziveri P (2023) The FORCISdatabase: A global census of planktonic Foraminifera from ocean waters.Scientific Data, 10, 354. DOI:10.1038/s41597-023-02264-2.