Advanced Search
Last updated date: Mar 8, 2022 Views: 1520 Forks: 0
Biodiversity online databases: An applied R protocol to get and curate
spatial and climatic data
Coca-de-la-Iglesia, M.1* Valcárcel, V.1, 2 and G. Medina, N.1,2
1Departamento de Biología, Universidad Autónoma de Madrid (UAM), Madrid, Spain; 2Centro de Investigación en Biodiversidad y Cambio Global (CIBC-UAM), Madrid, Spain; $Current/Present address: Dept/Center, Institution name, City, Country
*For correspondence: marina.coca@inv.uam.es
[Abstract]
Ecological and evolutionary studies often require high quality biodiversity data. This information is easy to access through the many online databases that have compiled biodiversity data from herbaria, museums, and human observations. However, the process to get this information ready for analyses is complex and time-consuming. In this study, we have developed a protocol in R language to process spatial data (download, merge, clean and correct) and extract climatic data, using some genera of the ginseng family (Araliaceae) as example. The protocol provides an automatic way to process spatial and climatic data for numerous taxa independently and from multiple online databases. The script uses GBIF, BIEN and WorldClim as the online data sources, but it is easy to adapt to include other online databases. Also, the script uses genera as the sample unit, but it provides the way to use species as the target. The cleaning process incorporates a filter that removes occurrences outside the natural range of taxa, gardens and other human environments, and erroneous locations and a spatial correction for misplaced occurrences (i.e, occurrences within a distance buffer from the coastal limit). Additionally, each step of the protocol can be run independently. Thus, the protocol can be started on the data cleaning, if the database is already compiled, or on the climatic data extraction, if the database is already parsed. Every line in the R script is commented so that it can also be run by users with little knowledge on R.
Keywords: BIEN, Data cleaning, GBIF, Online biodiversity databases, R language, WorldClim.
[Background] Our knowledge on species distributions is central to biogeographers, but also to phylogenetists and ecologists. Indeed, species ranges are needed to perform phylogenetic climatic reconstructions, species niche characterizations or species distribution models, and address multiple evolutionary questions. However, achieving accurate spatial information on the species distributions requires from good-quality occurrence databases with high geographic coverage that are difficult to gather.
The principal sources of geographical information are field inventories and biodiversity collections (museums and herbaria), for which accessibility was a serious limitation until recently. The digitization effort done during the last decades has facilitated the access to tons of biodiversity data that was previously scattered in different institutions across the world, through online databases such as the Global Biodiversity Information Facility (GBIF; GBIF.org, 2021). As a result, we have now available large amount of biodiversity data, which provides an unprecedented opportunity to take advantage of centuries of naturalist’s observations across the world. However, the use of this valuable information is limited because persistent gaps of (i) knowledge and technical limitations (ii). On the one hand (i), our knowledge on species distribution is still poor, biased or imprecise (Hortal et al., 2007) and this is reflected on the information gathered in biodiversity databases that is not uniform across lineages or across regions. This biases lead to groups of organisms and regions of the world that have scarce information while others concentrate large amount of data (Hortal & Lobo, 2005). On the other hand (ii), the complexity of the process to get online data parsed and ready for analyses is high. For example, it is frequent that online repositories include records with imprecise or erroneous spatial information (such as land organisms falling into the sea) or with outdated taxonomic nomenclature (Soberón & Peterson, 2004). Thus, every study based on online data requires from an initial step of cleaning and parsing to remove or minimize the impact of these sources of uncertainty (persistent gaps of knowledge and technical limitations) on further analyses (Hortal et al., 2007).
In parallel with the international digitization effort done in the last decades, several methodologies and pipelines have been conceived to deal with these sources of uncertainty and simplify the different steps when working with online biodiversity data. Some of the most relevant protocols have been developed in R (R Core Team, 2018) and include geographic, taxonomical or temporal data cleaning (see for example: bRacatus, Arlé et al., 2021; BDcleaner, Jin & Yang, 2020; plantR, Lima et al., 2021; Biogeo, Robertson et al., 2016; SpeciesGeoCorder, Töpel et al., 2016; CoordinateCleaner, Zizka et al., 2019). However, none of them deals with the uncertainty introduced both by the spatial gaps of knowledge and the technical limitations. Also, most of them are focused in one or a few steps of the process. Thus, to complete the process (from the initial download of rough occurrences to the climatic data extraction of the cleaned and parsed spatial database) users need to deal with different protocols, some of which require programming skills or deep R background.
The R protocol that we present here is designed to create reliable databases of species occurrences and climatic data from online repositories. It provides an automatic procedure to deal with the most frequent sources of spatial uncertainty of online biodiversity databases. It also includes an automatic script to run each sample (species, genus, family, etc.) separately, which allows for an easy and fast way to process hierarchical databases. The script also includes a post-processing code to run after the spatial pipeline and extract the climatic data. The protocol describes step-by-step how to download, parse, clean and merge spatial and climatic data from three online databases (Figure 1; GBIF, GBIF.org, 2021; BIEN, Maitner, 2020; and Worldclim, Fick & Hijmans, 2017). However, the protocol can be easily adapted to include any other online biodiversity database that may be of interest. The cleaning steps include how to automatically update nomenclatural information, identify and remove records outside the natural distribution of taxa, records from gardens and other human environments or geographically inaccurate records. To explain the protocol, we used the Asian Palmate Group (AsPG) of Araliaceae as case study, using genera as the sample unit. To speed up the protocol execution process we selected 16 of the AsPG. The selection of genera was done as to display uneven spatial information across genera and across areas of the world (to address the issue derived from gaps of knowledge as source of spatial uncertainty), and that are largely affected by erroneous and misplaced records (to address the issue derived from technical limitations as source of spatial uncertainty).
To summarize, the main advantages of this protocol are that it: (1) can be applied to all groups of organisms (as long as they have information available in GBIF or BIEN databases) and at any taxonomic rank, not only at the species level; (2) provides an automatic way to process hierarchical databases, which is very helpful when studying highly diversified groups (genera with high number of species, families with high number of genera, etc.; (3) provides a complete pipeline from spatial data download (including multiple databases merging) to climatic data extraction; (4) deals with uncertainty coming from technical limitations (such as wrong records), but also with the uncertainty derived from persistent gaps of knowledge (such as spatial biases across different parts of the world and across lineages); (5) provides an easy way of filtering records outside natural range; (5) applies a spatial correction for erroneous occurrences outside the coastal limit; (6) includes independent steps for each part of the process that can be run separately; and (7) can be easily used and modified by any kind of users, from undergraduate students to professors, irrespective of their ability, knowledge or background on R, because it is accompanied by instructions to guide the user.
Equipment
1.Computer with Microsoft® Windows® XP or Mac® OS X® 10.4 operator system or later versions of both.
Software
1.R version 3.5.1 (https://r-project.org/).
Packages: “BIEN“, “countrycode“, “data.table“, “devtools”, “dpyr“, “plyr”, “raster“, “readr“, “rgbif“, “rgdal“, “spocc“, “spThin“, "SEEG-Oxford/seegSDM" and “tidyr”.
2.RStudio version 1.1.456 (https://rstudio.com/products/rstudio/)
The use of RStudio is optional. RStudio is an interface that improves the use of R.
3.Microsoft® Excel® 2016.
4.Any text-editing program capable of exporting files in .txt, like WorldPad, Microsoft® Word®, Notepad++, etc.
Procedure
The R script can be freely downloaded from GitHub (link: https://github.com/NiDEvA/R-protocols.git; Note 1). The pipeline of the procedure coincides with the steps of the R script of this protocol (Figure 1). First, you have to create two working folders, one named “input” (it contains the information needed to run the R script) and another one named “output” (it will contain the resulting files after running the R script). In this protocol we used the genus rank as the sample unit but the script also includes commented lines (those preceded with “#”) with the functions needed if you want to use species as the sample unit. Besides, it can be easily modified to use family or any other higher taxonomic level as the sample unit if needed. Also, we cleaned the data by removing records outside the natural distribution of genera, from gardens and other human environments or geographically inaccurate, but it can also be easily modified to meet any particular data cleaning requirements.
A. Build checklist of taxa native range. It is necessary to know the countries for which the taxa are native. For plants this information can be found in World Checklist of Selected Plant Families (WCSP, Govaerts et al., 2008). WCSP is a database that compile checklists of a 200 seed plant families. The database is updated frequently, each new name published by International Plant Name Index (IPNI; International Plant Names Index, 2020), is reviewed and added to WCSP. Other sources of information on the natural distribution range of other organisms are available in ASM Mammal Diversity Database (https://www.mammaldiversity.org/index.html, Mammal Diversity Database, 2020), Avibase - The World Bird Database (https://avibase.bsc-eoc.org/avibase.jsp, Lepage et al., 2014), Catalogue of Life (https://www.catalogueoflife.org/, Bánki et al., 2022), Checklist of Ferns and Lycophytes of the World (Hassler, 2022a), Global Assessment of Reptile Distributions (http://www.gardinitiative.org/, GARD, 2022), Reptile Database (http://www.reptile-database.org/, Uetz et al., 2021), USDA Plants Database (https://plants.usda.gov, USDA, NRCS, 2022), World Plants (https://www.worldplants.de, Hassler, 2022b). To do so:
1.Create a txt file with the names of all taxa separated by “Enter” in a plain text editor (e.g., Notepad++, BBedit). Save it as “Natural_Distribution_Checklist_TDWG.txt”.
2. Visit the website https://wcsp.science.kew.org/home.do (or the correspondent webpage, see above) and introduce the taxon name in the search engine. WCSP uses two ways to describe the distribution of taxa, one in narrative form and the other one through international codes (Figure 2). The international code used in WCSP is the third level of geographical codes of the Taxonomic Databases Working Group (TDWG, Brummitt, 2001) (Note 2).
3.Copy each code of three capital letters (just the codes, not the numbers that appear at the begin of the country code line) and paste in “Natural_Distribution_Checklist_TDWG.txt” in the same line right after the corresponding taxa name separated by “;”. In some cases, symbols (“?”, “(?)”, “+”, ”†”) or lowercase letters may appear in distribution. According to TDWG, “?” is used when the presence of a taxon in a given area is not certain. If this symbol is used within brackets is because there is no exact location known within a country. When a taxon is extinct or may be extinct in an area the symbol ”†” is placed after the country code. When the country code is not known “+” is used. Lowercase letters for the country code indicate naturalization. For this protocol, we have only used the codes with three capital letters that do not have any symbol. For more information, consult the "about checklist" section on the WCSP website.
4.Repeat steps 2 and 3 until all taxa are completed and save the document in the “input” folder. The format of the resulting txt file should look as in Figure 3. It is advisable to sort the taxa alphabetically in the text file.
B. Create an account in GBIF database.
1.Visit the website https://www.gbif.org/. Click on “Login” located in the upper right corner of the web, and then on “REGISTER” (Figure 4).
2.Fill the “COUNTRY”, “EMAIL”, ”USERNAME”, and ”PASSWORD” fields, click on next and follow the instructions to create the account. It is also possible to create the account through Google, Facebook or Github. Important to remember: do not forget the information filled in the email, username and password, because it will be used later in the R script.
C. Initial preparation in R.
D. Download the occurrence data from online databases (GBIF and BIEN, or the desired database).
Table 1. Necessary arguments of occ_download function from “rgbif“ package.
Argument | Description | To download |
pred | Downloads only the occurrences equal to unique condition | Select “taxon” for “taxonKey” and TRUE for “hasCoordinate” |
pred_not | Downloads only the occurrences not equal to the condition | Select "INTRODUCED ",”INVASIVE”, “MANAGED” and “NATURALISED” for “establishmentMeans” |
pred_in | Downloads the occurrences equal to multiple conditions | Select "taxon.keys" for “taxonKey" |
2. Use the BIEN_occurrence_genus function from R package “BIEN“ to download the records from the BIEN database version 4.1.1 (Note 5). It is necessary to indicate some arguments to start the download (Table 2). We will refer to the resulting dataset as “raw.BIEN.dataset” onwards. If there are no records in "raw.BIEN.dataset" go directly to step E-3 to replace the column names and see the Note 6.
Table 2. Necessary arguments of BIEN_occurrence_genus function from “BIEN“ package. If you use species as sample unit, then you will need to use BIEN_occurrence_species function, replace the argument “genus” by “species” and the remaining arguments stay the same.
Argument | Description | For download |
genus | Name of genus | This argument corresponds with names vector of taxa created in R (taxa.names) |
cultivated | If TRUE, it also returns cultivated occurrences | Select FALSE (is selected for default) |
all.taxonomy | If TRUE, it returns all taxonomic information | Select TRUE (FALSE is selected for default) |
collection.info | If TRUE, it returns additional information about collection and identification | Select TRUE (FALSE is selected for default) |
observation.type | If TRUE, it returns information on type of observation | Select TRUE (FALSE is selected for default) |
political.boundaries | If TRUE, it returns information on political boundaries | Select TRUE (FALSE is selected for default) |
natives.only | If TRUE, it returns only native species | Select TRUE (is selected for default) |
3. Save the R workspace with the downloaded data as "1_Workspace_Download.RData". It is very useful to save the objects created in the data download. If there is a problem in later steps, this workspace can be load and thus avoid making another download.
E. Unify the format of the downloaded databases and simplify the database by removing unnecessary columns. In order to join the information from the two databases the number of columns and their names have to be identical in “raw.GBIF.list” and “raw.BIEN.dataset”. Note that some columns from GBIF and BIEN have different names and yet contain the same information. In those cases, it is necessary to rename the columns (see below). Columns with information that will not be used in further analysis can be removed in this step also.
Table 3. Equivalences between information of GBIF and BIEN simple datasets needed for merging datasets. Names of selected columns of “simple.GBIF.dataset” and “simple.BIEN.dataset”, and their corresponding name in the merged dataset.
GBIF | BIEN | Merged dataset |
ID_Originin (new) | ID_Originin | ID_Originin |
Data_Origin (new) | Data_Origin | Data_Origin |
genus | scrubbed_genus | Genus |
species | scrubbed_species_binomial | Spp |
scientificName | verbatim_scientific_name | Scientific_name |
decimalLatitude | latitude | Longitude |
decimalLongitude | longitude | Latitude |
elevation | Not available (Later created as “elevation”) | Elevation |
countryName (new) | country | Country_Name |
countryCode | Not available (Later created as “country_code”) | Country_code |
locality | locality | Locality |
eventDate | date_collected | Date |
institutionCode | datasource | Institution_code |
collectionCode | collection_code | Collection_code |
catalogNumber | catalog_number | Catalog_number |
basisOfRecord | observation_type | Basis_of_Record |
5. Save “merged.dataset“ as csv file named “2_merged_dataset.csv”. This file contains all the simplified GBIF and BIEN information. Or only GBIF data, in case on record was downloaded from BIEN.
F. Add occurrences form other sources. This step is only necessary if the data from GBIF and BIEN is incomplete (that is, they do not completely reflect the distribution range of the study case) and the author deems necessary to include other data sources (such as additional online databases, herbaria specimens or citations in the literature) to complete taxa ranges. If this is not the case, skip this step and go to G.
G. Check the "merged.dataset” object. It is necessary to check that the dataset has the correct format.
H. Data cleaning. This step is focused on cleaning the most common errors.
I. Distribution maps. This step is focused on visualizing the global distribution of all taxa together and individual maps of each sample unit after data cleaning.
J. Data thinning. The "spThin" package chooses an occurrence and randomly removes nearby occurrences according to the indicated distance in the buffer. This step is intended to remove the uncertainty when spatial data is unevenly distributed across your dataset and there are certain areas for all or a few sample units that are oversampled. To identify this sampling bias, visually inspect the maps created in step-I. If your sampling bias affects most or all of your sample units, then proceed with the thinning in step J-1-a, if the sampling bias affects only in one or two sample units, then proceed with the thinning in step J-1-b. If you detect sampling bias, the thinning is crucial to minimize errors in further spatial-based analyses such as avoid overestimation in the bioclimatic data and oversampled areas. If there is no bias in your dataset, then you can skip this step and go to the step K.
K. Load bioclimatic variables from WorlClim version 2. This online climatic database contains 19 variables with the average values of 19 parameters that represent precipitation and temperature for the years between 1970 and 2000. There are two ways to obtain these bioclimatic variables. The alternative 1 is shown in step J-1-a, and it is available for all the resolutions available in Wordclim (10, 5, 2.5 minutes and 30 seconds). Alternative 2 is shown in step J-1-b, and it is only available for resolution of 10, 5, 2.5 minutes.
c. Import bioclimatic variables to R. Remove “_” and “.tif” characters in column names.
2. Alternative 2. Download the standard WorldClim Bioclimatic variables directly from R using R package “raster”. This alternative is only for the resolution of 10, 5 and 2.5 minutes. In the argument “res” of “getData” function, indicate 10, 5 or 2.5 for the resolution selected. Thus, if you chose for a minimum threshold of two decimals in step H-3 and select this alternative be aware that you will be losing precision for your climatic analysis.
L. Spatial correction for terrestrial organisms. Despite the removal of all occurrences with inaccurate coordinates, it may happen that some of the occurrences may fall outside the limits of the earth's surface according to the limit of the cartographic base used as template. For terrestrial organisms these occurrences may be wrong (if the distance to the coastal limit is huge) or simply misplaced (if the distance to the coastal limit is small). Because we our ultimate goal is extract climatic data (see step L), we do not want to include wrong occurrences, but it is desirable that we do not loss misplaced records. Thus, we need to check for occurrences out of the Earth’s limit to remove wrong occurrences and apply a spatial correction for misplaced occurrences. If there are no occurrences outside Earth’s limits in your database, go to step M and proceed with climatic data extract. If there are occurrences outside the limits, then identify the occurrences that are between the coastal limit and 5 km from the coastal limit (misplaced occurrences) as established by bioclimatic variable 1 of WorldClim version 2 (same limit for the 19 available bioclimatic variables) and recalculate new coordinates so that the occurrence falls in the nearest climatic cell of the template. Occurrences located more than 5 km from the coastal limit (wrong occurrences) are eliminated (Note 19).
M. Extract climatic data from bioclimatic variable layers.
N. Visualize and export the final dataset as “6_Final_dataset.csv”. Save the final workspace as” 4_Workspace_Final_Data.RData”.
Notes
Acknowledgments
This protocol was derived for the publication in pre print Coca-de-la-Iglesia et al. (2021), currently under review in American Journal of Botany. We also indebted to the people who are part of the Writing Workshop developed by the Biology and Ecology Departments of the Universidad Autónoma de Madrid, for all the comments and discussions that have helped to realize this work, specially to I. Ramos for helping us to correct code errors. This study was supported by the Spanish Ministry of Economy, Industry and Competitiveness 607 [CGL2017-87198-P] and the Spanish Ministry of Science an Innovation [PID2019-106840GA-608 C22]. M. Coca de la Iglesia was supported by the Youth Employment Initiative of European 609 Social Fund and Community of Madrid [PEJ-2017-AI-AMB-6636 and CAM_2020_PEJD-610 2019-11 PRE/AMB-15871].
Competing interests
We declare no competing interests.
References
Aiello-Lammens, M. E., Boria, R. A., Radosavljevic, A., Vilela, B., Anderson, R. P., Bjornson, R., & Weston, S. (2019). spThin: Functions for Spatial Thinning of Species Occurrence Records for Use in Ecological Models (0.2.0) [Computer software]. Website https://CRAN.R-project.org/package=spThin
Arel-Bundock, V., Enevoldsen, N., & Yetman, C. (2018). countrycode: An R package to convert country names and country codes. Journal of Open Source Software, 3(28): 848. https://doi.org/10.21105/joss.00848
Arlé, E., Zizka, A., Keil, P., Winter, M., Essl, F., Knight, T., Weigelt, P., Jiménez‐Muñoz, M., & Meyer, C. (2021). bRacatus: A method to estimate the accuracy and biogeographical status of georeferenced biological data. Methods in Ecology and Evolution, 12(9): 1609–1619. https://doi.org/10.1111/2041-210X.13629
ALA. (2021). Atlas of Living Australia – Open access to Australia’s biodiversity data. Website https://www.ala.org.au/ [accessed 7 October 2021].
Bánki, O., Roskov, Y., Döring, M., Ower, G., Vandepitte, L., Hobern, D., Remsen, D., Schalk, P., DeWalt, R. E., Keping, M., Miller, J., Orrell, T., Aalbu, R., Adlard, R., Adriaenssens, E., Aedo, C., Aescht, E., Akkari, N., Alonso-Zarazaga, M. A., et al. (2022). Catalogue of Life Checklist (Version 2022-01-14). Catalogue of Life. Website https://doi.org/10.48580/d4tp [accessed January 2021].
Berkeley Ecoinformatics Engine. (2021). Website https://ecoengine.berkeley.edu/ [accessed 7 October 2021].
BISON. (2021). Biodiversity Information Serving Our Nation. Website https://bison.usgs.gov/ [accessed 7 October 2021].
Bivand, R., Keitt, T., Rowlingson, B., Pebesma, E., Sumner, M., Hijmans, R., Rouault, E., Warmerdam, F., Ooms, J., & Rundel, C. (2020). rgdal: Bindings for the ‘Geospatial’ Data Abstraction Library (1.5-8) [Computer software]. Website https://CRAN.R-project.org/package=rgdal
Brummitt, R. K. (2001). World Geographical Scheme for Recording Plant Distributions. https://web.archive.org/web/20160125135239/http:/www.nhm.ac.uk/hosted_sites/tdwg/TDWG_geo2.pdf
Chamberlain, S., Oldoni, D., Barve, V., Desmet, P., Geffert, L., Mcglinn, D., & Ram, K. (2020a). rgbif: Interface to the Global ‘Biodiversity’ Information Facility API (2.3) [Computer software]. Website https://CRAN.R-project.org/package=rgbif
Chamberlain, S., Ram, K., Hart, T., & rOpenSci. (2020b). spocc: Interface to Species Occurrence Data Sources (1.0.8) [Computer software]. Website https://CRAN.R-project.org/package=spocc
Coca-de-la-Iglesia, M., Medina, N. G., Wen, J., & Valcárcel, V. (2021). Tropical-temperate dichotomy falls apart in the Asian Palmate Group of Araliaceae [Preprint]. bioRxiv. https://doi.org/10.1101/2021.10.20.465102
Dowle, M., Srinivasan, A., Gorecki, J., Chirico, M., Stetsenko, P., Short, T., Lianoglou, S., Antonyan, E., Bonsch, M., Parsonage, H., Ritchie, S., Ren, K., Tan, X., Saporta, R., Seiskari, O., Dong, X., Lang, M., Iwasaki, W., Wenchel, S., Vaughan, D. (2019). data.table: Extension of ‘data.frame’ (1.12.8) [Computer software]. Website https://CRAN.R-project.org/package=data.table
Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: New 1‐km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37(12): 4302–4315. https://doi.org/10.1002/joc.5086
GARD. (2022). Global Assessment of Reptile Distributions. Website http://www.gardinitiative.org/data.html [accessed 10 February 2022]
GBIF.org. (2021). GBIF: The Global Biodiversity Information Facility. Website https://www.gbif.org/ [accessed 14 July 2021].
Golding, N., & Shearer, F. (2021). SeegSDM: Streamlined Functions for Species Distribution Modelling in the SEEG Research Group [HTML]. spatial ecology and epidemiology group. Website https://github.com/SEEG-Oxford/seegSDM (Original work published 2013)
Govaerts, R., Dransfield, J., Zona, S., Hodel, D. R., & Henderson, A. (2008). World Checklist of Selected Plant Families: Royal Botanic Gardens, Kew. Published on the Internet; http://wcsp.science.kew.org/ Retrieved. Website https://wcsp.science.kew.org/home.do
Grassle, F. (2000). The Ocean Biogeographic Information System (OBIS): An On-line, Worldwide Atlas for Accessing, Modeling and Mapping Marine Biological Data in a Multidimensional Geographic Context. Oceanography, 13(3): 5–7. https://doi.org/10.5670/oceanog.2000.01
Hassler, M. (2022a). Checklist of Ferns and Lycophytes of the World. Website http://www.catalogueoflife.org/annual-checklist/2018/details/database/id/140 [accessed 9 February 2022]
Hassler, M. (2022b). World Plants. Synonymic Checklist and Distribution of the World Flora (12.9) [Computer software]. Website www.worldplants.de
Hijmans, R. J., Etten, J. van, Sumner, M., Cheng, J., Bevan, A., Bivand, R., Busetto, L., Canty, M., Forrest, D., Ghosh, A., Golicher, D., Gray, J., Greenberg, J. A., Hiemstra, P., Hingee, K., Geosciences, I. for M. A., Karney, C., Mattiuzzi, M., Mosher, S., Wueest, R. (2020). raster: Geographic Data Analysis and Modeling (3.1-5) [Computer software]. Website https://CRAN.R-project.org/package=raster
Hortal, J., & Lobo, J. M. (2005). An ED-based Protocol for Optimal Sampling of Biodiversity. Biodiversity and Conservation, 14(12): 2913–2947. https://doi.org/10.1007/s10531-004-0224-z
Hortal, J., Lobo, J. M., & Jiménez-Valverde, A. (2007). Limitations of Biodiversity Databases: Case Study on Seed-Plant Diversity in Tenerife, Canary Islands. Conservation Biology, 21(3): 853–863. https://doi.org/10.1111/j.1523-1739.2007.00686.x
iDigBio. (2021). Integrated digitized biocollections. IDigBio. Website https://www.idigbio.org/home [accessed 7 October 2021]
iNaturalist. (2021). iNaturalist. Website https://www.inaturalist.org/ [accessed 15 July 2021]
IPNI. (2020). International Plant Names Index. Website https://www.ipni.org/ [accessed 11 September 2020]
Jin, J., & Yang, J. (2020). BDcleaner: A workflow for cleaning taxonomic and geographic errors in occurrence data archived in biodiversity databases. Global Ecology and Conservation, 21: e00852. https://doi.org/10.1016/j.gecco.2019.e00852
Lepage, D., Vaidya, G., & Guralnick, R. (2014). Avibase – a database system for managing and organizing taxonomic concepts. ZooKeys, 420: 117–135. https://doi.org/10.3897/zookeys.420.7089
Lima, R. A. F., Sánchez‐Tapia, A., Mortara, S. R., Steege, H., & Siqueira, M. F. (2021). plantR: An R package and workflow for managing species records from biological collections. Methods in Ecology and Evolution, 2041-210X.13779. https://doi.org/10.1111/2041-210X.13779
Maitner, B. (2020). BIEN: Tools for Accessing the Botanical Information and Ecology Network Database (1.2.4) [Computer software]. Website https://CRAN.R-project.org/package=BIEN
Maitner, B. S., Boyle, B., Casler, N., Condit, R., Donoghue, J., Durán, S. M., Guaderrama, D., Hinchliff, C. E., Jørgensen, P. M., Kraft, N. J. B., McGill, B., Merow, C., Morueta‐Holme, N., Peet, R. K., Sandel, B., Schildhauer, M., Smith, S. A., Svenning, J.-C., Thiers, B., … Enquist, B. J. (2018). The bien r package: A tool to access the Botanical Information and Ecology Network (BIEN) database. Methods in Ecology and Evolution, 9(2): 373–379. https://doi.org/10.1111/2041-210X.12861
Mammal Diversity Database. (2020). Mammal Diversity Database (1.2) [Data set]. Zenodo. https://doi.org/10.5281/ZENODO.4139818
R Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Website https://www.R-project.org/.
Robertson, M. P., Visser, V., & Hui, C. (2016). Biogeo: An R package for assessing and improving data quality of occurrence record datasets. Ecography, 39(4): 394–401. https://doi.org/10.1111/ecog.02118
Soberón, J., & Peterson, T. (2004). Biodiversity informatics: Managing and applying primary biodiversity data. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 359(1444): 689–698. https://doi.org/10.1098/rstb.2003.1439
Sullivan, B. L., Wood, C. L., Iliff, M. J., Bonney, R. E., Fink, D., & Kelling, S. (2009). eBird: A citizen-based bird observation network in the biological sciences. Biological Conservation, 142(10): 2282–2292. https://doi.org/10.1016/j.biocon.2009.05.006
Töpel, M., Zizka, A., Calió, M. F., Scharn, R., Silvestro, D., & Antonelli, A. (2016). SpeciesGeoCoder: Fast Categorization of Species Occurrences for Analyses of Biodiversity, Biogeography, Ecology, and Evolution. Systematic Biology, syw064. https://doi.org/10.1093/sysbio/syw064
Uetz, P., Freed, P., Aguilar, R., & Hošek, J. (2021). The Reptile Database. Website http://www.reptile-database.org
USDA, NRCS. 2022. The PLANTS Database. National Plant Data Team, Greensboro, NC USA. Website http://plants.usda.gov [accessed 2 October 2022
VertNet. (2021). VertNet. Website http://vertnet.org/ [accessed 7 October 2021]
Wickham, H. (2020). plyr: Tools for Splitting, Applying and Combining Data (1.8.6) [Computer software]. Website https://CRAN.R-project.org/package=plyr
Wickham, H., François, R., Henry, L., Müller, K., & RStudio. (2020). dplyr: A Grammar of Data Manipulation (1.0.0) [Computer software]. Website https://CRAN.R-project.org/package=dplyr
Wickham, H., Girlich, M., & RStudio. (2022). tidyr: Tidy Messy Data (1.2.0) [Computer software]. Website https://CRAN.R-project.org/package=tidyr
Wickham, H., Hester, J., Chang, W., & RStudio. (2021). devtools: Tools to Make Developing R Packages Easier (2.4.2) [Computer software]. Website https://CRAN.R-project.org/package=devtools
Wickham, H., Hester, J., Francois, R., R, R. C. T. , J. J. , M. J. (2018). readr: Read Rectangular Text Data (1.3.1) [Computer software]. Website https://CRAN.R-project.org/package=readr
Zizka, A., Silvestro, D., Andermann, T., Azevedo, J., Duarte Ritter, C., Edler, D., Farooq, H., Herdean, A., Ariza, M., Scharn, R., Svantesson, S., Wengström, N., Zizka, V., & Antonelli, A. (2019). CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases. Methods in Ecology and Evolution, 10(5): 744–751. https://doi.org/10.1111/2041-210X.13152
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.
Share
Bluesky
X
Copy link