b. Subdirección Científica, Jardín Botánico de Bogotá"José Celestino Mutis", Bogotá, D.C., Colombia;
c. Independent Researcher, Bogotá, D.C., Colombia;
d. Tropical Diversity Section, Royal Botanic Garden Edinburgh, UK;
e. School of Biological, Earth and Environmental Sciences, University College Cork, Cork, Ireland;
f. Departamento de Química y Biología, Universidad del Norte, Km. 5 Vía Puerto Colombia, Área Metropolitana de Barranquilla 081007, Colombia
Identifying spatial patterns of biodiversity distribution is fundamental to understanding how they were established. They are also essential for designing effective conservation and management strategies. However, it is well known that biodiversity information is incomplete or biased, limiting the generalization of results and the predictive power of models (Feeley and Silman, 2011a; García Márquez et al., 2012; Sousa-Baena et al., 2013; Vargas et al., 2022).
Although researchers are generally aware of the deficiencies of biological data, and different strategies have been developed to reduce their impact (Elith et al., 2006; Syfert et al., 2013; Engemann et al., 2015), many agree on the need to improve data availability (Graham et al., 2004; Feeley and Silman, 2010; Feeley, 2015; Ball-Damerow et al., 2019) and to generate more data through field explorations (Hopkins, 2007; Feeley and Silman, 2011a, 2011b). Museums and biological collections centralise a significant amount of biodiversity data that is currently available through global public online repositories (e.g., GBIF (https://www.gbif.org/), BIEN (https://bien.nceas.ucsb.edu/bien/)). The online information is used for different purposes that include the study of evolutionary process, causes and limitations of species distributions, the response of species to climate change, and designing protected areas (Soberón and Peterson, 2004; Bebber et al., 2010; Feeley and Silman, 2010; Gaira et al., 2011). This availability is possible due to the mass digitalisation of biological collections through standard formats, which can be handled with different information analysis programs. Digitalisation opens up centuries of data and the possibility of studying biodiversity dynamics, evaluating changes through time, and modelling of future scenarios for management, use and conservation purposes (Feeley, 2012; Morueta-Holme et al., 2015; Nualart et al., 2017). However, around 40%–80% of biological records available online or databased are discarded because of taxonomic (e.g., poor determination, undetermined material and nomenclatural issues), geographical (e.g., poor georeference precision or samples assigned to a centroid), or temporal (i.e., without date) data deficiencies (Gueta and Carmel, 2016; Meyer et al., 2016; Daru et al., 2018). This data can be rescued by curatorial work (Feeley and Silman, 2011b), although it is a time-consuming effort. Additionally, not all museum collections are digitised, particularly small and local collections, and much work remains to be done in curating these collections to a high taxonomic standard.
Fieldwork is an alternative way to increase the amount and the accuracy of data. However, this requires investment of financial resources by institutions whose role is to describe biodiversity (e.g., governmental agencies, botanic gardens). Given the limited financial resources for research, it is therefore necessary to analyze where funds could best be invested (O'Connell et al., 2004; Franco et al., 2007; Gardner et al., 2008; Targetti et al., 2014), in order to increase resources with the best return.
Biological collections contain large amounts of data that could be retrieved through curatorial work. Curatorial work has the potential to increase the richness by the discovery of new species (Goodwin et al., 2015), expanding species distributions, increasing the environmental envelop of species (Feeley and Silman, 2011b), categorizing threatened species (Nualart et al., 2017), and/or filling geographical gaps (Daru et al., 2018). At the regional scale, curatorial work has the potential to increase the geographical coverage of collections by filling gaps on the collection information. However, it is not well known how curatorial work impacts biodiversity knowledge compared with fieldwork.
In this paper, we explore how curatorial work of herbarium collections increases biodiversity data quality and quantity in contrast to fieldwork. Using the Flora de Bogotá project as a model, we analyze the change in species richness, spatial coverage, and sample coverage of plant records in Bogotá (capital of Colombia). We evaluate the impact of both curation and fieldwork on increasing the taxonomic and geographical robustness of biodiversity information and highlight their unique contributions.
2. Materials and methods 2.1. Study areaBogotá, the capital of Colombia and the most populated city in the country (ca. 7.5 million people over ca. 1630 km2; http://www.sdp.gov.co/, accessed on 9th Sep 2022), is located in the Colombian Cordillera Oriental between 2510–3780 m elevation (Fig. 1). The climate is typical of tropical mountains with daily temperatures varying between 6 and 22 ℃ and low annual seasonality (Secretaría Distrital de Ambiente, 2007). The rainfall regime is bimodal, with two peaks occurring in April–May and October–November. Bogotá is situated on two physiographic units: a flat area north of the city's urban area, where an enormous lake disappeared 30, 000 years ago (Van der Hammen, 1986), and a mountainous area surrounding the metropolitan area to the east and south which includes the most extensive area of páramo vegetation on Earth (Sumapaz Páramo). Seventy-five per cent of Bogotá's territory is rural, while the remaining 25% is occupied by the urban area, where 80% of the population lives. Urban ecosystems are represented mainly through metropolitan parks and the wetland system associated with the Bogotá River. Meanwhile, natural ecosystems are concentrated in the city's rural areas where the páramo ecosystem predominates, and relicts of Andean and high Andean forests are also found.
2.2. DataThe Flora de Bogotá database is an initiative of the Jardín Botánico de Bogotá to study the plant diversity of the city. The database was established in 2013 to gather information of Bogotá's plant records deposited in herbaria and contains 37, 468 plant records obtained from local and worldwide herbaria. Additionally, 5401 new plant records from fieldwork were made by Jardín Botánico de Bogotá between 2011 and 2016 and are included in the database. The total database consists of 42, 869 plant records (Table 1).
Source | Number of records |
Herbario Nacional Colombiano (COL) | 12, 869 |
Bibliographic review | 10, 966 |
Herbario Jardín Botánico de Bogotá (JBB) | 8671 |
Herbario Forestal Gilberto Emilio Mahecha (UDBC) | 1533 |
Missouri Botanical Garden (MO) | 1186 |
Pontificia Universidad Javeriana (HPUJ) | 1077 |
Instituto de investigación Alexander von Humboldt (FMB) | 544 |
Smithsonian Institution (US) | 452 |
New York Botanical Garden (NY) | 170 |
Jardín Botánico de Bogotá Fieldwork | 5401 |
Total | 42, 869 |
For this study, records with identical collection numbers were screened, leaving only one of each in the database. Plant records with coordinates outside of Bogotá were excluded. This reduced the dataset to 21, 926 plant records that, after curatorial and fieldwork represent 2384 species, 903 genera and 187 families. All specimens of this research are available in open databases (see: Vargas et al., 2022; https://doi.org/10.1093/botlinnean/boac035). The specimens at the Herbario Nacional Colombiano (COL, http://www.biovirtual.unal.edu.co/es/colecciones/search/plants/) and Jardín Botanico de Bogotá (JBB, https://herbario.jbb.gov.co/especimen/simple) have images in high resolution for many of the specimens that support the current manuscript. The list of flora de Bogotá is online (https://florabog.jbb.gov.co/) and the names are supported by specimens that can be checked by users. The flora of Bogotá is also continuously revised and improved. Therefore, the two main datasets used in this manuscript are digitized and available for checking by specialists.
2.3. Data treatmentTo study the effect of curatorial vs. fieldwork in the biodiversity patterns of Bogotá, we analyzed the change in species richness, spatial coverage of plant records and completeness at four different stages of data collection, during the first stage of the Flora de Bogotá project (2012–2016).
(1). Raw dataset: Plant records of the Flora of Bogotá database obtained from herbaria without nomenclatural and coordinate corrections.
(2). Curated dataset: Plant records on the Flora database obtained from herbaria that were corrected for nomenclatural and coordinate metadata. The taxonomic work consisted of revising herbarium specimens and correcting obvious orthographic and spelling errors on the names assigned to plant records, as well as screening for synonymy. The taxon names were standardized using the Catálogo de plantas y líquenes de Colombia (Bernal et al., 2016). Geographical work was advanced on every plant record by correcting and standardizing coordinates. Plant records without coordinates were georeferenced using the locality information from the specimen label, following the point-radius method (Wieczorek et al., 2004). The specimens with insufficient locality data were excluded from the analysis, such as the specimens older than the 1900's, mainly collected by Jose Jeronimo Triana, who reports specimens from "Bogotá province", which was a wider region than the current area of Bogotá city.
(3). Fieldwork dataset: The fieldwork conducted between 2012 and 2016 by Jardín Botánico in different parts of Bogotá to characterize areas without data. The characterization of those areas was conducted by the plant collection in reproductive condition. The plant records were deposited in the Jardín Botánico de Bogotá herbarium and identified by the botanical team to species level. The geographical and taxonomic data was checked to correct for any mistakes.
(4). Total (Curated–fieldwork): Curated dataset in addition to plant records obtained from fieldwork done between 2012 and 2016 by Jardín Botánico in Bogotá city.
2.4. Data analysisFor the analysis, the taxonomic and geographical changes in plant records were used to evaluate the difference in richness species, spatial coverage and sample coverage of the Flora of Bogotá, made through curatorial work compared with fieldwork. To analyze the effect of curatorial work (made in the database and herbarium specimens) and fieldwork to the Flora of Bogotá data quality, we evaluated the changes between 2012 and 2016. We conducted the taxonomic and spatial analysis for each dataset and compared differences between datasets to evaluate the improvement of data through both curatorial and fieldwork.
The analysis was conducted at two levels.
2.4.1. TaxonomicTo understand the change in taxonomic quality through curatorial and fieldwork, we calculated the number of taxon names and plant records by species at every data stage. We compared between stages using non-parametric tests (Kruskal–Wallis and the Wilcoxon test) in R 3.6.1 (R Development Core Team, 2019).
2.4.2. Spatial and sample coverageIn order to understand the geographical contribution of curatorial and fieldwork, we created a grid of cell size 1 km by 1 km (1 × 1) over the city. We analyzed the change in spatial coverage (number of grid cells with plant records), density records (number of plant records by grid cell of 1 km × 1 km), richness observed (number of species observed by grid cell of 1 km × 1 km), and sample coverage in the raw, curated and total datasets. The same analysis was conducted at the ecosystem level using the Colombian ecosystem map (Etter, 1998) to delimit Bogotá's ecosystems.
We conducted a spatial coverage analysis to observe the representativity of plant records in Bogotá, calculating it as the proportion of grid cells of 1 km × 1 km with plant records over the total grid cells in Bogotá (1842 grid cells of 1 km × 1 km) at the four stages of the data:
Where N is total number of grid cells in Bogotá and Nr the number of grid cells with plant records at every stage of data treatment.
We also analyzed the effect of curatorial and fieldwork on the record density by km2, as a first step to describing the collection patterns on the territory (Soberón et al., 2007); observed richness and sample coverage for every grid cell. For the sample coverage, rarefaction was calculated in cells with more than 20 plant records. The sample coverage is a measure of sample completeness, giving the proportion of the total number of individuals in a community that belongs to the species represented in the sample (Chao and Jost, 2012). Sample coverage is defined as the total relative abundance of the observed species in the sample, ranging from 0 to 1. Sample completeness was estimated using the iNEXT R package (Hsieh et al., 2016).
Finally, we tested for differences in richness and sample coverage between raw, curated and total datasets using non-parametric tests (Kruskal–Wallis and the Wilcoxon test) using R 3.6.1 (R Develpment Core Team, 2019) and illustrated the results in maps created in QGIS (QGIS Development Team, 2015).
3. Results 3.1. Taxonomic changes following cleaning and fieldworkTaxonomic cleaning decreased the number of taxon names, while fieldwork added new ones to the Flora de Bogotá database. As a result, taxa decreased by 24% for family and species levels, and 7% for genera (Table 2). On the other hand, fieldwork added 83 (3.5% of total species diversity) new names at the species level, most of which were herbs and epiphytes (Table S1).
Raw | Curated | Fieldwork | Total | |
Family | 225 | 187 | 110 | 187 |
Genera | 967 | 904 | 312 | 904 |
Species | 2878 | 2301 | 749 | 2384 |
The curatorial process and fieldwork significantly improved the number of records by species (p > 0.05) (Fig. S1), where the probability of species with less than five plant records decreased from 0.35 in the raw dataset to 0.21 in the curated dataset and 1.9 in the total dataset (curated – fieldwork), respectively (Fig. S1). It is important to note that for 744 species the number of records increased by fieldwork, for 690 species through the curatorial process, for 1157 by either of the two approaches (curatorial or fieldwork), and 277 species by the combined effect of both curatorial and fieldwork. Vaccinium floribundum (Ericaceae) showed the maximum number of plant records (143) and species with more than 100 plant records represented 0.6% (15 species) of the species recorded for Bogotá (Table S2).
3.2. Spatial representationThe spatial distribution of species showed significant differences between datasets (p > 0.05) (Fig. S2), where the probability of species being in one cell decreased from raw to curated and total datasets (curated–fieldwork) (Fig. S2). At the same time, fieldwork added grid cells to 28% of species, while the curatorial process added grid cells to 26% of species. The combination of cleaning and fieldwork added grid cells to 46% of species reported in the Flora database (Table S3). Gaultheria anastomosans (Ericaceae) was the most widely distributed species, recorded in 78 (4.2%) grid cells in Bogotá.
3.3. Spatial and ecosystem changes by curatorial and fieldwork 3.3.1. Grid cells and density recordsThe curatorial work on georeferences of plant records and fieldwork increased the number of plant records with coordinates in the Flora de Bogotá database. The number of plant records with coordinates increased by 77% from raw to total datasets, but the main contribution was due to the curatorial work that added coordinates to 59% of records, while fieldwork only added 18% (Fig. 2). Additionally, curatorial and fieldwork increased the number of cells with plant records from 364 in the raw dataset (19.8% of total grid cells) to 753 (41%) in the total dataset (curated–fieldwork). However, fieldwork only added three new grid cells (0.1%) (Fig. 3).
On the other hand, cells with low-density records predominate at the three stages of data, although georeferenced and fieldwork slightly increased density. The number of grid cells with very low (1–10 plant records by grid cells), low (11–100) and medium (100–500) density increased by 11, 8 and 2%, respectively, from the raw dataset to the total dataset. The increase in density of grid cells with high (501–1000 plant records) and very high (1001–1500 plant records) density records increased, although it was 0.1 and 0.2, respectively (Fig. 4).
At ecosystem level, plant records increased in Bogotá as a result of curatorial and fieldwork. Furthermore, ecosystems that were not represented in the raw dataset were represented by plant records after curatorial and fieldwork. Ecosystem representation was higher after curatorial work (ecosystem mean 57.1%) than through fieldwork (mean 10%) and raw data (mean 32.9%). For instance, plant density records increased in all ecosystems and a lake ecosystem appeared in the floristic record (i.e., La Regadera lake). In contrast, fieldwork increased the density of records in just five ecosystems with only one (i.e., western dry páramo of the city), representing 73% of those plant records (Table 3).
Ecosystem | Ecosystem code | Area (km2) | Raw (%) | Clean (%) | Fieldwork (%) | Total (%) |
Rural areas trasnformed by human activities | II | 266.20 | 1439 (20.4) | 2987 (42.4) | 2614 (37.1) | 7040 (100) |
Humid páramos | 19 | 582.54 | 1034 (22.4) | 3351 (72.7) | 222 (4.8) | 4607 (100) |
Dry Andean forest | 18b | 42.96 | 748 (21.1) | 2795 (78.9) | 0 (0) | 3543 (100) |
Urban Area | U | 316.03 | 423 (17.9) | 1912 (81.0) | 26 (1.1) | 2361 (100) |
Mix agrosystems | C3 | 133.22 | 766 (39.1) | 538 (27.5) | 655 (33.4) | 1959 (100) |
Dry páramos | 20 | 40.31 | 248 (24.8) | 750 (75.2) | 0 (0) | 998 (100) |
Dry Andean forest | 18b | 16.81 | 87 (16.5) | 382 (72.5) | 58 (11.0) | 527 (100) |
Dry páramos | 20 | 15.37 | 102 (24.9) | 9 (2.2) | 298 (72.9) | 409 (100) |
Milky agrosystems | C4 | 62.37 | 47 (28.8) | 116 (71.2) | 0 (0) | 163 (100) |
Humid Andean forest | 18a | 29.66 | 14 (12.5) | 98 (87.5) | 0 (0) | 112 (100) |
Oak Andean forest | 18c | 31.90 | 37 (41.1) | 53 (58.9) | 0 (0) | 90 (100) |
Mix agrosystems | C3 | 10.93 | 54 (98.2) | 1 (1.8) | 0 (0) | 55 (100) |
Humid Andean forest | 18a | 2.49 | 20 (90.9) | 2 (9.1) | 0 (0) | 22 (100) |
Humid Andean forest | 18a | 36.45 | 3 (17.6) | 14 (82.4) | 0 (0) | 17 (100) |
Lake | La | 1.86 | 6 (50) | 6 (50) | 0 (0) | 12 (100) |
Lake | La | 1.86 | 0 (0) | 11 (100) | 0 (0) | 11 (100) |
Humid Andean forest | 18a | 1.53 | 0 (0) | 0 (0) | 0 (0) | 0 (0) |
Overall, 80% of the Flora of Bogotá species are found in just four ecosystems. Two of those (humid páramo and dry high Andean Forest) are natural ecosystems, while the others consist of ecosystems with human intervention around the urban area. The most conserved ecosystems (e.g., cloud and high Andean humid forests; wet páramos) are located in the rural area, some distance from the urban area of Bogotá.
3.4. Richness and completeness changes through curatorial and fieldwork 3.4.1. RichnessCuratorial processes and fieldwork increased the median number of species observed in grid cells; however, there were no significant differences between datasets. While the median of observed richness at the raw dataset was four, the median at the curated dataset and total dataset (curated - fieldwork) was five and six, respectively. 75% of grid cells recorded 11 species at the raw dataset, 15 at curated and 18 at the total dataset. On the other hand, very few grid cells showed a high number of species observed. For example, 4% of grid cells contained more than 50 species in the raw stage, meanwhile, 7% of grid cells in the curated dataset and 9% of grid cells in the total dataset recorded more than 50 species observed (Fig. S3).
3.4.2. Sample coverageThis analysis discarded many grid cells with plant records because of the low sample size, even if the curatorial process and fieldwork added new ones. For example, while the proportion of grid cells with plant records in the raw dataset was 19.8%, the grid cells that reached the threshold (e.g., 20 plant records by cell) for the sample coverage analysis were only 2.8%. The curated dataset had 40.9% of the grid cells with plant records, but in the curated dataset only 8.8% were valid for this analysis. The total dataset had 41% of grid cells with plant records, but only 10.3% were valid for sample coverage. However, the sample coverage values between data stages did not show significant differences (p > 0.05). Median values on the grid cells were 0.22 in raw dataset, 0.21 in curated dataset and 0.24 in the total dataset. The 75% of grid cells showed sample coverage values below 0.5 with a maximum of 0.6 in the raw dataset, while for the curated and total, the maximum values were 0.91 (Fig. 5). Significant differences were found in the grid cells where fieldwork was done (p > 0.05). During the time of the study, only ten grid cells had fieldwork. Nevertheless, those grid cells showed a significant sample coverage increase by fieldwork. In those same grid cells, the sample coverage in the raw and curated datasets did not show significant differences (0.25 each), but in the total dataset (clean–fieldwork) the sample coverage increased to 0.68, showing the input of fieldwork.
At the ecosystem level, the sample coverage mainly increased in the curated dataset with a slight increase in fieldwork dataset. In only one ecosystem (C3), fieldwork's sample coverage increase was higher than in the curated dataset. Although the overall sample coverage of ecosystems was improved in the curated dataset compared to fieldwork, non-significant differences were observed between datasets (p < 0.05) (Fig. S4).
4. DiscussionIn this study, we analyzed the change in magnitude of floristic knowledge and information quality of Bogotá city through both curatorial (nomenclatural cleaning and georeferencing process of plant records) and fieldwork in a time window between 2012 and 2016. These two activities were done simultaneously in the first stage of the flora de Bogotá project. We also evaluate their impact on the taxonomic, geographical, richness and sample coverage aspects of biodiversity information. We found the highest change in the Bogotá's floristic data was due to curatorial processes rather than fieldwork. There was a decrease in alpha diversity through taxonomic curatorial work, because of synonyms or orthographical mistakes in the species names. At the same time, there was an increase in spatial coverage because of geographical curatorial work that increased the number of plant records with georeferences.
4.1. Taxonomic changes resulting from curatorial and fieldworkOur work significantly improved the taxonomic quality of the Flora of Bogotá database, which decreased the number of species names by 24%, by removing synonyms and orthographical errors. The loss of names is not surprising on account of different data sources with distinct curatorial levels, which is also evident in open databases such as GBIF, where much inaccurate and incorrect information is published (Maldonado et al., 2015). Different classification systems were found in the Flora of Bogotá database that refer to the same taxon with a different name (e.g., Compositae = Asteraceae, Palmae = Arecaceae, and Gramineae = Poaceae) that together with the orthographical mistakes, had inflated the alpha diversity of Bogotá. Although checking for nomenclatural issues is the first step in obtaining an accurate list of species for a region, this is not obvious, since local and regional species inventories are full of nomenclatural mistakes, artificially increasing the diversity. For example Cardoso et al. (2017) indicate that for the Amazon basin, almost 7% of species reported were mistakes, because individual species were listed more than once as synonyms and spelling variants. The problem is worsened because much information reported in open databases is not reviewed by experts (Goodwin et al., 2015) and does not utilize up to date nomenclature.
On the other hand, the contribution to Bogotá plant species richness by fieldwork was small (only 3% of new species for the Bogotá species list), compared with the data obtained from collection databases. Several factors contribute to the low rate of new species records obtained by fieldwork. For instance, all the analyzed fieldwork collections were based on biased sampling towards grid cells that were already intensively sampled. Additionally factors related with the low rate of detection of new species in the Bogotá area are also related with collector expertise (Ahrends et al., 2011), the sensitivity of sampling methods that exclude some groups of plants, preference (e.g., preference for angiosperms against ferns or non-vascular plants) (Daru et al., 2018), detectability of plants (e.g., that depend on phenology, life form) (Chen et al., 2009), species density (McCarthy et al., 2013), and sampling bias.
This study found that fieldwork had limited reach compared to the data from collections, which resulted from the combined efforts of multiple explorers and researchers over a long period of time. The data registered in collections reflects contributions from many collectors and the exploration of diverse locations. As a result, improving the data associated with existing herbarium specimens through curatorial work may be more effective in producing a more accurate representation of the flora. This can be achieved by adding new collection sites and enhancing the representation of species' climatic niches (Feeley and Silman, 2011a), as well as expanding geographical coverage. Alternatively, the curated information could assist in planning efficient fieldwork strategies for areas and plant groups with limited data.
4.2. Geographical changes resulting from curatorial and fieldwork 4.2.1. Spatial and environmental changesOur study showed an important increase in the spatial coverage of the Flora through curatorial work. The geographical dimension of biological records has been an important issue since most biological records, especially old ones (e.g., collections before 1990 were GPS was not popular) (Feeley and Silman, 2010) are deposited in collections without coordinates. As a result, many records are discarded from ecological analyses, and these could represent new areas and environmental combinations. Curatorial work allowed the recovery of important floristic information such as ecosystems not previously represented in Bogotá, that were revealed through georeferencing (e.g., La Regadera lake). More of the environmental and climatic spectrum of the Flora of Bogotá not represented by the non-curated raw data, were elucidated through georeferencing (clean dataset). In contrast, fieldwork was carried out in ecosystems and grid cells that had already been sampled before, resulting in a low number of new samples. This finding suggests that there were flaws in the sampling design at the regional scale, as supported by previous information.
4.2.2. Taxonomic perspective changesThe recovery of data from collections could improve species niche information by adding new environmental variables not previously recorded, information that would help to improve species distribution models (Feeley and Silman, 2011a). As expected, our work improved species distribution data. However, we found that the species that increased the number of distinct localities (new species records in new grid cells) by curatorial and fieldwork, are the most common ones. In contrast, for many species, particularly rare ones, new localities were not added after cleaning and fieldwork. It is possible that rare species are not represented in collections due to low detectability of species or collection preferences. However curatorial work could help identify rare species and with this information, focus on targeted fieldwork and add new environment information.
4.2.3. Richness perspectiveAlthough the number of species names in the database decreased as a result of curatorial work, those corrections have improved the taxonomic quality, resulting in a more reliable species list for the city. Taxonomic corrections increased the number of records for some species, increasing their range of geographic distribution. Fieldwork added new species to the Flora checklist, but the increase was low (e.g., 83 species that correspond to 3.5% of the total species diversity). The low rate of new species recorded by fieldwork could be due to an already exhaustive sampling of Bogotá. However, our analysis showed low sampling rates in the grid cells with plant records and a high proportion of grid cells without records (e.g., 59% of grid cells of Bogotá). Spatial sampling bias, collection preferences of some taxonomic groups (e.g., collections made by experts in certain groups that prefer angiosperms to ferns), and collector expertise could explain the low rate of recorded novel species (e.g., new species (undescribed) or new species records (new species for the area)). In our case, we found biased sampling around the urban area of Bogotá city, especially "Cerros Orientales" and some places such as the páramo of Sumapaz (e.g., Laguna de Chisacá and Nazareth). Only in one place, "Páramos de Pasquilla", where the Flora of Bogotá project undertook fieldwork intensively, did the sampling increase significantly, and the observed richness and the completeness increased above 40% with grid cells over 80%.
Locally (e.g., grid cells), curatorial work significantly improved the number of species observed in the grid cells with plant records. On the other hand, 50% of new grid cells were represented by plant records increasing the number of species observed in areas without plant information. However, many grid cells did not suffer changes in density and observed richness, especially those far away from the urban area where the lack of access, and social conflicts (e.g., guerrillas presence) can make it difficult for them to be reached (Negret et al., 2017).
Georeferencing increased the coverage of plant records, the number of species observed and sample coverage at local and regional scales in Bogotá. On the other hand, fieldwork (Fig. 5c) made significant changes at the local scale. For example, in the grid cells where fieldwork was performed, few plant records were recorded at the raw data stage, and few were recovered by curatorial work. After fieldwork, completeness in those grid cells increased significantly, reaching values above 0.8. Sample coverage analysis and richness estimators depend on sample size (Gotelli and Colwell, 2011). Although the number of plant records increased at a regional scale by curatorial work, at the local scale (e.g., grid cells), 80% of grid cells had less than 10 plant records (Fig. 4) and only 20% of grid cells had more than 20 plant records (threshold used to calculate sample coverage). The main changes were observed on the grid cells where fieldwork was undertaken.
Our study fixed taxonomic and geographical issues in the data recovery information that increased richness observed and completeness locally. Given the limited resources to explore the territory, especially in low or middle-income countries, it is essential to carefully invest those scarce resources in order to obtain as much information as possible. Biological collections contain information that recompiles the efforts of several researchers and projects through the years. As many researchers have pointed out physical collections have vast amounts of data that is not useable because of issues in three basic dimensions (taxonomic, geography and time) (Lavoie, 2013; Feeley, 2015; Hortal et al., 2015; Meyer et al., 2016). As we showed in this study, investing in curatorial work (e.g., physical and digital) as the first step of describing biodiversity could unveil those aspects that are necessary to make the use of few resources for biodiversity studies most efficiently. Fieldwork is a crucial activity to study biodiversity but should be targeted to under-collected areas that can be defined by improving collections data through curatorial work.
However, despite the great efforts that have been made with digitalization further improvements could be made that would enhance biodiversity studies. Small herbaria such as that of the Jardín Botánico de Bogotá would also benefit from citizen science contributions. Label digitization, for example, could allow data to be read from home by retirees. At the Jardín Botánico de Bogotá herbarium volunteer work contributes to scanning and photographing the specimen collection, as well as with mounting. We believe that by being involved in these activities, it is possible to inspire new taxonomists who will contribute to enhancing herbarium research. Research could also be assisted by new technologies associated with Artificial Intelligence approaches that can check the consistency of identifications and indicate which specimens are problematic requiring expert review (see for example, Hussein et al., 2022). All of these approaches can contribute to the concept of the 'global meta-herbarium', linking digitized specimens with other digital data (Davis, 2023).
5. ConclusionsCuratorial work of biodiversity collections and fieldwork are not distinct processes, but rather exist in continuous feedback that generates and improves biodiversity knowledge. Therefore, to maximize the scarce resources invested by research organizations and institutions in biodiversity, it is crucial that this circular process continually informs the subsequent steps. From the point of view of research institutes in charge of biodiversity characterization, curatorial work, facilitated by digitalization, is an investment that would offer a large amount of improved data that could be retrieved from biological collections at relatively low cost and requiring little time (Suarez and Tsutsui, 2004; Lavoie, 2013). However, many herbaria have decreased the investment in curators and care for collections decreases every day (Vogel et al., 2017), especially in local and small ones.
In contrast fieldwork requires high investment in personnel, logistics, preparation and time (Suarez and Tsutsui, 2004) with limited capacity to capture new data. Although fieldwork remains essential for acquiring biodiversity information, its significant investment underscores the importance of maintaining constant feedback between curatorial work and fieldwork. This enables better identification of collection areas or taxonomic groups that require further investigation. In order to improve baseline biodiversity information, we advocate for increased investment in curation and maintenance of herbaria, both in terms of trained personnel and infrastructure. This investment is particularly necessary in smaller, local herbaria that have been shown to contain important collections that contribute to a better understanding of the overall distribution of biodiversity (e.g., Marsico et al., 2020; Monfils et al., 2020).
AcknowledgementsWe are grateful to all the collaborators and contributors of Jardín Botánico de Bogotá (Proyecto flora de Bogotá) specially Diego Moreno, manager of Flora de Bogotá database. We would like to thank the group "Genética evolutiva, filogeografía y ecología de biodiversidad Neotropical" and the High-Performance Computing service of the Universidad del Rosario for hosting our PostgreSQL database on their servers. This study would not have been possible without the support of MinCiencias Doctoral funds and the support of Universidad del Rosario. We would also like to thank Iván Jiménez (curator at the Missouri Botanical Garden) for his valuable comments on the document, and to Domingos Cardoso and an anonymous reviewer for their valuable comments to the manuscript. This project was supported by Colciencias Doctoral funding (727–2015) and Universidad del Rosario, through a teaching assistantship and a doctoral grant.
Author contributions
CV, TS, JER, AS conceived and designed the research. CV, MB, MC, BV obtained and processed the plant records. CV, MB analyzed the data. CV, AS wrote the first draft. CV, TS, JER, AS edited a revised version of the manuscript.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.pld.2023.06.003.
Ahrends, A., Rahbek, C., Bulling, M.T., et al., 2011. Conservation and the botanist effect. Biol. Conserv., 144: 131-140. DOI:10.1016/j.biocon.2010.08.008 |
Ball-Damerow, J.E., Brenskelle, L., Barve, N., et al., 2019. Research applications of primary biodiversity databases in the digital age. bioRxiv: 1-26. DOI:10.1101/605071 |
Bebber, D.P., Carine, M. a., Wood, J.R.I., et al., 2010. Herbaria are a major frontier for species discovery. Proc. Natl. Acad. Sci. U.S.A., 107: 22169-22171. DOI:10.1073/pnas.1011841108 |
Bernal, R., Grandstein, R., Celis, M. (Eds.), 2016. Catálogo de plantas y líquenes de Colombia, Catálogo de plantas y líquenes de Colombia. Editorial Universidad Nacional de Colombia, Bogota.
|
Cardoso, D., Särkinen, T., Alexander, S., et al., 2017. Amazon plant diversity revealed by a taxonomically verified species list. Proc. Natl. Acad. Sci. U.S.A., 114: 10695-10700. DOI:10.1073/pnas.1706756114 |
Chao, A., Jost, L., 2012. Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size. Ecology, 93: 2533-2547. DOI:10.1890/11-1952.1 |
Chen, G., Kéry, M., Zhang, J., et al., 2009. Factors affecting detection probability in plant distribution studies. J. Ecol., 97: 1383-1389. DOI:10.1111/j.1365-2745.2009.01560.x |
Daru, B.H., Park, D.S., Primack, R.B., et al., 2018. Widespread sampling biases in herbaria revealed from large-scale digitization. New Phytol., 217: 939-955. DOI:10.1111/nph.14855 |
Elith, J., Graham, C, H., P. Anderson, R., et al., 2006. Novel methods improve prediction of species' distributions from occurrence data. Ecography, 29: 129-151. DOI:10.1111/j.2006.0906-7590.04596.x |
Engemann, K., Enquist, B.J., Sandel, B., et al., 2015. Limited sampling hampers "big data" estimation of species richness in a tropical biodiversity hotspot. Ecol. Evol., 5: 807-820. DOI:10.1002/ece3.1405 |
Etter, A., 1998. Mapa general de ecosistemas de Colombia (1: 2, 000, 000). Instituto Alexander von Humboldt y PNUD, Bogotá.
|
Feeley, K.J., 2015. Are we filling the data void? An assessment of the amount and extent of plant collection records and census data available for tropical South America. PLoS One, 10: 1-17. DOI:10.1371/journal.pone.0125629 |
Feeley, K.J., 2012. Distributional migrations, expansions, and contractions of tropical plant species as revealed in dated herbarium records. Global Change Biol., 18: 1335-1341. DOI:10.1111/j.1365-2486.2011.02602.x |
Feeley, K.J., Silman, M.R., 2011a. Keep collecting: accurate species distribution modelling requires more collections than previously thought. Divers. Distrib., 17: 1132-1140. DOI:10.1111/j.1472-4642.2011.00813.x |
Feeley, K.J., Silman, M.R., 2011b. The data void in modelling current and future distributions of tropical species. Global Change Biol., 17: 626-630. DOI:10.1111/j.1365-2486.2010.02239.x |
Feeley, K.J., Silman, M.R., 2010. Modelling the responses of Andean and Amazonian plant species to climate change: the effects of georeferencing errors and the importance of data filtering. J. Biogeogr., 37: 733-740. DOI:10.1111/j.1365-2699.2009.02240.x |
Franco, A.M.A., Palmeirim, J.M., Sutherland, W.J., 2007. A method for comparing effectiveness of research techniques in conservation and applied ecology. Biol. Conserv., 134: 96-105. DOI:10.1016/j.biocon.2006.08.008 |
Gaira, K.S., Dhar, U., Belwal, O.K., 2011. Potential of herbarium records to sequence phenological pattern: a case study of Aconitum heterophyllum in the Himalaya. Biodivers. Conserv., 20: 2201-2210. DOI:10.1007/s10531-011-0082-4 |
García Márquez, J., Dormann, C., Sommer, J.H., et al., 2012. A methodological framework to quantify the spatial quality of biological databases. Biodivers. Ecol., 4: 25-39. DOI:10.7809/b-e.00057 |
Gardner, T.A., Barlow, J., Araujo, I.S., et al., 2008. The cost-effectiveness of biodiversity surveys in tropical forests. Ecol. Lett., 11: 139-150. DOI:10.1111/j.1461-0248.2007.01133.x |
Goodwin, Z.A., Harris, D.J., Filer, D., et al., 2015. Widespread mistaken identity in tropical plant collections. Curr. Biol., 25: R1066-R1067. DOI:10.1016/j.cub.2015.10.002 |
Gotelli, N.J., Colwell, R.K., 2011. Estimating species richness. In: Biological Diversity. Frontiers in Measurement and Assessment. Oxford University press, New York.
|
Graham, C.H., Ferrier, S., Huettman, F., et al., 2004. New developments in museum-based informatics and applications in biodiversity analysis. Trends Ecol. Evol., 19: 497-503. DOI:10.1016/j.tree.2004.07.006 |
Gueta, T., Carmel, Y., 2016. Quantifying the value of user-level data cleaning for big data: a case study using mammal distribution models. Ecol. Inf., 34: 139-145. DOI:10.1016/j.ecoinf.2016.06.001 |
Hopkins, M.J.G., 2007. Modelling the known and unknown plant biodiversity of the Amazon Basin. J. Biogeogr., 34: 1400-1411. DOI:10.1111/j.1365-2699.2007.01737.x |
Hortal, J., de Bello, F., Diniz-Filho, J.A.F., et al., 2015. Seven shortfalls that beset large-scale knowledge of biodiversity. Annu. Rev. Ecol. Evol. Syst., 46: 523-549. DOI:10.1146/annurev-ecolsys-112414-054400 |
Hsieh, T.C., Ma, K.H., Chao, A., 2016. iNEXT: an R package for rarefaction and extrapolation of species diversity (Hill numbers). Methods Ecol. Evol., 7: 1451-1456. DOI:10.1111/2041–210X.12613 |
Lavoie, C., 2013. Biological collections in an ever changing world: herbaria as tools for biogeographical and environmental studies. Perspect. Plant Ecol. Evol. Syst., 15: 68-76. DOI:10.1016/j.ppees.2012.10.002 |
Maldonado, C., Molina, C.I., Zizka, A., et al., 2015. Estimating species diversity and distribution in the era of Big Data: to what extent can we trust public databases?. Global Ecol. Biogeogr., 24: 973-984. DOI:10.1111/geb.12326 |
Marsico, T.D., Krimmel, E.R., Carter, J.R., et al., 2020. Small herbaria contribute unique biogeographic records to county, locality, and temporal scales. Am. J. Bot., 107: 1577-1587. DOI:10.1002/ajb2.1563 |
McCarthy, M.A., Moore, J.L., Morris, W.K., et al., 2013. The influence of abundance on detectability. Oikos, 122: 717-726. DOI:10.1111/j.1600-0706.2012.20781.x |
Meyer, C., Weigelt, P., Kreft, H., et al., 2016. Multidimensional biases, gaps and uncertainties in global plant occurrence information. Ecol. Lett., 19: 992-1006. DOI:10.1111/ele.12624 |
Monfils, A.K., Krimmel, E.R., Bates, J.M., et al., 2020. Regional collections are an essential component of biodiversity research infrastructure. Bioscience, 70: 1045-1047. DOI:10.1093/biosci/biaa102 |
Morueta-Holme, N., Engemann, K., Sandoval-Acuña, P., et al., 2015. Strong upslope shifts in Chimborazo's vegetation over two centuries since Humboldt. Proc. Natl. Acad. Sci. USA, 112: 12741-12745. DOI:10.1073/pnas.1509938112 |
Negret, P.J., Allan, J., Braczkowski, A., et al., 2017. Need for conservation planning in postconflict Colombia. Conserv. Biol., 31: 499-500. DOI:10.1111/cobi.12935 |
Nualart, N., Ibáñez, N., Soriano, I., et al., 2017. Assessing the relevance of herbarium collections as tools for conservation biology. Bot. Rev., 83: 303-325. DOI:10.1007/s12229-017-9188-z |
O'Connell, A.F., Gilbert, A.T., Hatfield, J.S., 2004. Contribution of natural history collection data to biodiversity assessment in national parks. Conserv. Biol., 18: 1254-1261. DOI:10.1111/j.1523-1739.2004.00336.x |
QGIS Development Team, 2015. QGIS Geographic Information System, Open Source Geospatial Foundation Project, version 3.8.0.
|
R Develpment Core Team, 2019. R: A Language and Environment for Statistical Computing (Version 3.6.1). http://www.r-project.org/.
|
Secretaría Distrital de Ambiente., 2007. Atlas ambiental de Bogota DC. Imprenta Nacional de Colombia. Bogota (Colombia).
|
Soberón, J., Jiménez, R., Golubov, J., et al., 2007. Assessing completeness of biodiversity databases at different spatial scales. Ecography, 30: 152-160. DOI:10.1111/j.0906-7590.2007.04627.x |
Soberón, J., Peterson, A.T., 2004. Biodiversity informatics: managing and applying primary biodiversity data. Philos. Trans. R. Soc. B-Biol. Sci., 359: 689-698. DOI:10.1098/rstb.2003.1439 |
Sousa-Baena, M.S., Couto, L., Townsend, A., 2013. Completeness of digital accessible knowledge of the plants of Brazil and priorities for survey and inventory. Divers. Distrib., 20: 1-13. DOI:10.1111/ddi.12136 |
Suarez, A.V., Tsutsui, N.D., 2004. The value of museum collections for research and society. Bioscience, 54: 66-74. |
Syfert, M.M., Smith, M.J., Coomes, D.A., 2013. The effects of sampling bias and model complexity on the predictive performance of MaxEnt species distribution models. PLoS One, 8. DOI:10.1371/journal.pone.0055158 |
Targetti, S., Herzog, F., Geijzendorffer, I.R., et al., 2014. Estimating the cost of different strategies for measuring farmland biodiversity: evidence from a Europe-wide field evaluation. Ecol. Indicat., 45: 434-443. DOI:10.1016/j.ecolind.2014.04.050 |
Van der Hammen, T., 1986. La Sabana de Bogotá y su lago en el Pleniglacial Medio. Caldasia, 15: 249-262. |
Vargas, C.A., Bottin, M., Särkinen, T., et al., 2022. Environmental and geographical biases in plant specimen data from the Colombian Andes. Bot. J. Linn. Soc., 200: 451-464. DOI:10.1093/botlinnean/boac035 |
Vogel, C., Bordignon, S.A. de L., Trevisan, R., et al., 2017. Implications of poor taxonomy in conservation. J. Nat. Conserv., 36: 10-13. DOI:10.1016/j.jnc.2017.01.003 |
Wieczorek, J., Guo, Q., Hijmans, R.J., 2004. The point-radius method for georeferencing locality descriptions and calculating associated uncertainty. Int. J. Geogr. Inf. Sci., 18: 745-767. DOI:10.1080/13658810412331280211 |