Impacts of road networks on the geography of floristic collections in China

Citation

Jingyang He, Wenjing Yang, Qinghui You, Qiwu Hu, Mingyang Cong, Chao Tian, Keping Ma. Impacts of road networks on the geography of floristic collections in China[J]. Plant Diversity, 2025, 47(3): 403-414.

Jingyang He^a,b, Wenjing Yang^a,b^,*, Qinghui You^c, Qiwu Hu^b, Mingyang Cong^c, Chao Tian^a,b, Keping Ma^d

a. Key Laboratory of Poyang Lake Wetland and Watershed Research (Jiangxi Normal University), Ministry of Education, Nanchang 330022, China;
b. School of Geography and Environment, Jiangxi Normal University, Nanchang 330022, China;
c. Key Laboratory of Biodiversity Conservation and Bioresource Utilization of Jiangxi Province, College of Life Sciences, Jiangxi Normal University, Nanchang 330022, China;
d. State Key Laboratory of Vegetation and Environmental Change, Institute of Botany, Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing 100093, China

Received 13 November 2024; Received in revised form 15 February 2025; Accepted 18 February 2025; Available online 21 February 2025

Corresponding author: E-mail address: yangwenjing@jxnu.edu.cn (W. Yang).
Peer review under responsibility of Editorial Office of Plant Diversity.

^* Corresponding author. Key Laboratory of Poyang Lake Wetland and Watershed Research (Jiangxi Normal University), Ministry of Education, Nanchang 330022, China.

Abstract: Biological collections are critical for the understanding of species distributions and for formulating biodiversity conservation strategies. However, biological collections are susceptible to various biases, including the "road-map effect", meaning that the geography of biological collections can be influenced by road networks. Here, using species occurrence records derived from 921, 233 plant specimens, we quantified the intensity of the "road-map effect" on floristic collections of China, and investigated its relationships with various environmental and socio-economic variables. Species occurrence records mainly distributed in major mountain ranges, while lowlands were underrepresented. The distance of species occurrence records to the nearest road decreased from 19.54 km in 1960s to 3.58 km in 2010s. These records showed significant clustering within 5 km and 10 km buffer zones of roads. The road density surrounding these records was significantly higher than that in random patterns. Collectively, our results confirmed a significant "road-map effect" in the floristic collections of China, and this effect has substantially intensified from the 1960s to the 2010s, even after controlling for the impact of road network expansion. Topographic, climatic and socio-economic variables that determine regional species diversity, vegetation cover and human impact on vegetation played crucial roles in predicting the intensity of the "road-map effect". Our findings indicate that biological surveys have become increasingly dependent on road networks, a trend rarely reported in published studies. Future floristic surveys in China should prioritize the lowland areas that have experienced stronger human disturbances, as well as remote areas that may harbor more unique and rare species.

Keywords: Biological sampling Environmental and socio-economic variables Plants Road-map effect Species occurrence records

1. Introduction

Biological collections are a fundamental resource for understanding the temporal and spatial distributions of species in the natural world (Nualart et al., 2017). Since the 14th century, numerous specimens have been collected by scientists and explorers and systematically deposited in museums and herbaria (Bordelon, 2022). Over the past two decades, a portion of these biological collections has been digitized, with their information now accessible through online databases such as the Global Biodiversity Information Facility (Canhos et al., 2004). The accessibility of vast species occurrence data has greatly accelerated the advancement of disciplines related to biodiversity and biogeography (Beck et al., 2013; Zizka et al., 2021).

Nevertheless, species occurrence data are prone to various biases, especially geographical bias (Bowler et al., 2022; Zhang et al., 2024). A number of studies have demonstrated that currently available species occurrence data are not evenly distributed across regions and are heavily influenced by environmental and socio-economic factors (Meyer et al., 2015; Cosentino and Maiorano, 2021). For instance, regions of Ecuador characterized by higher temperatures, lower precipitation, and pronounced seasonality tended to be underrepresented in species occurrence datasets (Loiselle et al., 2008). This is often because these regions typically exhibit lower species diversity, while collectors tend to more often visit environments that are comparatively richer in species. The tropical Andes is one of the most species-rich regions in the world. However, species occurrence records in the northern Andes, particularly in Venezuela and Colombia, remain relatively sparse due to historical internal conflicts, although the situation has been improved in recent years (Vargas et al., 2022). This conflict has made fieldwork extremely risky, leading to minimal collecting efforts in these regions. In addition, the level of economic development also determines the availability of species occurrence records in a region. For instance, more economically developed countries in Africa generally have a greater amount of species occurrence data compared to those developing countries (Reddy and Dávalos, 2003).

An especially prevalent sampling bias is the so-called "road-map effect" (Crisp et al., 2001), which describes how the geographical distribution of biological collections is influenced by road networks (Lituma and Buehler, 2016). For instance, strong correlations have been observed between the density of access routes and the number of species occurrence records across various taxonomic groups in Brazil (Oliveira et al., 2016). A majority of the species occurrence records are concentrated within the 1 km buffer zone of these access routes, known as the roadside bias. Similar patterns have been observed in arid inland regions of Australia, where plant collections are predominantly found near a few main roads (Chapman and Busby, 1994). In Norway, the density of species occurrence records is much lower in remote areas with limited accessibility compared to highly accessible areas (Petersen et al., 2021).

According to published studies, there are generally three different methods to quantify the "road-map effect". The first method measures the distance of species occurrence records to the nearest roads (Parnell et al., 2003). If the number of species occurrence records decreases with increasing distance from the roads, it suggests that there is a significant "road-map effect" in the species occurrence datasets. Another method is to quantify the probability of biological collections falling within a certain buffer zone (usually 5 km) of roads (Zizka et al., 2021). If there is a higher likelihood of species records falling within a given buffer zone when compared to a random distribution, the roadside bias is considered to be statistically significant (Kadmon et al., 2004). The third method involves assessing the accessibility of areas where species occurrence records are located, typically quantified by the road density within the surveyed areas (Yang et al., 2014). Increased collecting efforts in areas with higher accessibility suggest a notable "road-map effect" in biological collections. However, very few studies have simultaneously employed these three methods to quantify the intensity of the "road-map effect" in species occurrence datasets, although each of these methods reveals the influence of road networks on the spatial distribution of species occurrence records from distinct perspectives.

The intensity of the "road-map effect" can vary largely across different regions, being influenced by a combination of various factors including climatic, topographic, and socio-economic variables. For instance, plant occurrence records are concentrated in the wet and mountainous areas of Israel despite the road density being relatively low (Kadmon et al., 2004). In contrast, mammal occurrence records on the island of Borneo display a distinct bias toward urban areas with high human population and road density, attributed to enhanced accommodation and transportation infrastructure (Zizka et al., 2021). It has also been found that collectors tend to stick closer to roads in the areas with low species richness (e.g., desert areas), owing to the limited opportunities for collecting additional species (Crisp et al., 2001). Thus, exploring the influence of various environmental and socio-economic variables on the "road-map effect" would be valuable for predicting the impact of road networks on the distribution of species occurrence records, and facilitating the mitigation of this impact during data utilization.

China is one of the most biodiverse countries in the world, with over 30, 000 native species of vascular plants (Zhuang et al., 2021). The flora of China have been extensively surveyed over the past centuries, and more than 10 million specimens have been collected and deposited in herbaria (Liu et al., 2022b). These collections have served as crucial foundations for understanding the geographical distribution of plant species and formulating biodiversity conservation policies in China (Peng et al., 2021). Yet, floristic collections in China suffer strong geographical bias, and only 9% of all counties have been completely sampled (Yang et al., 2013). The rapid economic development of China in recent decades has led to a substantial expansion of its road networks (Pan et al., 2024), greatly facilitating field surveys and biological collections. However, this expansion could exacerbate the impact of road networks on biological surveys, potentially increasing the dependency of such surveys on roads (Grilo et al., 2010).

In this paper, we utilize ~0.92 million species occurrence records derived from vascular plant specimens with precise collection location information and collected from 1960 to 2020. Our objectives are threefold: (1) to quantify the intensity of the "road-map effect" in the floristic collections of China; (2) to assess whether the intensity of the "road-map effect" varies across different time periods; and (3) to identify the primary environmental and socio-economic factors influencing the effect of road networks on the geography of floristic collections in China.

2. Materials and methods 2.1. Species occurrence data

Species occurrence records, derived from 8, 137, 097 vascular plant specimens, were obtained from the Chinese Virtual Herbarium (CVH, https://www.cvh.ac.cn/). All data were quality-controlled according to the following criteria: (1) duplicate records for the same specimen that may have occurred during the digitization process were removed; (2) only records containing precise information on collection locations and years were retained; (3) records were georeferenced to the most detailed geographical level available, down to the village level and finer granularities. These criteria yielded 921, 233 records collected from 1960 to 2020, representing 27, 838 species, which were used for subsequent analyses.

2.2. Road map data

The mileage of roads in China from 1960 to 2020 was obtained from datasets published by the National Bureau of Statistics of China (Department of Comprehensive Statistics of National Bureau of Statistics, 2010; National Bureau of Statisitics of China, 2009–2020). Each occurrence record was associated with the road network of the corresponding time period. Road maps of China for the years 1962, 1974, and 1986 were digitally converted from published paper maps using ArcGIS 10.2 (China Cartographic Publishing House, 1962, 1974, 1986), while those for the years 1995, 2000 and 2012 were downloaded from the Resource and Environmental Science Data Platform (https://www.resdc.cn/) and the Geographic Data Platform (https://geodata.pku.edu.cn/). These datasets were utilized to represent the road networks of their respective decades. We have not obtained road map data prior to 1960; therefore, data preceding this period were excluded from the analysis.

2.3. Environmental and socio-economic data

Ten socio-economic variables taken over 1960 to 2020 were compiled to investigate the impacts of these factors on "road map effect" intensity, including mean elevation, elevation range, annual mean temperature, mean temperature in the coldest month of the year, potential evapotranspiration, annual precipitation, seasonality of precipitation, Normalized Difference Vegetation Index (NDVI), human population density, and gross domestic product (GDP). NDVI data was only available for the period from 1981 to 2020, whereas human population density and GDP data were available from 1995 to 2020. For analyses conducted prior to these specified periods, the nearest available data were utilized. For example, NDVI data for 1981 were used for analyses involving species occurrence data collected in the 1960s and 1970s. The mean elevation and elevation range within a specified buffer zone (5–20 km) of species occurrence records or within the county where species occurrence records were located, were employed as indicators of accessibility and topographic complexity (Rahbek and Graves, 2001). Climatic variables determine plant species diversity within an area (Qian and Kissling, 2010; Yang et al., 2013), whereas NDVI measures the density of green vegetation cover (Liu et al., 2022a), potentially influencing the appeal of the area to collectors, i.e., areas with higher species richness and superior vegetation cover are likely to be more attractive to collectors (Loiselle et al., 2008). Preferred collection areas can also be influenced by human population density and GDP; higher levels of these factors are likely to provide better infrastructure for field surveys (Reddy and Dávalos, 2003).

Temperature and precipitation data were obtained from WorldClim with a spatial resolution of 30 arc-seconds (https://worldclim.org/data/worldclim21.html). Mean elevation and elevation range were calculated from the GTOPO-30 digital elevation model at a spatial resolution of 30 arc-seconds (U.S. Geological Survey, 1996). All other datasets were sourced from the Resource and Environmental Science Data Platform (https://www.resdc.cn/), with a spatial resolution of 1 km, except for NDVI, which has a resolution of 8 km.

2.4. Quantification of the "road-map effect"

Three different methods were used to quantify the "road-map effect" on plant occurrence data of China. The first method calculated the distance of an occurrence record to the nearest road to assess whether the observed geographical patterns of occurrence record distributions significantly differed from random patterns. An equal number of random points, corresponding to the number of species occurrence records, were generated and distributed randomly across China. The mean distance of random points to the nearest road in each pattern was then calculated. This process was repeated 1000 times, and the 5th percentile of the mean distances was computed. If the observed mean distance to road is smaller than the 5th percentile value, it suggests a significant "road-map effect". The ratio between the observed mean distance and the mean distance derived from random patterns was calculated to assess the extent to which the observed distance deviates from random patterns. A ratio value closer to 0 indicates a greater deviation from randomness and a stronger "road-map effect".

In addition, gamma distributions were applied to model the observed distance data. The shape parameter k in the probability density function of the gamma distribution determines the right skewness of the density curve, employed to indicate the rate at which occurrence record numbers decrease with distance from the road. A higher k value indicates a slower decay rate of occurrence record numbers with distance from the road, suggesting a weaker bias to roadsides (Husak et al., 2007). The value of k was estimated using the "fitdistr" function from the R package "MASS" (Venables and Ripley, 2002), employing maximum likelihood estimation with the scale parameter θ unconstrained (Appendix A).

The second method assessed whether species occurrence records showed significant clustering along roadsides, specifically within designated buffer zones of either 5 or 10 km from the roads. The distances of 5 and 10 km were selected because they can be covered on foot in 1–2 h. Each record could be classified into two categories—either within the buffer zones or outside of them. This classification enables the use of the binomial distribution to evaluate both the significance and degree of clustering of species occurrence records within these buffer zones. The evaluation is grounded in the assumption that all records are uniformly and randomly distributed across the region of interest.

A metric, denoted as P_clustering, was developed to indicate the degree of clustering of records within the buffer zones. For regions that have more records falling within the buffer zones than expected, the value of P_clustering was determined through the application of the following formula:

(1)

where the random variable X follows a binomial distribution with parameters N and p_d. N denotes the total number of species occurrence records in the relevant region, while p_d represents the probability for a given occurrence record falling within a given distance (d, either 5 or 10 km) to the nearest road. The value of p_d is determined as the ratio of the buffer zone area to the total area of the region. The variable n_d indicates the observed number of species occurrence records falling within the buffer zones. The term P(X ≤ n_d–1) represents the cumulative probability of observing n_d–1 or fewer records within the buffer zones. Accordingly, 1–P(X ≤ n_d–1) corresponds to the p-value for the observation of n_d records or more within the buffer zones in a right-tailed test. If the value of P(X ≤ n_d–1) exceeds 0.95, it suggests a significant clustering of records within the buffer zones.

For a region where the number of records within the road buffer zones is equal to the expected number (N × p_d), P_clustering was assigned a value of 0.50. If the number of records within the buffer zones is less than the expected count, the following equation was employed to compute the P_clustering:

(2)

where P(X ≤ n_d) represents the cumulative probability of observing n_d or fewer records within the buffer zones. The P(X ≤ n_d) value corresponds to the p-value for observing n_d or less records within the buffer zones in a left-tailed test. If the value of P(X ≤ n_d) is less than 0.05, it suggests that the records are significantly situated away from road buffer zones.

The value of P_clustering derived from the above equations varied between 0 and 1 (excluding 0 or 1), with higher values indicating a greater degree of record clustering within the buffer zones. The P_clustering value was calculated at two different spatial scales: the national level and the county level. Counties are political units characterized by relatively low environmental and socio-economic heterogeneity. To obtain the p_d values for both national and county levels, buffer zones were created around all roads within the country, and the buffer zone area within each county was measured. The p_d value at the national level was calculated as the ratio of the total buffer zone area across the entire country to the country's overall area. The p_d value for each individual county was determined as the ratio of the buffer zone area within that county to the county's total area.

The third method calculated the road density within designated buffer zones (5 km, 10 km, 20 km, 30 km, 50 km and 60 km, respectively) of each species occurrence record. If the mean road density around the observed occurrence records is higher than that of randomly distributed patterns across the country in 1000 simulations, and it surpasses 95th percentile of the mean road density in those random patterns, it implies that sampling sites with greater accessibility tend to be sampled more frequently (Reddy and Dávalos, 2003).

2.5. Random forest modeling

Random forest models were employed to quantify the effects of various socio-economic and environmental variables on the intensity of "road-map effect" via all three methods. Random forests are a powerful ensemble learning method that combines the strengths of multiple decision trees to create a robust model. One of the key advantages of random forests is their ability to handle complex relationships between predictor variables and response variables, as well as to mitigate overfitting (Valavi et al., 2021). The Akaike information criterion (AIC) was used along with a step-wise backward selection strategy to choose the most parsimonious multi-predictor model (Johnson and Omland, 2004). The random forest regressions provided a measure of variance explained that is equivalent to R² in linear regressions. The relative importance of each predictor variable in the model was calculated by randomly permuting the values of each predictor and observing the resulting change in mean square error from the original variance explained by the model (Van Buskirk and Jansen van Rensburg, 2020). In addition, Pearson correlation coefficients were computed to examine the positive or negative relationships between the response and explanatory variables.

Spatial autocorrelation in response variables and residuals of the random forest models is relatively weak, yet still statistically significant, particularly within distances of less than 300 km, as evidenced by the spatial correlograms and global Moran's I-tests (Fig. S1). This can lead to the overestimation of degrees of freedom and consequently an inflation of Type I errors in the random forest models (Diniz-Filho et al., 2003). Eigenfunction-based methods are effective in handling complex spatial patterns and are commonly utilized to account for spatial autocorrelation in regression analyses (Diniz-Filho and Bini, 2005). The pairwise Euclidean distances among species occurrence records involved in the random forest modeling were computed from their geographical coordinates in the Albers projection. The distance matrix was truncated at a distance of 300 km. Distances exceeding 300 km were substituted with 1200 km (four times the truncation distance), while distances of 300 km or less were preserved as originally calculated (Yang et al., 2014). This truncation distance is important because it gives more weight to short-distance effects after the filtering process. A distance of 300 km was chosen due to the observation of minimal spatial autocorrelation in residuals of non-spatial random forest models beyond this threshold (Fig. S1).

Principal coordinates of neighbor matrices (PCNM) were employed to decompose the spatial structures among species occurrence records using the "pcnm" function in the R package "vegan" (Borcard and Legendre, 2002). Forty eigenvectors with positive eigenvalues were derived to capture the spatial relationships among species occurrence records across various scales. Eigenvector-based spatial filters that contributed the most to the variance explained by the random forest model were subsequently included in the model until spatial autocorrelation in model residuals became insignificant. Random forest modeling was carried out using the "randomForest" function from the R package "randomForest" (van der Maaten et al., 2017; Appendix A).

3. Results 3.1. Road network in China

China has dramatically increased its road network from a mileage of 519, 500 km in 1960 to 5, 198, 100 km in 2020 (Fig. 1A). This growth was relatively slow before 2000, increasing at an average of 21, 388 km per year. The growth accelerated to an average of 124, 364 km per year after 2005. In general, road density tends to be higher in the eastern and southern regions compared to the western and northern regions (Fig. S2). A similar geographical pattern has been observed for the increase in road density over the last decades (Fig. 1B).

Fig. 1 (A) Annual road mileage in China and (B) increase in road density across various counties from 1960 to 2020. (B) The legend uses a quantile classification; the map is an Albers projection. The inset in the bottom right corner show the south boundary of China, including all islands in the South China Sea.

Figure options

3.2. Species occurrence records

The number of species occurrence records that can be georeferenced to the village level and even finer granularities, and that include collection year information, varied substantially across different time periods (Fig. 2). The number of species occurrence records was highest in the 1980s, followed by the 1970s and 1960s, at 276, 054, 223, 194, and 206, 989, respectively. By contrast, the number of species occurrence records was much fewer in the 1990s, 2000s and 2010s (with the 2010s referring to the period from 2010 to 2020), at 82, 114, 58, 202 and 74, 680, respectively. The distribution of species occurrence records showed similar geographical patterns in different time periods (Fig. 3). These records were predominantly concentrated in southern and southwestern China, especially along the major mountain ranges (e.g., Hengduan, Daba and Nanling Mountains). In contrast, they were notably fewer in the northern (e.g., North China Plain) and northwestern regions.

Fig. 2 The number of species occurrence records used in this study across different time periods.

Figure options

Fig. 3 Geographical patterns of species occurrence records used in this study across different time periods. The road networks for each of the time periods are based on data from 1962, 1974, 1986, 1995, 2000, and 2012, respectively.

Figure options

3.3. The intensity of the "road-map effect"

The number of species occurrence records decreased as the distance from roads increased in all time periods (Fig. 4). The mean distance of species occurrence records to the nearest road decreased from 19.54 km in the 1960s to 3.58 km in the 2010s, significantly lower that of random patterns in the corresponding time period (p < 0.001). The ratio of observed mean distance to the mean distance derived from random patterns, decreased from 0.62 in the 1960s to 0.31 in the 2010s (Fig. 5A), suggesting that the distribution of species occurrence records has increasingly deviated from random patterns. The shape parameter (k) in the probability density function of the gamma distribution fitted to the observed distance data declined from 0.97 in the 1960s to 0.40 in the 2010s (Fig. 5B), suggesting that the rate at which record numbers decreased with increasing distance from roads accelerated from the 1960s to the 2010s.

Fig. 4 The histograms illustrate the distances of species occurrence records to the nearest road (in yellow) in different time periods, compared to the mean distances derived from randomly generated patterns (in blue). These patterns are created by randomly and uniformly distributing the same number of points as the species occurrence records across China, and the mean distance from these random points to the nearest road is calculated for each pattern. The process is iterated for 1000 times, resulting in 1000 mean distance values. The red curves represent the gamma fit applied to the observed distance values, with the shape parameter (k) indicating the rate at which the number of species occurrence records decreases as the distance to the road increases. The yellow and blue dashed vertical lines represent the observed mean distance of species occurrence records to the nearest road and the 5th percentile value of the mean distances in the random patterns, respectively.

Figure options

Fig. 5 Trends in the ratios between the observed mean distance of species occurrence records to the nearest road and the mean distance derived from random patterns (A), and the shape parameter (k) of gamma fits for the observed distances (B) from the 1960s to the 2010s (including 2020).

Figure options

At the national level, species occurrence records were disproportionally concentrated within the 5 km and 10 km buffer zones of roads from the 1960s to the 2010s (Fig. 6). The binomial tests indicated a significant clustering of records within the buffer zones, with P_clustering values exceeding 0.99 for all time periods (associated p < 0.01). At the county level, counties exhibiting either significantly more or fewer records within the 5 km and 10 km buffer zones than expected (P_clustering > 0.95 or P_clustering < 0.05, p < 0.05) tended to be located along mountain ranges, such as the Hengduan, Qilian and Daba Mountains (Figs. 7 and S3).

Fig. 6 Proportions of the country's area and species occurrence records located within and outside the 5 km and 10 km buffer zones of roads at the national scale in different time periods.

Figure options

Fig. 7 The P_clustering values for the 5 km buffer zones of roads in different counties of China. The P_clustering value indicates the degree of clustering of species occurrence records within specific buffer zones of roads. Each category of P_clustering value corresponds to particular interpretations of the spatial distribution of records in relation to the buffer zones. P_clustering < 0.05: records are significantly situated away from the road buffer zones (left-tailed binomial test, p < 0.05); 0.05 ≤ P_clustering < 0.50: fewer records are present within the buffer zones than expected, but this difference is not statistically significant (p ≥ 0.05); P_clustering = 0.50: the number of records within the buffer zones is equal to the expected count; 0.50 < P_clustering ≤ 0.95: more records are found within the buffer zones than expected, although this is not statistically significant (p ≥ 0.05); P_clustering > 0.95: there is a significant clustering of species occurrence records within the buffer zones (right-tailed binomial test, p < 0.05). Maps are Albers projections. Insets in the bottom right of each subplot show the south boundary of China, including all islands in the South China Sea.

Figure options

There was an increasing trend in road density surrounding species occurrence records from the 1960s to the 2010s across the 5–60 km buffer zones (Fig. 8). The mean road density within all buffer zones and in all time periods was significantly higher than that of random patterns, indicating that areas with higher road density tended to be more frequently sampled.

Fig. 8 Mean road density within various buffer zones (5–60 km) of species occurrence records in different time periods, compared with road density in random patterns where species occurrence records are randomly distributed over 1000 iterations.

Figure options

3.4. Explanatory power of environmental and socio-economic variables on the intensity of the "road-map effect"

The random forest model explained 61.82% of variance in the distance of species occurrence records to the nearest road (Table 1). Elevation range within the 5 km buffer zone of species occurrence records, human population density, annual precipitation, NDVI, and annual mean temperature were selected in the most parsimonious model for the distance to the nearest road. Spatial autocorrelation was eliminated in the model residuals by including three eigenvector-based spatial filters (Fig. S1). The elevation range was found to be the most important variable in the model (Table 1). Elevation range, annual precipitation and NDVI showed positive relationships with the distance to the nearest road, while the other two explanatory variables were negatively correlated.

Table 1 Results of the most parsimonious random forest models for three measurements of the "road-map effect".

Explanatory variables	Pearson's r^a	Relative importance (%)	Overall R²
Distance to the nearest road^b			0.62
Elevation range^c	0.27	24.25
Human population density	−0.26	21.83
Annual precipitation	0.03	15.04
NDVI	0.19	14.90
Annual mean temperature	−0.19	9.07
3 spatial filters		14.91
P_clustering value^d			0.48
Annual precipitation	−0.14	27.49
NDVI	−0.19	24.78
Human population density	0.03	23.99
Mean elevation^e	0.12	23.74
Road density^f			0.39
Annual precipitation	−0.05	25.79
Mean elevation^g	−0.11	25.33
NDVI	−0.12	25.12
Human population density	0.18	23.76
a The Pearson correlation coefficient between the response and explanatory variables. b The distance of species occurrence records to the nearest road. c Elevation range within the 5 km buffer zone of species occurrence records. d P_clustering value indicates the degree of clustering of species occurrence records within the 5 km buffer zone of roads. e Mean elevation within counties of China. f Road density within the 5 km buffer zone of species occurrence records. g Mean elevation within the 5 km buffer zone of species occurrence records. The parameter "ntree" is configured as 2000, 1450, and 1250 for the three random forest models, respectively.

Table options

The random forest model accounted for 48.42% of the variance in P_clustering for the 5 km buffer zone (Table 1). Annual precipitation, NDVI, human population density, and mean elevation within counties were included in the most parsimonious model for P_clustering. Their relative importance decreased sequentially. Spatial autocorrelation in model residuals was insignificant without including any eigenvector-based spatial filters (Fig. S1). Mean elevation and human population density showed positive relationships with P_clustering, while the other selected explanatory variables exhibited negative relationships (Table 1). Similar results of random forest modeling were obtained for P_clustering within the 10 km buffer zone (Table S1).

The random forest model explained 39.35% of the variance in road density within the 5 km buffer zone of species occurrence records (Table 1). The most parsimonious model for road density included annual precipitation, mean elevation within the 5 km buffer zone of species occurrence records, NDVI, and human population density. Their importance in the model decreased successively. With the exception of human population density, all other selected variables exhibited negative relationships with road density. Similar results were obtained for the road density within the 10 km buffer zone of species occurrence records (Table S1).

4. Discussion 4.1. Road network expansion in China

The road network of China expanded rapidly from 1960 to 2020, with particularly accelerated growth after the year 2005 (Fig. 1A). This is closely linked to the rapid economic development and the acceleration of urbanization in China (Zhou et al., 2022). The road network expansion in China can be divided into two stages. The first stage is from 1960 to 2004, during which the road mileage showed a steady growth (Fig. 1A). In this period, the total road mileage increased from 1, 351, 691 km in 1999 to 1, 679, 848 km in 2000. This considerable growth is due to the inclusion of low-class roads (low-speed minor roads) in the national road census conducted in 2000, which may have been overlooked in earlier road mileage calculations (Ministry of Transport of the People's Republic of China, 2005). The second stage began in 2005, during which the growth rate of road mileage accelerated, which can be attributed to the rapid development of highway network and the implementation of the Village to Village Roads Project (Wong et al., 2017). Notably, between 2004 and 2005, there was a 1, 474, 500 km increase in road mileage. This growth was a result of a modification in statistical criteria from 2005 onwards, now encompassing village roads in the calculation of total road mileage.

Road density has been consistently higher in the eastern regions compared to the western regions since the 1960s (Fig. S2), which is strongly determined by the human population density, i.e., areas with greater population density generally have higher road density (Hu et al., 2018). This spatial pattern has become increasingly prominent in recent decades, with a notably greater increase in road mileage in the eastern and southeastern regions (Fig. 1B), attributed to the rapid economic development in these regions. The rapid expansion of road networks could induce stronger human-caused disturbances to the natural ecosystems, particularly in the eastern and southern regions, which are renowned for their high biodiversity (Huang et al., 2011). Addressing the impact of road expansion on natural ecosystems and biodiversity remains a substantial challenge in the future (Kleinschroth et al., 2019).

4.2. Variation in floristic collecting efforts over the past decades

The number of species occurrence records from the 1960s to the 1980s is higher than that from the 1990s and 2010s, and peaked in the 1980s (Fig. 2). This temporal trend is consistent with the overall trend of the intensity of floristic collections in China, which is largely influenced by national policies (Yang et al., 2013). One peak period for floristic collections in China was from 1958 to 1962, during which extensive plant resource surveys was conducted nationwide with the aim of understanding the distribution of all plant species, especially economic plants (e.g., edible or medicinal plants; Yang et al., 2021). Relatively fewer plant specimens were collected in the 1960s due to the influence of the Cultural Revolution (Chen, 1994). In 1973, the central government initiated the first comprehensive scientific investigation on the Qinghai-Tibet Plateau (1973–1980) with the aim of gathering fundamental information on natural resources, including plant resources. The comprehensive scientific survey of the Hengduan Mountains was conducted from 1980 to 1985, during which a large number of plant specimens were collected.

However, the number of plant specimens has remained relatively low since the 1990s (Liu et al., 2022b), largely due to decreased research funding for traditional taxonomy, as resources have shifted towards disciplines such as cytology and molecular biology. Due to the notable geographical and taxonomic biases in current collecting efforts and the lack of complete species checklists in the majority of counties in China (Yang et al., 2021), it is necessary to continue field surveys and floristic collections in China, especially focusing on regions and species that have been underrepresented in current biological collections.

Floristic collections showed similar geographical distributions in all decades (Fig. 3), with a large proportion of these collections concentrating in the prominent mountain ranges (e.g., Hengduan, Daba and Nanling Mountains). These mountains are renowned for their rich biodiversity (Qian and Kissling, 2010; Wang et al., 2012), making them a preferred destination for collectors seeking to encounter a diverse array of species (Loiselle et al., 2008). In contrast, lowland areas (e.g., North China Plain and Sichuan Basin) are underrepresented in the floristic collections. These areas have experienced stronger human disturbances compared to mountainous areas (Zhang et al., 2022). As a result, biodiversity in lowlands is subjected to greater threats, including biological invasion, the decline of native species, and the homogenization of biological communities (Šipek and Šajna, 2024). It is therefore crucial to enhance biological surveys in these areas to improve our understanding of the impact of human activities on biodiversity. Our findings are different from the global patterns that suggest the availability of species occurrence records primarily determined by proximity to researchers and local research funding (Meyer et al., 2015). This implies that the factors influencing spatial patterns in biological survey efforts might be specific to particular regions or countries.

4.3. "Road-map effect" on floristic collections of China

The mean distance of species occurrence records to the nearest road was significantly shorter than that in random patterns across all time periods (Fig. 4), suggesting that areas closer to roads are more likely to be sampled (Crisp et al., 2001; Petersen et al., 2021). In addition, the distance to the nearest road declined from the 1960s to the 2010s, suggesting that plant specimen collections have been increasingly closer to roads. This is partly due to the gradual densification of road networks, increasing the likelihood of species occurrence records being closer to roads. Furthermore, our results showed that the reduction in the observed mean distance from the 1960s to the 2010s was disproportionally faster than the reduction in the mean distance derived from random patterns (Figs. 4 and 5A). This suggests that collectors have increasingly stayed closer to roads, even after accounting for the influence of road network expansion.

Previous studies have explored the spatial relationships between species occurrence records and road networks, often neglecting the temporal dynamics of these relationships (Freitag et al., 1998; Parnell et al., 2003). In this study, we correlated the records collected in different decades with the corresponding road networks of those decades. This approach facilitated a more accurate assessment of the "road-map effect" and revealed a temporal trend that biological sampling has increasingly relied on road networks, a pattern seldom reported in earlier studies.

Elevation range, human population density, annual precipitation, NDVI, and annual mean temperature are all important in modelling the distance to the nearest road (Table 1). Human population density and annual mean temperature exhibited negative correlations, while the remaining variables demonstrated positive correlations with the distance to the nearest road. This suggests that in mountainous areas well covered with vegetation, collectors tend to stray further away from roads for collecting specimens. Compared to lowland areas with similar climatic conditions, mountainous areas provide diverse habitats for biological organisms and therefore foster greater species diversity (Wang et al., 2012; Badgley et al., 2017). Additionally, the low human population density implies minimal human disturbances. In such areas (e.g., the mountainous areas in southwestern China), collectors are likely to discover a wider range of species (including unique and rare species), and thus motivated to explore the areas farther away from roads (Loiselle et al., 2008).

Mean elevation was positively correlated with the P_clustering values, while annual precipitation and NDVI were negatively correlated with them (Tables 1 and S1). Low precipitation and NDVI indicate low species diversity and sparse vegetation cover (Liu et al., 2022a; Coelho et al., 2023). This suggests that in high elevation areas with low species diversity and vegetation cover, collectors often concentrate their sampling efforts along roadside areas (Crisp et al., 2001; Daru et al., 2018). This is evidenced by the spatial distribution of counties with P_clustering values larger than 0.95 in the relatively dry plateau regions (Figs. 7 and S3). In contrast, counties with P_clustering values less than 0.05 were primarily located along mountain ranges, such as the Daba and Nanling Mountains. These mountainous regions, characterized by relatively high precipitation and NDVI, generally display greater species diversity, which makes them more attractive to collectors, resulting in more surveys being conducted away from roads.

The mean road density within various buffer zones (5–60 km) of species occurrence records was significantly higher than that of random patterns in all time periods (Fig. 8), which is consistent with previous findings that areas with higher accessibility tend to be more frequently sampled (Oliveira et al., 2016). Human population density was positively correlated with road density, while mean elevation is negatively correlated with it. This is likely due to the denser road network in the highly populated and lowland regions of China. Annual precipitation and NDVI were negatively correlated with the road density surrounding species occurrence records, suggesting that collectors tend to sample in easily accessible locations within regions of lower vegetation coverage. Fewer species are expected to be found in these regions, hindering enthusiasm to venture into less accessible locations.

Our findings reveal a significant "road-map effect" in the floristic collections of China, and the intensity of this effect increased from the 1960s to the 2010s. Elevation range, mean elevation, annual precipitation, NDVI, and human population density were identified as crucial factors for predicting the intensity of the "road-map effect" across various regions in China. Adequate rainfall, complex topography, and dense vegetation facilitate greater species diversity (Irl et al., 2015), motivating collectors to explore remote, roadless locations for sampling in such areas (Loiselle et al., 2008). In contrast, collectors tend to prioritize sampling locations near roads in sparsely vegetated mountainous areas, as the rugged terrain makes accessing remote locations difficult. Human population density signifies the degree of disruption caused by human activities (Ai et al., 2024). The positive effect of human population density on the intensity of "road-map effect" suggests that collectors may prefer to sample the areas with more pristine vegetation (Yang et al., 2014).

5. Conclusion

Species occurrence records collected in different time periods showed similar geographical patterns, with a large proportion of these records located along mountain ranges, while the lowland areas are much underrepresented. A notable intensification of "road-map effect" was identified from the 1960s to the 2010s, suggesting an increased reliance of biological surveys on road networks even after controlling for the influence of road network expansion, which has rarely been reported in published studies. Specifically, species occurrence records of vascular plants have been increasingly concentrated nearer to roads. The road density surrounding the records has progressively increased, indicating that accessibility plays a growing role in determining the likelihood of an area being sampled. Environmental and socio-economic variables affecting regional species diversity, vegetation cover, and the pristine state of vegetation, all of which influence willingness to invest additional efforts in surveying locations situated far from roads, played important roles in predicting the intensity of the "road-map effect". These results indicate that future floristic surveys in China should focus on lowland areas and remote areas characterized by low road density and considerable distances from roads. Our study also suggests that given the prevalence of the "road-map effect" in species occurrence data, it is important to account for it when using such data for both theoretical and practical purposes.

Acknowledgements

This research was funded by National Natural Science Foundation of China (32460276, 32060275), and Jiangxi Provincial Natural Science Foundation (20232BAB203058, 20242BAB27001). We appreciate Dr. Adam Devlin from the University of Hawaiʻi at Mānoa for the language editing.

CRediT authorship contribution statement

Jingyang He: Writing – original draft, Visualization, Validation, Software, Methodology, Formal analysis. Wenjing Yang: Writing – review & editing, Supervision, Resources, Methodology, Investigation, Conceptualization. Qinghui You: Writing – review & editing, Resources, Methodology, Investigation, Funding acquisition, Data curation. Qiwu Hu: Writing – review & editing, Validation, Methodology, Data curation. Mingyang Cong: Writing – review & editing, Visualization, Methodology, Data curation. Chao Tian: Writing – review & editing, Validation, Software. Keping Ma: Writing – review & editing, Resources, Project administration, Investigation, Data curation.

Availability of data and materials

The datasets that support the findings of this study are available on request from the corresponding author, upon reasonable request.

Declaration of competing interest

These authors have no conflict in interest.

Appendix A. Supplementary data

Supplementary data related to this article can be found at https://doi.org/10.1016/j.pld.2025.02.001.

References

Ai, M., Chen, X., Yu, Q., 2024. Spatial correlation analysis between human disturbance intensity (HDI) and ecosystem services value (ESV) in the Chengdu-Chongqing urban agglomeration. Ecol. Indic., 158: 111555.

Badgley, C., Smiley, T.M., Terry, R., et al., 2017. Biodiversity and topographic complexity: modern and geohistorical perspectives. Trends Ecol. Evol., 32: 211-226.

Beck, J., Ballesteros-Mejia, L., Nagel, P., et al., 2013. Online solutions and the "Wallacean shortfall": what does GBIF contribute to our knowledge of species' ranges?. Divers. Distrib., 19: 1043-1050. DOI:10.1111/ddi.12083

Borcard, D., Legendre, P., 2002. All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecol. Model., 153: 51-68.

Bordelon, A., 2022. Herbarium: the quest to preserve and classify the world's plants. J. Bot. Res. Inst. Tex., 16: 448. DOI:10.17348/jbrit.v16.i2.1274

Bowler, D.E., Callaghan, C.T., Bhandari, N., et al., 2022. Temporal trends in the spatial bias of species occurrence records. Ecography, 2022: e06219.

Canhos, V., Souza, S., De Giovanni, R., et al., 2004. Global biodiversity informatics: setting the scene for a "new world" of ecological modeling. Biodivers. Inform., 1: 1-13.

Chapman, A.D., Busby, J.R., 1994. Linking plant species information to continental biodiversity inventory, climate modeling and environmental monitoring. In: Miller, R.I. (Ed. ), Mapping the Diversity of Nature. Springer, Dordrecht, pp. 179-195.

Chen, C., 1994. History of plant taxonomy in China. In: Wang, Z. (Ed. ), History of Chinese Botany. Science Press, Beijing, pp. 121-144.

China Cartographic Publishing House, 1962. China Transportation and Travel Map. China Cartographic Publishing House, Beijing.

China Cartographic Publishing House, 1974. China Transportation Map. China Cartographic Publishing House, Xi'an.

China Cartographic Publishing House, 1986. China Transportation Atlas. China Cartographic Publishing House, Beijing.

Coelho, M.T.P., Barreto, E., Rangel, T.F., et al., 2023. The geography of climate and the global patterns of species diversity. Nature, 622: 537-544. DOI:10.1038/s41586-023-06577-5

Cosentino, F., Maiorano, L., 2021. Is geographic sampling bias representative of environmental space?. Ecol. Inform., 64: 101369.

Crisp, M.D., Laffan, S., Linder, H.P., et al., 2001. Endemism in the Australian flora. J. Biogeogr., 28: 183-198.

Daru, B.H., Park, D.S., Primack, R.B., et al., 2018. Widespread sampling biases in herbaria revealed from large-scale digitization. New Phytol., 217: 939-955. DOI:10.1111/nph.14855

Department of Comprehensive Statistics of National Bureau of Statistics, 2010. China Compendium of Statistics 1949-2008. China Statistics Press, Beijing.

Diniz-Filho, J.A.F., Bini, L.M., 2005. Modelling geographical patterns in species richness using eigenvector-based spatial filters. Global Ecol. Biogeogr., 14: 177-185.

Diniz-Filho, J.A.F., Bini, L.M., Hawkins, B.A., 2003. Spatial autocorrelation and red herrings in geographical ecology. Global Ecol. Biogeogr., 12: 53-64.

Freitag, S., Hobson, C., Biggs, H.C., et al., 1998. Testing for potential survey bias: the effect of roads, urban areas and nature reserves on a southern African mammal data set. Anim. Conserv., 1: 119-127.

Grilo, C., Bissonette, J., Cramer, P., 2010. Mitigation measures to reduce impacts on biodiversity. In: Jones, S.R. (Ed. ), Highways: Construction, Management, and Maintenance. Nova Science Publishers, Inc, Hauppauge, pp. 73-114.

Hu, X., Wu, C., Wang, J., et al., 2018. Identification of spatial variation in road network and its driving patterns: economy and population. Reg. Sci. Urban Econ., 71: 37-45.

Huang, J., Chen, J., Ying, J., et al., 2011. Features and distribution patterns of Chinese endemic seed plant species. J. Syst. Evol., 49: 81-94. DOI:10.1111/j.1759-6831.2011.00119.x

Husak, G., Michaelsen, J., Funk, C., 2007. Use of the gamma distribution to represent monthly rainfall in Africa for drought monitoring applications. Int. J. Climatol., 27: 935-944. DOI:10.1002/joc.1441

Irl, S.D.H., Harter, D.E.V., Steinbauer, M.J., et al., 2015. Climate vs. topography – spatial patterns of plant species diversity and endemism on a high-elevation island. J. Ecol., 103: 1621-1633. DOI:10.1111/1365-2745.12463

Johnson, J.B., Omland, K.S., 2004. Model selection in ecology and evolution. Trends Ecol. Evol., 19: 101-108.

Kadmon, R., Farber, O., Danin, A., 2004. Effect of roadside bias on the accuracy of predictive maps produced by bioclimatic models. Ecol. Appl., 14: 401-413. DOI:10.1890/02-5364

Kleinschroth, F., Laporte, N., Laurance, W.F., et al., 2019. Road expansion and persistence in forests of the Congo Basin. Nat. Sustain., 2: 628-634. DOI:10.1038/s41893-019-0310-6

Lituma, C., Buehler, D., 2016. Minimal bias in surveys of grassland birds from roadsides. Condor, 118: 715-727.

Liu, D., Liu, L., You, Q., et al., 2022a. Development of a landscape-based multi-metric index to assess wetland health of the Poyang Lake. Remote Sens., 14: 1082. DOI:10.3390/rs14051082

Liu, H., Qin, H., Bao, B., et al., 2022b. Analysis of digitized specimens of higher plants in China. Guihaia, 42: 29-45. DOI:10.1117/12.2661782

Loiselle, B.A., Jørgensen, P.M., Consiglio, T., et al., 2008. Predicting species distributions from herbarium collections: does climate bias in collection sampling influence model outcomes?. J. Biogeogr., 35: 105-116. DOI:10.1111/j.1365-2699.2007.01779.x

Meyer, C., Kreft, H., Guralnick, R., et al., 2015. Global priorities for an effective information basis of biodiversity distributions. Nat. Commun., 6: 8221.

Ministry of Transport of the People's Republic of China, 2005. The Second National Road Census Data Compilation. China Communications Press, Beijing.

National Bureau of Statisitics of China, 2009-2020. China Statistical Yearbook. China Statistics Press, Beijing.

Nualart, N., Ibáñez, N., Soriano, I., et al., 2017. Assessing the relevance of herbarium collections as tools for conservation biology. Bot. Rev., 83: 303-325. DOI:10.1007/s12229-017-9188-z

Oliveira, U., Paglia, A.P., Brescovit, A.D., et al., 2016. The strong influence of collection bias on biodiversity knowledge shortfalls of Brazilian terrestrial biodiversity. Divers. Distrib., 22: 1232-1244. DOI:10.1111/ddi.12489

Pan, J., Zhao, X., Guo, W., et al., 2024. Characterizing China's road network development from a spatial entropy perspective. J. Transport Geogr., 116: 103848.

Parnell, J.A.N., Simpson, D.A., Moat, J., et al., 2003. Plant collecting spread and densities: their potential impact on biogeographical studies in Thailand. J. Biogeogr., 30: 193-209.

Peng, H.E., Chen, J., Kong, H., et al., 2021. Important supporting role of biological specimen in biodiversity conservation and research. Bull. Chin. Acad. Sci., 36: 425-435.

Petersen, T.K., Speed, J.D.M., Grøtan, V., et al., 2021. Species data for understanding biodiversity dynamics: the what, where and when of species occurrence data collection. Ecol. Solut. Evid., 2: e12048.

Qian, H., Kissling, W.D., 2010. Spatial scale and cross-taxon congruence of terrestrial vertebrate and vascular plant species richness in China. Ecology, 91: 1172-1183. DOI:10.1890/09-0620.1

Rahbek, C., Graves, G.R., 2001. Multiscale assessment of patterns of avian species richness. Proc. Natl. Acad. Sci. U.S.A., 98: 4534-4539.

Reddy, S., Dávalos, L.M., 2003. Geographical sampling bias and its implications for conservation priorities in Africa. J. Biogeogr., 30: 1719-1727.

Šipek, M., Šajna, N., 2024. Lowland forest fragment characteristics and anthropogenic disturbances determine alien plant species richness and composition. Biol. Invasions, 26: 1595-1614. DOI:10.1007/s10530-024-03269-7

U.S. Geological Survey (USGS), 1996. GTOPO30. https://doi.org/10.5066/F7DF6PQS. Accessed July 19, 2023. https://www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-hydro1k#overview. Data repository: USGS Earth Resources Observation and Science (EROS) Center.

Valavi, R., Elith, J., Lahoz-Monfort, J.J., et al., 2021. Modelling species presence-only data with random forests. Ecography, 44: 1731-1742. DOI:10.1111/ecog.05615

Van Buskirk, J., Jansen van Rensburg, A., 2020. Relative importance of isolation-by-environment and other determinants of gene flow in an alpine amphibian. Evolution, 74: 962-978. DOI:10.1111/evo.13955

van der Maaten, E., Hamann, A., van der Maaten-Theunissen, M., et al., 2017. Species distribution models predict temporal but not spatial variation in forest growth. Ecol. Evol., 7: 2585-2594. DOI:10.1002/ece3.2696

Vargas, C.A., Bottin, M., Särkinen, T., et al., 2022. Environmental and geographical biases in plant specimen data from the Colombian Andes. Bot. J. Linn. Soc., 200: 451-464. DOI:10.1093/botlinnean/boac035

Venables, W.N., Ripley, B.D., 2002. Modern Applied Statistics with S, fouth ed., Springer, New York.

Wang, Z., Fang, J., Tang, Z., et al., 2012. Geographical patterns in the beta diversity of China's woody plants: the influence of space, environment and range size. Ecography, 35: 1092-1102. DOI:10.1111/j.1600-0587.2012.06988.x

Wong, H.L., Wang, Y., Luo, R., et al., 2017. Local governance and the quality of local infrastructure: evidence from village road projects in rural China. J. Publ. Econ., 152: 119-132.

Yang, W., Liu, D., You, Q., et al., 2021. Taxonomic bias in occurrence information of angiosperm species in China. Sci. China Life Sci., 64: 584-592. DOI:10.1007/s11427-020-1821-x

Yang, W., Ma, K., Kreft, H., 2013. Geographical sampling bias in a large distributional database and its effects on species richness-environment models. J. Biogeogr., 40: 1415-1426. DOI:10.1111/jbi.12108

Yang, W., Ma, K., Kreft, H., 2014. Environmental and socio-economic factors shaping the geography of floristic collections in China. Global Ecol. Biogeogr., 23: 1284-1292. DOI:10.1111/geb.12225

Zhang, J., Xiao, C., Duan, X., et al., 2024. Species' geographical range, environmental range and traits lead to specimen collection preference of dominant plant species of grasslands in Northern China. Plant Divers., 46: 353-361. DOI:10.3390/educsci14040353

Zhang, T., Sun, Y., Guan, M., et al., 2022. Human activity intensity in China under multi-factor interactions: spatiotemporal characteristics and influencing factors. Sustainability, 14: 3113. DOI:10.3390/su14053113

Zhuang, H., Wang, C., Wang, Y., et al., 2021. Native useful vascular plants of China: a checklist and use patterns. Plant Divers., 43: 134-141.

Zhou, Y., Tong, C., Wang, Y., 2022. Road construction, economic growth, and poverty alleviation in China. Growth Change, 53: 1306-1332. DOI:10.1111/grow.12617

Zizka, A., Antonelli, A., Silvestro, D., 2021. sampbias, a method for quantifying geographic sampling biases in species distribution data. Ecography, 44: 25-32. DOI:10.1111/ecog.05102

Article Information

Article history

Workspace