J. Meteor. Res.  2018, Vol. 32 Issue (4): 636-647 PDF
http://dx.doi.org/10.1007/s13351-018-7139-1
The Chinese Meteorological Society
0

#### Article Information

WU, Dingrong, Chunyi WANG, Fang WANG, et al., 2018.
Uncertainty in Simulating the Impact of Cultivar Improvement on Winter Wheat Phenology in the North China Plain. 2018.
J. Meteor. Res., 32(4): 636-647
http://dx.doi.org/10.1007/s13351-018-7139-1

### Article History

Received October 10, 2017
in final form March 25, 2018
Uncertainty in Simulating the Impact of Cultivar Improvement on Winter Wheat Phenology in the North China Plain
Dingrong WU1, Chunyi WANG1,2, Fang WANG1, Chaoyang JIANG1, Zhiguo HUO1, Peijuan WANG1
1. State Key Laboratory of Severe Weather, Chinese Academy of Meteorological Sciences, Beijing 100081;
2. Hainan Meteorological Service, Haikou 570203
ABSTRACT: The phenology model is one of the major tools in evaluating the impact of cultivar improvement on crop phenology. Understanding uncertainty in simulating the impact is an important prerequisite for reliably interpreting the effect of cultivar improvement and climate change on phenology. However, uncertainty induced by different temperature response functions and parameterization methods have not been properly addressed. Based on winter wheat phenology observations during 1986–2012 in 47 agro-meteorology observation stations in the North China Plain (NCP), the uncertainty of the simulated impacts caused by four widely applied temperature response functions and two parameterization methods were investigated. The functions were firstly calibrated using observed phenology data during 1986–1988 from each station by means of two parameterization methods, and were then used to quantify the impact of cultivar improvement on wheat phenology during 1986–2012. The results showed that all functions and all parameterization methods could reach acceptable precision (RMSE < 3 days for all functions and parameterization methods), however, substantial differences exist in the simulated impacts between different functions and parameterization methods. For vegetative growth period, the simulated impact is 0.20 day (10 yr) –1 [95% confidence interval: –2.81–3.22 day (10 yr)–1] across the NCP, while for reproductive period, the value is 1.50 day (10 yr)–1 [–1.03–4.02 day (10 yr)–1]. Further analysis showed that uncertainty can be induced by both different functions and parameterization methods, while the former has greater influence than the latter. During vegetative period, there is a significant positive linear relationship between ranges of simulated impact and growth period average temperature, while during reproductive period, the relationship is polynomial. This highlights the large inconsistency that exists in most impact quantifying functions and the urgent need to carry out field experiment to provide realistic impacts for all functions. Before applying a simulated effect, we suggest that the function should be calibrated over a wide temperature range.
Key words: observed phenology     temperature response function     parameterization method     renewing cultivar
1 Introduction

Crop phenology is defined as the seasonal plant activity driven by environmental factors (Menzel and Fabian, 1999), and is the key signal of entering the next development stage. It controls the partitioning of assimilations between various crop organs and determines the timing of various agronomic management practices (Ruml and Vulić, 2005). Crop phenology is affected by genotype and environmental factors. Genotype determines the type and rate of the response to the envrironment, while environmental factors, especially temperature, determine the external conditions where the crop is grown (McMaster et al., 1992; Estrella et al., 2007; Wilczek et al., 2010; Tao et al., 2014). Field management practice can also impact phenology, since it modifies the temperature, water, and nutrition condition of the crop grown. In order to simulate phenology accurately, numerous models with different temperature response functions (such as linear, bilinear, trilinear, and nonlinear) and different time steps (such as daily and hourly) were developed. Model parameter values were derived by several methods, including trial-and-error (e.g., Braga et al., 2008), Bayesian (e.g., Dose and Menzel, 2004), and iterative methods (e.g., Danuso et al., 2012).

Under the background of changing climate, phenology has drawn increasing interest amongst researchers (Cleland et al., 2007; White et al., 2009; Ibáñez et al., 2010; Richardson et al., 2010), since its close relationship to temperature makes it a powerful tool to track global change (Doi, et al., 2010; Jochner et al., 2016). On the other hand, crop cultivars are renewed frequently under the pressure of meeting the increasing food demand by the rising population and living standards (Cassman, 1999; Ray et al., 2012; Zhang W. Y. et al., 2016). As a consequence, the observed crop phenology carried out by major observation systems (e.g., Chmielewski et al., 2004; Tao et al., 2014) is the result of the combined effects of climate change and cultivar improvement. Climate change has complicated effects on phenology. According to the most acceptable understanding, winter wheat under warmer conditions has a higher development rate and thus shortened number of growing days. On the other hand, the beginning of the growth season will also be advanced under a warmer climate, thus the growth season will shift towards the cooler season with a shorter day length, and the shortened day length will also affect wheat phenology (Siebert and Ewert, 2012). Cultivar change was believed to have extended winter wheat phenology through adopting cultivars requiring more accumulated temperature since those kind of cultivars generally have better yield performance (Tao et al., 2014). The phenology model depicts how crop phenology responds to climate according to the genetic characteristic parameters (Wang E. L. et al., 2017), and thus, the combined effects can be disentangled by using phenology models. During the last decade, the effects of cultivar improvement of major crops including: rice (Liu L. L. et al., 2012, 2013), maize (Liu et al., 2010; Liu Z. J., 2013; Tao et al., 2014; Xiao et al., 2015; Zhao et al., 2015), and wheat (Liu et al., 2010; Wang et al., 2013; He et al., 2014; Xiao et al., 2014; Li et al., 2015; Mo et al., 2016) on phenology have been studied by using various phenology models, such as CERES (Tao et al., 2014; Mo et al., 2016), APSIM (Liu et al., 2010; Liu Z. J., 2013; Wang et al., 2013; He et al., 2014; Xiao et al., 2014, 2015; Li et al., 2015; Zhao et al., 2015), and RiceGrow (Liu L. L. et al., 2012, 2013). These findings provide the scientific basis for policy-making relevant to food security.

However, most results are carried out based on a single phenology model and a single parameterization method. There are many kinds of phenology models developed for different crops in various regions. These models vary in the mechanism of describing crop developmental response to temperature (Wang N. et al., 2015; He et al., 2017; Wang E. L. et al., 2017; Wu et al., 2017). In the same model, the mechanisms can also be substantially changed if different genetic parameters are selected (Gao et al., 1992; Yin et al., 1995). Consequently, the simulated impact of cultivar shift on phenology is likely to be varied between different models and different parameterization methods, but until now, the uncertainties arising from the application of different models have not been properly addressed.

The North China Plain (NCP), one of the major granaries in China (Wu et al., 2006; Liu et al., 2010), is a region where climate has changed substantially (Wang et al., 2004; Ding et al., 2007) and winter wheat varieties are also changed frequently (Xiao et al., 2014). It covers two metropolises (Beijing and Tianjin) and five provinces (Anhui, Hebei, Henan, Jiangsu, and Shandong), with an area of 31 million ha of which 17.95 million ha is used for agriculture (Wu et al., 2006). More than 50% of the nation’s wheat yield is produced in the NCP (Wang et al., 2012). Uncertainty in simulated effects of winter wheat cultivar improvement on phenology is important to the strategy of maintaining the national food security. Therefore, it is of particular importance to assess the uncertainty in the simulated phenology effects caused by cultivar improvement through using different temperature response functions and different parameterization methods.

The objectives of this study are to: (1) describe the impact of cultivar improvement on winter wheat phenology in the NCP; (2) evaluate uncertainty introduced by different temperature response functions and different parameterization methods; and (3) identify how the ranges of simulated impacts respond to temperature.

2 Material and methods 2.1 Study area, observation stations, and cultivar improvement

The historical data on wheat phenology and climate during 1986–2012 were obtained from the National Meteorological Information Center (NMIC). Forty-seven agro-meteorological observation stations were selected within the NCP. These stations are almost evenly distributed over the area (Fig. 1). All stations have the full 27-yr winter wheat phenology observations from 1986–2012. In each station, the time of wheat phenology, including turning green (TG), heading (HD), and maturity (MT), were observed in each growing season following the uniformed China Meteorological Administration (CMA) observation standards (China Meteorological Administration (CMA), 1993). TG–HD is the main vegetative growth period of winter wheat, while HD–MT is the reproductive period. In order to have a better representation of local traditions, crop management practices at the experimental stations are generally the same as or better than in the surrounding local practices (Tao et al., 2006). The traditional management practices did not change much during the studied period except for cultivars. According to water availability and requirement, irrigation was not conducted every year and in each station, but fertilizers (such as organic and/or chemical fertilizers), were used several times every year. Historical daily weather data at the selected stations collecting from NIMC includes: sunshine hours (h), precipitation (mm), and mean, minimum, and maximum temperatures (°C).

 Figure 1 Study area and location of agro-meteorological observation stations.

Winter wheat was sown between early October in the northern plain and early November in the south and harvested in late May in the south and mid-June in the north. During the study period, cultivars changed frequently in most stations. The cultivar was changed about every two years (Fig. 2). Its changing rate steadily decreased during 2003–09, but abruptly increased after 2010. In total, 358 cultivars were used during the whole period. In most stations (74.5%), 5–15 cultivars were used during 1986–2012. Among other stations, 4.3% of stations used less than 5 cultivars and 21.3% recorded more than 15 cultivars. Taian in Shandong Province is the site with the most frequent changing rate of cultivar. During the study period, 23 cultivars were used in this station. The frequently improved cultivars were believed to impact wheat phenology (Wang et al., 2013; Xiao et al., 2014; Li et al., 2015).

 Figure 2 Percentage (%) of sites that changed cultivar relative to the previous year.
2.2 Descriptions of temperature response functions

Four widely applied temperature response functions were used in this study. They are the Blackman function (F1), daily bilinear function (F2), hourly bilinear function (F3), and Gao function (F4). The Blackman and daily bilinear functions were used in the WOFOST model (Supit et al., 1994). The hourly bilinear function was used in ORYZA2000 (van Oort et al., 2011) and its revised form was used in DSSAT (Jones et al., 2010). The Gao function was originally designed to simulate rice development (Gao et al., 1992) but it is a general model and was adopted in simulated development of other crops including wheat. Response curves of the four functions to temperature were shown in Fig. 3. Detailed descriptions of the four functions were as follows.

2.2.1 Blackman function

The Blackman function assumes development is linearly increased with the increase of temperature until the optimal temperature was reached. The function can be defined as:

 ${T_{\rm sum}} = \left\{ {\begin{array}{*{20}{lr}}{0},&{{T_{\rm d}} \leqslant {T_{\rm b}}}\\{{T_{\rm d}} - {T_{\rm b}}}, &\quad {{T_{\rm b}} < {T_{\rm d}} \leqslant {T_{\rm opt}}}\\{{T_{\rm opt}} - {T_{\rm b}}}, &{{T_{\rm opt}} < {T_{\rm d}}}\end{array}} \right. ,$ (1)

where Tsum (°C day) is the increment in thermal time over the day, Td is the daily temperature during the day, Tb is the base temperature (°C), and Topt is the optimal temperature (°C). In this model, there is no development below Tb, Tsum increases linearly with Td up to Topt and then remains at its maximum ToptTb. The temperature sum for a specific development phase is then the sum of the daily Tsum values during this phase.

2.2.2 Daily bilinear function

This differs from the Blackman function, which assumes development remains constant after Td exceeds Topt, the daily bilinear model assumes development rate slows linearly if Td exceed Topt. The bilinear function is defined as:

 $T_{\rm sum}=\left\{ {\begin{array}{*{20}{lr}}{0}, & {T_{\rm d} \leqslant T_{\rm b}\; {\rm {or}}\; T_{\rm d} > T_{\max}}\\{T_{\rm d}-T_{\rm b}}, & {T_{\rm b} where Tmax is the maximum temperature above which the development rate is zero (°C). 2.2.3 Hourly bilinear function The hourly bilinear function assumes the same temperature response function as the daily bilinear function but in an hourly time step. The hourly temperature was determined by the daily minimum and maximum temperatures using a cosine function. This function is defined as: ${T_t} = \frac{{{T_{\min}} + {T_{\max}}}}{2} + \left( {{T_{\max}} - {T_{\min}}} \right) \times \frac{{{\rm{cos}}\left[ {0.2618 \times \left( {t - 14} \right)} \right]}}{2},$(3) where Tmax and Tmin are daily maximum and minimum temperatures (°C), respectively, t is hour (h) in a day and it is assumed that Tmax is reached at 14.00 h, and Tt is the temperature at hour t. After Tt was calculated, Eq. (2) was used to determine the accumulation of Tsum in hour t. The average Tsum in the day was then calculated and was taken as the accumulated thermal time in the day. 2.2.4 Gao function The Gao function assumes a nonlinear temperature response function in the daily time step at the two sides of Topt. The function describes the response of the development rate to temperature as: $R = k \times {\left( {\frac{{{T_{\rm d}} - {T_{\rm b}}}}{{{T_{\rm opt}} - {T_{\rm b}}}}} \right)^p} \times {\left( {\frac{{{T_{\max}} - {T_{\rm d}}}}{{{T_{\max}} - {T_{\rm opt}}}}} \right)^q},$(4) where R is the development rate in a day, k, p, and q are the model parameters, k defines the maximum development rate while Td equals Topt, while p and q defines the curvatures of the relationship with temperatures between Tb and Topt and between Topt and Tmax, respectively. However, according to Yin’s report (Yin et al., 1995), in order to make sure maximum development rate occurs at Topt, the value of q should be restricted as: $q = \frac{{{T_{\max}} - {T_{\rm opt}}}}{{{T_{\rm opt}} - {T_{\rm b}}}} \times p.\$ (5)

Hence, this implicit condition was applied to keep the continuity of the response function in this study.

 Figure 3 Normalized temperature response curves of the four functions. (a) Blackman function, (b) daily bilinear function, (c) hourly bilinear function, and (d) Gao function.

In this normalization, base, optimum, and maximum temperature are 3, 22, and 32°C, respectively. The thermal time accumulation needed by a crop to finish a specific development phase was set as 1000°C day. Diurnal temperature range was fixed at 12°C. In the Gao function, k and p are 0.05 and 1.5, respectively, and consequently q is 0.79 according to Eq. (5).

2.3 Descriptions of two parameterization methods

Two widely applied parameterization methods were used in this study. The first one (M1) is using fixed cardinal temperatures (Tb, Topt, and Tmax), and better fit to the observations were achieved through adjusting Tsum needed to finish a specific development period. In the second (M2), cardinal temperatures were derived through optimization over a wide temperature range, and then Tsum was adjusted to achieve the best fitting.

In the two methods, fixed cardinal temperatures and the temperature range used for optimization were referred to in previous literature (Supit et al., 1994; Porter and Gawith, 1999). In M1, Tb, Topt, and Tmax in the vegetative growth period are set as 3, 22, and 32°C, respectively (Table 1), while in the reproductive growth period, the three cardinal temperatures are set as 15, 30, and 40°C, respectively. In M2, cardinal temperature ranges in the vegetative growth period are set as 0–6, 18–24, and 26–40°C, respectively, while in the reproductive growth period, the ranges are set as 12–16, 26–32, and 38–45°C, respectively (Table 1).

Table 1 Fixed cardinal temperatures and the temperature range in the two parameterization methods
 Parameterization method Tb (°C) Topt (°C) Tmax (°C) Vegetative 　period M1 3 22 32 M2 0–6 18–24 26–40 Reproductive 　period M1 15 30 40 M2 12–16 26–32 38–45
2.4 Simulated impact of cultivar improvement on phenology and statistical methods

In each station, phenology data from 1986–88 were used to calibrate the thermal functions and derived function parameters. Data from 1989–91 were used for validation. Function performances in calibration and validation were evaluated by using statistical indicators, such as mean absolute error (MAE) and root mean square error (RMSE) between observed and simulated values. The calculation methods MAE and RMSE were drawn from literature (Willmott, 1982).

After calibration, the functions were applied to simulate wheat HD from TG and MT from HD during 1986–2012. In simulating HD and MT, the start date of simulation in each year is the observed date of TG and HD in each year, respectively. Differences between observed and simulated dates of HD and MT were assumedly caused by cultivar improvement. This is a common and necessary assumption in similar studies (Tao et al., 2014; Mo et al., 2016). Based on the assumption, the temporal linear trends in the differences between the observed and estimated HD and MT dates from 1986 to 2012 were derived for each station using the Least Squares Method (LSM). The statistical test for the linear trends was conducted by a paired t-test between each pair of temperature response functions and between each pair of parameterization methods. P < 0.05 was considered statistically significant and meaningful.

3 Results 3.1 Spatial and temporal trends in climate change

The NCP belongs to the zone of warm temperate monsoon climate. Figure 4 shows the spatial pattern of its climate trends during 1986–2012. Though one station has positive value, annual sunshine hour decreased by 13.5 h yr–1 across the NCP, and the negative trends are significant at 35 stations (Fig. 4a). Precipitation significantly increased by 8.9 and 10.0 mm yr–1 in 2 stations (Fig. 4b). Annual mean temperature increased by 0.10 to 0.65°C (10 yr)–1, and the trends are significant in 28 stations, with larger increases in minimum temperatures [0.56°C (10 yr)–1] and lower increases in maximum temperatures [0.15°C (10 yr)–1]. The trends are significant in 38 and 3 stations for minimum and maximum temperatures, respectively.

 Figure 4 Spatial patterns of climatic trends in (a) sunshine hours, (b) precipitation, (c) average temperature, (d) maximum temperature, and (e) minimum temperature in the North China Plain from 1986 to 2012. Stations with red circle indicate statistical significance at P = 0.05.
3.2 Performance of functions in simulating HD and MT date

Differences between simulated and observed values in calibration and validation for the wheat HD and MT dates were shown in Table 2. All functions and parameterization methods can give acceptable simulation accuracy. The results of calibration showed MAE and RMSE were less than 3 days for almost all functions. According to the statistical indicators, all functions give similar simulation accuracy, with F4 being a little more accurate than other three. Since M2 is actually the optimization of M1, the accuracy of M2 is always better than M1. After optimization, MAE and RMSE are decreased by 0.29 and 0.22 day in vegetative and reproductive periods, respectively, though their values are dependent on different functions and different stations. All functions give better replications in simulating reproductive periods than in simulating vegetative periods. On average, MAE and RMSE in the reproductive period are less than in the vegetative period by 0.63 and 0.70 day, respectively. Averaged RMSE in vegetative and reproductive periods are 3.28 and 2.35 days, respectivly, while MAE in the two periods are 2.98 and 2.11 days, respectively. RMSE and MAE during validation are 1 day larger than during calibration. It is reasonable because the variety is likely to be changed during the validation period, as indicated by Fig. 2. Table 2 indicates that, after calibration and validation, all functions and parameterization methods can be used in simulating the impact of cultivar improvement on phenology.

Table 2 Statistical indicators for the performance of four temperature response functions in simulating heading and maturity date
 Function Parameterization method Calibration Validation TG–HD HD–MT TG–HD HD–MT MAE (day) RMSE (day) MAE (day) RMSE (day) MAE (day) RMSE (day) MAE (day) RMSE (day) F1 M1 2.14 2.47 1.51 1.77 2.79 3.07 2.43 2.67 M2 1.66 1.94 1.15 1.36 2.69 3.02 2.34 2.6 F2 M1 2.11 2.43 1.50 1.74 2.77 3.03 2.41 2.66 M2 1.48 1.76 1.06 1.28 3.01 3.34 2.27 2.52 F3 M1 2.35 2.64 1.32 1.56 2.93 3.19 1.89 2.13 M2 1.60 1.87 0.84 1.06 3.04 3.33 2.07 2.28 F4 M1 1.66 1.94 1.03 1.23 3.31 3.67 1.71 1.96 M2 1.30 1.55 0.82 1.00 3.28 3.59 1.77 1.96
3.3 Impact of cultivar improvement on wheat phenology 3.3.1 Average impact of cultivar improvement

Cultivar improvement has an obvious impact on both vegetative and reproductive periods (Fig. 5). Among different stations, the impact on the vegetative period varied between –3.16–3.16 day (10 yr)–1, with an average value of 0.20 day (10 yr)–1 [95% conﬁdence interval –2.81–3.22 day (10 yr)–1]. In 27 stations, the effect is positive. Impacts on the reproductive period varied between –1.72 and 4.45 day (10 yr)–1, with an average value of 1.50 day (10 yr)–1 [95% conﬁdence interval –1.03–4.02 day (10 yr)–1]. In 41 stations, the effect is positive. The effect in both two growth periods showed no obvious regional distribution pattern (Fig. 5).

 Figure 5 Averaged impacts of cultivar improvement on (a) vegetative period and (b) reproductive period.

Ranges in simulated impacts of cultivar improvement on the vegetative period simulated by four temperature response functions and two parameterizations vary from 1.21 to 5.61 day (10 yr)–1, with an average value of 3.08 day (10 yr)–1 but the range in the reproductive period is relatively small, the range is 0.64–4.31 day (10 yr)–1, and average value is 2.37 day (10 yr)–1 (Fig. 6). The standard deviation of the range in both periods is 0.97 and 0.90 day (10 yr)–1, respectively. The range in the both two periods is generally smaller in the northern and higher in the southern regions.

 Figure 6 Range in simulated impacts of cultivar improvement on (a) vegetative growth and (b) reproductive growth simulated by 4 models and 2 parameterization methods. The range in each station was derived by subtracting the maximum trend by the minimum trend simulated by different response functions and different optimization methods.
3.3.2 Differences induced by different parameterization methods

Differences induced by different parameterization methods in the vegetative period are larger than those in the reproductive period (Fig. 7). In the vegetative period, the average ranges of simulated impacts of F1 to F4 are 0.29, 0.36, 0.35, and 0.26 day (10 yr)–1, respectively. Therefore, F2 has the largest range. In the reproductive period, the average ranges of simulated impacts of F1 to F4 are 0.19, 0.21, 0.18, and 0.28 day (10 yr)–1, respectively. F4 has the largest range in this period. In both growth periods, F4 function gives very different values to the others. In the vegetative period, all the other three functions give positive impacts while F4 gives negative impacts. In the reproductive period, F4 gives much lower impacts. The F4 function also gives different range values to the others. In the vegetative period, F4 gives the lowest impact, while in the reproductive period, it gives the highest. Paired t-test indicates that parameterization methods bring a significant difference to F1 and F2 during the vegetative period, and to F2 and F4 during the reproductive period.

 Figure 7 Average values simulated by specific response functions in specific development stages by two parameterization methods and differences induced by different parameterization methods. Solid lines and hollow squares indicate medians and means, respectively. Box boundaries indicate upper and lower quartiles, whisker caps indicate 95th and 5th percentiles, and upper and lower crosses indicate maximums and minimums. Solid black squares indicate the averaged values simulated by specific response functions in specific development stages by two parameterization methods. V and R indicate vegetative and reproductive growth periods, respectively.
3.3.3 Differences induced by different temperature response functions

Differences induced by different functions in the vegetative period are also larger than in the reproductive period (Fig. 8). M1 and M2 give similar impact ranges in the vegetative and reproductive periods. Averaged impacts in the vegetative period are very small [M1: 0.24 day (10 yr)–1; M2: 0.17 day (10 yr)–1], but show a large variation between different functions, with ranges in M1 and M2 of 2.87 and 2.78 day (10 yr)–1, respectively. In the reproductive period, averaged impacts given by M1 and M2 are 1.55 and 1.44 day (10 yr)–1, respectively, with ranges in M1 and M2 of 2.03 and 2.25 day (10 yr)–1, respectively. Paired t-test indicates that, during the vegetative period, temperature response functions caused significant simulation differences except for F1 and F2 in M1, and F1 and F2, F1 and F3, F2 and F3 in M2. While during the reproductive period, temperature response functions caused significant simulation differences except for F1 and F2 in both M1 and M2 (Table 3).

 Figure 8 Averaged values simulated by specific parameterization methods in specific development stages by four response functions and differences induced by different response functions. Solid lines and hollow squares indicate medians and means, respectively. Box boundaries indicate upper and lower quartiles, whisker caps indicate 95th and 5th percentiles, and upper and lower crosses indicate maximums and minimums. Solid black squares indicate the averaged values simulated by specific parameterization methods in specific development stages by four response functions. V and R indicate vegetative and reproductive growth periods, respectively.
Table 3 P-value of paired t-test for simulation differences induced by different temperature response functions
 F1 F2 F3 Vegetative period M1 F2 0.102 F3 < 0.001 < 0.001 F4 < 0.001 < 0.001 < 0.001 M2 F2 0.130 F3 0.199 0.899 F4 < 0.001 < 0.001 < 0.001 Reproductive period M1 F2 0.663 F3 0.004 0.004 F4 < 0.001 < 0.001 < 0.001 M2 F2 0.189 F3 0.007 0.001 F4 < 0.001 < 0.001 < 0.001
3.4 Influence of temperature on range in simulated impacts

Figure 9 shows the relationship between ranges of simulated impact and growth period average temperature. There is a significant positive linear relationship in the vegetative period, indicating that the simulated impacts will vary to a greater extent among different functions and parameterization methods if temperature is increased. As for the reproductive period, a polynomial relationship fits better than a linear one, indicating that the range will increase if the temperature increases or decreases.

 Figure 9 Relationship between ranges of simulated impact and growth period average temperature during (a) vegetative growth period and (b) reproductive growth period. The growth period average temperature on x-axis and the range in simulated impact on y-axis are averaged from 47 stations each year. Consequently, there are 27 points in each figure. ** and *** indicate statistical significance at P = 0.01 and P = 0.001, respectively.
4 Summary and discussion

Facing the great challenges imposed by increasing food demand and changing climate, researchers have shown a strong willingness to understand contributions of climate change and cultivar improvement on crop yield over recent decades (Tao et al., 2006; Liu et al., 2010). As part of these efforts, phenology modelling has been increasingly used in evaluating the impact of cultivar improvement on crop phenology, especially in most Chinese major food production regions, such as the NCP (Liu et al., 2010; Wang et al., 2013), the middle and lower reaches of the Yangtze River (Liu L. L. et al., 2012; Liu Z. J. et al., 2013), Northeast Plain (Liu L.L. et al., 2012) and the Loess Plateau (He et al., 2014; Mo et al., 2016). Due to the imperfect model structure and model parameterization methods, simulation uncertainty exists in such studies. Before these results are useful to assist in policy making, the degree of uncertainty, and what factors affect the uncertainty must be investigated.

In this study, we assembled wheat phenology observations from 47 agro-meteorology observation stations in the NCP during 1986–2012 to identify the uncertainty of the simulated impacts induced by four widely applied temperature response functions and two parameterization methods. Our results clearly indicate that the uncertainty was affected by different temperature response functions. The average ranges in different temperature response functions are 2.87 and 2.78 day (10 yr)–1 in M1 and M2 in the vegetative period, and are 2.03 and 2.25 day (10 yr)–1 in the reproductive period, respectively. Results also showed uncertainty was affected by different parameterization methods. In the vegetative and reproductive periods, the average ranges are 0.31 and 0.22 day (10 yr)–1, respectively. Therefore, uncertainty induced by response functions is much higher than in parameterization methods. However, it is should be noticed that our study disregards the impact of managements practice on phenology, especially nutrition and irrigation, since these kind of data are inconsistency and incomplete. It is reported fertilizers have positive effect on effects on plant development. The general response of wheat to water stress was to reach growth stages earlier, but this varied in different growth periods (McMaster and Wilhelm, 2003; Liu et al., 2016). During the study period, fertilization level and irrigation ability have been increasing. As the results, the lack of nutrition and irrigation information is likely to overestimate the impact of cultivar shift on phenology.

The simulated advance in wheat phenology as affected by climate warming, especially in the reproductive period, is consistent with previous studies (Liu et al., 2010; Wang et al., 2013; Li et al., 2015). Other studies based on the sensitivity of the development rate to temperature (He et al., 2014) and the comparison of genetic coefficients in the two periods (Xiao et al., 2014) also showed similar results. In spite of this consistency, the wide range of results from multiple functions and parameterization methods highlights the knowledge gaps in understanding wheat phenology response to cultivar improvement. Furthermore, we found the uncertainty in both vegetative and reproductive period has obvious a regional pattern, i.e., larger in the southern NCP but lower in the northern. Beside the two optimization methods used in this study, another very popular parameterization method is trial-and-error. Since trial-and-error adjusts parameters manually, parameter values obtained through this method may be optimal locally but not optimal globally. Consequently, if this method was considered, uncertainty is likely to be further increased. Uncertainty in simulated phenology will cast a potential impact on simulated yield. Generally, longer growth durations will result in higher simulated yields, especially during the reproductive period. Larger uncertainties in simulated phenology are likely to result in larger uncertainties in simulated yield. As a result, policy decisions taken using these results also need to take into consideration the degree of uncertainty and the regional distribution.

The temperature response function is the core assumption of a phenology model. Before model application, the validity of the assumption should be verified over a wide range of temperatures (e.g., Wang N. et al., 2015; Zhang T. Y. et al., 2016; Wu et al., 2017; Liu et al., 2018). However, for practical reasons, function was validated commonly using 2–4 yr of observation data, thus the range of temperature is unlikely to be wide enough. In addition, insufficient data can easily cause excessive fitting, which will further increase the uncertainty, for example, based on observations for the same variety under various temperature conditions, Zhang T. Y. et al. (2016) calibrated several models in lower temperature environments and validated them in higher temperatures, and found that in warmer climates, bilinear and beta models resulted in gradually increasing bias. In this study, based on observations in the context of variety changes, we also found uncertainties changed with growth period average temperature. In Fig. 9, the narrowest ranges appearing in the vegetative and reproductive periods are 9–10°C and 20–21°C, respectively, which are the growth period of average temperatures during three years for calibration. In other years with many different temperatures, the range becomes larger. Although we have not carried out further research because of the limitations of this study, it is still enough to see the uncertainties among various models can be aroused in both incomplete validation and excessive fitting caused by insufficient observations.

Phenology models always differ in the temperature response function. As a consequence, uncertainty always exists. Moreover, only the temperature response function was considered in this study. Other factors, such as photoperiod and vernalization, may also impact winter wheat phenology (Wang and Engel, 1998). The parameterization process of these two impact factors and its interaction with temperature response function will also become the source of uncertainty. From this point of view, uncertainty existing in simulation results can only be reduced, but cannot be fully eliminated. Uncertainty may be reduced through applying several methods. The first, as indicated by Fig. 9, is to calibrate the function over a wide range of temperature. The second is using an ensemble model. It is reported that median simulation from a multi-model ensemble is currently more accurate than simulations from single models (Chenu et al., 2017). The third is field experiments, i.e., plant cultivars bred in different years in the same field and under the same management practices and then comparing the phenology response to temperature (Wang et al., 2016). Because of the increasing need to evaluate the impacts, such experiments are urgently needed, since they provide the most accurate impact value.

In general, temperature response functions and parameterization methods determine the impact range, while temperatures in calibration years determine relationship between range and temperature. Our results demonstrate the need for authors to clearly communicate the response function, parameterization method and calibration years when evaluating the impact of cultivar improvement on crop phenology so that others can correctly interpret and apply the reported results. Moreover, before applied simulated effects, we suggest that the functions should be calibrated over a wide temperature range.

References