J. Meteor. Res.   2017, Vol. 31 Issue (2): 409-419    PDF    
http://dx.doi.org/10.1007/s13351-017-6114-6
The Chinese Meteorological Society
0

Article Information

Mengzi ZHOU, Huijun WANG, Zhiguo HUO . 2017.
A New Prediction Model for Grain Yield in Northeast China Based on Spring North Atlantic Oscillation and Late-Winter Bering Sea Ice Cover. 2017.
J. Meteor. Res., 31(2): 409-419
http://dx.doi.org/10.1007/s13351-017-6114-6

Article History

Received July 8, 2016
in final form September 13, 2016
A New Prediction Model for Grain Yield in Northeast China Based on Spring North Atlantic Oscillation and Late-Winter Bering Sea Ice Cover
Mengzi ZHOU1, Huijun WANG2,3, Zhiguo HUO1     
1. State Key Laboratory of Severe Weather, Chinese Academy of Meteorological Sciences, Beijing 10081;
2. International Joint Laboratory on Climate and Environment Change/Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information Science & Technology, Nanjing 211044;
3. Nansen–Zhu International Research Centre, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029
ABSTRACT: Accurate estimations of grain output in the agriculturally important region of Northeast China are of great strategic significance for guaranteeing food security. New prediction models for maize and rice yields are built in this paper based on the spring North Atlantic Oscillation index and the Bering Sea ice cover index. The year-to-year increment is first forecasted and then the original yield value is obtained by adding the historical yield of the previous year. The multivariate linear prediction model of maize shows good predictive ability, with a low normalized root-mean-square error (NRMSE) of 13.9%, and the simulated yield accounts for 81% of the total variance of the observation. To improve the performance of the multivariate linear model, a combined forecasting model of rice is built by considering the weight of the predictors. The NRMSE of the model is 12.9% and the predicted rice yield explains 71% of the total variance. The corresponding cross-validation test and independent samples test further demonstrate the efficiency of the models. It is inferred that the statistical models established here by applying year-to-year increment approach could make rational prediction for the maize and rice yield in Northeast China before harvest. The present study may shed new light on yield prediction in advance by use of antecedent large-scale climate signals adequately.
Key words: crop yield     linear forecasting model     spring North Atlantic Oscillation index     Bering Sea ice cover index     year-to-year increment    
1 Introduction

Agricultural production of grain is necessary to sustain human life and is strongly associated with social and economic development. Crop yield is known to be sensi-tive to climate change. In the past 100 years, the average global surface temperature has increased by 0.89°C (IPCC, 2013). Against the background of global warming, extreme climate events are expected to become more frequent (Mitchell et al., 1995; Schlenker and Roberts, 2009; Schlenker and Lobell, 2010). Both climate change and climatic variation could pose enormous challenges for agriculture (You et al., 2009). Lobell et al. (2011) revealed that in the past 30 years, global maize and wheat production declined by 3.8% and 5.5% in the context of climate change.

Northeast China, comprising the provinces of Heilongjiang, Jilin, and Liaoning, is one of the most productive agricultural regions. Spring maize, single crop rice, soybean, and some other cash crops are popular in this region. The production of maize in Northeast China accounts for 30% of the national total (Gouveia et al., 2008). Rice is the second most important grain crop, and the percentage of the production of japonica rice in terms of national output is greater than 40% (Sun et al., 2010). In recent decades, the air temperature in Northeast China showed an evident warming trend. Accompanied by climate warming, the extreme low temperature event decreased significantly in this region (Li et al., 2012); rather, the frequency of extreme precipitation and heat increased (Zhang and Zhang, 2005; Yang et al., 2008). Previous researches suggested that climate warming is favorable to crop yield on the whole (Yang et al., 2010). But the negative impact induced by climate would emerge as the warming intensifies (Zhang et al., 2008). That is, climate change brought about great uncertainty to agricultural production in Northeast China (Cheng and Zhang, 2005; Zhao et al., 2009). In the context of the current and future situation regarding the climate and environment of the region, accurate estimation of grain output in Northeast China is critical for guiding adaptation efforts and providing information for policymakers.

Two approaches are most commonly used to predict crop production: crop simulation models and statistical models. Process-based crop models, which first started to be developed in the late 1960s, attempt to emulate the main physiological process of crop growth and development in daily time steps (Rosenberg, 2010). However, they do not consider the losses caused by pests and diseases, which are assumed to be controlled in such models (Parry et al., 2005; Lobell et al., 2007; Yao et al., 2007). Also, extensive parameters about cultivars, mana-gement and soil conditions are required, which in many parts of the world are not available or cannot be calibrated (Lobell and Burke, 2010). Furthermore, most crop simulation models are limited to field scale due to the spatiotemporal mismatch between dynamic climate models and crop growth models when extrapolated to regional scale (Hansen and Indeje, 2004).

Statistical models, which we employ here, are built based on historical yield data and climate variables. Generally speaking, crop yield exhibits a highly significant positive trend with time as a result of scientific and technological progress. Hence, an effective detrending method to remove or reduce the influence of technology is necessary. Some studies predict crop yield by including temporal trend intuitively in the regression equations (Lobell et al., 2007), while other studies employ detrended yield as the response variable. The detrended yield refers to the deviations from observed yield and trend yield, and the trend yield could be obtained by using a linear temporal model, curve fitting, or a sliding average method (Chen et al., 2012; Liu et al., 2012).

Various functional forms can be used in statistical models to describe the specified relationship between predictors and crop yield. For example, Xu and Chen (1991) used a multivariate linear regression model to predict spring wheat yield in Mulei County of Xinjiang. Sometimes, a squared term was added to improve the performance of empirical crop models, especially when there was an optimum value that fell within the observed data (Lobell et al., 2006). Several more complicated formulas could also be applied, such as Miami model (Lieth, 1973) or artificial neural networks (Hsu and Chen, 2003). However, a more complex model does not always help, and instead may increase the computational cost and lead to overfitting (Lobell, 2010). A simple linear model is still a relatively common yield forecasting method currently. In this approach, the critical step is the selection of appropriate predictors. Previous models have almost been conducted based upon local growing season meteorological variables. However, the seasonal prediction of these meteorological variables is difficult, particularly in the middle and high latitudes. Furthermore, close correlations among meteorological elements can lead to colinearity and subsequent bias in the result. Recently, studies have begun to introduce remotely sensed data into models instead of local climate factors (Becker-Reshef et al., 2010); however, the poor data quality and short-time series both restrict the applicability of acquired data (e.g., normalized difference vegetation index) (Mkhabela et al., 2011).

In our previous work, we focused on identifying antecedent large-scale signals from climate system and exploring their viability as predictors in crop yield forecasting model (Zhou et al., 2013; Zhou and Wang, 2014). ENSO, as a large-scale air–sea interaction phenomenon, is a crucial determinant of global interannual climate variability. Given the significant impact of ENSO on climate in China, we firstly detected the correlation between crop yield and ENSO index. The results indicated that the summer and autumn ENSO index in the same year could affect yield negatively. But our original intention was to predict crop yield at lead time of one season at least before harvest, so the rest of this article would mainly investigate the contribution of precedent signals in the extratropical zone to crop yield.

The North Atlantic Oscillation (NAO), a large-scale seesaw in atmospheric mass between the subtropical high (centered on the Azores) and the polar low (centered on the Iceland), plays an important role in the climate fluctuations of North Atlantic and surrounding regions, even Asia (Sun et al., 2008, 2009; Sun and Wang, 2012). Our research found that the springtime (March–May) North Atlantic Oscillation index could potentially be applied in predictions of crop yield in Northeast China (Zhou et al., 2013). The possible physical explanation is as follows. Spring North Atlantic Oscillation can induce sea surface temperature anomalies in North Atlantic, which display a tripole pattern and could persist to summer (Fig. 1). The summer SST anomalies in North Atlantic lead to atmospheric circulation anomalies over the Siberian–Mongolian region via a Rossby wave pattern, ultimately affecting the summer climate in Northeast China. Summer (June–August) is an important period for crop growth, in which maize experiences its jointing stage, and rice experiences its tillering stage, ahead of the grain filling stage. The abnormal climate could significantly impact the physiological process of crops and eventually lead to the change of production. For more details, refer to our previous article (Zhou et al., 2013).

Figure 1 Geographical distribution of correlation coefficients between ΔSNAOI and spring sea surface temperature. Regions with correlation exceeding the 95% (90%) confidence level are shaded. The rectangular areas refer to the key ocean of the tripole pattern. SST data were taken from the NOAA monthly extended reconstructed version 3 (ERSST3) dataset.

Sea ice is another critical component of climate system. The recent accelerated decline in Arctic sea ice cover has captured much attention (Stroeve et al., 2005; Li and Wang, 2012, 2013a, b, 2014). We previously found that late-winter Bering Sea ice area index also could be employed to predict crop yields in Northeast China (Zhou and Wang, 2014). The physical mechanism is described in the following. The increase (decrease) of the late-winter (February–March) Bering Sea ice cover could induce a positive (negative)-phase NPO pattern and then lead to a positive (negative) SST anomaly over the Kuroshio region by affecting oceanic currents (Fig. 2). Corresponding to the SST anomalies, Northeast China experiences abnormal climate in summer and eventually would affect crop yield in this region. Likewise, more details about the physical linkage between sea ice and yield in Northeast China can be found in our published paper (Zhou and Wang, 2014).

Figure 2 Correlation maps of spring sea surface temperature with reference to changes in spring North Pacific Oscillation (ΔNPO) for 1969–2008. Regions with correlation exceeding the 99% (95%) confidence level between SST and NPO are shaded. The rectangular area is the selected Kuroshio region. SST data used were the same as in Fig. 1.

In the above context, new prediction models for maize and rice yields in Northeast China would be built in this paper based on the spring NAO index (SNAO) and the late-winter Bering Sea ice area index (BSI). The objective of this study is to provide some insights to improve the predictability of agricultural production at lead time of several months by using the observed antecedent large-scale climate indices. Therefore, the advance warning of crop yield fluctuation may present opportunities for modifying practices to reduce losses or take advantage of favorable meteorological conditions.

2 Data and method

The annual provincial-scale data of crop area and production used in this study were obtained from China Agricultural Yearbooks (National Bureau of Statistics, 2012). The NAO index data were obtained from the website . Spring NAO index (SNAO) refers to the average value for March, April, and May. The Met Office Hadley Centre’s sea ice and sea temperature dataset (HadISST1) were also used (Rayner et al., 2003). The Bering Sea ice cover index (BSI) was defined as the accumulation of sea ice concentration within the region of 55°–65°N, 160°E–160°W, with due consideration given to the variation of the grid area with latitude.

In this paper, we used the year-to-year increment approach to remove the temporal trends of variables. The year-to-year increment for a variable refers to the absolute difference between the value in the current year and that in the preceding year and is expressed as Δ. In recent years, the method has been widely adopted to predict climate variability (Fan and Wang, 2009; Fan et al., 2009; Fan, 2010; Fan and Tian, 2013). When applied to crop yield data, it could minimize the influence of long-term factors of change (e.g., the development of science and technology). Besides, the signal of the variables can be amplified by year-to-year increment method (Wang et al., 2010) since the climate and corresponding atmospheric circulation in East Asia have evident quasi-biennial oscillation (Chang and Li, 2000; Li et al., 2001).

To assert the superiority of the year-to-year increment approach intuitively, we computed the coefficients of correlation between SNAO/BSI and crop yields in Northeast China, using year-to-year increment data and original data, respectively (Table 1). The results showed that correlation between SNAO and yield was significant only when year-to-year increments were adopted and insignificant for original data. But for BSI, the correlation coefficients were both significant whether data were increment form or original form. As we have known, crop yields increased remarkably with the development of technology over time, while the sea ice cover in Arctic declined sharply due to global warming. And quite possibly, there was a spurious relationship between original time series of crop yield and BSI through time variable. It meant that it was necessary to transform the original data before using them in regression analysis. The year-to-year increment approach is an effective detrending method as mentioned above. Hereafter, all dependent and independent variables below were considered in their year-to-year increment form unless otherwise noted.

Table 1 Correlation between crop yield and SNAO and BSI based on year-to-year increments and raw data, respectively, for 1969–2008
ΔSNAO (Original) ΔBSI (Original)
Maize –0.3698* (–0.0674) 0.4179* (–0.3569*)
Rice –0.3419* (–0.0348) 0.3122* (–0.4028*)
* Statistically significant at the 95% confidence level.

Three quantitative statistics, normalized root-mean-square error (NRMSE), the index of agreement (D), and Nash-Sutcliffe efficiency (NSE) were used in model evaluation. NRMSE is frequently used as a measure of the difference between estimated and measured historical values. The range of values for D is 0 to 1, and the closer to 1 the value is, the finer the accuracy of the simulation (Willmott, 1982). NSE is a normalized statistic that expresses the proportion of the initial variance accounted for by the model (Moriasi et al., 2007). NSE ranges between –∞ and 1.0 and values between 0 and 1.0 are generally viewed as accepted levels of performance.

$\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\text{NRMSE} = \frac{{\sqrt {\sum\limits_{t = 1}^N {(Y_t - y_t)} {}^2/N} }}{{\bar y}},$ (1)
$D \!=\! 1 \!-\!\! \left[ {\sum\limits_{t = 1}^N {{{\left( {Y{}_t \!-\! y_t} \right)}^2}} /\sum\limits_{t = 1}^N {\left( {{{\left| {Y{}_t \!-\! \bar y} \right|}^2} \!+\! {{\left| {{y_t} \!-\! \bar y} \right|}^2}} \right)} } \right],$ (2)
$\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\text{NSE} = 1 - \frac{{\sum\limits_{t = 1}^n {{{(y_t - Y_t)}^2}} }}{{\sum\limits_{t = 1}^n {{{(y_t - \bar y)}^2}} }},$ (3)

where Yt is the predicted crop yield in the year t, yt is the observed yield, $\bar y$ is the average of the observed yields during 1969–2008, and N is the number of cases (40 yr in this study).

To better evaluate the performance of the models, two methods of statistical test were applied in this study. The first is the cross-validation test, which is conducted as follows. Suppose we have N samples for each variable. We remove data in the tth year (1 ≤tN), build a new regression model based on the retained years (N–1 yr), and then obtain the predicted data in the tth year by using the new equation. The process is repeated N times, and ultimately, a new forecast for each year in the training period can be generated (Barnett and Preisendorfer, 1987; Liu, 2014). The second approach is the independent samples test (Fan, 2010). In this procedure, we use the 1969–2000 dataset to build a model to forecast the crop yield in 2001, and employ the 1969–2001 dataset to predict the yield in 2002. Continuing with this same process, 10 predicted values extending from 2001 to 2010 are then obtained.

Given that the selected climate factors could have different degrees of influence on yield, specifying weights for the different variables is necessary, and the key is how best to allocate the weight. Many algorithms exist that can determine the weight coefficient, such as linear programming (Zhang et al., 2002), rough set theory (Ann et al., 2000), and the entropy method (Zou et al., 2006). In our research, the weights of the predictors for combined forecasting models are set up by applying the following simple procedure. First, we establish a linear regression model based on SNAO and BSI, separately. Second, we use two evaluation measures (NRMSE and D) to assess the ability of the univariate linear models, and rank the models in order of performance. The sum of the ranks for each model under the two individual standards is taken as Si . The reciprocal of normalized Si is then regarded as a model reliability indicator (Ri ). That is,

$R_i= \frac{{\sum\limits_1^N {{S_i}} }}{{{S_i}}}.$ (4)

The higher the Ri value, the better the quality of the model. Finally, the weight of each statistical model (Wi) is defined as

${W_i} = \frac{{{R_i}}}{{\sum\limits_1^N {{R_i}} }},$ (5)

and the sum of the weights is

$\sum\limits_1^N {{W_i}} = 1,$ (6)

where i refers to the ith model and N is the number of models (2 in this study).

3 Results 3.1 Multiple linear regression model of maize

Multiple regression analysis is one of most widely used statistical procedures for prediction research. Yet, as a practical matter, problem may arise because of the potential collinearity effects among correlated predictor variables (Mason and Perreault Jr., 1991). Here, the correlation coefficient between SNAO and BSI is –0.0775, which is an indication of how the collinearity of empirical models can be weakened.

The physical statistical prediction model for the year-to-year increment of maize yield based upon SNAO and BSI is written as

$\Delta y_{m\!,\,t} = - 0.2724 \times \Delta {\rm SNAO} + 0.0133 \times \Delta {\rm BSI}.$ (7)

The modeled original maize yield can then be calculated as

$Y{}_{m\!,\,t} = y_{m\!,\,t - 1} + \Delta y_{m\!,\,t},$ (8)

where $\Delta y_{m\!,\,t}$ is the simulated increment yield in year t relative to year t – 1, $y_{m\!,\,t - 1}$ is the observational yield in year t – 1, and $Y_{m\!,\,t}$ represents the predicted yield in year t.

The analysis of variance table with degree of freedom, sum of squares (SS), mean squares (MS), and F-test for our maize model is shown in Table 2. The simultaneous F test of two predictors certifies that the multivariate regression equation here is significant as F = 7.438 > F0.99 = 5.2290 (0.99 refers to the confidence level). Also, we can conclude that the regression coefficient of SNAO and BSI both arrive significant level as Table 2 shows (p = 0.0121 and p = 0.0078, respectively).

Table 2 Analysis of variance for multiple linear regression of maize yield
Source Degree of freedom Sum of square (SS) Mean squares (MS) F statistic P value
Model 2 6.2563 3.1282 7.4380 0.0019
NAO 1 2.9309 2.9309 6.9700 0.0121
BSI 1 3.3291 3.3291 7.9170 0.0078
Error 37 15.5568 0.4205

The correlation coefficient between the modeled $\Delta y_{m\!,\,t}$ and observed $\Delta y_{m\!,\,t}$ is 0.5377, exceeding the 99% confidence level. We then obtain the time series of the predicted original yield by adding the previous maize yield to the modeled increment. Figure 3 indicates that the predicted maize yield corresponds well to the actual data, yielding points very close to the 1:1 line, and could explain approximately 81% of the variance of the observation. Further, the NRMSE is 13.9%, D = 0.95, and NSE = 0.79, all of which suggest strong capability of the model.

Figure 3 Quantile–quantile plots for maize yield between observation and prediction, 1969–2008. Diagonal line indicates 1:1 correspondence. The solid dots refer to the years when the prediction is relatively poor.

What needs to pay special attention is that the autocorrelation of crop yield cannot be ignored when we predict the original crop yield using Eq. (8). To illustrate the efficiency of our model here is superior to the first-order autoregression model, Hotelling’s T test is applied (Steiger, 1980). Hotelling’s statistic is given by Eq. (9):

$t = \frac{{(r_{12} - r_{13})\sqrt {(n - 3)(1 + r_{23})} }}{{\sqrt {2(1 - r_{12}^2 - r_{13}^2 - r_{23}^2 + 2r_{12}r_{13}r_{23})} }},$ (9)

where r12 is the Pearson’s correlation between the predicted maize yield obtained from our model and observation, r13 is the Pearson’s correlation between the predicted maize yield obtained from first-order autoregression model and observation, r23 is the Pearson’s correlation between the two predicted values, and n is the sample size. The value of t = 2.173 is greater than the 95th percentile value of 2.026, so the null hypothesis is rejected at the 5% level. That is, our model could improve the level of prediction significantly compared with the first-order autoregression model.

Figure 4 Time series of disaster-affected area ratio of drought and flood in Northeast China, 1968–2008. Data used here were obtained from China Agricultural Yearbooks (National Bureau of Statistics, 2012).

One year is defined as a disaster year when the increment of crop yield is less than 10% of the average yield of the past 40 years. Seven years (1972, 1985, 1989, 1997, 1999, 2000, and 2007) are selected. Our model could capture the losses well in these years except in 1997, which is another advantages of our model in comparison with the first-order autoregression model. Corresponding to the disaster years selected, the SNAO or BSI also experienced dramatic changes in these years except in 1997 and 1999. In turn, to some extent, it addresses the reason of failing prediction in 1997. Northeast China is located in a transitional zone from temperate to frigid. Crops in this region are mainly affected by drought, flood, hail, and low temperature (Ma et al., 2012). We plotted the time series of disaster-affected area ratio (refer to the ratio of disaster-affected area to total planting area) of drought and flood (Fig. 4). As expected, there were floods in 1985 while severe drought happened in the other six years. We further analyze the impact of SNAO on precipitation in Northeast China. Figure 5 indicates that Northeast China was prone to receive much less rainfall during the years of positive SNAO. Meanwhile, the abnormal negative sea ice cover in the Bering Strait is also well connected with decreased precipitation (Zhou and Wang, 2014). Especially, for 1989, there was continuous drought from spring to summer. In 1972, not only drought but also low temperature was contributed to the yield loss.

Figure 5 Patterns of correlation between SNAO and precipitation in summer. Dark (light) shadings indicate values significantly exceeding the 95% (90%) confidence level. The precipitation data were obtained from the China Meteorological Administration.

Applying the forecast model, we hindcast the maize yields of 2009 and 2010. From Table 3, we can see that the hindcast values coincide well with the observed data. In 2009, there was a great yield loss which was in association with the higher SNAO and lower BSI relative to 2008. Correspondingly, Northeast China experienced severe drought and persistent low temperature in the second half of this year (Shen et al., 2011). In Heilongjiang, for example, low temperature lasted for 50 days from 3 June to 23 July, which made rice suffer severe sterile-type cooling damage. The low temperature could block anther dehiscence, lead to inadequate pollination and increase the number of kernel abortion, with subsequent adverse impacts on yield potential.

Table 3 Comparison of the hindcast values of the increment of maize yield, original maize yield, and the observed data in 2009 and 2010
Maize 2009 2010
Increment (t ha–1) Predicted –0.4550 0.9684
Observed –0.8086 0.5076
Yield (t ha–1) Predicted 5.6183 6.2232
Observed 5.2548 5.7624

Next, we employ cross-validation test to further verify the prediction model for maize yield during 1969–2008. The predicted $\Delta y_{m\!,\,t}$ correlates significantly with observed $\Delta y_{m\!,\,t}$ at the 99% confidence level (p = 0.0024), with a correlation coefficient of 0.4687. When transformed to the original yield data, the correlation coefficient is 0.8913, the NRMSE is 14.82%, and D = 0.94.

The independent sample test extending from 2001 to 2010 also supports a good prediction skill of the multivariate linear model. The correlation coefficient between predicted $\Delta y_{m\!,\,t}$ and historical $\Delta y_{m\!,\,t}$ is 0.7504, significant at the 95% level (p = 0.0124). Also, the predicted original maize yield derived from the independent samples test fits well with the observed yield, with the correlation coefficient of 0.7425 and NRMSE of 6.87%.

In spite of the above positive results, we must admit that the model shows relatively poor predictive skill for the late 1990s especially in 1989, 1995, 1996, 1997, and 1998 as shown in Fig. 3. Crop production is not only affected by natural environment, but also affected by economy, society, and some other factors (Ma and Chu, 2007), which are not considered in this paper. During the 1990s, China was in transition from a planned economy to a socialist market economy and food policy experienced a series of revolutions. Moreover, the influence of these unnatural factors is not always synchronous, but rather is delayed sometimes. Social factors such as economic reform and government intervention brought great uncertainty to crop production and then lead to a great random fluctuation. Therefore, the results of the forecasting model based on climate variables show large biases compared with the observed data.

3.2 Multiple linear regression model of rice

The predicted year-to-year increment of rice yield, calculated by multivariate linear model, is significantly correlated to the observed data in Northeast China during the training period, with a correlation coefficient of 0.4461. However, based on the result of the independent sample test, the model is not satisfactory, almost completely lacking any forecasting capacity in the first 5 years of the 21st century.

3.3 Combined forecasting model of rice

We attempt to establish a new combined forecasting model by giving greater weight to the predictor, which seems to contain more useful information to overcome the deficiency of the multivariate linear model discussed above.

The simple linear models for year-to-year increment of rice yield based upon SNAO or BSI can be expressed by

$\Delta y_{r\!,\,s\!,\,t} = - 0.2897 \times \Delta {\rm SNAO},$ (10)
$\!\!\!\!\!\!\!\!\!\!\!\!\!\Delta y_{r\!,\,b\!,\,t} = 0.0111 \times \Delta {\rm BSI}.$ (11)

The F test certifies that the above two models provide appropriate fits to the increment rice yield with p = 0.0293 and p = 0.0459, respectively.

Using the method described in Section 2, the weights of the SNAO and BSI indices are 2/3 and 1/3, respectively. The final combination forecasting model is given by Eqs. (12) and (13), where $\Delta y_{r\!,\,t}$ is the simulated increment yield in year t relative to year t – 1, $y_{r\!,\,t - 1}$ is the observed yield in year t – 1, and $Y_{r\!,\,t}$ refers to the predicted original rice yield in year t:

$\Delta y_{r\!,\,t} = 2/3 \times \Delta y_{r\!,\,s\!,\,t} + 1/3 \times \Delta y_{r\!,\,b\!,\,t},$ (12)
$\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!Y_{r\!,\,t} = y_{r\!,\,t - 1} + \Delta y_{r\!,\,t}.$ (13)

The NRMSE is 13% and D = 0.92, both of which manifest the strong forecast capacity of our combined model for rice (Fig. 6).

Figure 6 As in Fig. 3, but for rice yield in place of maize yield.

For rice yield, there were great losses in 1972, 1976, 1985, and 1989. The combined model established here could predict the severe reduction in the above years when SNAO and BSI exhibited extreme changes as mentioned before in Section 3.1.

Table 4 Comparison of the hindcast values of the increment of rice yield, original rice yield, and the observed data in 2009 and 2010
Rice 2009 2010
Increment (t ha–1) Predicted –0.3311 0.4420
Observed –0.1749 0.1226
Yield (t ha–1) Predicted 6.6876 7.2858
Observed 6.8437 6.9964
Figure 7 Time series of (a) the increment of rice yield and (b) the original rice yield for the independent samples test of the combined forecasting model during the validation period 2001–2010.

We use the combined forecasting model to hindcast the rice yields of 2009 and 2010 and obtain an NRMSE of 4.59% (Table 4). The result of the cross-validation test for the new model taking into account the forecast predictor weight shows that the simulated rice yield fits well with the observed value (correlation coefficient of 0.83, NRMSE of 13.4%, D = 0.91). In particular, when validated by the independent sample test, the combined forecasting model shows an enormous advantage with NRMSE of 6.86% compared with the multiple linear regression model (Fig. 7). There is still some departure in the first five years of the 21st century when the unexpected volatility was likely caused by the implementation of the preferential agricultural policies, but the model reproduces well the interannual fluctuations in the later years of the 2000s. It is clear that the combined forecasting model can enhance the capability to predict rice yields in Northeast China.

4 Discussion and conclusions

Northeast China is an important region of the country in terms of agricultural production and trade of grain. Therefore, accurate yield prediction ahead of the harvest is valuable in order to provide useful information to policymakers, to guarantee food security, and stabilize the market provisions.

The new models established in this paper can reasonably predict (NRMSE < 15%) the maize and rice yields in Northeast China. Indeed, when we employ the Markov Chain Monte Carlo (MCMC) technique (Andrieu et al., 2003; Iizumi et al., 2013) as an alternative method to the least-squares method to solve the statistical models, there is little difference between the regression coefficients, which further indicates the stability and reliability of the models. But one caveat should be made about the statistical crop model. They cannot be extrapolated to predict yield impacts for future scenarios when any independent variable (SNAO or BSI) is beyond the bound of historical year, without assumptions about the linearly of crop responses outside of the historical range. That is, statistical models are inherently limited to the range of conditions over which they are trained.

In recent years, various empirical models have been employed to predict grain yields in Northeast China. Liu et al. (1998) proposed that a logical linear model could be used if the climatic potential productivity is known or can be calculated. Yao et al. (2009) built a combined forecasting model comprising a gray prediction model, gray Markova model, and logical model, and reported reduced mean relative error with respect to any individual model. However, these models analyze food crops as a whole system rather than consider individual crop species. Also, several process-based crop models have been shown to estimate well the development of crop growth in Northeast China. For example, Mi et al. (2012) showed that the WOFOST (World Food Studies) model can simulate reasonably the maize yield in Northeast China. However, in most cases, crop simulation models are used mainly for projecting the potential impact of future climate change on crop yield, not for prediction. Moreover, these models operate principally at field scales, although large-scale crop models have been developed recently (Osborne et al., 2007; Iizumi et al., 2009; Masutomi et al., 2009).

On account of the inconsistencies between time series and spatial scales, it is difficult to make horizontal comparisons among the diverse range of prediction models. However, we believe strongly that the models used in the present study possess some key advantages. The models in this paper could predict crop yield in Northeast China skillfully with lead time of up to a season and therefore provide an effective early-warning for farmers and the relevant departments. Also, the values of the predictors can be obtained easily (see Section 2). Besides, in our previous work, we addressed the physical mechanism responsible for the large-scale climate predictors (SNAO and BSI) and their influence on the summer climate in Northeast China; but it is possible that the NAO and sea ice could also affect the wintertime (Li and Wang, 2013a) or springtime climate of the region, which occur prior to the growing season and have been proven to be important for crop yields (Lobell et al., 2007). That is to say, the two selected variables could serve as composite indicators to provide even better predictions of crop yield. In addition, the year-to-year increment approach can avoid the influence of artificial factors when selecting the function type for detrending, and also capture well the trend of the predicted variable by accumulating the predicted increment. Although the forecasting models presented in this article is still relatively simple, it may provide a new probe for predicting crop yield variations in advance at the regional level by use of antecedent large-scale climate indices.

Certainly, the climate in Northeast China is affected not only by climate modes originated in middle and high latitudes, but also by tropical system. As mentioned in the introduction, pre-winter and spring ENSO index could not significantly affect crop yield. However, He and Wang (2013) indicated that the PDO (Pacific decadal oscillation) could modulate the impact of ENSO on the East Asian winter monsoon after the 1950s. Next, we will further attempt to discuss whether the capacity of ENSO to predict crop yield could be also modulated by the phase of PDO. Besides, other variations such as Siberian high also have strong influence on climate. We have checked a poor Pearson’s relationship between Siberian high and crop yield in Northeast China. However, considering complicate interaction between different climate signals and crop yield, we will try to establish some more sophisticated models such as support vector machine by including more variables and take into full account of nonlinear response of crop yield.

Acknowledgments . We thank the editors of the journal and the anonymous reviewers for their helpful comments.

References
Andrieu C., de Freitas N., Doucet A., et al., 2003: An introduction to MCMC for machine learning. Machine Learning, 50, 5–43. DOI:10.1023/A:1020281327116
Ann B. S., Cho S. S., Kim C. Y., 2000: The integrated methodology of rough set theory and artificial neural network for business failure prediction. Expert Systems with Applications, 18, 65–74. DOI:10.1016/S0957-4174(99)00053-6
Barnett T. P., Preisendorfer R., 1987: Origins and levels of monthly and seasonal forecast skill for United States surface air temperatures determined by canonical correlation analysis. Mon. Wea. Rev., 115, 1825–1850. DOI:10.1175/1520-0493(1987)115<1825:OALOMA>2.0.CO;2
Becker-Reshef I., Vermote E., Lindeman M., et al., 2010: A generalized regression-based model for forecasting winter wheat yields in Kansas and Ukraine using MODIS data. Remote Sensing of Environment, 114, 1312–1323. DOI:10.1016/j.rse.2010.01.010
Chang C.-P., Li T., 2000: A theory for the tropical tropospheric biennial oscillation. J. Atmos. Sci., 57, 2209–2224. DOI:10.1175/1520-0469(2000)057<2209:ATFTTT>2.0.CO;2
Chen C. Q., Qian C. R., Deng A. X., et al., 2012: Progressive and active adaptations of cropping system to climate change in Northeast China. European Journal of Agronomy, 38, 94–103. DOI:10.1016/j.eja.2011.07.003
Cheng Y. Q., Zhang P. Y., 2005: Regional patterns changes of Chinese grain production and response of commodity grain base in Northeast China. Scientia Geographica Sinica, 25, 513–520.
Fan K., 2010: A prediction model for Atlantic named storm frequency using a year-by-year increment approach. Wea. Forecasting, 25, 1842–1851. DOI:10.1175/2010WAF2222406.1
Fan K., Wang H. J., 2009: A new approach to forecasting typhoon frequency over the western North Pacific. Wea. Forecasting, 24, 974–986. DOI:10.1175/2009WAF2222194.1
Fan K., Tian B. Q., 2013: Prediction of wintertime heavy snow activity in Northeast China. Chinese Sci. Bull., 58, 1420–1426. DOI:10.1007/s11434-012-5502-7
Fan K., Lin M. J., Gao Y. Z., 2009: Forecasting the summer rainfall in North China using the year-to-year increment approach. Sci. China (Ser. D), 52, 532–539. DOI:10.1007/s11430-009-0040-0
Gouveia C., Trigo R. M., DaCamara C. C., et al., 2008: The North Atlantic Oscillation and European vegetation dynamics. Int. J. Climatol., 28, 1835–1847. DOI:10.1002/joc.v28:14
Hansen J. W., Indeje M., 2004: Linking dynamic seasonal climate forecasts with crop simulation for maize yield prediction in semi-arid Kenya. Agricultural and Forest Meteorology, 125, 143–157. DOI:10.1016/j.agrformet.2004.02.006
He S. P., Wang H. J., 2013: Oscillating relationship between the East Asian winter monsoon and ENSO. J. Climate, 26, 9819–9838. DOI:10.1175/JCLI-D-13-00174.1
Hsu C. C., Chen C. Y., 2003: Regional load forecasting in Taiwan—applications of artificial neural networks. Energy Conversion and Management, 44, 1941–1949. DOI:10.1016/S0196-8904(02)00225-X
Iizumi T., Yokozawa M., Nishimori M., 2009: Parameter estimation and uncertainty analysis of a large-scale crop model for paddy rice: Application of a Bayesian approach. Agricultural and Forest Meteorology, 149, 333–348. DOI:10.1016/j.agrformet.2008.08.015
Iizumi T., Sakuma H., Yokozawa M., et al., 2013: Prediction of seasonal climate-induced variations in global food production. Nature Climate Change, 3, 903–908.
IPCC, 2013:Climate Change 2013:The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Stocker, T. F., D. Qin, G.-K., Platter, et al., Eds., Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, 866 pp.
Li C. Y., Sun S. Q., Mu M. Q., 2001: Origin of the TBO-interaction between anomalous East-Asian winter monsoon and ENSO cycle. Adv. Atmos. Sci., 18, 554–566. DOI:10.1007/s00376-001-0044-y
Li F., Wang H. J., 2012: Autumn sea ice cover, winter Northern Hemisphere annular mode, and winter precipitation in Eurasia. J. Climate, 26, 3968–3981.
Li F., Wang H. J., 2013a: Relationship between Bering Sea ice cover and East Asian winter monsoon year-to-year variations. Adv. Atmos. Sci., 30, 48–56. DOI:10.1007/s00376-012-2071-2
Li F., Wang H. J., 2013b: Spring surface cooling trend along the East Asian coast after the late 1990s. Chinese Sci. Bull., 58, 3847–3851. DOI:10.1007/s11434-013-5853-8
Li F., Wang H. J., 2014: Autumn Eurasian snow depth, autumn Arctic sea ice cover and East Asian winter monsoon. Int. J. Climatol., 34, 3616–3625. DOI:10.1002/joc.3936
Li S. F., Lian Y., Chen S. B., et al., 2012: Distribution of extreme cool events over Northeast China in early summer and the related dynamical processes. Scientia Geographica Sinica, 32, 752–758.
Lieth H., 1973: Primary production: Terrestrial ecosystems. Human Ecology, 1, 303–332. DOI:10.1007/BF01536729
Liu Y. Q., 2014: A regression model for smoke plume rise of prescribed fires using meteorological conditions. Journal of Applied Meteorology and Climatology, 53, 1961–1975. DOI:10.1175/JAMC-D-13-0114.1
Liu Z. J., Yang X. G., Hubbard K. G., et al., 2012: Maize potential yields and yield gaps in the changing climate of Northeast China. Global Change Biology, 18, 3441–3454. DOI:10.1111/gcb.2012.18.issue-11
Lobell, D., 2010: Crop responses to climate: Time-series models.Climate Change and Food Security. Lobell, D., and M. Burke, Eds., Springer, Netherlands, 85–98.
Lobell D. B., Burke M. B., 2010: On the use of statistical models to predict crop yield responses to climate change. Agricultural and Forest Meteorology, 150, 1443–1452. DOI:10.1016/j.agrformet.2010.07.008
Lobell D. B., Field C. B., Cahill K. N., et al., 2006: Impacts of future climate change on California perennial crop yields: Model projections with climate and crop uncertainties. Agricultural and Forest Meteorology, 141, 208–218. DOI:10.1016/j.agrformet.2006.10.006
Lobell D. B., Cahill K. N., Field C. B., 2007: Historical effects of temperature and precipitation on California crop yields. Climate Change, 81, 187–203. DOI:10.1007/s10584-006-9141-3
Lobell D. B., Schlenker W., Costa-Roberts J., 2011: Climate trends and global crop production since 1980. Science, 333, 616–620. DOI:10.1126/science.1204531
Ma H. B., Chu Q. Q., 2007: Study on fluctuation and influence factors of grain production in China. Journal of Anhui Agricultural Sciences, 35, 8735–8737.
Ma J. Y., Xu Y. L., Pan J., 2012: Analysis of agro-meteorological disasters tendency variation and the impacts on grain yield over Northeast China. Chinese J. Agrometeor., 33, 283–288.
Mason C. H., Perreault Jr W. D., 1991: Collinearity, power, and interpretation of multiple regression analysis. Journal of Marketing Research, 28, 268–280. DOI:10.2307/3172863
Masutomi Y., Takahashi K., Harasawa H., et al., 2009: Impact assessment of climate change on rice production in Asia in comprehensive consideration of process/parameter uncertainty in general circulation models. Agriculture, Ecosystems & Environment, 131, 281–291.
Mi N., Zhang Y. S., Cai F., et al., 2012: Modelling the impacts of future climate change on maize productivity in Northeast China. Journal of Arid Land Resources and Environment, 26, 117–123.
Mitchell R. A. C., Lawlor D. W., Mitchell V. J., et al., 1995: Effects of elevated CO2 concentration and increased tempera-ture on winter wheat: Test of ARCWHEAT1 simulation mo-del . Plant, Cell and Environment, 18, 736–748. DOI:10.1111/pce.1995.18.issue-7
Mkhabela M. S., Bullock P., Raj S., et al., 2011: Crop yield forecasting on the Canadian Prairies using MODIS NDVI data. Agricultural and Forest Meteorology, 151, 385–393. DOI:10.1016/j.agrformet.2010.11.012
Moriasi D. N., Arnold J. G., Van Liew M. W., et al., 2007: Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of the ASABE, 50, 885–900. DOI:10.13031/2013.23153
National Bureau of Statistics, 2012:China Rural Statistical Yearbook. China Statistics Press, 133–153.
Osborne T. M., Lawrence D. M., Challinor A. J., et al., 2007: Development and assessment of a coupled crop-climate model. Global Change Biology, 13, 169–183. DOI:10.1111/j.1365-2486.2006.01274.x
Parry M., Rosenzweig C., Livermore M., 2005: Climate change, global food supply and risk of hunger. Philosophical Transactions of the Royal Society B: Biological Sciences, 360, 2125–2138. DOI:10.1098/rstb.2005.1751
Rayner N. A., Parker D. E., Horton E. B., et al., 2003: Global analyses of sea surface temperature, sea ice, and night marine air temperature since the late nineteenth century. J. Geophys. Res., 108(D14). DOI:10.1029/2002JD002670
Rosenberg N. J., 2010: Climate change, agriculture, water resources: What do we tell those that need to know. Climate Change, 100, 113–117. DOI:10.1007/s10584-010-9823-8
Schlenker W., Roberts M. J., 2009: Nonlinear temperature effects indicate severe damages to U.S. crop yields under climate change. Proceedings of the National Academy of Sciences of the United States of America, 106, 15594–15598. DOI:10.1073/pnas.0906865106
Schlenker W., Lobell D. B., 2010: Robust negative impacts of climate change on African agriculture. Environmental Research Letters, 5, 014010. DOI:10.1088/1748-9326/5/1/014010
Shen B. Z., Liu S., Lian Y., et al., 2011: An investigation into 2009 summer low temperature in Northeast China and its association with prophase changes of the air–sea system. Acta Meteor. Sinica, 29, 320–333.
Steiger J. H., 1980: Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87, 245–251. DOI:10.1037/0033-2909.87.2.245
Stroeve J. C., Serreze M. C., Fetterer F., et al., 2005: Tracking the Arctic’s shrinking ice cover: Another extreme September minimum in 2004. Geophys. Res. Lett., 32, L04501.
Sun J. Q., Wang H. J., 2012: Changes of the connection between the summer North Atlantic Oscillation and the East Asian summer rainfall. J. Geophys. Res., 117(D8), D08110.
Sun J. Q., Wang H. J., Yuan W., 2008: Decadal variations of the relationship between the summer North Atlantic Oscillation and middle East Asian air temperature. J. Geophys. Res., 113(D15), D15107. DOI:10.1029/2007JD009626
Sun J. Q., Wang H. J., Yuan W., 2009: Role of the tropical Atlantic sea surface temperature in the decadal change of the summer North Atlantic Oscillation. J. Geophys. Res., 114(D20), D20110. DOI:10.1029/2009JD012395
Sun Q., Zhang S., Zhang J., et al., 2010: Current situation of rice production in Northeast China and countermeasures. North Rice, 40, 72–74.
Wang H. J., Zhang Y., Lang X. M., 2010: On the predictand of short-term climate prediction. Climatic Environ. Res., 15, 225–228.
Willmott C. J., 1982: Some comments on the evaluation of model performance. Bull. Amer. Meteor. Soc., 63, 1309–1313. DOI:10.1175/1520-0477(1982)063<1309:SCOTEO>2.0.CO;2
Xu W. X., Chen J. F., 1991: Regression analysis for yield of dryland spring wheat. Journal of " Ba Yi” Agricultural College, 14, 50–55.
Yang S. Y., Sun F. H., Ma J. Z., 2008: Evolvement of precipitation extremes in Northeast China on the background of climate warming. Scientia Geographica Sinica, 28, 224–228.
Yang X., Liu Z., Chen F., 2010: The possible effects of global warming on cropping systems in China.Ⅰ: The possible effects of climate warming on northern limits of cropping systems and crop yields in China. Scientia Agricultura Sinica, 43, 329–336.
Yao F. M., Xu Y. L., Lin E. D., et al., 2007: Assessing the impacts of climate change on rice yields in the main rice areas of China. Climate Change, 80, 395–409. DOI:10.1007/s10584-006-9122-6
Yao Z. F., Liu X. T., Yang F., et al., 2009: The combinatorial predicting model of the grain yields in the northeast of China. Acta Agriculturae Boreali-Sinica, 24, 215–219.
You L. Z., Rosegrant M. W., Wood S., et al., 2009: Impact of growing season temperature on wheat productivity in China. Agricultural and Forest Meteorology, 149, 1009–1014. DOI:10.1016/j.agrformet.2008.12.004
Zhang H. Y., Liu G. L., He Y., 2002: Combined forecasting model and its application in grain yield forecasting. Journal of Agricultural Mechanization Research, 166-167, 173.
Zhang J. P., Zhao Y. X., Wang C. Y., et al., 2008: Simulation of maize production under climate change scenario in Northeast China. Chinese Journal of Eco-Agriculture, 16, 1448–1452.
Zhang Y. C., Zhang L. Q., 2005: Precipitation and temperature probability characteristics in climatic and ecological transition zone of Northeast China in recent 50 years. Scientia Geographica Sinica, 25, 561–566.
Zhao J. F., Yang X. G., Liu Z. J., 2009: Influence of climate warming on serious low temperature and cold damage and cultivation pattern of spring maize in Northeast China. Acta Ecologica Sinica, 29, 6544–6551.
Zhou M. Z., Wang H. J., 2014: Late winter sea ice in the Bering Sea: Predictor for maize and rice production in Northeast China. Journal of Applied Meteorology and Climatology, 53, 1183–1192. DOI:10.1175/JAMC-D-13-0242.1
Zhou M. Z., Wang H. J., Yang S., et al., 2013: Influence of springtime North Atlantic Oscillation on crops yields in Northeast China. Climate Dyn., 41, 3317–3324. DOI:10.1007/s00382-012-1597-4
Zou Z. H., Yun Y., Sun J. N., 2006: Entropy method for determination of weight of evaluating indicators in fuzzy synthetic evaluation for water quality assessment. Journal of Environmental Sciences, 18, 1020–1023. DOI:10.1016/S1001-0742(06)60032-6