J. Meteor. Res.  2018, Vol. 32 Issue (3): 410-420   PDF    
http://dx.doi.org/10.1007/s13351-018-7107-9
The Chinese Meteorological Society
0

Article Information

GUO, Lianyi, Zhihong JIANG, and Weilin CHEN, 2018.
Using a Hidden Markov Model to Analyze the Flood-Season Rainfall Pattern and Its Temporal Variation over East China. 2018.
J. Meteor. Res., 32(3): 410-420
http://dx.doi.org/10.1007/s13351-018-7107-9

Article History

Received July 6, 2017
in final form March 6, 2018
Using a Hidden Markov Model to Analyze the Flood-Season Rainfall Pattern and Its Temporal Variation over East China
Lianyi GUO1, Zhihong JIANG2, Weilin CHEN2     
1. Key Laboratory of Meteorological Disaster of Ministry of Education, Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information Science & Technology, Nanjing 210044;
2. Joint International Research Laboratory of Climate and Environment Change, Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information Science & Technology, Nanjing 210044
ABSTRACT: The homogeneous hidden Markov model (HMM), a statistical pattern recognition method, is introduced in this paper. Based on the HMM, a 53-yr record of daily precipitation during the flood season (April–September) at 389 stations in East China during 1961–2013 is classified into six patterns: the South China (SC) pattern, the southern Yangtze River (SY) pattern, the Yangtze–Huai River (YH) pattern, the North China (NC) pattern, the overall wetter (OW) pattern, and the overall drier (OD) pattern. Features of the transition probability matrix of the first four patterns reveal that 1) the NC pattern is the most persistent, followed by the YH, and the SY is the least one; and 2) there exists a SY–SC–SY–YH–NC propagation process for the rain belt over East China during the flood season. The intraseasonal variability in the occurrence frequency of each pattern determines its start and end time. Furthermore, analysis of interdecadal variability in the occurrence frequency of each pattern in recent six decades has identified three obvious interdecadal variations for the SC, YH, and NC patterns in the mid–late 1970s, the early 1990s, and the late 1990s. After 2000, the patterns concentrated in the southern region play a dominant role, and thus there maintains a " flooding in the south and drought in the north” rainfall distribution in eastern China. In summary, the HMM provides a unique approach for us to obtain both spatial distribution and temporal variation features of flood-season rainfall.
Key words: hidden Markov model (HMM)     rainfall patterns     flood season     rain belt propagation     interdecadal variability    
1 Introduction

The monsoon rainfall over East China during the flood season shows strong interdecadal variability (Wang et al., 2002; Gao et al., 2014; He and Liu, 2016; Ren et al., 2017). In particular, rainfall patterns play a vital role in analysis of the precipitation variability, and are of great concern to forecasters and researchers. Many statistical techniques, such as empirical orthogonal function analysis (EOF), principal component analysis (PCA), and various cluster analyses, have been applied to describe precipitation patterns in different places of the world (e.g., Barnston and Livezey, 1987; Kulkarni et al., 1992; Wang et al., 1998; Golian et al., 2010; Venkatanagendra and Maligelaussenaiah, 2017). In particular, in as early as the 1980s, Liao et al. (1981) classified the precipitation in East China into three rainfall patterns, which remain widely recognized. Svensson (1999) used EOF to study the characteristic of dominant rainfall patterns in the upper reaches of the Huai river basin, China; subsequently, Kim et al. (2002) reported that the k-means (KM) clustering method appear more reliable than EOF especially for the synoptic climatology/composite techniques; however, Dikbas et al. (2012) showed that the performance of fuzzy c-means (FCM) clustering method was better than the KM method for identification of homogeneous precipitation regions. Additionally, it is noted that the KM and Ward’s clustering methods can lead to simi-lar classifications of the observations (Ramos, 2001; Soltani and Modarres, 2006). The above statistical techniques are also applied to explore the spatial and tempo-ral variability of rainfall patterns (Chen et al, 1995; Wang et al., 1998; Xu et al., 2000, 2005), which provide important references for the current climate prediction over East China during the flood season.

The rainfall patterns over East China during the flood season have a close relationship with the persistence and propagation of the rain belt. Because the movement of rain belt is greatly affected by the advancement of summer monsoon, some studies used the indicators representing the monsoon’s propagation to similarly characterize the rain belt’s movement (Lau and Yang, 1997; Fasullo and Webster, 2003). The rainfall amount and its variation are also directly defined as the indicator of the rain belt’s movement. In the late 1950s, Tao et al. (1958) employed the percentage of total precipitation in summer for each half month to analyze the movement of the rain belt over East China during the Meiyu rainy season. Chen et al. (2000) defined a relative coefficient of precipitation, which is the ratio of the N-day running-mean daily precipitation to the annual average precipitation, to describe the summer monsoon rainfall migration. However, because of regional differences in precipitation, these methods cannot establish a uniform standard to determine the propagation process of the rain belt. To solve this problem, Jiang et al. (2006) standardized the average daily precipitation in summer (May–August) using a five-day running mean method to obtain a spatially homogenized precipitation index. According to changes in the high index value area, the movement of the rain belt was determined. It should be note that most previous methods have generally used the precipitation and its variations to analyze the position and movement of the rain belt; more unified objective indicators or approaches to quantitatively depicting the flood-season rainfall characteristics are lacking. In fact, the persistence and propagation of the rain belt are closely related to the transition probabilities between different rainfall spatial structures.

The hidden Markov model (HMM) describes essentially a double stochastic process consisting of hidden states and an observed state sequence. The HMM can be used to analyze the distribution of a local meteorological factor and the transition rule between its different distributions using the Markov chain principle (Robertson et al., 2004; Greene et al., 2008, 2011; Chen et al., 2013; Tan et al., 2013; Mares et al., 2014). The HMM has been used to study rainfall patterns or rain belts. For example, Robertson et al. (2006) used the HMM to study the summer daily precipitation at 11 stations in North Queensland, Australia, during October–April, from 1958 to 1998, and found that the rainfall occurrence probability and intensity distribution well captured the main location and characteristics of the rain belt. Greene et al. (2008) employed the HMM to investigate the daily rainfall at 13 stations in central and western India from 1901 to 1970. They found that the method is able to objectively analyze the characteristics of the rain belt propagation. In addition, Yoo et al. (2010) examined the intraseasonal and interannual variability of the Asian summer monsoon using the HMM. However, the HMM has scarcely been applied in the studies of the East Asian monsoon precipitation, especially the temporal variability of related rainfall patterns.

The objective of this study is to examine whether the HMM could be used to investigate, in a more objective and comprehensive way, the local spatial structure and propagation characteristics of the rain belt over East China during the flood season. The remainder of this paper is organized as follows. Section 2 describes the dataset used in this study and the methodology. In Section 3, we use the HMM to categorize the daily precipitation in East China during the flood season (April–September) from 1961 to 2013 into several rainfall patterns and explore the local features of each pattern. In Section 4, the quantitative characteristics of the rain belt movement are derived based on a transition probability matrix of the HMM, and the interdecadal variability of the rain belt movement in East China is analyzed based on the annual occurrence frequencies of each rainfall pattern obtained by the HMM. Finally, conclusions and discussion are given in Section 5.

2 Data and method 2.1 Data

The daily rainfall data at 389 stations in East China (south of 50°N, east of 110°E,) for the period 1961–2013 are obtained from the National Climate Center of China. The length of each annual time series of rainfall data is 183 days from April to September.

2.2 The hidden Markov model (HMM)

The HMM contains a double stochastic process consisting of hidden states that cannot be directly found and an observed state sequence, which respectively describe a statistical relationship between hidden states and observed states, and the transition probabilities from one hidden state to another. The basic principle of the HMM is that time series of the precipitation field across the study area consists of several rainfall patterns or hidden states. The rainfall occurrence frequency at each station is decided by the rainfall pattern each day. It should be noted that if there is precipitation at each station, the exponential distribution function is used to stochastically simulate the rainfall value. In addition, the rainfall pattern on each day is determined by the rainfall pattern on the previous day.

We define St as the rainfall pattern on day t (t = 1, 2...T), where T is the total number of days in this study, i.e., 183 days (April–September). If the number of rainfall patterns is defined as (i = 1, 2...N), S = {q1, q2... qi...qN}, where qi denotes each rainfall pattern. We use $R_{{t}}^{{w}}$ to represent a multivariate vector of observed precipitation on day t at station $w\left( {w = 1,2 \ldots W} \right)$ , where W is the total number of stations in this study, i.e., 389 stations across East China. The variable $P({S_{t}}|{S_{1:{{t}} - 1}})$ denotes the transition probability matrix between different rainfall patterns—that is, the probability of qj at the moment t under the condition of qi at the moment t – 1. The HMM for rainfall data makes two conditional independence assumptions (Hughes and Guttorp, 1994). The first assumption is that the multivariate precipitation observations $R_{{t}}^{{w}}$ at time t are independent of all other variables in the model up to time t, conditional on the hidden state St at time t. The second assumption is that the hidden-state process, ${S_{1:{{T}}}}$ , is first-order Markov, which is homogeneous in time—that is, the N × N transition probability matrix. The conditional independence assumptions are easily visualized as edges in a directed graph of the HMM, as shown in Fig. 1.

$P\left( {{R_{{t}}}|{S_{1:{t}}},{R_{1:{{t}} - 1}}} \right) = P\left( {{R_{t}}|{S_{t}}} \right),$ (1)
$P\left( {{S_{{t}}}|{S_{1:{{t}} - 1}}} \right) = P\left( {{S_{{t}}}|{S_{{{t}} - 1}}} \right).\quad\,$ (2)
Figure 1 Schematic representation of the basic principle of the HMM.

According to the HMM that was introduced by Hughes and Guttorp (1994) and Kirshner (2005), the precipitation probability distribution function (conditional on the rainfall pattern St) $P\left( {{R_{{t}}}|{S_{{t}}}} \right)$ in Eq. (1) establishes a statistical relationship between the multivariate precipitation observations and rainfall patterns. An exponential distribution function was chosen for the rainfall in this study. Woolhiser and Roldán (1982) showed that a two-exponential distribution function was more effective for the simulation of daily precipitation, which is given as

$P(R_{{t}}^{{w}} = r|{S_{{t}}} = {q_i}) = \left\{ \begin{array}{l}{P_{{{iw}}0}}\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\,\qquad r = 0\\\mathop \sum \limits_{{{c}} = 1}^2 {P_{{{iwc}}}}{\lambda _{{{iwc}}}}{{\rm{e}}^{ - {\lambda _{{{iwc}}}}r}}\;\;\;\;\;r > 0\end{array} \right.,$ (3)

where r is the observed precipitation at station w on day t, qi is the rainfall pattern on day t, $w = 1,2 \ldots W$ , $i = 1,2 \ldots N$ , c is the number of exponentials, Piwc refers to the weight, and λiwc denotes the exponential distribution function parameter. It is noted that $P\left( {{R_{{t}}}|{S_{{t}}}} \right)$ is related to station w, day t, and precipitation observation r. The observed value r = 1 when rain is observed on day t at station w, and r = 0 when it is dry.

To solve the transition probability $P\left( {{S_{{t}}}|{S_{{{t}} - 1}}} \right)$ between different rainfall patterns in Eq. (2), five parameters that comprise the precipitation observation $R_{{t}}^{{w}}$ , the rainfall pattern St, initial probabilities of rainfall patterns $P\left( {{S_1}} \right)$ , the transition probability matrix between different patterns $P\left( {{S_{{t}}}{\rm{|}}{S_{{{t}} - 1}}} \right)$ , and the precipitation probability distribution function (PDF) (conditional on St), $P\left( {{R_{{t}}}|{S_{{t}}}} \right)$ are initialized randomly. In addition, four variables comprising the station $w = 1,2 \ldots W$ , the day $t = 1,2 \ldots T$ , the number of rainfall patterns $i = 1,2 \ldots N$ , and the year yr $ = 1,2 \ldots 53$ , need to be taken into account. It is noted that while the precipitation at each station is characterized by a PDF that is both station-specific and pattern-specific, the PDFs for all stations are coupled by a pattern, as per the i subscript in Eq. (3). Thus, the HMM accounts for spatial dependence in the data and there are W × N PDFs altogether to be considered. The above PDFs constitute $P\left( {{R_{{t}}}|{S_{{t}}}} \right)$ of total patterns and stations, which need to be optimized, meaning that Piw0, Piwc and λiwc should also be optimized in Eq. (3).

Using a forward–backward algorithm and the expectation maximization (EM) algorithm, we estimate a set of parameters Θ that maximize the condition probabilities of the observed data as a function of the parameters. It is emphasized that this study uses 10 iterations in the EM algorithm to estimate these parameters. Considering the large amount of data, we have also tried to estimate the parameters through 100 iterations of the EM steps (figure omitted), and the rainfall patterns obtained are highly similar to those obtained through 10 iterations. To summarize, the rainfall patterns of the HMM established through 10 iterations of EM steps are closely consistent with the propagation of the East Asian monsoon. This HMM is therefore reasonable and should be recognized. Additionally, we also believe that more iterations of EM steps could make the HMM more stable.

3 Classification of rainfall patterns based on the HMM 3.1 Number of rainfall patterns

Establishment of the HMM primarily needs to determine the number of hidden states or rainfall patterns N; the number of rainfall patterns is often optimized by the Bayesian information criterion (BIC), but it is not absolute. Greene et al. (2008) pointed out that use of a small number of patterns facilitates diagnosis and model comprehensibility. In that study, a four-pattern model providing a physically meaningful description was chosen, despite 10 patterns having the most suitable modeling based on the BIC. Pineda and Willems (2016) subjectively chose a four-pattern model, guided by the number of physically interpretable patterns. Therefore, we take into account the above opinions and the BIC together in this study. The BIC score with N states is defined as

${\rm{BI}}{{\rm{C}}_{{N}}} = 2L\left( {{\varTheta} _{{N}}^*} \right) - p\log T,$ (4)

where ${{\varTheta} _{{N}}^*}$ is the estimated maximum likelihood parameter vector as found by EM on the training data for a model with N states (see Section 2.2), $L\left( {{\varTheta} _{{N}}^*} \right)$ is the likelihood of the model evaluated at ${{\varTheta}_{{N}}^*}$ , p is the number of parameters in the N-state model, and T is the total number of days of the observed data used to train the model. Normally, the least BIC score corresponds to the optimal model that greatly interprets the training data.

Figure 2 shows the change in the BIC with N. There is no overfitting when as many as 18 states are prescribed. However, we analytically found that models with N > 6 begin to exhibit “pattern splitting.” Meanwhile, when N < 6, the North China or South China or Yangtze–Huaihe River rainfall pattern, which is a typical and non-negligible part of the precipitation during the flood season in East China, is missing (see online supplemental material). Therefore, a six-pattern model is chosen.

Figure 2 The Bayesian information criterion (BIC) for models with different numbers of rainfall patterns. The BIC value calculated by Eq. (4) represents the degree to which the model interprets the data. Each diamond denotes a model of certain patterns established based on the observed rainfall data at 389 stations in East China from April to September during 1961–2013, which corresponds to 3, 4, …, 24 patterns respectively.
3.2 The rainfall patterns identified by the HMM

Rainfall occurrence probabilities and mean intensities for the six rainfall spatial structures obtained by the HMM are shown in Fig. 3. These rainfall spatial structures are concentrated in South China, south of the Yangtze River region, the Yangtze–Huaihe River region, North China, and the whole eastern China, with a substantial northeast–southwest distribution in both occurrence probability and mean intensity. It is noted that, in terms of the spatial features, rainfall probabilities are different from the mean intensities for each pattern (Fig. 3). Highest probabilities (0.7) and maximum intensities (20 mm day–1) are located in South China—that is, the South China rainfall pattern and the intensities of the pattern change sharply between 23° and 28°N and span from 5 to 20 mm day–1 (Fig. 3a). High probabilities (0.8) and intensities (20 mm day–1) are situated in the center of the south of the Yangtze River domain—that is, the southern Yangtze River rainfall pattern (Fig. 3b). Large intensity values lie in the Yangtze and Huaihe river region, while the probability is more westward. Thus, this is called the Yangtze–Huai River pattern (Fig. 3c). When the rainfall occurrence probabilities in most areas are no more than 0.15 with two high-value centers located in Northeast China (0.7, 17.5 mm day–1) and southern South China (0.5, 12.5 mm day–1), this is the North China rainfall pattern (Fig. 3d). The last two rainfall spatial structures are characterized as the “wet” and “dry” patterns, respectively (Figs. 3e, f). The “wet” pattern shows relatively high rainfall probabilities and intensities at all stations, while the “dry” pattern has relatively low probabilities and intensities.

In summary, the HMM with six rainfall spatial structures is found to be optimal, insofar as it captures sufficient detail to represent the essential rainfall features of East China during the flood season. These diagnosed patterns comprise the South China (SC) pattern, the southern Yangtze River (SY) pattern, the Yangtze–Huai River (YH) pattern, the North China (NC) pattern, the overall wetter (OW) pattern, and the overall drier (OD) pattern, which account for 18.1%, 15%, 17.2%, 16.4%, 16.1%, and 17.2% of the total rainfall, respectively. Comparing Fig. 3d1 with Fig. 3d2 indicates that the rainfall occurrence probabilities are not spatially consistent with the rainfall intensities. For example, for the NC pattern, the high-probability center is more northeastward than the high-intensity center. This means that rainfall is more likely to occur in southern Northeast China, but rainfall in this region is only 12.5–15 mm day–1, which is lower than the highest intensity of 17.5 mm day–1 in the region. By the same token, a rainfall probability in the high-intensity center is about 0.35, which is lower than the highest probability of 0.7. Therefore, when the rain belt moves to North China, it is easier for it to rain in its northeast, but the rainfall intensity is weaker.

Figure 3 Rainfall occurrence probabilities (a1, b1, c1, d1, e1, f1) and mean daily rainfall intensities (mm day–1) (a2, b2, c2, d2, e2, f2) for the six rainfall spatial structures, in which SC, SY, YH, NC, OW, and OD denote the South China, southern Yangtze River, Yangtze–Huai River, North China, overall wetter, and overall drier pattern, respectively. The proportion of each pattern is 18.1%, 15%, 17.2%, 16.4%, 16.1%, and 17.2%, respectively.
4 Propagation and interdecadal variability of rainfall patterns 4.1 Transition and intraseasonal variation of rainfall patterns

Through the analysis in the previous section, we have obtained the rainfall spatial structures and their intensity centers. In addition, the HMM can also be used to obtain a transition probability matrix; therefore, the persistence and propagation characteristics of the rain belt over East China during the flood season are further analyzed. The transition probability matrix is shown in Table 1. The largest values in the transition matrix lie along the leading diagonal, and these values represent the self-transitions, which indicate the persistence of the patterns. Excluding the OW and OD patterns, the NC pattern (0.508) is the most persistent, the YH pattern (0.431) is the second most persistent, and the SC pattern (0.427) is the third. This is consistent with the three areas of eastern China, which are featured with nearly stagnant and long-period monsoon rainfall during the northward movement of the summer monsoon. The persistence of the rainfall patterns could be related to the stability of relevant weather systems, such as the subtropical high, mid–high latitude systems, and so on (Zhang et al., 2006; Wan and Wu, 2007; Yu et al., 2013). However, the SY pattern (0.415) has the lowest self-transition probability, so the stability of this pattern is relatively poor. Except the leading diagonal, the other probabilities represent transition probabilities from one pattern to another. Excluding the OW and OD patterns, the SY pattern transitions to the SC pattern with the largest probability (0.261); then, the SC pattern transitions to the SY pattern with the largest probability (0.149); next, the SY pattern transitions to the YH pattern, with the second-largest probability (0.099); and after that, the YH pattern transitions to the NC pattern with the largest probability (0.159).

In the HMM, the transition matrix can be regarded as descriptive, summarizing the temporal dependence of the observations in a quantitative and objective probabilistic form. Specifically, the self-transition probabilities for the SY, SC, YH, and NC patterns objectively determine that the NC pattern is the most persistent climatologically, followed by the YH, and the SY pattern is the least persistent. According to the largest and the second-largest transition probabilities from one pattern to another, the whole propagation process of the rain belt over East China during the flood season can be inferred quantitatively as SY–SC–SY–YH–NC.

Table 1 Transition matrix for the six patterns from the HMM. The rows represent the “from” patterns and the columns represent the “to” patterns, e.g., the probability of a transition from SY to NC is 0.009. Except the OW and OD patterns, dark grey represents the largest self-transition probabilities and light grey represents the second-largest one along the leading diagonal. The asterisked indicates the largest and the second-largest transition probabilities between different rainfall patterns
Transition probability “To” pattern
SC SY YH NC OW OD
“From” pattern SC 0.427 0.149* 0.079 0.075 0.085 0.185
SY 0.261* 0.415 0.099* 0.009 0.132 0.085
YH 0.031 0.129 0.431 0.159* 0.203 0.048
NC 0.066 0.019 0.195 0.508 0.119 0.094
OW 0.196 0.137 0.098 0.115 0.421 0.033
OD 0.093 0.076 0.125 0.122 0.020 0.563

Previous studies have shown that the formation of the rainy season during the flood season in East China is related to stagnation of the rain band. Therefore, we can further analyze the intraseasonal variability of the occurrence frequency for each pattern obtained by the HMM (Fig. 4). In Fig. 4, the dominant or peak period for each pattern is defined by 2.5 times the standard deviation for each sequence having relatively high credibility, with the shading indicating the part where the frequency exceeds 2.5 times the standard deviation.

Figure 4a shows that the occurrence frequency of the SY pattern reaches a peak (0.347–0.349) in the 20th–21st pentads (6–15 April). The 20th–27th pentads (6 April–15 May) denote a main period of dominance (0.239–0.349) matching the spring rainy season for the SY pattern. This rainy season results from the onset of the East Asian subtropical summer monsoon and indicates the arrival of the rainy season in East China (Chen et al, 2000; Zhu et al., 2011). The 28th–34th pentads (16 May–20 June) are the peak period (0.257–0.29) for the occurrence frequency of the SC pattern. Notably, the SC pattern is closely related to the onset of the South China Sea summer monsoon (SCSSM), and a sooner or later arrival of the SCSSM has further impacts on the overall distribution of the rainy zone over East China in the flood season (Ding and Ma, 1996; Li, 2004). As the SCSSM moves, the rain belt advances northward. The 35th–39th pentads (21 June–15 July) are a dominant period (0.218–0.267) for the occurrence frequency for the YH pattern, and that period matches the Meiyu rainy season. Subsequently, the 39th–44th pentads (11 July–10 August) are a peak period (0.3–0.384) for the occurrence frequency of the NC pattern, which is consistent with the rainy season in North China. Moreover, a second sub-peak period (0.219–0.257) in the 45th–51st pentads (11 August–15 September) emerges for the occurrence frequency of the YH pattern. Chen et al. (2000) pointed out that this sub-peak period results from typhoons, tropical circulation systems, and so on. Figure 4b shows that the 34th–40th pentads (16 June–20 July) and 51st–54th pentads (11 September–30 September) emerge as the major peak periods for the occurrence frequency of the OW and OD patterns, respectively. In general, detailed analysis of the time evolution of “most likely” rainfall occurrence probabilities for the SY, SC, YH, and NC patterns can further quantify the climatological start and end times of several rainy seasons, and reveal the temporal characteristics of the advancement of the rain belt over East China during the flood season.

By combining the information in Table 1 and Fig. 4, it is found that the persistence probabilities for the SY, SC, YH, and NC patterns can objectively determine the climatological characteristics of the rain belt stagnation. The transition probabilities and the intraseasonal variabilities of rainfall occurrence probabilities reveal the SY–SC–SY–YH–NC advancement process of the rain belt over East China during the flood season, and the climatological features of the rainy season variations. Notably, the propagation process of the rain belt is in accordance with previous studies. Moreover, the characteristics of the rain belt’s advancement and retreat is controlled by the East Asian summer monsoon's circulation features and thermodynamic properties (Guo and Wang, 1981; Wang and Lin, 2002), and is also the result of the interaction between the East Asian planetary frontal area and the East Asian monsoon (Ding, 1992; Ding and Katsuhito, 1994; Yao and Yu, 2005).

Figure 4 Intraseasonal variability of the occurrence probabilities of the (a) SY (purple), SC (blue), YH (orange), and NC (black) rainfall patterns and the (b) OW (green) and OD (red) patterns. Pentads are shown on the x-axis, and the-y axis represents the rainfall occurrence frequencies for each pattern. The curves produced by the five-spot triple smoothing algorithm denote the intraseasonal variability, and the shading indicates the part where the frequency exceeds 2.5 times the standard deviation for each sequence.
4.2 Interdecadal variability of rainfall patterns

The interannual sequence of the HMM through a low-pass filter is applied to further explore the interdecadal variability of the rainfall pattern in East China during the flood season with relatively high credibility (Fig. 5). Three evident interdecadal changes exist during 1961–2013, which occur in the mid–late 1970s, the early 1990s, and the late 1990s (Gao et al., 2014; Ren et al, 2017). The SC and NC patterns occur with relatively high frequencies and have positive anomalies, while the YH pattern presents a negative anomaly from the 1960s to the mid–late 1970s. Then, until the early 1990s, the SC and NC patterns have a decreasing trend, with the YH pattern occurring with moderate frequency. Zhang et al. (2007) showed that the rainfall in North China decreased (changed to a dry period) in the mid–late 1970s, and rainfall in the Yangtze River basin changed from a low rainfall period to a high one. This is in accordance with the occurrence frequency variation of the NC and YH patterns in this study. These transformations are caused by the Pacific Decadal Oscillation and East Asian summer monsoon (Zhu and Yang, 2003; Kwon et al., 2007), and are also worthy of further study. Between the early and late 1990s, the SC, SY, and YH patterns occur more frequently than the NC pattern, despite the NC pattern having positive anomalies for a short duration. After the late 1990s, the patterns concentrated in the southern region play a dominant role. In addition, the SY pattern has slow growth during 1960–2013. In the most recent decade, in contrast to the NC pattern, there are overall positive anomalies for the SC, SY, and YH patterns. Thus, the rain belt remains in southern China for a long period and could bring greater amounts of precipitation. The result was verified by Lyu et al. (2014).

Analysis of the OW and OD patterns shows that the total rainfall amount is slowly increasing over the entire period, and the main drought and flood years are in agreement with Li and Jiang (2007). In the recent decade, with the frequency of the OW pattern increasing steadily and that of the OD pattern obviously decreasing, the total rainfall amounts in East China during the flood season are increasing.

Figure 5 Interannual and interdecadal variabilities of the six rainfall patterns. Grey bars denote the annual occurrence frequency sequence; black curves denote the decadal sequence with low-pass filtering; and light grey dashed line represents the average for each series.
5 Conclusions

This study used a HMM to analyze the daily rainfall occurrence and intensity at 389 stations across East China during the rainy season from 1961 to 2013. Quantitative characteristics of the rain belt movement and interdecadal variability were derived based on the self-transition and transition probabilities. The occurrence frequencies of each pattern reveal the interdecadal variability characteristics of rainfall in East China. In summary, the HMM provides a unique classification method with transition probabilities for the rainfall patterns. The conclusions are as follows.

(1) The HMM with six rainfall spatial structures is found to be optimal, insofar as it captures sufficient detail to represent the essential features of the rainfall over East China during the flood season. These diagnosed patterns, with explicit rainfall occurrence probabilities and mean daily intensity spatial distributions, are the South China (SC), the southern Yangtze River (SY), the Yangtze–Huaihe River (YH), the North China (NC), the overall wetter (OW), and the overall drier (OD) rainfall patterns.

(2) The transition matrix in the HMM can derive, in a quantitative and objective way, the intraseasonal advancement of the rain belt and the climatological start and end time of each rainfall pattern and corresponding rainy season. Specifically, except the OW and OD patterns, the self-transition probabilities indicate that the NC pattern is the most persistent and the SY pattern is the least persistent. Moreover, according to the largest and second-largest transition probabilities as well as the intraseasonal variability in the occurrence frequency, it is inferred that a SY–SC–SY–YH–NC propagation process exists for the rain belt in East China during the flood season.

(3) On the interdecadal timescale, the HMM determines the decadal distribution mode of rainfall, based on the decadal variation of each rainfall pattern’s occurrence probability. The results derived show that the SC, YH, and NC pattern have three evident interdecadal changes during 1961–2013, i.e., in the mid–late 1970s, the early 1990s, and the late 1990s. This is in accordance with previous studies. In addition, after 2000, the total rainfall amounts in East China during the flood season have been increasing, although there is still a mode of flooding in the south and drought in the north.

In terms of the spatial classification of rainfall, seve-ral methods, such as the rotated empirical orthogonal function method, the three-group stepwise discriminant analysis method, and the Ward and KM clustering me-thod, have been applied in previous studies (Liao et al., 1981; Chen et al., 1995; Wang et al., 1998; Xu et al., 2000, 2005). In comparison, the daily rainfall spatial structures obtained from the HMM not only exhibit features of the rainfall intensity field, which are the same as from the above methods, but also reveal the rainfall occurrence probability. These two fields together display more comprehensive characteristics of each rainfall pattern. In addition, in terms of propagation of the rain belt, the self-transition and transition probabilities reveal the characteristics of the persistence and propagation of the rain belt, and supply richer information about the rain-belt movement. Meanwhile, the HMM is also applicable to other monsoonal regions (Greene et al., 2008; Robertson et al., 2006).

In this paper, the HMM, a climate diagnostic tool that is able to produce both the classification of rainfall patterns and their transition matrix, is well applied in the East Asian monsoonal region. The physical meaning of the six rainfall patterns obtained statistically from the HMM has not been fully examined in this paper due to word limit, which undoubtedly warrants a further study, together with other factors such as the circulation background of the rainfall patterns and their decadal variability in relation to ENSO.

Acknowledgments. We greatly thank the editor and anonymous reviewers for their constructive comments.

References
Barnston, A. G., and R. E. Livezey, 1987: Classification, seasonality and persistence of low-frequency atmospheric circulation patterns. Mon. Wea. Rev., 115, 1083–1126. DOI:10.1175/1520-0493(1987)115<1083:CSAPOL>2.0.CO;2
Chen, C., A. M. Greene, A. W. Robertson, et al., 2013: Scenario development for estimating potential climate change impacts on crop production in the North China Plain. Int. J. Climatol., 33, 3124–3140. DOI:10.1002/joc.3648
Chen, L. X., W. Li, P. Zhao, et al., 2000: On the process of summer monsoon onset over East Asia. Climatic Environ. Res., 5, 345–355. DOI:10.3969/j.issn.1006-9585.2000.04.002
Chen, Y. S., N. Shi, and H. B. Liu, 1995: A study of the diagnostic and prognostic method for distributive patterns of summer rainfall in eastern China. Quart. J. Appl. Meteor., 6, 327–332.
Dikbas, F., M. Firat, A. C. Koc, et al., 2012: Classification of precipitation series using fuzzy cluster method. Int. J. Climatol., 32, 1596–1603. DOI:10.1002/joc.2350
Ding, Y. H., 1992: Summer monsoon rainfalls in China. J. Meteor. Soc. Japan, 70, 373–396. DOI:10.2151/jmsj1965.70.1B_373
Ding, Y. H., and M. Katsuhito, 1994: Monsoons in East Asia. China Meteorological Press, Beijing, 74–92.
Ding, Y. H., and H. N. Ma, 1996: The Present Status and Future Research of the East Asian Monsoon. China Meteorological Press, 1–14.
Fasullo, J., and P. J. Webster, 2003: A hydrological definition of Indian monsoon onset and withdrawal. J. Climate, 16, 3200–3211. DOI:10.1175/1520-0442(2003)016<3200a:AHDOIM>2.0.CO;2
Gao, H., W. Jiang, and W. J. Li, 2014: Changed relationships between the East Asian summer monsoon circulations and the summer rainfall in eastern China. J. Meteor. Res., 28, 1075–1084. DOI:10.1007/s13351-014-4327-5
Golian, S., B. Saghafian, S. Sheshangosht, et al., 2010: Comparison of classification and clustering methods in spatial rainfall pattern recognition at Northern Iran. Theor. Appl. Climatol., 102, 319–329. DOI:10.1007/s00704-010-0267-x
Greene, A. M., A. W. Robertson, and S. Kirshner, 2008: Analysis of Indian monsoon daily rainfall on subseasonal to multidecadal timescales using a hidden Markov model. Quart. J. Roy. Meteor. Soc., 134, 875–887. DOI:10.1002/qj.254
Greene, A. M., A. W. Robertson, P. Smyth, et al., 2011: Downscaling projections of Indian monsoon rainfall using a non-homogeneous hidden Markov model. Quart. J. Roy. Meteor. Soc., 137, 347–359. DOI:10.1002/qj.788
Guo, Q. Y., and J. Q. Wang, 1981: Interannual variations of rain spell during predominant summer monsoon over China for recent 30 years. Acta Geogr. Sinica., 36, 187–195. DOI:10.11821/xb198102007
He, J. H., and B. Q. Liu, 2016: The East Asian subtropical summer monsoon: Recent progress. J. Meteor. Res., 30, 135–155. DOI:10.1007/s13351-016-5222-z
Hughes, J. P., and P. Guttorp, 1994: A class of stochastic models for relating synoptic atmospheric patterns to regional hydrologic phenomena. Water Resour. Res., 30, 1535–1546. DOI:10.1029/93WR02983
Jiang, Z. H., J. H. He, J. P. Li, et al., 2006: Northerly advancement characteristics of the East Asian summer monsoon with its interdecadal variations. Acta Geogr. Sinica., 61, 675–686. DOI:10.3321/j.issn:0375-5444.2006.07.001
Kim, B. J., R. H. Kripalani, J. H. Oh, et al., 2002: Summer monsoon rainfall patterns over South Korea and associated circulation features. Theor. Appl. Climatol., 72, 65–74. DOI:10.1007/s007040200013
Kirshner, S., 2005: Modeling of multivariate time series using hidden Markov models. Ph. D. dissertation, University of California, Long Beach, CA, USA, 202 pp.
Kulkarni, A, R. H. Kripalani, and S. V. Singh, 1992: Classification of summer monsoon rainfall patterns over India. Int. J. Climatol., 12, 269–280. DOI:10.1002/joc.3370120304
Kwon, M. H., J. G. Jhun, and K. J. Ha, 2007: Decadal change in East Asian summer monsoon circulation in the mid-1990s. Geophys. Res. Lett., 34, L21706. DOI:10.1029/2007GL031977
Lau, K. M., and S. Yang, 1997: Climatology and interannual variability of the Southeast Asian summer monsoon. Adv. Atmos. Sci., 14, 141–162. DOI:10.1007/s00376-997-0016-y
Li, A. H., and Z. H. Jiang, 2007: Interannual and interdecadal changes of summer rain band propagation over eastern China. J. Nanjing Inst. Meteor., 30, 186–193. DOI:10.3969/j.issn.1674-7097.2007.02.006
Li, C. Y., 2004: New research progress of the intraseasonal oscillation. Pro. Nat. Sci., 14, 734–741.
Liao, Q. S., G. Y. Chen, and G. Z. Chen, 1981: Westerlies Circulation in Northern Hemisphere and Summer Precipitation in China. China Meteorological Press, Beijing, 103–114.
Lyu, J. M., C. W. Zhu, J. H. Ju, et al., 2014: Interdecadal variability in summer precipitation over East China during the past 100 years and its possible causes. Chinese J. Atmos. Sci., 38, 782–794. DOI:10.3878/j.issn.1006-9895.1401.13227
Mares, C., I. Mares, H. Huebener, et al., 2014: A hidden Markov model applied to the daily spring precipitation over the Danube basin. Adv. Meteor., doi: 10.1155/2014/237247.
Pineda, L. E., and P. Willems, 2016: Multisite downscaling of seasonal predictions to daily rainfall characteristics over Pacific-Andean river basins in Ecuador and Peru using a nonhomogeneous hidden Markov model. J. Hydrometeor., 17, 481–198. DOI:10.1175/JHM-D-15-0040.1
Ramos, M. C., 2001: Divisive and hierarchical clustering techniques to analyze variability of rainfall distribution patterns in a Mediterranean region. Atmos. Res., 57, 123–138. DOI:10.1016/S0169-8095(01)00065-5
Ren, Y. J., L. C. Song, Z. Y. Wang, et al., 2017: A possible abrupt change in summer precipitation over eastern China around 2009. J. Meteor. Res., 31, 397–408. DOI:10.1007/s13351-016-6021-2
Robertson, A. W., S. Kirshner, and P. Smyth, 2004: Downscaling of daily rainfall occurrence over northeast Brazil using a hidden Markov model. J. Climate, 17, 4407–4424. DOI:10.1175/JCLI-3216.1
Robertson, A. W., S. Kirshner, P. Smyth, et al., 2006: Subseasonal-to-interdecadal variability of the Australian monsoon over North Queensland. Quart. J. Roy. Meteor. Soc., 132, 519–542. DOI:10.1256/qj.05.75
Soltani, S., and R. Modarres, 2006: Classification of spatiotemporal pattern of rainfall in Iran using a hierarchical and divisive cluster analysis. J. Spat. Hydrol., 6, 1–12.
Svensson, C., 1999: Empirical orthogonal function analysis of daily rainfall in the upper reaches of the Huai river basin, China. Theor. Appl. Climatol., 62, 147–161. DOI:10.1007/s007040050080
Tan, W. L., F. Yusof, and Z. Yusop, 2013: Non-homogeneous hidden Markov model for daily rainfall amount in peninsular Malaysia. Jurnal Teknologi, 63, 75–80. DOI:10.11113/jt.v63.1916
Tao, S. Y., Y. J. Zhao, and X. M. Chen, 1958: Meiyu Rainfall in China. Meteorological Proceedings of Central Weather Bureau, Beijing, No. 4, 36 pp.
Venkatanagendra, K., and D. Maligelaussenaiah, 2017: Classification of rainfall data using linear Kernel based support vector machines. Int. J. Appl. Eng. Res., 12, 9717–9722. Available at www.ripublication.com. Accessed on 20 May 2018.
Wan, R. J., and G. X. Wu, 2007: Mechanism of the spring persistent rains over southeastern China. Sci. China Ser. D Earth Sci., 50, 130–144. DOI:10.1007/s11430-007-2069-2
Wang, B., and H. Lin, 2002: Rainy season of the Asian–Pacific summer monsoon. J. Climate, 15, 386–398. DOI:10.1175/1520-0442(2002)015<0386:RSOTAP>2.0.CO;2
Wang, S. W., J. L. Ye, D. Y. Gong, et al., 1998: Study on the patterns of summer rainfall in eastern China. Quart. J. Appl. Meteor., 9 (suppl), 65–74.
Wang, S. W., J. N. Cai, J. H. Zhu, et al., 2002: The interdecadal variations of annual precipitation in China during the 1880s–1990s. Acta Meteor. Sinica, 60, 637–639. DOI:10.11676/qxxb2002.076
Woolhiser, D. A., and J. Roldán, 1982: Stochastic daily precipitation models: 2. A comparison of distributions of amounts. Water Resour. Res., 18, 1461–1468. DOI:10.1029/WR018i005p01461
Xu, L., Z. G. Zhao, Y. G. Wang, et al., 2000: A study of summer rainfall patterns in eastern China. Scientia Meteor. Sinica, 20, 270–276. DOI:10.3969/j.issn.1009-0827.2000.03.005
Xu, L., Z. G. Zhao, L. H. Sun, et al., 2005: Distinguishing wetter/drier rainfall patterns in China and analysis of associated characteristics of general circulation. J. Appl. Meteor. Sci., 16, 77–84. DOI:10.3969/j.issn.1001-7313.2005.z1.010
Yao, X. P., and Y. B. Yu, 2005: Activity of dry cold air and its impacts on Meiyu rain during the 2003 Meiyu period. Chinese J. Atmos. Sci., 29, 973–985. DOI:10.3878/j.issn.1006-9895.2005.06.13
Yoo, J. H., A. W. Robertson, and I. S. Kang, 2010: Analysis of intraseasonal and interannual variability of the Asian summer monsoon using a hidden Markov model. J. Climate, 23, 5498–5516. DOI:10.1175/2010JCLI3473.1
Yu, Y. X., S. G. Wang, Z. A. Qian, et al., 2013: Climatic linkages between SHWP position and EASM rainy belts and areas in East China in the summer half year. Plateau Meteor., 32, 1510–1525. DOI:10.7522/j.issn.1000-0534.2013.00033
Zhang, Q. Y., J. M. Lyu, L. M. Yang, et al., 2007: The interdecadal variation of precipitation pattern over China during summer and its relationship with the atmospheric internal dynamic processes and extra-forcing factors. Chinese J. Atmos. Sci., 31, 1290–1300. DOI:10.3878/j.issn.1006-9895.2007.06.23
Zhang, R., Z. J. Dong, J. Z. Min, et al., 2006: A mechanism analyses of East Asia monsoon and its rainfall influencing West Pacific subtropical high activity. J. Basic Sci. Eng., 14, 332–336.
Zhu, C. W., X. J. Zhou, P. Zhao, et al., 2011: Onset of East Asian subtropical summer monsoon and rainy season in China. Sci. China Earth Sci., 54, 1845–1853. DOI:10.1007/s11430-011-4284-0
Zhu, Y. M., and X. Q. Yang, 2003: Relationships between Pacific decadal oscillation (PDO) and climate variabilities in China. Acta Meteor. Sinica, 61, 641–654. DOI:10.3321/j.issn:0577-6619.2003.06.001