In the last decades, various well-known financial models based on the time-homogeneous diffusion processes have arisen. In this work, we are concerned about how to estimate the drift and diffusion coefficients of the processes. We generally consider the time-homogeneous Markov process {Xt} given by the stochastic differential equations
$ {\rm{d}}{X_t} = \mu \left( {{X_t}} \right){\rm{d}}t + \sigma \left( {{X_t}} \right){\rm{d}}{B_t}, $ | (1) |
where {Bt, t>0} is a standard one-dimensional Brownian motion, and μ(·) and σ(·) are the drift and diffusion coefficients, respectively, dependent on {Xt}. There have been a great amount of studies on parametric methods to estimate μ(·) and σ(·)[1, 2, 3]. One of the major limitations of these methods is that the specific parametric forms of μ(·) and σ(·) have to be assumed first. In order to model more complex drift and diffusion functions and reduce the bias of the estimators, nonparametric regression techniques have been introduced in this area. Florens-Zmirou[4] first proposed nonparametric estimation with discrete sampling observations for the diffusion term, while the nonparametric kernel estimators of the drift and diffusion functions of a stationary process based on a set of discrete sampling observations were developed by Jiang and Knight[5].
Stanton[6] constructed the first-, second-, and third-order approximation formulas for μ(·) and σ(·), in which the unknown conditional expectations estimated by Nadaraya-Watson kernel regression were given, and claimed the superiority of higher-order approximations. By deriving the higher-order approximations and computing the asymptotic variances, Fan and Zhang[7] concluded that higher-order approximations could reduce the numerical approximation errors of asymptotic biases, but the asymptotic variances would escalate nearly exponentially with the order of approximation. To overcome this difficulty, they adopted the local polynomial techniques to estimate the drift and diffusion functions and applied such techniques to some well-known short-term interest rate models. It turned out that this approach might produce the negative results for the diffusion functions when the higher-order approximations were used. To ensure the nonnegativity of the diffusion functions, Ziegelmann[8] and Yu and Jones[9] proposed positive nonparametric conditional variance estimators and Xu[10] proposed reweighted functional estimation of the diffusion coefficient based on reweighting the conventional Nadaraya-Watson estimator. In addition, Yu et al.[11] developed the log P-splines and local log-linear method to semiparametrically estimate a class of time-inhomogeneous diffusion processes. Due to the heavy reliance on the logarithmic transformation of the diffusion function, which brings additional bias term, the method is not asymptotically equivalent to the local linear estimator.
In this work, we make use of B-spline basis functions to estimate the drift and diffusion functions based on higher-order approximations, and compare it with the local polynomial techniques. Our simulation results show that the B-spline method indeed outperforms the local polynomial method.
1 The estimators of the drift and diffusion functionsWe introduce the estimators of the drift and diffusion functions based on higher-order approximations[7]. Given a time-homogeneous diffusion process {Xt} satisfying (0.1) and an arbitrary n+1 order continuous differentiable function f(·), the conditional expected increment can be expressed as
$ \begin{array}{*{20}{c}} {E\left[{\left. {f\left( {{X_{t + \mathit{\Delta } }}} \right)-f\left( {{X_t}} \right)} \right|{X_t}} \right] = {\cal L}f\left( {{X_t}} \right)\mathit{\Delta } + }\\ {\frac{1}{2}{{\cal L}^2}f\left( {{X_t}} \right){\mathit{\Delta } ^2} + \cdots + \frac{1}{{n!}}{{\cal L}^n}f\left( {{X_t}} \right){\mathit{\Delta } ^n} + O\left( {{\mathit{\Delta } ^{n + 1}}} \right), } \end{array} $ | (2) |
where the infinitesimal generator
$ {\cal L}f\left( x \right) = f'\left( x \right)\mu \left( x \right) + \frac{1}{2}f''\left( x \right){\sigma ^2}\left( x \right). $ | (3) |
Setting f(x)=x-Xt and (x-Xt)2 in (3), we have
Furthermore, we rewrite (2) at the different time steps jΔ, j=1, …, k, and get a kth-order approximation scheme
$ \frac{1}{\mathit{\Delta } }\sum\limits_{j = 1}^k {{a_{k, j}}E\left[{f\left( {{X_{t + j\mathit{\Delta } }}} \right)-f\left( {{X_t}} \right)\left| {{X_t}} \right.} \right]} \approx {\cal L}f\left( {{X_t}} \right) $ | (4) |
with the approximation error (-1)
$ \begin{array}{*{20}{c}} {\mu \left( {{X_t}} \right) = \frac{1}{\mathit{\Delta } }\sum\limits_{j = 1}^k {{a_{k, j}}E\left( {{X_{t + j\mathit{\Delta } }} - {X_t}\left| {{X_t}} \right.} \right)} + O\left( {{\mathit{\Delta } ^k}} \right), }\\ {{\sigma ^2}\left( {{X_t}} \right) = \frac{1}{\mathit{\Delta } }\sum\limits_{j = 1}^k {{a_{k, j}}E\left( {{{\left( {{X_{t + j\mathit{\Delta } }} - {X_t}} \right)}^2}\left| {{X_t}} \right.} \right)} + }\\ {O\left( {{\mathit{\Delta } ^k}} \right).} \end{array} $ | (5) |
Fan and Zhang[7] proposed the qth-degree (q≥0) local polynomial techniques for estimating the conditional expectations in (5), and computed the asymptotic variances of nonparametric estimators. In this section, we will adopt B-spline techniques to estimate μ(·) and σ2(·).
2.1 Estimation for μ(·) using B-SplineOur goal is to obtain the smooth function μ(x)=E(YiΔ|XiΔ). To this end, we fit the discrete (n-k) pairs of synthetic data (XiΔ, YiΔ), XiΔ∈[α, β], i=1, …, n-k by the model YiΔ=μ(XiΔ)+εi, where
$ {B_{j, 1}}\left( x \right) = \left\{ \begin{array}{l} 1, \;\;\;\;{x_j} < \cdots < {x_{j + 1}}\\ 0, \;\;\;\;{\rm{otherwise}}{\rm{.}} \end{array} \right. $ |
$ \begin{array}{*{20}{c}} {{B_{j, p1}}\left( x \right) = \frac{{x - {x_j}}}{{{x_{j + {p_1} - 1}} - {x_j}}}{B_{j, {p_1} - 1}}\left( x \right) + }\\ {\frac{{{x_{j + {p_1}}} - x}}{{{x_{j + {p_1}}} - {x_{j + 1}}}}{B_{j + 1, {p_1} - 1}}\left( x \right), } \end{array} $ |
for j=1, …, m1 (the convention x/0=0 is used). When p1 is chosen as the order of B-spline basis functions, Bj(x)=Bj, p1(x), j=1, …, m1 for the simplicity.
To avoid the excessively "wiggly" estimated curve
$ {\rm{PENSSE}}\left( {\rm{a}} \right) = {\left( {\mathit{\boldsymbol{Y}} - {\mathit{\boldsymbol{H}}_1}\mathit{\boldsymbol{a}}} \right)^\prime }\left( {\mathit{\boldsymbol{Y}} - {\mathit{\boldsymbol{H}}_1}\mathit{\boldsymbol{a}}} \right) + {\lambda _1}\mathit{\boldsymbol{a'}}{\mathit{\boldsymbol{R}}_1}\mathit{\boldsymbol{a}}, $ | (6) |
where Y=(Y1Δ, …, Y(n-k)Δ)′, R1=
In this work, we choose the best value of λ1 by minimizing GCV(λ1)(generalized cross-validation)
$ {\rm{GCV}}\left( {{\lambda _1}} \right) = \frac{{{\rm{RS}}S\left( {{\lambda _1}} \right)}}{{{{\left\{ {1 - {{\left( {n - k} \right)}^{ - 1}}{\rm{tr}}\left( {{\mathit{\boldsymbol{S}}_{{\lambda _1}}}} \right)} \right\}}^2}}}, $ |
where RSS
$ \hat \mu \left( x \right) = {\mathit{\boldsymbol{B}}_1}\left( x \right)\mathit{\boldsymbol{\hat a}} + {\mathit{\boldsymbol{B}}_1}\left( x \right){\left( {{{\mathit{\boldsymbol{H'}}}_1}{\mathit{\boldsymbol{H}}_1} + {\lambda _1}{\mathit{\boldsymbol{R}}_1}} \right)^{ - 1}}{{\mathit{\boldsymbol{H'}}}_1}\mathit{\boldsymbol{Y}}. $ | (7) |
Similarly, the square of diffusion function σ2(x) can be also expressed as a linear combination of the B-splines Bj(x) in the form σ2(x)=B2(x)c, where c=(c1, …, cm2), B2(x)=(B1(x), …, Bm2(x)), m2=J2+p2, J2 and p2 are the numbers of interval knots and the order of B-spline, respectively. In order to ensure the positivity of σ2(x), we impose the reasonable constraint cj≥0, (j=1, …, m2) due to the property of the coefficients cj of B-spline Bj(x) that there are no more sign changes in σ2(x) than those in the sequence cj.
Under the constraint c≥0, the estimation of c can be achieved by the following optimization problem with a linear inequality constraint
$ \begin{array}{*{20}{c}} {{\rm{minimise}} \to {{\left( {\mathit{\boldsymbol{Z}} - {\mathit{\boldsymbol{H}}_2}\mathit{\boldsymbol{c}}} \right)}^\prime }\left( {\mathit{\boldsymbol{Z}} - {\mathit{\boldsymbol{H}}_2}\mathit{\boldsymbol{c}}} \right) + {\lambda _2}\mathit{\boldsymbol{c'}}{\mathit{\boldsymbol{R}}_2}\mathit{\boldsymbol{c}}}\\ {{\rm{with}}\;\mathit{\boldsymbol{c}} \ge 0, } \end{array} $ | (8) |
where Z=(Z1Δ, …, Z(n-k)Δ)′, ZiΔ=
The smoothing parameter λ2 can be chosen by minimizing GCV(λ2),
$ {\rm{GCV}}\left( {{\lambda _2}} \right) = \frac{{{\rm{RS}}S\left( {{\lambda _2}} \right)}}{{{{\left\{ {1 - {{\left( {n - k} \right)}^{ - 1}}{\rm{tr}}\left( {{\mathit{\boldsymbol{S}}_{{\lambda _2}}}} \right)} \right\}}^2}}}, $ |
where RSS(λ2)=
According to the view of Stanton[6], the approximations to the drift and diffusion functions converge pointwise to μ(·) and σ(·) at a rate Δk, and the first-order approximation performs well when the data are sampled weekly or more frequently. However, as the frequency of data decreases, the approximation error of the first-order approximation increases. In order to reduce asymptotic bias of the approximation error, Fan and Zhang[7] proposed higher-order approximation method. However, the asymptotic variances of such approximations escalate nearly exponentially with the approximations order.
In this work, we balance the asymptotic bias and variance by choosing k=1 for the data sampled weekly or more frequently and opting for the second-order approximation when the sampling frequency decreases or the data over a small range of values.
2.4 Selection of the number of knots and order pRuppert[13] pointed out that the number of the interval knots was not so crucial as the penalty parameter λ. When the chosen number of knot is enough to fit the feature of data, any increase in J will have little effect on the fitting result. Moreover, the computation time will increase as J rises. As a trade-off, Yoshimoto et al.[14] proposed the statistical criterion BIC to select a suitable J. Apart from BIC statistical criterion, one might use a simple default value. In Ref.[13], Matt Wand suggested that min([
To simplify the computation, we set J1=J2=min([
As for the selection of p, we choose p1=p2=4. The reason for such selection is that the simulation results show the order 4 B-splines fit the drift and diffusion functions well enough and the higher order spline functions will increase the computation, which overwhelms the approximation precision.
3 SimulationIn this section, we apply our proposed method to three examples. Then we compare our method with local polynomial method. To assess the performance of the two estimation methods, we use the following two measures[15].
Measure 1 Mean absolute deviation error
$ {\rm{MADE}}\left( {\hat \sigma _t^2} \right) = {m^{ - 1}}\sum\limits_{i = 1}^m {\left| {{Z_{i\mathit{\Delta } }} - \hat \sigma _i^2} \right|}, $ |
Measure 2 Ideal mean absolute deviation error
$ {\rm{IMADE}}\left( {\hat \sigma _t^2} \right) = {m^{ - 1}}\sum\limits_{i = 1}^m {\left| {\hat \sigma _i^2 - \sigma _i^2} \right|}, $ |
where m is the length of the testing sample,
In this example, we consider the well-known CIR SR model
$ {\rm{d}}{X_t} = \left( {\alpha + \beta {X_t}} \right){\rm{d}}t + \sigma X_t^{1/2}{\rm{d}}{B_t}, t \ge {t_0}, $ | (9) |
where {Xt} is a measurable Markov process. Bt is a standard Brownian motion. Our simulation data are generated by using the discrete time order 1.0 strong approximation scheme (see Ref.[16] for details) as follows:
$ \begin{array}{l} {X_{{t_{i + 1}}}} \approx {X_{{t_i}}} + \left[ {\alpha + \beta {X_{{t_i}}} - {4^{ - 1}}{\sigma ^2}} \right]{\mathit\Delta } + \\ \;\;\;\;\;\;{2^{ - 1}}\sigma \left\{ {\left[ {{X_{{t_i}}} + \left( {\alpha + \beta {X_{{t_i}}} - {4^{ - 1}}{\sigma ^2}} \right){\mathit\Delta } + } \right.} \right.\\ \;\;\;\;\;\;\left. {\left. {\sigma \left( {{X_{{t_i}}}} \right)_ + ^{1/2}{\varepsilon _i}\sqrt {\mathit\Delta } } \right]_ + ^{1/2} + \left( {{X_{{t_i}}}} \right)_ + ^{1/2}} \right\}{\varepsilon _i}\sqrt {\mathit\Delta } ,\\ \;\;\;\;\;\;\;\;\;\;\;\;\;{\rm{for}}\;i = 1, \cdots ,n - 1, \end{array} $ | (10) |
where {εi}i=1n-1 are independent and standard normal variables, x+=max(x, 0), the time points t1 < … < tn, are equally spaced, and Δ is the step size.
In order to compare with local polynomial techniques, we adopt the same values of model parameters for simulation: α=0.018 392 5, β=-0.214 59, σ=0.078 30, and Δ=1/52. To begin with, a sample path of length 5 000 is generated by the discretization scheme (10) and setting the initial value X0=0.1, where the first 4 500 observations are in-sample data and the remaining 500 observations out-sample data. Then the estimators of the drift and diffusion functions are calculated respectively through kth-order approximation. Here we choose k=1, as the simulation data are weekly ones. For the purpose of comparison, we simulate 100 sample paths with range interval
Download:
|
|
Solid: true function, dashed-dotted: the median of the estimates, and dashed: the 25th and 75th percentiles of the estimates. Fig. 1 Estimated drift and diffusions among 100 simulations for CIR SR model |
In this example, smoothing parameters λ1 and λ2 are chosen as 1×10-3 and 1×10-4 by minimizing GCV(λ1) and GCV(λ2), respectively. The bandwidth parameter for the local linear estimation is generated from the rule
$ h = C \times {\rm{std}}\left( {{X_\mathit{\Delta } }, \cdots, {X_{\left( {n - k} \right)\mathit{\Delta } }}} \right){n^{ - 1/5}}, $ | (11) |
where C=6. In Table 1 it is shown that the B-spline method possesses certain superior performance because of its lower MADE and IMADE.
3.2 Geometric Brownian motion (GBM) modelWe consider another familiar example of geometric Brownian motion determined by
$ {\rm{d}}{X_t} = \left( {\mu + \frac{1}{2}{\sigma ^2}} \right){X_t}{\rm{d}}t + \sigma {X_t}{\rm{d}}{B_t}, t \ge {t_0}, $ | (12) |
where Bt is a standard one-dimensional Brownian motion. The used parameter values are the same as these in Ref.[7]. μ=0.087, σ=0.178, and the state value X0=1, Δ=1/250, h=2×std(XΔ, …, X(n-k)Δ)n-1/5. Using these values and following the order 1.0 scheme, we generate 1 000 sample paths of length 1 000 observations,
$ \begin{array}{*{20}{c}} {{X_{{t_{i + 1}}}} \approx {X_{{t_i}}} + \left( {\mu + {2^{ - 1}}{\sigma ^2}} \right){X_{{t_i}}}{\mathit\Delta } + \sigma {X_{{t_i}}}{\varepsilon _i}\sqrt {\mathit\Delta } + }\\ {{2^{ - 1}}{\sigma ^2}{X_{{t_i}}}\left( {\varepsilon _i^2 - 1} \right){\mathit\Delta } .} \end{array} $ | (13) |
For each simulation, we set the first 900 observations as in-sample data and the remaining observations as out-sample data. In this example, smoothing parameters λ1=0.01 and λ2=0.005. Other parameters are the same as these in subsection 3.1. The results are summarized in Table 2.
In this part, we generate 600 observations from model (9), and use the first 500 observations as in-sample observations and the remaining as out-sample observations. The simulation data set is presented in Fig. 2(a). Stanton[6] proposed that the first-order approximation had a bad performance over a small range of values or when the sampling frequency decreased. In order to illustrate that the local polynomial techniques are not suitable for the higher-order approximations, we use the second-order approximation to estimate the squared volatility and set h=0.016, λ2=1×10-5. In Fig. 2(b) we display the discrete points of second-order approximation of squared volatility, many of which are negative.The results are shown in Fig. 3 and Table 3. It is seen in Fig. 3 that the fitting results of the squared volatility have negative values at the right boundary when we use the local polynomial method.
Download:
|
|
Fig. 2 Simulation data |
Download:
|
|
Fig. 3 Estimated squared volatility |
This example shows that the restricted B-spline method not only has a lower MADE and IMADE, but also guarantees the nonnegativity of the volatility which is shown in Fig. 3.
4 Application 4.1 Application of the B-spline techniques to HS300 indexTo understand the dynamics of the stock market of China, we apply our proposed techniques to the HS300 index data. The data consist of 2 610 daily observations from 8 April 2005 to 31 December 2015, and are presented in Fig. 4.
Download:
|
|
Fig. 4 Index of HS300 from 2005-04-08 to 2015-12-31 |
Because the data are the daily ones, the first-order approximation is a good choice in the light of the estimation effect and variance inflation. Following the conventional practice in finance research, we first take the logarithmic transformation of the price index and then obtain the range interval
Download:
|
|
Solid—estimator, dashed—95% bootstrap confidence bond. Fig. 5 Estimation of the coefficients of diffusion processes using B-spline |
Regression bootstrap technique is used to calculate the bootstrap confidence intervals (see Ref[17] for details). We denote Sti=Xti+1-Xti for the original HS300 data {Xti, i=1, …, n}, and then generate the bootstrap responses {Sti*} using
We apply the two techniques proposed above to the three-year treasury bond from the secondary market rates on the last trading day of each month and compare the estimation effects of squared volatility between the two methods. The data set contains 753 monthly observations from 30 April 1953 to 31 December 2015, and is presented in Fig. 6.
Download:
|
|
Fig. 6 Three-year treasury bond data set |
Here we use the second-order approximation to estimate the squared volatility of model (1) for the monthly three-year treasury bond data. The estimation of squared volatility with the 95% bootstrap confidence bond based on 1 000 bootstrap samples by the penalized B-spline with nonnegative restriction techniques is displayed in Fig. 7 (the solid curve). In Fig. 7 the dashed-dotted curve denotes the estimation obtained by local polynomial techniques. It is obvious that the squared volatility estimated by local polynomial techniques with h=0.028 9 is negative when the treasury bond rate is equal to 0.16. This result shows that the fitting effect of penalized B-spline with nonnegative restriction techniques is better than that of the local polynomial. Such effect will be very obvious when the higher-order approximation is used. In this study, we choose
Download:
|
|
Fig. 7 Estimated squared volatility with 95% bootstrap confidence bond |
The B-spline method gives a good performance for the drift and volatility estimation. When a higher-order approximation is adopted, the B-spline method is a better choice because it not only has lower MADE and IMADE but also guarantees the nonnegativity of the volatility, while the local polynomial method does not. Our techniques will be extended to time-inhomogeneous diffusion models in the future.
[1] |
Hansen L P, Scheinkman J A. Back to the future:generating moment implications for continuous-time Markov processes[J]. Econometrica, 1995, 63(4):767–804.
DOI:10.2307/2171800 |
[2] |
Pedersen A R. Consistency and asymptotic normality of an approximate maximum likelihood estimator for discretely observed diffusion processes[J]. Bernoulli, 1995, 1(3):257–279.
DOI:10.3150/bj/1193667818 |
[3] |
Aït-Sahalia Yacine. Maximum likelihood estimation of discretely sampled diffusions:a closed-form approximation approach[J]. Econometrica, 2002, 70(1):223–262.
DOI:10.1111/ecta.2002.70.issue-1 |
[4] |
Florens-Zmirou D. On estimating the diffusion coefficient from discrete observations[J]. Journal of Applied Probability, 1993, 30(4):790–804.
DOI:10.1017/S0021900200044570 |
[5] |
Jiang G J, Knight J L. A nonparametric approach to the estimation of diffusion processes:with an application to a short-term interest rate model[J]. Econometric Theory, 1997, 13(5):615–645.
DOI:10.1017/S0266466600006101 |
[6] |
Stanton R. A nonparametric model of term structure dynamics and the market price of interest rate risk[J]. The Journal of Finance, 1997, 52(5):1973–2002.
DOI:10.1111/j.1540-6261.1997.tb02748.x |
[7] |
Fan J, Zhang C. A reexamination of diffusion estimators with applications to financial model validation[J]. Journal of the American Statistical Association, 2003, 98(461):118–134.
DOI:10.1198/016214503388619157 |
[8] |
Ziegelmann F A. Nonparametric estimation of volatility functions:the local exponential estimator[J]. Econometric Theory, 2002, 18(4):985–991.
|
[9] |
Yu K, Jones M C. Likelihood-based local linear estimation of the conditional variance function[J]. Journal of the American Statistical Association, 2004, 99(465):139–144.
DOI:10.1198/016214504000000133 |
[10] |
Xu K L. Reweighted functional estimation of diffusion models[J]. Econometric Theory, 2010, 26(2):541–563.
DOI:10.1017/S0266466609100087 |
[11] |
Yu Y, Yu K, Wang H, et al. Semiparametric estimation for a class of time-inhomogeneous diffusion processes[J]. Statistica Sinica, 2009, 19(2):843–867.
|
[12] |
Ramsay J O, Silverman B W.
Functional data analysis[M]. New York: Springer, 2005.
|
[13] |
Ruppert D. Selecting the number of knots for penalized splines[J]. Journal of computational and graphical statistics, 2002, 11(4):735–757.
DOI:10.1198/106186002853 |
[14] |
Yoshimoto F, Harada T, Yoshimoto Y. Data fitting with a spline using a real-coded genetic algorithm[J]. Computer-Aided Design, 2003, 35(8):751–760.
DOI:10.1016/S0010-4485(03)00006-X |
[15] |
Fan J, Fan Y, Jiang J. Dynamic integration of time-and state-domain methods for volatility estimation[J]. Journal of the American Statistical Association, 2007, 102(478):618–631.
DOI:10.1198/016214507000000176 |
[16] |
Kloeden P E. On effects of discretization on estimators of drift parameters for diffusion processes[J]. Journal of Applied Probability, 1996, 33(4):606–610.
|
[17] |
Franke J, Kreiss J P, Mammen E. Bootstrap of kernel smoothing in nonlinear time series[C]//Bernoulli. Humboldt University of Berlin, Interdisciplinary Research Project 373: Quantification and Simulation of Economic Processes, 1997: 1-37.
http://www.jstor.org/stable/3318641 |