The portfolio is a classical problem of finance, which was fist addressed and solved by Markowitz[1] using mean-variance paradigm. This method is still used today as a standard method. The basis of the mean-variance model is to estimate the mean and covariance of the returns of underlying assets. Once the estimators are inaccurate, the results of the model will be very imprecise. Kan and Zhou[2] showed that the estimators of mean-variance paradigm have some shortcomings in the situation of large-scale mean-variance models. Michaud[3]proved that the error of expected returns by this model is over estimated. To solve these problems, extensive literature focus on improving the accuracy of estimator. For instance, the shrinkage estimation method is proposed by James and Stein[4]. Sharpe[5] and Chan et al.[6]impose a factor structure for the covariance among assets to reduce the number of free parameters of the covariance matrix. Other methods involve reduction of the dimension. Ao et al.[7] took advantage of LASSO to achieve dimension reduction. Chen and Yuan[8] provided a framework of portfolio selection under subspace and adopted factor model to achieve the global Sharpe Ratio. Although the aforementioned methods improved the original method from different aspects, there are still two problems to be solved. First, these methods employ long historical data, it is not compatible with the current rapidly changing market. The second problem is caused by the dramatic increase in the number of modern assets, the amount of assets facing by modern investors is much larger than traditional market. The essence of this problem is feature screening, it is first proposed by Fan and Lyu[9], they investigated the data for which the number of features p is much more than the number of samples n and propose a screening method called sure independence screening (SIS) to quickly reduce the ultrahigh dimension to suitable dimension. Following Fan and Lyu[9], the screening methods have been explored in two directions, Fan and Song[10] studied the specific model, Zhu et al.[11] and Liu et al.[12] did the research of model-free. However, the SIS methods have a premise: independent and identically distributed samples, which is invalid in the financial filed. To tackle these problems, Wang et al.[13] proposed an asset selection method based on high frequency Sharpe ratio and D-SEV, which do not need the assumption of identically distributed samples. They used D-SEV to calculate the correlation between stocks and high frequency Sharpe ratio, the variables here can be seen as a time series. Their method is reasonable because the D-SEV tells the proportion of the variance explained by the selected assets. Yet, it also has two disadvantages, the lack of robustness and the high computation cost. To deal with these issues, we propose to introduce the robust correlation coefficient instead of D-SEV to evaluate the correlation between assets and high frequency Sharpe ratio index. Simulations and empirical analysis show that our method outperforms D-SEV and other traditional method in several different situations.
1 High frequency Sharpe ratio index 1.1 Sharpe ratioSharpe ratio, proposed by Sharpe[5], is a well-accepted measure of assets in financial field, defined as
$ \mathrm{SR}_{\mathrm{p}}=\frac{E\left(R_{\mathrm{p}}\right)-R_{\mathrm{f}}}{\sigma_{\mathrm{p}}}, $ | (1) |
where Rf is the annualized risk-free interest rate and is commonly set as benchmark, σp is the standard deviation of annualized return of portfolio, and E(Rp) is expected annualized rate of return of portfolio.
The Sharpe ratio is a measure of excess return earned over the risk-free rate return for every unit of risk taken. High risk return is the ultimate goal for portfolio construction. It can be seen that high Sharpe ratio leads to high excess return earned for a given amount of risk.
1.2 High frequency Sharpe ratioWith the development of high frequency data, high frequency stock volatility can be calculated as a proxy for the standard deviation. Integrated volatility can be defined as
$ \mathrm{RV}_{i, t}=\sum\limits_{j=1}^M\left(r_{t, j}^i\right)^2 \text {, } $ | (2) |
where M is the number of the time period divided in a day,
Consider stock i=1, 2, …, p, for the day t=1, 2, …, T, high frequency Sharpe ratio of this stock is defined by the daily data of the stock prices and market index yield:
$ \mathrm{SR}_{i, t}=\frac{R_{i, t}-R_{m, t}}{\sqrt{\mathrm{RV}_{i, t}}}, $ | (3) |
Here Ri, t is the yield of the stock i during day t, Rm, t is the market index yield during day t as the benchmark, here we choose market index yield instead of risk-free rate return. Because our portfolio contains stocks only, our benchmark can only contain stocks as well, RVi, t is the integrated volatility as described above.
1.3 Choose assets to construct a return time series of an indexSimilar as original Sharpe ratio, stocks with higher high frequency Sharpe ratio can also be considered as an indication of better performance. Thus, High frequency Sharpe ratio can be used to serve as a bases of portfolio construction. Similar to D-SEV proposed by Wang et al.[13], we first calculate high frequency Sharpe ratio for each stock and rank them in descending order for each day,
$ R_{\mathrm{SR}, t}=\sum\limits_{i=1}^d m_{i, t} R_{d_i, t}, $ | (4) |
Although high frequency Sharpe ratio index has an excellent return, it can not be held directly because the index of the day will only be available after the market closed. Thus we can not directly use high frequency Sharpe on the same day to construct portfolio. The only data available is the high frequency Sharpe index over the past time.
2 Portfolio established by correlation coefficientAs we mentioned in previous section, we can only calculate high frequency Sharpe ratio index over past period. Thus we can not directly apply this index to filter assets and construct portfolio. To overcome this, D-SEV calculates the correlation between each stock and high frequency Sharpe ratio over a pre-specified period and select the stock with higher correlation to construct portfolio.
2.1 Dependent sure explained variability(D-SEV)Wang et al.[13] proposed the D-SEV as the correlation coefficient to choose assets. The dependent sure explained variability of (X, Y) is defined as
$ \mathrm{D}-\operatorname{SEV}(X, Y)=1-\frac{E\left[\{Y-E(Y \mid X)\}^2\right]}{\operatorname{var}(Y)}. $ | (5) |
The use of D-SEV to measure the correlation between stock and index may not be efficient for two reasons. First, D-SEV is not a robust correlation coefficient. Our simulation results show that D-SEV cannot handle abnormal data which is commonly encountered in real data from stock market. For example, there are significant events that cause dramatic fluctuations in stock price, and these fluctuations are usually short-term and will not affect the future trend of stock price, so these stocks removed due to dramatic fluctuations should not be excluded from the portfolio. The second reason is the computation time, since D-SEV relies on the kernel estimation of the density function, the estimation process will take a long time with different kinds of kernel functions. This may cause some difficulties since the number of stocks need to be selected is large. To solve these problems, we propose to use a new correlation coefficient proposed by Sourav Chatterjee instead to measure the relationship between the return of the stock and the high frequency Sharpe ratio index to filter assets.
2.2 A robust correlation coefficientChatterjee[14] propose a new Coefficient of Correlation, which is defined as
$ \zeta_n(X, Y)=1-\frac{n \sum\limits_{i=1}^{n-1}\left|r_{i+1}-r_i\right|}{2 \sum\limits_{i=1}^n l_i\left(n-l_i\right)} . $ | (6) |
Let
The robust correlation coefficient is more suitable for the problem of choosing assets correlated to high frequency Sharpe ratio index for two reasons: First, It is a function of ranks, which guarantees robustness. We have discussed the existence of dramatic fluctuations from significant events which can be considered as abnormal data. Using rank to build the robust correlation coefficient can partially solve problem and we will illustrate this in following simulations. Second, the computational time of the new correlation coefficient is much less than D-SEV, since it does not involve complicate density function estimation.
2.3 Establish a portfolio of assets heldIn this part, portfolio selected by robust correlation coefficient between the return of the stock and the high frequency Sharpe ratio index will be established. To begin with, the robust correlation coefficient between the return of every stock in the market and the high frequency Sharpe ratio index will be calculated. Then the stocks will be ranked by the correlation coefficient, the stocks corresponding to the largest r value of ζn(Xk, Y), k=1, 2, …, p will be chosen to establish a portfolio and be held for a period of time. Markowitz's portfolio theory shows that investors should diversify their holdings of assets to reduce risks, while pursed high returns, so the number of assets holding is very critical. Theoretically, if the number of assets is small, the dispersion degree will be low, resulting in greater volatility, if the number of assets is large, although the dispersion effect is more obvious and the volatility is lower, its income will be greatly affected. In our article, the percentage of the total assets is used instead of threshold value based on the correlation coefficient. The reason is that according to the market experience, we generally believe that top 1% of the assets are high-quality assets. Here we refer to the parameter setting by Wang et al.[13] which also use top 1% of the assets to construct portfolio. The stock of the portfolio constructed above can be weighted in two ways, equal weights and weights under the minimum variance portfolio. Both of them can achieve good yields.
2.4 AlgorithmFinally, we will introduce the complete algorithm for establishing asset portfolio.
1) Calculate the integrated volatility of every stock for every day in Eq.(2).
2) Calculate the high frequency Sharpe ratio of every stock for every day in Eq.(3).
3) Rank high frequency Sharpe ratios in descending order and choose the top d stocks based on ranked high frequency Sharpe ratio for every day.
4) Use the d stocks to construct a portfolio weighted by market value in Eq.(4), the portfolio is high frequency Sharpe ratio index for every day.
5) Calculate the robust correlation coefficient between the return of every asset in the market and high frequency Sharpe ratio index in Eq.(6) through z periods of data, one period is w days.
6) Rank robust correlation coefficients and select the largest r assets based on ranked robust correlation coefficient.
7) Establish a portfolio by the r assets of equal weights, the portfolio is the ultimate portfolio we will hold for a period exactly after the z periods in step 5).
8) Circulate the step 5) to step 7), we can obtain the portfolio held in every period for a long time.
3 Simulation of the correlation coefficientBefore applying the robust correlation coefficient method to construct the portfolio, we will examine its performance by simulated data. First generate a p dimensional random variables X=(X1, X2, …, Xp), X~N(0, Σ), the covariance matrix Σ=(σij)p×p, where σij=σi-j. Then we construct two linear models as follows:
$ Y_1=c_1 X_1+c_2 X_2+c_3 X_{12}+c_5 \exp \left(X_{22}\right) \varepsilon, $ | (7) |
$ Y_2=c_1 X_1+c_2 X_2+c_3 I\left(X_{12}<0\right)+c_4 X_{22}+\varepsilon . $ | (8) |
We generate n samples to perform simulation experiments. In Eq.(7), we add a exponential terms to compare the robustness. The indicator function is used to simulate the abnormal data in Eq.(8).
To avoid the active features with high correlation, we choose X1, X2, X12 and X22 to be our active features as in Li et al.[15]. The random error ε~N(0, 1). The regression coefficients are set to (c1, c2, c3, c4, c5)=(5, 2, 7, 5, 2) as in Wang et al.[13].
We calculate the correlation coefficient between Xi and Yi, set d as
Here we introduce the parameters in the models. In Table 1, we set parameters 1 as n=50, p=30, σ=0.8, q=100, parameters 2 as n=50, p=50, σ=0.8, q=100, parameters 3 as n=100, p=30, σ=0.8, q=100, and parameters 4 as n=200, p=50, σ=0.8, q=100. In Table 2, we present the time to be used with RCC and D-SEV at different parameters.In the table we use RCC to present robust correlation coefficient.
It can be seen from the simulation that robust correlation coefficient performs better on the accuracy of variable selection and computational cost in general. Under parameters 1, Y1 and Y2 with robust correlation coefficient perform better than with D-SEV in selection probability. The similar situation can be seen under parameters 2. Under parameters 3 and parameters 4, Y1 with robust correlation coefficient perform a little worse than D-SEV in selection probability, while Y2 with robust correlation coefficient still have a higher selection probability. Meanwhile, the computation time with RCC is much shorter than with D-SEV.
In the simulation data, the performance of our method is actually better with the increase of N. But while with the increase of N, the advantage of our method over D-SEV in the first linear model gradually loses. Here we briefly explain the reason. As a method based on kernel estimation, D-SEV performs better when the sample size n is large because kernel estimation crucially relies on sample size. Specifically when n=200 under parameters 4, our method is worse than D-SEV in the first linear model. Here we explain it in two aspects. Firstly, the first linear model is a traditional linear model, under this linear model, the complex estimation method used by D-SEV does have some advantages over our method when the sample size is large. However, in the second linear model, our method is uniformly superior to D-SEV. In the second linear model, the explicit function we apply is to simulate the major events we mentioned later, which is the crucial problem that we can solve by proposing this method. Secondly, we only conduct large n as a complement to our simulation situation, normally speaking, historical data of a long time have little impact on the current yield, this indicate that the n in real financial market will not be very large.
Overall, the performance of robust correlation coefficient achieves advantages on the accuracy of variable selection and computational cost.
4 Empirical analysis 4.1 Empirical data experimentIn this section, we construct the portfolio by robust correlation coefficient with the actual SSE and SZSE stock market data for 2019 and 2020. To calculate the high frequency Sharpe ratio index first, we fix M as 16 and Rm, t as Shanghai market index yield. Then the high frequency Sharpe ratio index can be calculated by equation (4). We can see the excellent performance of this index in Fig. 1. After that we establish the portfolio as mentioned in 2.4, we adopt Scroll Window method (DeMiguel et al.[16]) to evaluate the performance of our method. In detail, we denote our sample periods by 20 d as P1, P2…P24, P1 stands for the first 20 d. For each t≥7, we use data in Pt-6 to Pt-1, a total of 120 d to structure the high frequency Sharpe ratio index as Y and then pick 30 stocks to constitute the portfolio and hold for 20 d. We choose 30 stocks because we decided to choose 1% of total 3 000 assets. This is what we mentioned above for the purpose of risk diversification. Choosing 20 d as the cycle is similar to the monthly calculation in other articles such as Su and Chen[17], because in a natural month the stock market has about 20 d of opening time. The portfolio is constructed of the stocks selected by equal weighting. In Fig. 2, The portfolio selected by robust correlation coefficient earns 8% more excess annualized return than by D-SEV with the actual SSE and SZSE stock market data for 2019 and 2020. To further compare, another important aspect in portfolio construction is risk, which is emphasized by Markowitz[1] and Sharpe[2]. We calculate Sharpe ratio of each portfolio as the definition of Eq.(1). Sharpe ratio is commonly used to evaluate the risk and return of a portfolio, the portfolio with a high Sharpe ratio will face less risk when the benefits at same level.
Download:
|
|
Fig. 1 Comparison between returns of high frequency Sharpe ratio index and returns of Shanghai market index in two years |
Download:
|
|
Fig. 2 Comparison between returns with our method and returns of D-SEV method in two years under parameters: period=20, stocks=30, time series=120 |
The Sharpe ratio of our method is 1.172 while of the D-SEV method is 0.018. The computation time of our method is 17.5 s while of the D-SEV method is 90 067 s.
4.2 Parameter robustness testAlthough our method performs much better than D-SEV with fixed parameter setting 30 stocks and 20 d as a period, it is also valuable to check whether our method can achieve similar results in different parameter settings. In this section, we perform the parameter adjustment experiment. From the results obtained in the following figures, we found that our method was relatively robust, and our selection results perform well.
In Fig. 3(a), we set a period as 18 d with other parameters remain unchanged. The Sharpe ratio in these parameters of our method is 1.47 while of the D-SEV method is 1.11. The computation time in this parameters of our method is 17.9 s while of the D-SEV method is 97 091 s.
Download:
|
|
Fig. 3 Parameter robustness test |
In Fig. 3(b), we set a period as 19 days with other parameters remain unchanged. The Sharpe ratio in this parameters of our method is 1.01 while of the D-SEV method is -0.71. The computation time in this parameters of our method is 17.6 s while of the D-SEV method is 89 499 s. In Fig. 3(c), we choose 40 stocks to constitute the portfolio with other parameters remain unchanged. The Sharpe ratio in this parameters of our method is 1.11 while of the D-SEV method is 0.55. In Fig. 3(d), we choose 50 stocks to constitute the portfolio with other parameters remain unchanged. The Sharpe ratio in this parameters of our method is 0.88 while of the D-SEV method is 0.49. In Fig. 3(e), we choose a total of 100 d to structure the high frequency Sharpe ratio index time series as Y with other parameters remain unchanged. The Sharpe ratio in this parameters of our method is 1.91 while of the D-SEV method is 0.41. In Fig. 3(f), we choose a total of 80 d to structure the high frequency Sharpe ratio index time series as Y with other parameters remain unchanged. The Sharpe ratio in this parameters of our method is 1.52 while of the D-SEV method is 0.63.
From the experiments above, the returns of our method performs better than D-SEV under different parameters. Though under some parameters in first 14 periods the returns of the two methods are similar, the Sharpe ratio of our method is obvious higher than that of D-SEV. It indicates that the portfolio of our method has not only more returns but also less risks. Our method itself is very robust in terms of returns and risks. This is due to the robustness of our method as mentioned in section 2.2. Meanwhile, our method has a big advantage over D-SEV in computation time.
5 ConclusionIn this paper, we filter assets to establish a portfolio based on high frequency Sharpe ratio and the robust correlation coefficient. We are among the first to introduce the robust correlation coefficient in asset selection. The robust correlation coefficient guarantee robustness and less computation cost. These benefits are demonstrated by our simulation and empirical analysis. Our method provides a robust way to construct assets portfolio which generate high return with low risk. Meanwhile, it is relatively easy to implement.
[1] |
Markowitz H. Portfolio selection[J]. The Journal of Finance, 1952, 7(1): 77-91. DOI:10.1111/j.1540-6261.1952.tb01525.x |
[2] |
Kan R, Zhou G F. Optimal portfolio choice with parameter uncertainty[J]. Journal of Financial and Quantitative Analysis, 2007, 42(3): 621-656. DOI:10.1017/s0022109000004129 |
[3] |
Michaud R O. The Markowitz optimization enigma: is 'optimized' optimal?[J]. Financial Analysts Journal, 1989, 45(1): 31-42. DOI:10.2469/faj.v45.n1.31 |
[4] |
James W, Stein C. Estimation with quadratic loss[M]. New York, NY: Springer New York, 1992: 443-460. DOI:10.1007/978-1-4612-0919-5_30
|
[5] |
Sharpe W F. A simplified model for portfolio analysis[J]. Management Science, 1963, 9(2): 277-293. DOI:10.1287/mnsc.9.2.277 |
[6] |
Chan L K C, Karceski J, Lakonishok J. On portfolio optimization: forecasting covariances and choosing the risk model[J]. The Review of Financial Studies, 1999, 12(5): 937-974. DOI:10.1093/rfs/12.5.937 |
[7] |
Ao M M, Li Y Y, Zheng X H. Approaching mean-variance efficiency for large portfolios[J]. The Review of Financial Studies, 2018, 32(7): 2890-2919. DOI:10.1093/rfs/hhy105 |
[8] |
Chen J Q, Yuan M. Efficient portfolio selection in a large market[J]. Journal of Financial Econometrics, 2016, 14(3): 496-524. DOI:10.1093/jjfinec/nbw003 |
[9] |
Fan J Q, Lyu J C. Sure independence screening for ultrahigh dimensional feature space[J]. Journal of the Royal Statistical Society Series B (Statistical Methodology), 2008, 70(5): 849-911. DOI:10.1111/j.1467-9868.2008.00674.x |
[10] |
Fan J Q, Song R. Sure independence screening in generalized linear models with NP-dimensionality[EB/OL]. arXiv: 0903.5255. (2009-03-30)[2012-11-13]. https://arxiv.org/abs/0903.5255.
|
[11] |
Zhu L P, Li L X, Li R Z, et al. Model-free feature screening for ultrahigh-dimensional data[J]. Journal of the American Statistical Association, 2011, 106(496): 1464-1475. DOI:10.1198/jasa.2011.tm10563 |
[12] |
Liu J Y, Li R Z, Wu R L. Feature selection for varying coefficient models with ultrahigh-dimensional covariates[J]. Journal of the American Statistical Association, 2014, 109(505): 266-274. DOI:10.1080/01621459.2013.850086 |
[13] |
Wang C D, Chen Z, Lian Y M, et al. Asset selection based on high frequency Sharpe ratio[J]. Journal of Econometrics, 2022, 227(1): 168-188. DOI:10.1016/j.jeconom.2020.05.007 |
[14] |
Chatterjee S. A new coefficient of correlation[J]. Journal of the American Statistical Association, 2021, 116(536): 2009-2022. DOI:10.1080/01621459.2020.1758115 |
[15] |
Li R Z, Zhong W, Zhu L P. Feature screening via distance correlation learning[J]. Journal of the American Statistical Association, 2012, 107(499): 1129-1139. DOI:10.1080/01621459.2012.695654 |
[16] |
DeMiguel V, Garlappi L, Uppal R. Optimal versus naive diversification: how inefficient is the 1/N portfolio strategy?[J]. The Review of Financial Studies, 2007, 22(5): 1915-1953. DOI:10.1093/rfs/hhm075 |
[17] |
Su X K, Chen L M. Asset selection model based on the vaR adjusted high-frequency sharp index[J]. Management Science and Engineering, 2017, 11(1): 67-75. DOI:10.3968/9412 |