Nowadays, high-dimensional data which are referred to as small n large p data, develop extremely rapidly. Graphical models have been extensively used as a solid tool to measure conditional dependence structure between different variables, ranging from genetics, proteins and brain networks to social networks, online marketing and portfolio optimization. It is well known that the edges of Gaussian graphical model (GGM) are encoded by the corresponding entries of the precision matrix[1]. While most of the existing work concentrates on the estimation and individual inference of precision matrix, simultaneous inference methods are generally reckoned to be more useful in practical applications because of the valid reliability assurance. Therefore, it is in urgent need to develop approaches to make inference for groups of entries of the precision matrix.
Making individual inference for the precision matrix has been widely studied in the literature. Ref.[2] first advocated multiple testing for conditional dependence in GGM with false discovery rates control. It's a pity that this method can not be applied to construct confidence intervals directly. To address this issue, based on the so-called de-biased or de-sparsified procedure, Refs.[3-4]designed to remove the bias term of the initial Lasso-type penalized estimators and achieved asymptotically normal distribution for each entry of the precision matrix. Difference lies in that Ref.[3] adopted graphical Lasso as initial Lasso-type penalized estimator but Ref.[4] focused on nodewise Lasso. They both followed the way of Refs.[5-8]which proposed de-biased steps for inference in high-dimensional linear models.
While most recent studies have focused on the individual inference in high-dimensional regime, the simultaneous inference remains largely unexplored. Refs.[9-11]creatively proposed multiplier bootstrap method. Based on the individual confidence interval, Ref.[12] proposed simultaneous confidence intervals via applying bootstrap scheme to high-dimensional linear models. Distinct from earlier Bonferroni-Holm procedure, this bootstrap method is asymptotically nonconservative because it considers the correlation among the test statistics. More recently, Ref.[13] considered combinatorial inference aiming at testing the global structure of the graph at the cost of heavy computation and only limited to the Gaussian case.
Motivated by these concerns, we develop a bootstrap-assisted procedure to conduct simultaneous inference for high-dimensional precision matrix, based on the de-biased nodewise Lasso estimator. Moreover, we summary a unified framework to perform simultaneous inference for high-dimensional precision matrix. Our method imitates Ref.[12] but generalizes bootstrap-assisted scheme to graphical models and we conclude general theory that our method is applicative as long as precision matrix estimation satisfies some common conditions. The major contributions of this paper are threefold. First of all, we develop a bootstrap-assisted procedure to conduct simultaneous inference for high-dimensional precision matrix, which is adaptive to the dimension of the concerned component and considers the dependence within the de-biased nodewise Lasso estimators while Bonferroni-Holm procedure cannot attain. Second, our method is easy to implement and enjoy nice computational efficiency without loss of accuracy. Last, we provide theoretical guarantees for constructing simultaneous confidence intervals of the precision matrix under a unified framework. We prove that our simultaneous testing procedure asymptotically achieves the preassigned significance level even when the model is sub Gaussian and the dimension is exponentially larger than sample size.
Notations. For a vector x=(x1, ⋯, xp)T, denote
Under the graphical model framework, denote by X an n×p random design matrix with p covariates. Assume that X has independent sub-Gaussian rows X(i), that is, there exists constant K such that
$ \sup \limits_{\boldsymbol{\alpha} \in \mathbb{R}^{p}:\|\boldsymbol{\alpha}\|_{2} \leqslant 1} \mathbb{E} \exp \left(\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)}\right|^{2} / K^{2}\right) \leqslant 2. $ | (1) |
Characterizing the distribution of Lasso-type estimator for precision matrix is difficult because Lasso-type estimator is biased due to the l1 penalization. To address this problem, Refs.[3-4]adopted de-biasing idea which is to start with graphical Lasso or nodewise Lasso estimator and then remove its bias. This results in de-biased estimator generally taking the form
$ \breve{\boldsymbol{\varTheta}}=\widehat{\boldsymbol{\varTheta}}-\widehat{\boldsymbol{\varTheta}}^{\mathrm{T}}\left(\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\varTheta}}-\boldsymbol{I}_{p}\right). $ |
Then we have the following estimation error decomposition
$ \begin{aligned} \sqrt{n}\left(\breve{\varTheta}_{j k}-\varTheta_{j k}\right) &=-\sqrt{n} \boldsymbol{\varTheta}_{j}^{\mathrm{T}}(\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}) \boldsymbol{\varTheta}_{k}+\varDelta_{j k} \\ &=-\sum\limits_{i=1}^{n}\left(\boldsymbol{\varTheta}_{j}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\varTheta}_{k} / \sqrt{n}-\boldsymbol{\varTheta}_{j k}\right)+\varDelta_{j k}, \end{aligned} $ |
where
Various estimations of
$ \widehat{\boldsymbol{\gamma}}_{j}=\arg \min \limits_{\boldsymbol{\gamma} \in \mathbb{R}^{p-1}}\left\{\frac{1}{n}\left\|\boldsymbol{X}_{j}-\boldsymbol{X}_{-j} \boldsymbol{\gamma}\right\|_{2}^{2}+2 \lambda_{j}\|\boldsymbol{\gamma}\|_{1}\right\} . $ |
Further we let
$ {\widehat{\boldsymbol{\varGamma}}}_{j}=\left(-\widehat{\gamma}_{j, 1}, \cdots,-\widehat{\gamma}_{j, j-1}, 1,-\widehat{\gamma}_{j, j+1}, \cdots,-\widehat{\gamma}_{j, p}\right)^{\mathrm{T}}, $ |
$ \widehat{\tau_{j}^{2}}=\left\|\boldsymbol{X}_{j}-\boldsymbol{X}_{-j} \widehat{\boldsymbol{\gamma}}_{j}\right\|{ }_{2}^{2} / n+\lambda_{j}\left\|\widehat{\boldsymbol{\gamma}}_{j}\right\|_{1}. $ |
Then the jth column of the nodewise Lasso estimator
$ \widehat{\boldsymbol{\varTheta}}_{j}=\widehat{\boldsymbol{\varGamma}}_{j} / \widehat{\boldsymbol{\tau}_{j}^{2}}. $ |
However, we realize that
We extend the idea of de-biased nodewise Lasso estimator to construct confidence intervals for any subsets of the entries of the precision matrix. Specifically, we are interested in deriving the distribution of
$ T_{E}=: \max \limits_{(j, k) \in E} \sqrt{n}\left(\breve{\varTheta}_{j k}-\varTheta_{j k}\right). $ | (2) |
Following the idea of Refs.[9-10, 12], we use a bootstrap-assisted scheme to make simultaneous inference for graphical model. Let
$ W_{E}=: \max \limits_{(j, k) \in E}\left|\sum\limits_{i=1}^{n} \widehat{Z}_{i j k} e_{i} / \sqrt{n}\right|, $ | (3) |
where
$ \left.c_{1-\alpha, E}=\inf \left\{t \in \mathbb{R}: \mathbb{P}_{e}\left(W_{E} \leqslant t\right) \leqslant 1-\alpha\right)\right\}, $ | (4) |
denoting the (1-α) -quantile of the statistic WE, where
Remark 1.1 Bonferroni-Holm adjustment states that if an experimenter is testing p hypotheses on a set of data, then the statistical significance level for each independent hypothesis separately is 1/p times what it would be if only one hypothesis were tested. However, the bootstrap uses the quantile of the multiplier bootstrap statistic to asymptotically estimate the quantile of the target statistic and takes dependence among the test statistics into account. Thus the original method with Bonferroni-Holm is on the conservative side, while the bootstrap is closer to the preassigned significance level.
1.4 A unified theory for confidence intervalsDefine the parameter set
$ \begin{array}{*{35}{l}} \mathcal{M}(s)=\left\{ \mathit{\pmb{\Theta}} \in {{\mathbb{R}}^{p\times p}}:1/L\le {{\lambda }_{\min }}\left( \mathit{\pmb{\Theta}} \right)\le \right. \\ \begin{align} & \;\;\;\;\;\;\;\;\;\;\;\;{{\lambda }_{\max }}\left( \mathit{\pmb{\Theta}} \right)\le L, \\ & \left. \;\;\;\;\;\underset{j\in [p]}{\mathop{\max }}\,{{\left\| {{\mathit{\pmb{\Theta}} }_{j}} \right\|}_{0}}\le s,\|\mathit{\pmb{\Theta}} {{\|}_{1}}\le C \right\} \\ \end{align} \\ \end{array} $ |
for some 1≤L≤C. We can extend the above conclusions to more general regime. Let
$ \|\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta}\|_{\max }=O_{p}(\sqrt{\log p / n}), $ |
$ \|\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta}\|_{1}=O_{p}(s \sqrt{\log p / n}), $ |
$ \left\|\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\varTheta}}-\boldsymbol{I}_{p}\right\|_{\max }=O_{p}(\sqrt{\log p / n}), $ |
with probability tending to one uniformly over the parameter space
Before giving the theoretical properties, we list two technical conditions.
(A1) Assume that
(A2) Assume that Bn2(log(pn))7/n≤C1n-c1, where Bn≡1 be a sequence of constants and
Proposition 2.1 (Lemma 1 of Ref.[4]) Consider the sub-Gaussian model and let
$ \sqrt{n}\left(\breve{\Theta}_{j k}-\Theta_{j k}\right)=- \boldsymbol{\varTheta}_{j}^{\mathrm{T}}(\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}) \boldsymbol{\varTheta}_{k}+\varDelta_{j k}, $ |
where
$ \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)} \mathbb{P}\left\{\max \limits_{(j, k) \in[p] \times[p]}\left|\Delta_{j k}\right| \geqslant O\left(\frac{s \log p}{\sqrt{n}}\right)\right\}=0. $ |
Furthermore, define the variance
$ \begin{array}{c} \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)} \mid \mathbb{P}\left(\sqrt{n}\left(\breve{\varTheta}_{j k}-\varTheta_{j k}\right) / \widehat{\sigma}_{j k} \leqslant z\right)- \\ \Phi(z) \mid=0, \end{array} $ |
where
Based on the asymptotic normality properties established in Proposition 2.1, we have the following simultaneous confidence intervals for multiple entries Θjk.
Theorem 2.1 Assume that conditions (A1)-(A2) hold. Then for any E⊆[p]×[p], we have
$ \begin{array}{c} \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s) } \sup \limits_{\alpha \in(0,1)}\mid \mathbb{P}\left(\sqrt{n}\left\|\breve{\boldsymbol{\varTheta}}_{E}-\boldsymbol{\varTheta}_{E}\right\|_{\max } \leqslant c_{1-\alpha, E}\right)- \\ (1-\alpha) \mid=0, \end{array} $ |
where ΘE denotes the entries of Θ with indices in E.
Theorem 2.1 states that we can approximate the (1-α)-quantile of
Next, we extend the above theory to more general case and conclude the unified theory for precision matrix inference.
Theorem 2.2 Assume that event H holds. Then we have
$ \begin{array}{c} \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s) } \sup \limits_{\alpha \in(0,1)}\mid \mathbb{P}\left(\sqrt{n}\left\|\breve{\boldsymbol{\varTheta}}_{E}-\boldsymbol{\varTheta}_{E}\right\|_{\max } \leqslant c_{1-\alpha, E}\right)- \\ (1-\alpha) \mid=0, \end{array} $ |
for any E⊆[p]×[p], where ΘE denotes the entries of Θ with indices in E.
Next, we extend the above theory to more general case and conclude the unified theory for precision matrix inference.
Theorem 2.3 Assume that event H holds. Then we have
(A)(Individual inference)
$ \begin{align} & \underset{n\to \infty }{\mathop{\lim }}\,\underset{\mathit{\pmb{\Theta}} \to \mathcal{M}\left( s \right)}{\mathop{\sup }}\,\left| \mathbb{P}\left( \sqrt{n}\left( {{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\mathit{\Theta} }}}_{jk}}-{{\mathit{\Theta} }_{jk}} \right)/\widehat{{{\sigma }_{jk}}}\le z \right) \right.- \\ & \left. \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\Phi \left( z \right)=0, \right| \\ \end{align} $ |
where
(B)(Simultaneous inference)
$ \begin{array}{c} \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s) } \sup \limits_{\alpha \in(0,1)}\mid \mathbb{P}\left(\sqrt{n}\left\|\breve{\boldsymbol{\varTheta}}_{E}-\boldsymbol{\varTheta}_{E}\right\|_{\max } \leqslant c_{1-\alpha, E}\right)- \\ (1-\alpha) \mid=0, \end{array} $ |
for any E⊆[p]×[p], where ΘE denotes the entries of Θ with indices in E.
Theorem 2.3 presents general conclusions for both individual and simultaneous confidence intervals. That is, our inferential procedures work for any estimation methods for precision matrix as long as the estimation effect satisfies event H.
3 Numerical studiesIn this section, we investigate the finite sample performance of the methods proposed in Section 3 and provide a comparison to simultaneous confidence interval for de-biased graphical Lasso, denoted by S-NL and S-GL, respectively. We now present two numerical examples and evaluate the methods by estimated average coverage probabilities (avgcov) and average confidence interval lengths (avglen) over two cases: support set S and its complement Sc. For convenience, we only consider Gaussian setting. The implementation for de-biased nodewise Lasso and de-biased graphical Lasso are suggested by Ref.[4]. Throughout the simulation, the level of significance is set at α=0.05 and the coverage probabilities and interval lengths calculated by averaging over 100 simulation runs and 500 Monte Carlo replications. For extra comparison, we also record individual confidence intervals for de-biased nodewise Lasso and de-biased graphical Lasso, denoted by I-NL and I-GL, respectively.
3.1 Numerical example 1: band structureWe start with a numerical example which has the similar setting as that in Ref.[3]. We consider the precision matrix Θ with the band structure, where
In terms of avgcov and avglen, it is clear that our proposed S-NL method outperforms other alternative methods with higher avgcov and shorter avglen in most settings. Although the avglen over Sc may be a little longer in some cases, it is amazing that the coverage probabilities in S approach the nominal coverage 95%. On the other hand, the advantage becomes more evident as p and ρ increase. Compared with individual confidence intervals, simultaneous confidence intervals have longer lengths and lower coverage probabilities. This is reasonable because multiplicity adjustment damages partial accuracy which is inevitable.
3.2 Numerical example 2: nonband structureFor the second numerical example, we use the same setup as simulation example 1 in Ref.[16] to test the performance of S-NL in more general cases. We generate the precision matrix in two steps. First, we create a band matrix Θ0 the same as that in Section 3.1. Second, we randomly permute the rows and columns of Θ0 to obtain the precision matrix Θ. The final precision matrix Θ no longer has the band structure. Then we sample the rows of the n×p data matrix X as i.i.d. copies from the multivariate Gaussian distribution N(0, Σ) where Σ=Θ-1. Throughout this simulation, we fix the sample size n=200, dimensionality p=1 000 and consider a range of ρ=0.2, 0.3, 0.4, 0.5.
Simulation results summarized in Table 2 also illustrate that our method can achieve the preassigned significance level asymptotically and behaves better than others in most cases. Moreover, we can see our method is very robust especially in large ρ.
In this paper, we apply bootstrap-assisted procedure to make valid simultaneous inference for high-dimensional precision matrix based on the recent de-biased nodewise Lasso estimator. In addition, we summary a unified framework to perform simultaneous confidence intervals for high-dimensional precision matrix under the sub-Gaussian case. As long as some estimation effects are satisfied, our procedure can focus on different precision matrix estimation methods which owns great flexibility. Further, this method can be expended to more general settings, such as functional graphical model where the samples are consisted of functional data. We leave this problem for further investigations.
Appendix A.1 PreliminariesWe first provide a brief overview of the results for the nodewise Lasso and Gaussian approximation in the following Propositions.
Proposition A.1 (Theorem 1 of Ref.[4], Asymptotic normality). Suppose that
$ \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)}\left|\mathbb{P}\left(\sqrt{n}\left(\breve{\varTheta}_{i j}-\varTheta_{i j}\right) / \sigma_{i j} \leqslant z\right)-\varPhi(z)\right|=0. $ |
Proposition A.2 (Lemma 2 of Ref.[4], Variance estimation). Suppose that
$ \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)} \mathbb{P}\left(\max \limits_{i, j=1, \cdots, p}\left|\widehat{\sigma}_{i j}^{2}-\sigma_{i j}^{2}\right| \geqslant \eta\right)=0. $ |
Proposition A.3 (Corollary 2.1 of Ref.[9], Gaussian approximation) Let
$ c_{1} \leqslant \sum\limits_{i=1}^{n} \mathbb{E} x_{i j}^{2} / n \leqslant C_{1}, $ |
$ \begin{array}{l} \max \limits_{r=1,2} \sum\limits_{i=1}^{n} \mathbb{E}\left(\left|x_{i j}\right|^{2+r} / B_{n}^{r}\right) / n+\mathbb{E}\left(\exp \left(\left|x_{i j}\right| / B_{n}\right)\right) \\ \ \ \ \ \ \ \ \ \ \ \leqslant 4, \end{array} $ |
and Bn4(log(pn))7/n≤C2n-c2. Then there exist constants c>0 and C>0 depending only on c1, C1, c2, and C2 such that
$ \rho:=\sup \limits_{t \in \mathbb{R}}\left|P\left(T_{0} \leqslant t\right)-P\left(Z_{0} \leqslant t\right)\right| \leqslant C n^{-c} \rightarrow 0. $ |
Without loss of generality, we set E=[p]×[p]. For any (j, k)∈E, define
$ T_{E}=: \max \limits_{(j, k) \in E} \sqrt{n}\left(\breve{\Theta}_{j k}-\Theta_{j k}\right), W_{E}=: \max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} \widehat{Z}_{i j k} e_{i} / \sqrt{n} , $ |
$ T_{0}=: \max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Z_{i j k} / \sqrt{n}, W_{0}=: \max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Z_{i j k} e_{i} / \sqrt{n}, $ |
where
$ \mathbb{P}\left(c_{1-\alpha, W} \leqslant c_{\left(1-\alpha+\xi_{2}\right), W_{0}}+\xi_{1}\right) \geqslant 1-\xi_{2} , $ |
$ \mathbb{P}\left(c_{1-\alpha, W_{0}} \leqslant c_{1-\alpha+{\rm{\pi }}(\nu), W}\right) \geqslant 1-\mathbb{P}(\Gamma>\nu). $ |
Let
$ \begin{array}{c} \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s) } \sup \limits_{\alpha \in(0,1)}\mid \left(\mathbb{P} \sqrt{n}\left\|\breve{\boldsymbol{\varTheta}}_{E}-\boldsymbol{\varTheta}_{E}\right\|_{\max } \leqslant c_{1-\alpha, E}\right)- \\ (1-\alpha) \mid=0, \end{array} $ |
which conclude the proof.
A.3 Proof of Theorem 2.3To enhance the readability, we split the proof into three steps by providing the bound on bias term, establishing asymptotic normality and verifying the variance consistency.
Step 1
$ \begin{aligned} \breve{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta}=& \widehat{\boldsymbol{\varTheta}}-\widehat{\boldsymbol{\varTheta}}^{\mathrm{T}}\left(\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\theta}}-\boldsymbol{I}_{p}\right)-\boldsymbol{\varTheta} \\ =&-\boldsymbol{\varTheta}(\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}) \boldsymbol{\varTheta}-\left(\boldsymbol{\varTheta} \widehat{\boldsymbol{\varSigma}}-\boldsymbol{I}_{p}\right)(\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta})-\\ &(\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta})^{\mathrm{T}}\left(\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\varTheta}}-\boldsymbol{I}_{p}\right) \\ : & \boldsymbol{Z}+\boldsymbol{\varDelta}_{1}+\boldsymbol{\varDelta}_{2}. \end{aligned} $ |
For the first part bias, we have
Step 2 The proof is the direct conclusion of Theorem 1 of Ref.[4].
Step 3 The proof is the direct conclusion of Lemma 2 of Ref.[4].
The subsequent proof is similar to Theorem 2.1, thus we omit the details.
A.4 Lemmas and their proofsThe following lemmas will be used in the proof of the main theorem.
Lemma A.1 Assume that conditions (A1)-(A4) hold. Then for any E⊆[p]×[p] we have
$ \begin{array}{l} \sup \limits_{t \in \mathbb{R}} \mid \mathbb{P}\left(\max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Z_{i j k} / \sqrt{n} \leqslant t\right)- \\ \ \ \ \ \mathbb{P}\left(\max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Y_{i j k} / \sqrt{n} \leqslant t\right) \mid \leqslant C_{0} n^{-c_{0}}, \end{array} $ |
where {Yijk}(j, k)∈E are Gaussian analogs of {Zijk}(j, k)∈E in the sense of sharing the same mean and covariance for i=1, 2, ⋯, n.
Proof The proof is based upon verifying conditions from Corollary 2.1 of Ref.[9]. To be concrete, we require to prove the following condition (E.1).
$ c_{1} \leqslant \sum\limits_{i=1}^{n} \mathbb{E} Z_{i j k}^{2} / n \leqslant C_{1} , $ |
$ \begin{array}{l} \max \limits_{r=1,2} \sum\limits_{i=1}^{n} \mathbb{E}\left(\left|Z_{i j k}\right|^{2+r} / B_{n}^{r}\right) / n+ \\ \ \ \ \ \ \ \ \ \ \ \ \ \mathbb{E}\left(\exp \left(\left|Z_{i j k}\right| / B_{n}\right)\right) \leqslant 4, \end{array} $ |
where Bn≥1 be a sequence of constants and
$ \max \limits_{r=1,2} \mathbb{E}\left|Z_{i j k}\right|^{2+r} / B_{n}^{r}+\mathbb{E} \exp \left(\left|Z_{i j k}\right| / B_{n}\right) \leqslant 4, $ |
which conclude the proof.
Lemma A.2 Let V and Y be centered Gaussian random vectors in
$ \begin{array}{c} \sup \limits_{t \in \mathbb{R}} \mid \mathbb{P}\left(\max \limits_{1 \leqslant j \leqslant p} V_{j} \leqslant t\right)-\mathbb{P}\left(\max \limits_{1 \leqslant j \leqslant p} Y_{j} \leqslant t\right) \mid \\ \leqslant C \varDelta_{0}^{1 / 3}\left(1 \vee \log \left(p / \varDelta_{0}\right)\right)^{2 / 3}, \end{array} $ |
where
Proof The proof is the same as Lemma 3.1 of Ref.[9]
Lemma A.3 Suppose that there are some constants 0 < c1 < C1 such that
$ \mathbb{P}\left(c_{1-\alpha, W_{0}} \leqslant c_{1-\alpha+{\rm{\pi }}(\nu), Y_{0}}\right) \geqslant 1-\mathbb{P}(\Gamma>\nu) , $ |
$ \mathbb{P}\left(c_{1-\alpha, Y_{0}} \leqslant c_{1-\alpha+{\rm{\pi }}(\nu), W_{0}}\right) \geqslant 1-\mathbb{P}(\Gamma>\nu), $ |
Proof Recall that
Lemma A.4 Assume that conditions (A1)-(A4) hold. Then for any (j, k)∈E we have
$ \mathbb{P}\left(\left|T_{E}-T_{0}\right|>\xi_{1}\right)<\xi_{2}, $ |
$ \mathbb{P}\left(\mathbb{P}_{e}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right)>\xi_{2}\right)<\xi_{2}, $ |
where ξ1=o(1), ξ2=o(1) and
Proof
Bounds for |TE-T0|: Recall that
$ \left|T_{E}-T_{0}\right| \leqslant \max \limits_{(j, k) \in E}\left|\varDelta_{j k}\right| $ |
It follows from Theorem 2.1 that
$ \mathbb{P}\left\{\max \limits_{(j, k) \in E}\left|\Delta_{j k}\right| \geqslant O\left(\frac{s \log p}{n}\right)\right\} \leqslant o\ \ \ \ (1), $ |
where ξ1= O(slogp/n)=o(1) and ξ2=o(1).
$ \begin{array}{l} \ \ \ \ \ \ \ \ {\bf { Bounds\ \ for }} \quad\left|W_{E}-W_{0}\right|: \\ \left|W_{E}-W_{0}\right| \leqslant \max \limits_{(j, k) \in E}\left|\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right) e_{i} / \sqrt{n}\right| . \end{array} $ |
Let
$ \begin{array}{l} \mathbb{E}\left(A_{n}\right) \leqslant \mathbb{E}_{X} \sqrt{\mathbb{E}_{e}\left[\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right) e_{i} / \sqrt{n}\right]^{2}} \leqslant\\ \mathbb{E}_{X} \sqrt{\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right)^{2} / n} \leqslant \sqrt{\mathbb{E}_{X}\left[\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right)^{2} / n\right]}. \end{array} $ |
By Lemma A.5,
$ \begin{array}{c} \mathbb{P}\left(\mathbb{P}_{e}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right)>\xi_{2}\right) \leqslant \\ \mathbb{E}\left[\mathbb{P}_{e}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right)\right] / \xi_{2}= \\ \mathbb{P}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right) / \xi_{2} \leqslant \xi_{2}^{2} / \xi_{2}=\xi_{2}, \end{array} $ |
which conclude the proof.
Lemma A.5
$ \max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right)^{2} / n=o_{p}(1). $ |
Proof
Since (a-b)2≤2(a2+b2), we have
For the first part, it follows from triangle inequality that
For the second part, it is obvious that
$ \left|I_{2}\right| \leqslant O_{p}(\log p / n), $ |
which is a direct result of event H.
Combining them together, we conclude =Op(s2logplog(np)/n)=op(1).
Lemma A.6 Assume that conditions (A1)-(A4) hold. Let
$ \|\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}\|_{\max } \leqslant O_{p}(\sqrt{\log p / n}). $ |
Proof he proof is the same as Lemma L.3 of Ref.[13], which follows by invoking the inequality
$ \left\|\boldsymbol{X}_{i} \boldsymbol{X}_{j}\right\|_{\psi_{1}} \leqslant 2\left\|\boldsymbol{X}_{i}\right\|_{\psi_{2}}\left\|\boldsymbol{X}_{j}\right\|_{\psi_{2}} \leqslant 2 c^{-2} $ |
and Proposition 5.16 in Ref.[17] and the union bound.
Lemma A.7 Let
$ \max \limits_{i \in[n]}\left\|\boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}}\right\|_{\max }<O_{p}(\log (n p)). $ |
Proof The proof is the same as Lemma L.5 of Ref.[13]. It follows from the fact
Lemma A.8 Let
$ \begin{array}{c} \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \beta-\mathbb{E} \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} / \\ \left(2 M^{2} K^{2}\right)^{r} \leqslant r ! / 2. \end{array} $ |
Proof The proof is the same as Lemma 5 of Ref.[3]. Since
$ \mathbb{E} e^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\alpha}^{2} /(M K)^{2}} \leqslant 2 \text { and } \mathbb{E} e^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}^{2 /( { MK })^{2}}} \leqslant 2. $ |
By the inequality ab≤a2/2+b2/2 (for any
$ \begin{array}{c} \mathbb{E} \mathrm{e}^{\mid \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta} \boldsymbol{\gamma}(M K)^{2}} \leqslant \mathbb{E} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \alpha \mid^{2} /(M K)^{2} / 2} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\mid^{2 /(M K)^{2} / 2}} \leqslant \\ \left\{\mathbb{E} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\alpha}\mid^{2} /(M K)^{2}}\right\}^{1 / 2}\left\{\mathbb{E} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\mid^{2 /(M K)^{2}}}\right\}^{1 / 2} \leqslant 2 . \end{array} $ |
By the Taylor expansion, we have the inequality
$ \begin{array}{c} 1+\frac{1}{r !} \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} /(M K)^{2 r} \leqslant \\ \mathbb{E} \mathrm{e}^{\mid \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta} \mid /(M K)^{2}}. \end{array} $ |
Next it follows
$ \begin{array}{c} \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}-\mathbb{E} \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} /(M K)^{2 r} \leqslant \\ 2^{r-1} \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} /(M K)^{2 r} \leqslant \\ 2^{r-1} r !\left(\mathbb{E} \mathrm{e}^{\mid \alpha^{\mathrm{T}} X_{(i)} X_{(i)}^{\mathrm{T}} \boldsymbol{\beta} \mid /(M K)^{2}}-1\right)=2^{r-1} r !=\frac{r !}{2} 2^{r}. \end{array} $ |
Therefore, we have
$ \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \beta-\mathbb{E} \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} {\beta}\right|^{r} /\left(2 M^{2} K^{2}\right)^{r} \leqslant \frac{r !}{2}. $ |
[1] |
Lauritzen S L. Graphical Models[M]. New York: Oxford University Press, 1996.
|
[2] |
Liu W. Gaussian graphical model estimation with false discovery rate control[J]. Ann Statist, 2013, 41(6): 2948-2978. |
[3] |
Janková J, van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation[J]. Electron J Statist, 2015, 9: 1205-1229. |
[4] |
Janková J, Van de Geer S. Honest confidence regions and optimality in high-dimensional precision matrix estimation[J]. Test, 2017, 26(1): 143-162. DOI:10.1007/s11749-016-0503-5 |
[5] |
Bühlmann P. Statistical significance in high-dimensional linear models[J]. Bernoulli, 2013, 19(4): 1212-1242. |
[6] |
Javanmard A, Montanari A. Confidence intervals and hypothesis testing for high-dimensional regression[J]. J Mach Learn Res, 2014, 15: 2869-2909. |
[7] |
van de Geer S, Bühlmann P, Ritov Y, et al. On asymptotically optimal confidence regions and tests for high-dimensional models[J]. Ann Statist, 2014, 42(3): 1166-1202. |
[8] |
Zhang C, Zhang S. Confidence intervals for low dimensional parameters in high dimensional linear models[J]. J R Stat Soc Ser B Stat Methodol, 2014, 76(1): 217-242. DOI:10.1111/rssb.12026 |
[9] |
Chernozhukov V, Chetverikov D, Kato K. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors[J]. Ann Statist, 2013, 41(6): 2786-2819. |
[10] |
Chernozhukov V, Chetverikov D, Kato K. Comparison and anti-concentration bounds for maxima of Gaussian random vectors[J]. Probability Theory and Related Fields, 2015, 162: 47-70. DOI:10.1007/s00440-014-0565-9 |
[11] |
Chernozhukov V, Chetverikov D, Kato K. Central limit theorems and bootstrap in high dimensions[J]. Annals of Probability, 2016, 45(4): 2309-2352. |
[12] |
Zhang X, Cheng G. Simultaneous inference for high-dimensional linear models[J]. J Amer Statist Assoc, 2016, 112(2): 757-768. |
[13] |
Neykov M, Lu J, Liu H. Combinatorial inference for graphical models[J]. Ann Statist, 2019, 47(6): 795-827. |
[14] |
Cai T, Liu W, Luo X. A constrained l1 minimization approach to sparse precision matrix estimation[J]. J Amer Statist Assoc, 2011, 106(494): 594-607. DOI:10.1198/jasa.2011.tm10155 |
[15] |
Liu W D, Luo X. Fast and adaptive sparse precision matrix estimation in high dimensions[J]. Journal of Multivariate Analysis, 2015, 135(4): 153-162. |
[16] |
Fan Y, Lv J. Innovated scalable efficient estimation in ultra-large gaussian graphical models[J]. Ann Statist, 2016, 44(5): 2098-2126. |
[17] |
Vershynin R. Introduction to the non-asymptotic analysis of random matrices[M]//Eldar Y C, Kutyniok G. Compressed sensing: theory and applications. Cambridge: Cambridge University Press, 2012.
|