A unified theory of confidence intervals for high-dimensional precision matrix

引用本文

Wang Y, Li Y, Zheng Z M. A unified theory of confidence intervals for high-dimensional precision matrix[J]. Journal of University of Chinese Academy of Sciences, 2021, 38(2): 171-180.

王月, 李阳, 郑泽敏. 基于高维精度矩阵的置信区间的一致性理论[J]. 中国科学院大学学报, 2021, 38(2): 171-180.

A unified theory of confidence intervals for high-dimensional precision matrix

WANG Yue, LI Yang, ZHENG Zemin

School of Management, University of Science and Technology of China, Hefei 230026, China

Received 2 August 2019; Revised 21 October 2019

Foundation items: Supported by National Natural Science Foundation of China (11601501, 11671374, and 71731010), Anhui Provincial Natural Science Foundation (1708085QA02), and Fundamcntal Rescarch Funds for the Central Universities (WK2040160028)

Corresponding author: LI Yang, E-mail: tjly@mail.ustc.edu.cn

Abstract: Precision matrix inference is of fundamental importance nowadays in high-dimensional data analysis for measuring conditional dependence. Despite the fast growing literature, developing approaches to make simultaneous inference for precision matrix with low computational cost is still in urgent need. In this paper, we apply bootstrap-assisted procedure to conduct simultaneous inference for high-dimensional precision matrix based on the recent de-biased nodewise Lasso estimator, which does not require the irrepresentability condition and is easy to implement with low computational cost. Furthermore, we summary a unified framework to perform simultaneous confidence intervals for high-dimensional precision matrix under the sub-Gaussian case. We show that as long as some precision matrix estimation effects are satisfied, our procedure can focus on different precision matrix estimation methods which owns great flexibility. Besides, distinct from earlier Bonferroni-Holm procedure, this bootstrap method is asymptotically nonconservative. Both numerical results confirm the theoretical results and computational advantage of our method.

Keywords: precision matrix high dimensionality bootstrap-assisted confidence intervals simultaneous inference de-biased

基于高维精度矩阵的置信区间的一致性理论

王月, 李阳, 郑泽敏

中国科学技术大学管理学院, 合肥 230026

摘要: 随着高维数据的不断发展，精度矩阵作为衡量变量间条件相依性的有效工具引起广泛关注。尽管已有大量文献研究精度矩阵，但如何发展一种低计算成本的方法构造高维精度矩阵的同时推断变得尤为迫切。基于nodewise Lasso估计量，利用bootstrap assisted策略构造同时置信区间。与现有方法相比，该方法在理论上不需要不可解释性条件且计算成本非常低。进一步，总结出在次高斯情形下，精度矩阵同时置信区间的一致性理论，即只要精度矩阵某些估计性质满足，该方法可以基于不同的精度矩阵估计方法进行推断。此外，不同于传统的Bonferroni-Holm，该方法是渐近非保守的。模拟结果验证了该方法的优势。

关键词: 精度矩阵高维 bootstrap-assisted 置信区间同时推断纠偏

Nowadays, high-dimensional data which are referred to as small n large p data, develop extremely rapidly. Graphical models have been extensively used as a solid tool to measure conditional dependence structure between different variables, ranging from genetics, proteins and brain networks to social networks, online marketing and portfolio optimization. It is well known that the edges of Gaussian graphical model (GGM) are encoded by the corresponding entries of the precision matrix^[1]. While most of the existing work concentrates on the estimation and individual inference of precision matrix, simultaneous inference methods are generally reckoned to be more useful in practical applications because of the valid reliability assurance. Therefore, it is in urgent need to develop approaches to make inference for groups of entries of the precision matrix.

Making individual inference for the precision matrix has been widely studied in the literature. Ref.[2] first advocated multiple testing for conditional dependence in GGM with false discovery rates control. It's a pity that this method can not be applied to construct confidence intervals directly. To address this issue, based on the so-called de-biased or de-sparsified procedure, Refs.[3-4]designed to remove the bias term of the initial Lasso-type penalized estimators and achieved asymptotically normal distribution for each entry of the precision matrix. Difference lies in that Ref.[3] adopted graphical Lasso as initial Lasso-type penalized estimator but Ref.[4] focused on nodewise Lasso. They both followed the way of Refs.[5-8]which proposed de-biased steps for inference in high-dimensional linear models.

While most recent studies have focused on the individual inference in high-dimensional regime, the simultaneous inference remains largely unexplored. Refs.[9-11]creatively proposed multiplier bootstrap method. Based on the individual confidence interval, Ref.[12] proposed simultaneous confidence intervals via applying bootstrap scheme to high-dimensional linear models. Distinct from earlier Bonferroni-Holm procedure, this bootstrap method is asymptotically nonconservative because it considers the correlation among the test statistics. More recently, Ref.[13] considered combinatorial inference aiming at testing the global structure of the graph at the cost of heavy computation and only limited to the Gaussian case.

Motivated by these concerns, we develop a bootstrap-assisted procedure to conduct simultaneous inference for high-dimensional precision matrix, based on the de-biased nodewise Lasso estimator. Moreover, we summary a unified framework to perform simultaneous inference for high-dimensional precision matrix. Our method imitates Ref.[12] but generalizes bootstrap-assisted scheme to graphical models and we conclude general theory that our method is applicative as long as precision matrix estimation satisfies some common conditions. The major contributions of this paper are threefold. First of all, we develop a bootstrap-assisted procedure to conduct simultaneous inference for high-dimensional precision matrix, which is adaptive to the dimension of the concerned component and considers the dependence within the de-biased nodewise Lasso estimators while Bonferroni-Holm procedure cannot attain. Second, our method is easy to implement and enjoy nice computational efficiency without loss of accuracy. Last, we provide theoretical guarantees for constructing simultaneous confidence intervals of the precision matrix under a unified framework. We prove that our simultaneous testing procedure asymptotically achieves the preassigned significance level even when the model is sub Gaussian and the dimension is exponentially larger than sample size.

Notations. For a vector x=(x₁, ⋯, x_p)^T, denote $ {{\left\| \boldsymbol{x} \right\|}_{\text{q}}}={{\left( \sum\limits_{j=1}^{p}{{{\left| {{x}_{j}} \right|}^{q}}} \right)}^{1/q}} $ the l_q -norm for q∈(0, ∞), $ {\left\| \boldsymbol{x} \right\|_{\rm{0}}} = \left| {\left\{ {j:{x_j} \ne 0} \right\}} \right|, {\left\| \boldsymbol{x} \right\|_\infty } = \mathop {\max }\limits_{1 \le i \le p} \left| {{x_i}} \right|, {\left( \boldsymbol{x} \right)_i}$ denote the ith row of x and x_-j denote the sub-vector without jth component. For a matrix $ \boldsymbol{A}=\left( {{a}_{ij}} \right)\in {{\mathbb{R}}^{p\times q}} $, denote $ {\left\| \boldsymbol{A} \right\|_1} = \mathop {\max }\limits_{1 \le j \le q} \sum\limits_{i = 1}^p {\left| {{a_{ij}}} \right|} , {\left\| \boldsymbol{A} \right\|_\infty }\mathop {\max }\limits_{1 \le i \le p} \sum\nolimits_{j = 1}^q {\left| {{a_{ij}}} \right|} , {\left\| \boldsymbol{A} \right\|_{\max }} = \mathop {\max }\limits_{i, j} \left| {{a_{ij}}} \right|, {\left\| \boldsymbol{A} \right\|_2} = {\lambda _{\max }}{\left( {{\boldsymbol{A}^{\rm{T}}}\boldsymbol{A}} \right)^{1/2}} $ the matrix l₁ -norm, l_∞ -norm, elementwise maximum norm, and spectral norm, respectively, where λ_min(A) and λ_max(A) denote the minimum and maximum eigenvalues of the given matrix A.A_ij denote the (i, j) -entry of A, A_j denote the jth column of A, A_(i) denote the ith row of A, A_-j denote the sub-matrix of A without the jth column, A_{-i, j} denote the jth column of A without its ith entry and A_{-i, -j} denote the sub-matrix of A without the ith row and jth column. For two real sequences {f_n} and {g_n}, we write f_n=O(g_n) if there exists a constant C such that |f_n|≤C|g_n| holds for all n, f_n=o(g_n) if $ \mathop {\lim }\limits_{n \to \infty } {\mkern 1mu} {f_n}/{g_n} = 0 $ and $ f_{n} \asymp g_{n} $ if f_n=O(g_n) and g_n=O(f_n). The sub-gaussian norm of a random variable Z, denoted by ‖Z‖_ψ₂, is defined as $ {\left\| Z \right\|_{{\psi _2}}} = \mathop {\sup }\limits_{q \ge 1} {q^{ - 1/2}}{\left( {\mathbb{E}\;{Z^q}} \right)^{1/q}} $. The sub-Gaussian norm of a random vector Z is defined as $ {\left\| \boldsymbol{Z} \right\|_{{\psi _2}}} = \mathop {\sup }\limits_{\left\| x \right\| = 1} {\left\| {\langle \boldsymbol{Z}, \boldsymbol{x}\rangle } \right\|_{{\psi _2}}} $. Finally, the sub-exponential norm of random variable Z is defined as $ {\left\| Z \right\|_{{\psi _1}}} = \mathop {\sup }\limits_{q \ge 1} {q^{ - 1}}{\left( {\mathbb{E}\;{Z^q}} \right)^{1/q}} $. c_i and C_i be some constants independent of n and p.

1 Methodology 1.1 Model setting

Under the graphical model framework, denote by X an n×p random design matrix with p covariates. Assume that X has independent sub-Gaussian rows X_(i), that is, there exists constant K such that

$ \sup \limits_{\boldsymbol{\alpha} \in \mathbb{R}^{p}:\|\boldsymbol{\alpha}\|_{2} \leqslant 1} \mathbb{E} \exp \left(\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)}\right|^{2} / K^{2}\right) \leqslant 2. $

(1)

1.2 De-biased nodewise Lasso

Characterizing the distribution of Lasso-type estimator for precision matrix is difficult because Lasso-type estimator is biased due to the l₁ penalization. To address this problem, Refs.[3-4]adopted de-biasing idea which is to start with graphical Lasso or nodewise Lasso estimator and then remove its bias. This results in de-biased estimator generally taking the form

$ \breve{\boldsymbol{\varTheta}}=\widehat{\boldsymbol{\varTheta}}-\widehat{\boldsymbol{\varTheta}}^{\mathrm{T}}\left(\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\varTheta}}-\boldsymbol{I}_{p}\right). $

Then we have the following estimation error decomposition

$ \begin{aligned} \sqrt{n}\left(\breve{\varTheta}_{j k}-\varTheta_{j k}\right) &=-\sqrt{n} \boldsymbol{\varTheta}_{j}^{\mathrm{T}}(\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}) \boldsymbol{\varTheta}_{k}+\varDelta_{j k} \\ &=-\sum\limits_{i=1}^{n}\left(\boldsymbol{\varTheta}_{j}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\varTheta}_{k} / \sqrt{n}-\boldsymbol{\varTheta}_{j k}\right)+\varDelta_{j k}, \end{aligned} $

where $ \widehat{\mathit{\pmb{\Sigma}}}=\boldsymbol{X}^{\mathrm{T}} \boldsymbol{X} / n $ is the empirical covariance. The first term is the main term whose covariance works out as Var $ \left(\mathit{\pmb{\Theta}}_{j}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \mathit{\pmb{\Theta}}_{k}\right) $, while the second term is the bias term and is controlled by $ O_{p}(s \log p /(\sqrt{n}) $.

Various estimations of $ \widehat {\mathit{\pmb{\Theta}}} $ have been offered by different works, but the basic insight lies in that $ \widehat {\mathit{\pmb{\Theta}}} $ should act as a good initial estimator. In this paper, we adopt the de-biased nodewise Lasso proposed in Ref.[4] to illustrate our simultaneous inference approach due to the appealing property that de-biased nodewise Lasso does not require the irrepresentability condition and is easy to implement with low computational cost. More general choices of $ \widehat {\mathit{\pmb{\Theta}}} $ will be introduced in Section 4. Following Ref.[4], for each j=1, ⋯, p, define the vector $ {\widehat{\mathit{\pmb{\gamma}}} _j} = \left\{\widehat{\gamma}_{j, k}, k=1, \cdots, p, j \neq k\right\} $ as

$ \widehat{\boldsymbol{\gamma}}_{j}=\arg \min \limits_{\boldsymbol{\gamma} \in \mathbb{R}^{p-1}}\left\{\frac{1}{n}\left\|\boldsymbol{X}_{j}-\boldsymbol{X}_{-j} \boldsymbol{\gamma}\right\|_{2}^{2}+2 \lambda_{j}\|\boldsymbol{\gamma}\|_{1}\right\} . $

Further we let

$ {\widehat{\boldsymbol{\varGamma}}}_{j}=\left(-\widehat{\gamma}_{j, 1}, \cdots,-\widehat{\gamma}_{j, j-1}, 1,-\widehat{\gamma}_{j, j+1}, \cdots,-\widehat{\gamma}_{j, p}\right)^{\mathrm{T}}, $

$ \widehat{\tau_{j}^{2}}=\left\|\boldsymbol{X}_{j}-\boldsymbol{X}_{-j} \widehat{\boldsymbol{\gamma}}_{j}\right\|{ }_{2}^{2} / n+\lambda_{j}\left\|\widehat{\boldsymbol{\gamma}}_{j}\right\|_{1}. $

Then the jth column of the nodewise Lasso estimator $ \widehat {\mathit{\pmb{\Theta}}} $ is defined as

$ \widehat{\boldsymbol{\varTheta}}_{j}=\widehat{\boldsymbol{\varGamma}}_{j} / \widehat{\boldsymbol{\tau}_{j}^{2}}. $

However, we realize that $ \widehat {\mathit{\pmb{\Theta}}} $ is not unique. It only needs to be a good estimator of Σ^-1 in order for the de-biasing procedure to work. As a consequence, our approach applies to general precision matrix inference. By contrast, earlier approaches applied only to de-biased graphical Lasso, as in Ref.[3], or de-biased gnodewise Lasso as in Ref.[4]. The only assumptions we make on $ \widehat {\mathit{\pmb{\Theta}}} $ are the event H containing three common conditions described in Section 1.4.

1.3 Simultaneous confidence intervals

We extend the idea of de-biased nodewise Lasso estimator to construct confidence intervals for any subsets of the entries of the precision matrix. Specifically, we are interested in deriving the distribution of $ \mathop {\max }\limits_{\left( {j, k} \right) \in E} \sqrt n \left( {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\smile$}} \over {\mathit{\Theta}} } }_{jk}} - {\mathit{\Theta} _{jk}}} \right) $. For convenience, define statistic

$ T_{E}=: \max \limits_{(j, k) \in E} \sqrt{n}\left(\breve{\varTheta}_{j k}-\varTheta_{j k}\right). $

(2)

Following the idea of Refs.[9-10, 12], we use a bootstrap-assisted scheme to make simultaneous inference for graphical model. Let $ \left\{ {{e_i}} \right\}_{i = 1}^n $ be an i.i.d. sequence of N(0, 1) random variables independent of X_i. Define the multiplier bootstrap statistic as follows

$ W_{E}=: \max \limits_{(j, k) \in E}\left|\sum\limits_{i=1}^{n} \widehat{Z}_{i j k} e_{i} / \sqrt{n}\right|, $

(3)

where $ \widehat{\boldsymbol{Z}}_{i j k}=\widehat{\mathit{\pmb{\Theta}}}_{j}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \widehat{\mathit{\pmb{\Theta}}}_{k}-\widehat{\mathit{\pmb{\Theta}}}_{j k} $ independent of $ \left\{ {{e_i}} \right\}_{i = 1}^n $ and E to be an arbitrary subset of [p]×[p]. The bootstrap critical value is determined by

$ \left.c_{1-\alpha, E}=\inf \left\{t \in \mathbb{R}: \mathbb{P}_{e}\left(W_{E} \leqslant t\right) \leqslant 1-\alpha\right)\right\}, $

(4)

denoting the (1-α) -quantile of the statistic W_E, where $ \mathbb{P}_e $ is the probability measure induced by the variables $ \left\{ {{e_i}} \right\}_{i = 1}^n $ with X fixed, that is, $ {\mathbb{P}_e}\left( {{W_E} \le t} \right) = \mathbb{P}\left( {{W_E} \le t|X} \right) $. The core point lies in that we want to use the quantile of the statistic W_E to asymptotically estimate the quantile of the statistic T_E. We will provide the strict theoretical proof in Section 3.

Remark 1.1 Bonferroni-Holm adjustment states that if an experimenter is testing p hypotheses on a set of data, then the statistical significance level for each independent hypothesis separately is 1/p times what it would be if only one hypothesis were tested. However, the bootstrap uses the quantile of the multiplier bootstrap statistic to asymptotically estimate the quantile of the target statistic and takes dependence among the test statistics into account. Thus the original method with Bonferroni-Holm is on the conservative side, while the bootstrap is closer to the preassigned significance level.

1.4 A unified theory for confidence intervals

Define the parameter set

$ \begin{array}{*{35}{l}} \mathcal{M}(s)=\left\{ \mathit{\pmb{\Theta}} \in {{\mathbb{R}}^{p\times p}}:1/L\le {{\lambda }_{\min }}\left( \mathit{\pmb{\Theta}} \right)\le \right. \\ \begin{align} & \;\;\;\;\;\;\;\;\;\;\;\;{{\lambda }_{\max }}\left( \mathit{\pmb{\Theta}} \right)\le L, \\ & \left. \;\;\;\;\;\underset{j\in [p]}{\mathop{\max }}\,{{\left\| {{\mathit{\pmb{\Theta}} }_{j}} \right\|}_{0}}\le s,\|\mathit{\pmb{\Theta}} {{\|}_{1}}\le C \right\} \\ \end{align} \\ \end{array} $

for some 1≤L≤C. We can extend the above conclusions to more general regime. Let $ \widehat {\mathit{\pmb{\Theta}}} $ be any estimator of the precision matrix Θ satisfying the following event H:

$ \|\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta}\|_{\max }=O_{p}(\sqrt{\log p / n}), $

$ \|\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta}\|_{1}=O_{p}(s \sqrt{\log p / n}), $

$ \left\|\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\varTheta}}-\boldsymbol{I}_{p}\right\|_{\max }=O_{p}(\sqrt{\log p / n}), $

with probability tending to one uniformly over the parameter space $ \mathcal{M}(s) $. In fact, event H contains very mild conditions. As we all know, many precision matrix estimation methods satisfy the above conditions, such as graphical Lasso, CLIME ^[14], SCIO ^[15] and so on. The de-biased graphical Lasso inference has already been proposed in Ref.[3]. Different from them, we conclude a more general theory as long as it satisfies the event H, we can obtain simultaneous confidence intervals honestly. Besides, our theory applies to sub-Gaussian case. Theoretical properties will be provided in the following section.

2 Theoretical properties

Before giving the theoretical properties, we list two technical conditions.

(A1) Assume that $ s\log p/\sqrt n = o\left( 1 \right) $.

(A2) Assume that B_n²(log(pn))⁷/n≤C₁n^-c₁, where B_n≡1 be a sequence of constants and $ \mathop {\lim }\limits_{n \to \infty } {B_n} = \infty $, and c₁ and C₁ are some positive constants.

Proposition 2.1 (Lemma 1 of Ref.[4]) Consider the sub-Gaussian model and let $ \widehat {\mathit{\pmb{\Theta}}} $ be the nodewise Lasso estimator with $ \lambda_{j} \asymp \sqrt{\log p / n} $ uniformly in j. Then for any (j, k)∈[p]×[p], we have

$ \sqrt{n}\left(\breve{\Theta}_{j k}-\Theta_{j k}\right)=- \boldsymbol{\varTheta}_{j}^{\mathrm{T}}(\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}) \boldsymbol{\varTheta}_{k}+\varDelta_{j k}, $

where $ \Delta_{j k}=-\sqrt{n}\left\{\left(\widehat{\mathit{\pmb{\Theta}}}_{j}-\mathit{\pmb{\Theta}}_{j}\right)^{\mathrm{T}}\left(\widehat{\mathit{\pmb{\Sigma}}} \widehat{\mathit{\pmb{\Theta}}}_{k}-e_{k}\right)+\left(\widehat{\mathit{\pmb{\Theta}}}_{k}-\mathit{\pmb{\Theta}}_{k}\right)^{\mathrm{T}}\right. \left.\left(\widehat{\mathit{\pmb{\Sigma}}} \mathit{\pmb{\Theta}}_{j}-e_{j}\right)\right\} $ and

$ \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)} \mathbb{P}\left\{\max \limits_{(j, k) \in[p] \times[p]}\left|\Delta_{j k}\right| \geqslant O\left(\frac{s \log p}{\sqrt{n}}\right)\right\}=0. $

Furthermore, define the variance $ \sigma _{jk}^2 = Var\left( {\mathit{\pmb{\Theta}} _j^{\rm{T}}{\boldsymbol{X}_1}\boldsymbol{X}_1^{\rm{T}}{\mathit{\pmb{\Theta}} _k}} \right) $. If condition (A1) holds, we have

$ \begin{array}{c} \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)} \mid \mathbb{P}\left(\sqrt{n}\left(\breve{\varTheta}_{j k}-\varTheta_{j k}\right) / \widehat{\sigma}_{j k} \leqslant z\right)- \\ \Phi(z) \mid=0, \end{array} $

where $ \widehat{\boldsymbol{\sigma}}_{j k}^{2}=\sum\limits_{i = 1}^n\left(\widehat{\mathit{\pmb{\Theta}}}_{j}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \widehat{\mathit{\pmb{\Theta}}}_{k}\right)^{2} / n-\widehat{\mathit{\pmb{\Theta}}}_{j k}^{2} $ is a consistent estimator of σ_jk².

Based on the asymptotic normality properties established in Proposition 2.1, we have the following simultaneous confidence intervals for multiple entries Θ_jk.

Theorem 2.1 Assume that conditions (A1)-(A2) hold. Then for any E⊆[p]×[p], we have

$ \begin{array}{c} \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s) } \sup \limits_{\alpha \in(0,1)}\mid \mathbb{P}\left(\sqrt{n}\left\|\breve{\boldsymbol{\varTheta}}_{E}-\boldsymbol{\varTheta}_{E}\right\|_{\max } \leqslant c_{1-\alpha, E}\right)- \\ (1-\alpha) \mid=0, \end{array} $

where Θ_E denotes the entries of Θ with indices in E.

Theorem 2.1 states that we can approximate the (1-α)-quantile of $ \mathop {\max }\limits_{(j, k) \in E} \sqrt n \left| {{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\smile$}} \over {\mathit{\Theta}} } }_{jk}} - {\mathit{\Theta} _{jk}}} \right| $ by (1-α)-quantile of the multiplier bootstrap statistic W_E consistently. Note that although we cannot compute the quantile of W_E analytically, we can obtain it via Monte Carlo simulations in practice. This result derives the simultaneous confidence intervals for Θ_jk as: $ {\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\smile$}} \over {\mathit{\Theta} }} _{jk}} \pm {c_{1 - \alpha , E}}/\sqrt n $ where (j, k)∈E. As discussed in Ref.[9], our method applies to high-dimensional situation. In contrast with the earlier Bonferroni simultaneous confidence intervals, our bootstrap-assisted scheme is asymptotically nonconservative and can work for arbitrary subset E.

Next, we extend the above theory to more general case and conclude the unified theory for precision matrix inference.

Theorem 2.2 Assume that event H holds. Then we have

for any E⊆[p]×[p], where Θ_E denotes the entries of Θ with indices in E.

Next, we extend the above theory to more general case and conclude the unified theory for precision matrix inference.

Theorem 2.3 Assume that event H holds. Then we have

(A)(Individual inference)

$ \begin{align} & \underset{n\to \infty }{\mathop{\lim }}\,\underset{\mathit{\pmb{\Theta}} \to \mathcal{M}\left( s \right)}{\mathop{\sup }}\,\left| \mathbb{P}\left( \sqrt{n}\left( {{{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\mathit{\Theta} }}}_{jk}}-{{\mathit{\Theta} }_{jk}} \right)/\widehat{{{\sigma }_{jk}}}\le z \right) \right.- \\ & \left. \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\Phi \left( z \right)=0, \right| \\ \end{align} $

(B)(Simultaneous inference)

for any E⊆[p]×[p], where Θ_E denotes the entries of Θ with indices in E.

Theorem 2.3 presents general conclusions for both individual and simultaneous confidence intervals. That is, our inferential procedures work for any estimation methods for precision matrix as long as the estimation effect satisfies event H.

3 Numerical studies

In this section, we investigate the finite sample performance of the methods proposed in Section 3 and provide a comparison to simultaneous confidence interval for de-biased graphical Lasso, denoted by S-NL and S-GL, respectively. We now present two numerical examples and evaluate the methods by estimated average coverage probabilities (avgcov) and average confidence interval lengths (avglen) over two cases: support set S and its complement S^c. For convenience, we only consider Gaussian setting. The implementation for de-biased nodewise Lasso and de-biased graphical Lasso are suggested by Ref.[4]. Throughout the simulation, the level of significance is set at α=0.05 and the coverage probabilities and interval lengths calculated by averaging over 100 simulation runs and 500 Monte Carlo replications. For extra comparison, we also record individual confidence intervals for de-biased nodewise Lasso and de-biased graphical Lasso, denoted by I-NL and I-GL, respectively.

3.1 Numerical example 1: band structure

We start with a numerical example which has the similar setting as that in Ref.[3]. We consider the precision matrix Θ with the band structure, where $ {\mathit{\Theta} _{jj}} = 1, {\mathit{\Theta} _{j, j + 1}} = {\mathit{\Theta} _{j + 1, j}} = \rho $ for j=1, 2, ⋯, p-1, andzero otherwise. We sample the rows of the n×p data matrix X as i.i.d.copies from the multivariate Gaussian distribution N(0, Σ) where Σ=Θ^-1. We fix the sample size n=100 and consider a range of dimensionality p=300, 500 and link strength ρ=0.2, 0.3, 0.4, 0.5, respectively. The results are summarized in Table 1.

Table 1 Averaged coverage probabilities and lengths over the support set S and its complement S^c in Section 3.1

ρ	Method	p=300				p=500
		S		S^c		S		S^c
		Avgcov	Avglen	Avgcov	Avglen	Avgcov	Avglen	Avgcov	Avglen
0.2	S-NL	0.932	0.824	0.931	0.793	0.925	0.743	0.945	0.841
	S-GL	0.788	0.798	0.942	0.812	0.893	0.751	0.953	0.847
	I-NL	0.955	0.424	0.952	0.375	0.954	0.428	0.958	0.373
	I-GL	0.956	0.425	0.976	0.372	0.953	0.427	0.981	0.375
0.3	S-NL	0.919	0.726	0.935	0.712	0.916	0.722	0.943	0.833
	S-GL	0.752	0.740	0.938	0.802	0.846	0.745	0.944	0.816
	I-NL	0.931	0.399	0.963	0.349	0.927	0.396	0.962	0.346
	I-GL	0.899	0.400	0.987	0.349	0.846	0.402	0.992	0.351
0.4	S-NL	0.912	0.713	0.934	0.804	0.873	0.699	0.939	0.821
	S-GL	0.725	0.721	0.936	0.798	0.623	0.704	0.936	0.813
	I-NL	0.932	0.399	0.963	0.349	0.895	0.364	0.977	0.315
	I-GL	0.807	0.388	0.995	0.333	0.696	0.393	0.997	0.338
0.5	S-NL	0.790	0.498	0.864	0.458	0.731	0.256	0.792	0.305
	S-GL	0.598	0.782	0.931	0.672	0.425	0.902	0.931	0.683
	I-NL	0.811	0.234	0.890	0.214	0.796	0.134	0.834	0.153
	I-GL	0.642	0.422	0.997	0.353	0.538	0.454	0.990	0.381

Table 1 Averaged coverage probabilities and lengths over the support set S and its complement S^c in Section 3.1

In terms of avgcov and avglen, it is clear that our proposed S-NL method outperforms other alternative methods with higher avgcov and shorter avglen in most settings. Although the avglen over S^c may be a little longer in some cases, it is amazing that the coverage probabilities in S approach the nominal coverage 95%. On the other hand, the advantage becomes more evident as p and ρ increase. Compared with individual confidence intervals, simultaneous confidence intervals have longer lengths and lower coverage probabilities. This is reasonable because multiplicity adjustment damages partial accuracy which is inevitable.

3.2 Numerical example 2: nonband structure

For the second numerical example, we use the same setup as simulation example 1 in Ref.[16] to test the performance of S-NL in more general cases. We generate the precision matrix in two steps. First, we create a band matrix Θ₀ the same as that in Section 3.1. Second, we randomly permute the rows and columns of Θ₀ to obtain the precision matrix Θ. The final precision matrix Θ no longer has the band structure. Then we sample the rows of the n×p data matrix X as i.i.d. copies from the multivariate Gaussian distribution N(0, Σ) where Σ=Θ^-1. Throughout this simulation, we fix the sample size n=200, dimensionality p=1 000 and consider a range of ρ=0.2, 0.3, 0.4, 0.5.

Simulation results summarized in Table 2 also illustrate that our method can achieve the preassigned significance level asymptotically and behaves better than others in most cases. Moreover, we can see our method is very robust especially in large ρ.

Table 2 Averaged coverage probabilities and lengths over the support set S and its complement S^c in Section 3.2

4 Discussions

In this paper, we apply bootstrap-assisted procedure to make valid simultaneous inference for high-dimensional precision matrix based on the recent de-biased nodewise Lasso estimator. In addition, we summary a unified framework to perform simultaneous confidence intervals for high-dimensional precision matrix under the sub-Gaussian case. As long as some estimation effects are satisfied, our procedure can focus on different precision matrix estimation methods which owns great flexibility. Further, this method can be expended to more general settings, such as functional graphical model where the samples are consisted of functional data. We leave this problem for further investigations.

Appendix A.1 Preliminaries

We first provide a brief overview of the results for the nodewise Lasso and Gaussian approximation in the following Propositions.

Proposition A.1 (Theorem 1 of Ref.[4], Asymptotic normality). Suppose that $ \widehat {\mathit{\pmb{\Theta}}} $ is the nodewise Lasso estimator with regularization parameters $ \lambda_{j} \asymp \sqrt{\frac{\log p}{n}} $ uniformly in j. Then, for every (i, j)∈E and $ z\in \mathbb{R} $, it holds

$ \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)}\left|\mathbb{P}\left(\sqrt{n}\left(\breve{\varTheta}_{i j}-\varTheta_{i j}\right) / \sigma_{i j} \leqslant z\right)-\varPhi(z)\right|=0. $

Proposition A.2 (Lemma 2 of Ref.[4], Variance estimation). Suppose that $ \mathop {\lim }\limits_{n \to \infty } {\mkern 1mu} \;{\log ^4}\left( {p \vee n} \right)/{{n}^{1-\epsilon}} $ feo some $ \epsilon > 0 $. Let $ \widehat {\mathit{\pmb{\Theta}}} $ be the nodewise Lasso estimator and let $ {{\lambda }_{j}}\ge c\tau \sqrt{\log p/n} $ uniformly in j for some τ, c>0. Let $ \hat{\sigma }_{ij}^{2}=\sum\limits_{k=1}^{n}{{{\left( \widehat{\mathit{\pmb{\Theta }}}_{i}^{\text{T}}{{\mathit{\pmb{X}}}_{(k)}}\mathit{\pmb{X}}_{(k)}^{\text{T}}{{\widehat{\mathit{\pmb{\Theta }}}}_{j}} \right)}^{2}}}/n-\widehat{\mathit{\Theta }} _{ij}^{2} $. Then for all η>0

$ \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s)} \mathbb{P}\left(\max \limits_{i, j=1, \cdots, p}\left|\widehat{\sigma}_{i j}^{2}-\sigma_{i j}^{2}\right| \geqslant \eta\right)=0. $

Proposition A.3 (Corollary 2.1 of Ref.[9], Gaussian approximation) Let $ {{\boldsymbol{x}}_{i}}={{\left( {{x}_{i1}}, \cdots , {{x}_{ip}} \right)}^{\text{T}}}\in {{\mathbb{R}}^{p}}, \boldsymbol{X}:=\left(X_{1}, \cdots, X_{p}\right)^{\prime}:=\frac{1}{\sqrt{n}}\sum\limits_{i=1}^{n} \boldsymbol{x}_{i}, T_{0}:= {{\max }_{1\le j\le p}}{{X}_{j}} $ and $ \boldsymbol{y}_{i}=\left(y_{i 1}, \cdots, y_{i p}\right)^{\mathrm{T}} \in \mathbb{R}^{p}, \boldsymbol{Y}:= \left(Y_{1}, \cdots, Y_{p}\right)^{\prime}:= \frac{1}{\sqrt{n}}\sum\limits_{i=1}^{n}\boldsymbol{y}_{i} $ is the Gaussian analog of X in the sense of sharing the same mean and covariance matrix, namely E[X]=E[Y]=0 and $ \mathrm{E}\left[\boldsymbol{X} \boldsymbol{X}^{\prime}\right]=\mathrm{E}\left[\boldsymbol{Y} \boldsymbol{Y}^{\prime}\right]={{n}^{-1}}\sum\limits_{i=1}^{n}{\text{E}}\left[\boldsymbol{x}_{i} \boldsymbol{x}_{i}^{\prime}\right] $. Define the Gaussian analog Z₀ of T₀ as the maximum coordinate of vector $ \boldsymbol{Y}:{Z_0}: = \mathop {\max }\limits_{1 \le j \le n} {Y_j} $. Let B_n≥1 be a sequence of constants and $ \mathop {\lim }\limits_{n \to \infty } {\mkern 1mu} {B_n} = \infty $. Suppose that there exist constants c₁>0, C₁>0, c₂>0 and C₂>0 such that the following condition (E.1) is satisfied:

$ c_{1} \leqslant \sum\limits_{i=1}^{n} \mathbb{E} x_{i j}^{2} / n \leqslant C_{1}, $

$ \begin{array}{l} \max \limits_{r=1,2} \sum\limits_{i=1}^{n} \mathbb{E}\left(\left|x_{i j}\right|^{2+r} / B_{n}^{r}\right) / n+\mathbb{E}\left(\exp \left(\left|x_{i j}\right| / B_{n}\right)\right) \\ \ \ \ \ \ \ \ \ \ \ \leqslant 4, \end{array} $

and B_n⁴(log(pn))⁷/n≤C₂n^-c₂. Then there exist constants c>0 and C>0 depending only on c₁, C₁, c₂, and C₂ such that

$ \rho:=\sup \limits_{t \in \mathbb{R}}\left|P\left(T_{0} \leqslant t\right)-P\left(Z_{0} \leqslant t\right)\right| \leqslant C n^{-c} \rightarrow 0. $

A.2 Proof of Theorem 2.1

Without loss of generality, we set E=[p]×[p]. For any (j, k)∈E, define

$ T_{E}=: \max \limits_{(j, k) \in E} \sqrt{n}\left(\breve{\Theta}_{j k}-\Theta_{j k}\right), W_{E}=: \max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} \widehat{Z}_{i j k} e_{i} / \sqrt{n} , $

$ T_{0}=: \max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Z_{i j k} / \sqrt{n}, W_{0}=: \max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Z_{i j k} e_{i} / \sqrt{n}, $

where $ \widehat{Z}_{i j k}=\widehat{\mathit{\pmb{\Theta}}}_{j}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \widehat{\mathit{\pmb{\Theta}}}_{k}-\widehat{\mathit{\pmb{\Theta}}}_{j k} $ and $ Z_{i j k}=\mathit{\pmb{\Theta}}_{j}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}}\mathit{\pmb{\Theta}}_{k}-\mathit{\Theta}_{j k} $. By following the same arguments in the proof of Lemma 3 and Lemma 4, we have

$ \mathbb{P}\left(c_{1-\alpha, W} \leqslant c_{\left(1-\alpha+\xi_{2}\right), W_{0}}+\xi_{1}\right) \geqslant 1-\xi_{2} , $

$ \mathbb{P}\left(c_{1-\alpha, W_{0}} \leqslant c_{1-\alpha+{\rm{\pi }}(\nu), W}\right) \geqslant 1-\mathbb{P}(\Gamma>\nu). $

Let $ {\rm{ \mathit{ π} }}(\nu ): = {C_2}{\nu ^{1/3}}{(1 \vee \log (|E|/\nu ))^{2/3}}{\kappa _1}(\nu ) = {c_{1 - \alpha - {\xi _2} - {\rm{ \mathit{ π} }}(\nu ), {Y_0}}} $ and $ {\kappa _2}(\nu ) = {c_{1 - \alpha + {\xi _2} + {\rm{ \mathit{ π} }}(\nu ), {Y_0}}} $. Then by Lemma A.1, A.3 and invoking the similar proof of Corollary 3.1 in Ref.[9], we have $ \mathop {\sup }\limits_{\alpha \in \left( {0, 1} \right)} \left|\mathbb{P}\left(T_{E}<c_{1-\alpha, W_{E}}\right)-(1-\alpha)\right| \leqslant \mathop {\sup }\limits_{\alpha \in \left( {0, 1} \right)} \mathbb{P}\left(\left(T_{E}<\right.\right. \left.\left.c_{1-\alpha, W_{E}}\right) \ominus\left(T_{0}<c_{1-\alpha, Y_{0}}\right)\right) \leqslant \mathbb{P}\left(\kappa_{1}(\nu)- 2 \xi_{1}<T_{0}\left.\leqslant \kappa_{2}(\nu)+2 \xi_{1}\right)+2 \mathbb{P}(\Gamma>\nu)+3 \xi_{2} \leqslant \mathbb{P}\left(\kappa_{1}(\nu)-\right. \left.2 \xi_{1}<Y_{0} \leqslant \kappa_{2}(\nu)+2 \xi_{1}\right)+2 \mathbb{P}(\Gamma> \right. $$ \nu)+3 \xi_{2}+ 2 C n^{-c} \leqslant 2 \rm{ \mathit{ π} }(\nu)+2 \mathbb{P}(\Gamma>\nu)+2 C n^{-c}+C_{2} \xi_{1} \sqrt{1 \bigvee \log \left(|E| / \xi_{1}\right)}+5 \xi_{2} \leqslant o(1) $ where $ A \ominus B $ denotes their symmetric difference, that is, $ A \ominus B = \left( {A/B} \right) \cup \left( {B\backslash A} \right) $. Thus we get

$ \begin{array}{c} \lim \limits_{n \rightarrow \infty} \sup \limits_{\boldsymbol{\varTheta} \in \mathcal{M}(s) } \sup \limits_{\alpha \in(0,1)}\mid \left(\mathbb{P} \sqrt{n}\left\|\breve{\boldsymbol{\varTheta}}_{E}-\boldsymbol{\varTheta}_{E}\right\|_{\max } \leqslant c_{1-\alpha, E}\right)- \\ (1-\alpha) \mid=0, \end{array} $

which conclude the proof.

A.3 Proof of Theorem 2.3

To enhance the readability, we split the proof into three steps by providing the bound on bias term, establishing asymptotic normality and verifying the variance consistency.

Step 1

$ \begin{aligned} \breve{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta}=& \widehat{\boldsymbol{\varTheta}}-\widehat{\boldsymbol{\varTheta}}^{\mathrm{T}}\left(\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\theta}}-\boldsymbol{I}_{p}\right)-\boldsymbol{\varTheta} \\ =&-\boldsymbol{\varTheta}(\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}) \boldsymbol{\varTheta}-\left(\boldsymbol{\varTheta} \widehat{\boldsymbol{\varSigma}}-\boldsymbol{I}_{p}\right)(\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta})-\\ &(\widehat{\boldsymbol{\varTheta}}-\boldsymbol{\varTheta})^{\mathrm{T}}\left(\widehat{\boldsymbol{\varSigma}} \widehat{\boldsymbol{\varTheta}}-\boldsymbol{I}_{p}\right) \\ : & \boldsymbol{Z}+\boldsymbol{\varDelta}_{1}+\boldsymbol{\varDelta}_{2}. \end{aligned} $

For the first part bias, we have $ \left\|\mathit{\pmb{\Delta}}_{1}\right\|_{\max }=\|(\mathit{\pmb{\Theta}} \widehat{\mathit{\pmb{\Sigma}}}-\left.\left.\boldsymbol{I}_{p}\right)(\widehat{\mathit{\pmb{\Theta}}}-\mathit{\pmb{\Theta}})\left\|_{\max } \leqslant\right\| \mathit{\pmb{\Theta}} \widehat{\mathit{\pmb{\Sigma}}}-\boldsymbol{I}_{p}\right)\left\|_{\max }\right\| \widehat{\mathit{\pmb{\Theta}}}-\mathit{\pmb{\Theta}} \|_{1}={O_p}\left( {s\log p/n} \right) $, which is the result of event H and $ \mathop {\max }\limits_{k \in \left[ p \right]} \left\|\widehat{\mathit{\pmb{\Sigma}}} \mathit{\pmb{\Theta}}_{k}-\boldsymbol{e}_{k}\right\|_{\infty} \leqslant\|\widehat{\mathit{\pmb{\Sigma}}}-\mathit{\pmb{\Sigma}}\|_{\max }\left\|\mathit{\pmb{\Theta}}_{k}\right\|_{1}={O_p}\left( {s\log p/n} \right) $ by Lemma 6. For the second part bias, we have $ \left\|\mathit{\pmb{\Delta}}_{2}\right\|_{\max }=\left\|(\widehat{\mathit{\pmb{\Theta}}}-\mathit{\pmb{\Theta}})^{\mathrm{T}}\left(\widehat{\mathit{\pmb{\Sigma}}} \widehat{\mathit{\pmb{\Theta}}}-\boldsymbol{I}_{p}\right)\right\|_{\max } \leqslant \left\|\widehat{\mathit{\pmb{\Sigma}}} \widehat{\mathit{\pmb{\theta}}}-\boldsymbol{I}_{p}\right\|_{\max }\|\widehat{\mathit{\pmb{\Theta}}}-\mathit{\pmb{\Theta}}\|_{1}=O_{p}(s \log p / n) $. Therefore, by condition A2, we have bias term $ \|\mathit{\pmb{\Delta}}\|_{\max }=\left\|\sqrt{n}\left(\mathit{\pmb{\Delta}}_{1}+\mathit{\pmb{\Delta}}_{2}\right)\right\|_{\max } \leqslant \sqrt{n}\left(\left\|\mathit{\pmb{\Delta}}_{1}\right\|_{\max }+\right.\left.\left\|\mathit{\pmb{\Delta}}_{2}\right\|_{\max }\right)=O_{p}(s \log p / \sqrt{n})=o_{p}(1) $, which complete the first step proof.

Step 2 The proof is the direct conclusion of Theorem 1 of Ref.[4].

Step 3 The proof is the direct conclusion of Lemma 2 of Ref.[4].

The subsequent proof is similar to Theorem 2.1, thus we omit the details.

A.4 Lemmas and their proofs

The following lemmas will be used in the proof of the main theorem.

Lemma A.1 Assume that conditions (A1)-(A4) hold. Then for any E⊆[p]×[p] we have

$ \begin{array}{l} \sup \limits_{t \in \mathbb{R}} \mid \mathbb{P}\left(\max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Z_{i j k} / \sqrt{n} \leqslant t\right)- \\ \ \ \ \ \mathbb{P}\left(\max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n} Y_{i j k} / \sqrt{n} \leqslant t\right) \mid \leqslant C_{0} n^{-c_{0}}, \end{array} $

where {Y_ijk}_{(j, k)∈E} are Gaussian analogs of {Z_ijk}_{(j, k)∈E} in the sense of sharing the same mean and covariance for i=1, 2, ⋯, n.

Proof The proof is based upon verifying conditions from Corollary 2.1 of Ref.[9]. To be concrete, we require to prove the following condition (E.1).

$ c_{1} \leqslant \sum\limits_{i=1}^{n} \mathbb{E} Z_{i j k}^{2} / n \leqslant C_{1} , $

$ \begin{array}{l} \max \limits_{r=1,2} \sum\limits_{i=1}^{n} \mathbb{E}\left(\left|Z_{i j k}\right|^{2+r} / B_{n}^{r}\right) / n+ \\ \ \ \ \ \ \ \ \ \ \ \ \ \mathbb{E}\left(\exp \left(\left|Z_{i j k}\right| / B_{n}\right)\right) \leqslant 4, \end{array} $

where B_n≥1 be a sequence of constants and $ \mathop {\lim }\limits_{n \to \infty } {B_n} = \infty $. By sub-Gaussian setting and condition A1, we have $ \|\mathit{\pmb{\Theta}}\|_{2} \leqslant \lambda_{\max }(\mathit{\pmb{\Theta}}|) \leqslant L $ and K=O(1). Then invoking Lemma 8, we have a moment bound for $ r \geqslant 2, \mathbb{E}\left|Z_{i j k}\right|^{r} \leqslant\left(2 L^{2} K^{2}\right)^{r} r ! / 2 $. Then for r=2, there exist c₁ and C₁ such that $ c_{1} \leqslant \mathbb{E} Z_{i j k}^{2} \leqslant C_{1} $. On the other hand, $ \mathbb{E} \exp \left(Z_{i j k} / B_{n}\right)=1 + \mathop \sum \limits_{r = 1}^\infty \frac{\mathbb{E}\left|Z_{i j k}\right|^{r}}{B_{n}^{r} r !} \leqslant 1 + \mathop \sum \limits_{r = 1}^\infty \frac{\left(C^{\prime}\right)^{r} r !}{2 B_{n}^{r} r !} \leqslant 1 + \mathop \sum \limits_{r = 1}^\infty \left(\frac{C^{\prime}}{B_{n}}\right)^{r}<\infty $, where we use the fact that B_n is some sufficiently large constant. Thus we have

$ \max \limits_{r=1,2} \mathbb{E}\left|Z_{i j k}\right|^{2+r} / B_{n}^{r}+\mathbb{E} \exp \left(\left|Z_{i j k}\right| / B_{n}\right) \leqslant 4, $

which conclude the proof.

Lemma A.2 Let V and Y be centered Gaussian random vectors in $ {{\mathbb{R}}^{p}} $ with covariance matrices Σ^V and Σ^Y respectively. Suppose that there are some constants 0 < c₁ < C₁ such that $ {{c}_{1}}\le \mathit{\Sigma} _{jj}^{Y}\le {{C}_{1}} $ for all 1≤j≤p. Then there exists a constant C>0 depending only on c₁ and C₁ such that

$ \begin{array}{c} \sup \limits_{t \in \mathbb{R}} \mid \mathbb{P}\left(\max \limits_{1 \leqslant j \leqslant p} V_{j} \leqslant t\right)-\mathbb{P}\left(\max \limits_{1 \leqslant j \leqslant p} Y_{j} \leqslant t\right) \mid \\ \leqslant C \varDelta_{0}^{1 / 3}\left(1 \vee \log \left(p / \varDelta_{0}\right)\right)^{2 / 3}, \end{array} $

where $ {\mathit{\Delta} _0}: = \mathop {\max }\limits_{1 \le j, k \le p} \left| {\mathit{\Sigma} _{jk}^V - \mathit{\Sigma} _{jk}^Y} \right| $.

Proof The proof is the same as Lemma 3.1 of Ref.[9]

Lemma A.3 Suppose that there are some constants 0 < c₁ < C₁ such that $ {c_1} \le \sum\limits_{i = 1}^n E Z_{ijk}^2/n \le {C_1} $ for all (j, k)∈E. Let $ {Y_0} = \mathop {\max }\limits_{(j, k) \in E} \sum\limits_{i = 1}^n {{Y_{ijk}}} /\sqrt n $, where Y_ijk is the same with that in Lemma 1. Then for every α∈(0, 1),

$ \mathbb{P}\left(c_{1-\alpha, W_{0}} \leqslant c_{1-\alpha+{\rm{\pi }}(\nu), Y_{0}}\right) \geqslant 1-\mathbb{P}(\Gamma>\nu) , $

$ \mathbb{P}\left(c_{1-\alpha, Y_{0}} \leqslant c_{1-\alpha+{\rm{\pi }}(\nu), W_{0}}\right) \geqslant 1-\mathbb{P}(\Gamma>\nu), $

Proof Recall that $ \Gamma = \mathop {\max }\limits_{\left( {j, k} \right) \in E} \left| {\sum\limits_{i = 1}^n {{Z_{ijk}}} {Z_{i{j^\prime }{k^\prime }}}/n - \sum\limits_{i = 1}^n \mathbb{E} \left( {{Z_{ijk}}{Z_{i{j^\prime }{k^\prime }}}} \right)/n} \right| $. By Lemma A.2, on the event {X: Γ≤ν}, we have $ \left| {\mathbb{P}\left( {{Y_0} \le t} \right) - \mathbb{P}\left( {{W_0} \le t\left| \boldsymbol{X} \right.} \right)} \right| \le {\rm{ \mathit{ π} }}\left( \nu \right) $, for all $ t\in \mathbb{R} $. Thus condition on this event we have $ \mathbb{P}\left( {{W_0} \le {c_{1 - \alpha + \rm{ \mathit{ π} } (\nu ), {Y_0}}}} \right) \ge \mathbb{P}\left( {{c_{1 - \alpha + \rm{ \mathit{ π} } (\nu ), {Y_0}}}} \right) - {\rm{ \mathit{ π} }}(\nu ) \ge \alpha + {\rm{ \mathit{ π} }}(\nu ) - {\rm{ \mathit{ π} }}(\nu ) = \alpha $, which conclude the first proof. The second claim follows similarly.

Lemma A.4 Assume that conditions (A1)-(A4) hold. Then for any (j, k)∈E we have

$ \mathbb{P}\left(\left|T_{E}-T_{0}\right|>\xi_{1}\right)<\xi_{2}, $

$ \mathbb{P}\left(\mathbb{P}_{e}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right)>\xi_{2}\right)<\xi_{2}, $

where ξ₁=o(1), ξ₂=o(1) and $ {\xi _1}\sqrt {\log p/n} + {\xi _2} \le {C_2}{n^{ - {c_2}}} $.

Proof

Bounds for |T_E-T₀|: Recall that

$ \left|T_{E}-T_{0}\right| \leqslant \max \limits_{(j, k) \in E}\left|\varDelta_{j k}\right| $

It follows from Theorem 2.1 that

$ \mathbb{P}\left\{\max \limits_{(j, k) \in E}\left|\Delta_{j k}\right| \geqslant O\left(\frac{s \log p}{n}\right)\right\} \leqslant o\ \ \ \ (1), $

where ξ₁= O(slogp/n)=o(1) and ξ₂=o(1).

$ \begin{array}{l} \ \ \ \ \ \ \ \ {\bf { Bounds\ \ for }} \quad\left|W_{E}-W_{0}\right|: \\ \left|W_{E}-W_{0}\right| \leqslant \max \limits_{(j, k) \in E}\left|\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right) e_{i} / \sqrt{n}\right| . \end{array} $

Let $ {A_n} = \left| {\sum\limits_{i = 1}^n {\left( {{{\hat Z}_{ijk}} - {Z_{ijk}}} \right)} {e_i}/\sqrt n } \right| $, we have

$ \begin{array}{l} \mathbb{E}\left(A_{n}\right) \leqslant \mathbb{E}_{X} \sqrt{\mathbb{E}_{e}\left[\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right) e_{i} / \sqrt{n}\right]^{2}} \leqslant\\ \mathbb{E}_{X} \sqrt{\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right)^{2} / n} \leqslant \sqrt{\mathbb{E}_{X}\left[\sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right)^{2} / n\right]}. \end{array} $

By Lemma A.5, $ \sum\limits_{i = 1}^n {{{\left( {{{\hat Z}_{ijk}} - {Z_{ijk}}} \right)}^2}} /n = o(1) $, we directly get $ \mathbb{E}\left( {{A_n}} \right) \le o\left( 1 \right) $. Thus, we have A_n=o(1) by applying Markov's inequality. Further, there exist ξ₁=o(1) and ξ₂=o(1) such that $ \mathbb{P}\left( {\left| {{W_E} - {W_0}} \right| > {\xi _1}} \right) < {\xi _2} $. By Markov's inequality, we have

$ \begin{array}{c} \mathbb{P}\left(\mathbb{P}_{e}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right)>\xi_{2}\right) \leqslant \\ \mathbb{E}\left[\mathbb{P}_{e}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right)\right] / \xi_{2}= \\ \mathbb{P}\left(\left|W_{E}-W_{0}\right|>\xi_{1}\right) / \xi_{2} \leqslant \xi_{2}^{2} / \xi_{2}=\xi_{2}, \end{array} $

which conclude the proof.

Lemma A.5

$ \max \limits_{(j, k) \in E} \sum\limits_{i=1}^{n}\left(\widehat{Z}_{i j k}-Z_{i j k}\right)^{2} / n=o_{p}(1). $

Proof

Since (a-b)²≤2(a²+b²), we have $ \mathop {\max }\limits_{(j, k) \in E} \sum\nolimits_{i = 1}^n {{{\left( {{{\hat Z}_{ijk}} - {Z_{ijk}}} \right)}^2}/n} = \mathop {\max }\limits_{(j, k) \in E} \sum\limits_{i = 1}^n {\left( {\widehat {\mathit{\pmb{\Theta}}} _j^{\rm{T}}{\boldsymbol{X}_{(i)}}\boldsymbol{X}_{(i)}^{\rm{T}}{{\widehat {\mathit{\pmb{\Theta}}} }_k} - {{\widehat {\mathit{\pmb{\Theta}}} }_{jk}} - \mathit{\pmb{\Theta}} _j^{\rm{T}}{\boldsymbol{X}_{(i)}}\boldsymbol{X}_{(i)}^{\rm{T}}{{\mathit{\pmb{\Theta}}} _k} } \right.} $+$ {\left. {{\mathit{\pmb{\Theta}} _{jk}}} \right)^2}/n \le 2\mathop {\max }\limits_{(j, k) \in E} \left[ {\sum\limits_{i = 1}^n {{{\left( {\widehat {\mathit{\pmb{\Theta}}} _j^{\rm{T}}{\boldsymbol{X}_{(i)}}\boldsymbol{X}_{(i)}^{\rm{T}}{{\widehat {\mathit{\pmb{\Theta}}} }_k} - \mathit{\pmb{\Theta}} _j^{\rm{T}}{\boldsymbol{X}_{(i)}}\boldsymbol{X}_{(i)}^{\rm{T}}{\mathit{\pmb{\Theta}} _k}} \right)}^2}} } \right] + 2\mathop {\max }\limits_{(j, k) \in E} {\left( {{{\widehat {\mathit{\pmb{\Theta}}} }_{jk}} - {\mathit{\pmb{\Theta}} _{jk}}} \right)^2}: = 2{I_1} + 2{I_2} $.

For the first part, it follows from triangle inequality that $ {I_1} \le \mathop {\max }\limits_{(j, k) \in E} \sum\limits_{i = 1}^n {{{\left( {\widehat {\mathit{\pmb{\Theta}}}_j^{\rm{T}}{\boldsymbol{X}_{(i)}}\boldsymbol{X}_{(i)}^{\rm{T}}\left( {{{\widehat {\mathit{\pmb{\Theta}}}}_k} - {{\mathit{\pmb{\Theta}}}_k}} \right)} \right)}^2}} /n $+$ \mathop {\max }\limits_{(j, k) \in E} \sum\limits_{i = 1}^n {{{\left( {{{\left( {{{\widehat {\mathit{\pmb{\Theta}}} }_j} - {\mathit{\pmb{\Theta}} _j}} \right)}^{\rm{T}}}{\boldsymbol{X}_{(i)}}\boldsymbol{X}_{(i)}^{\rm{T}}\quad {\mathit{\pmb{\Theta}} _k}} \right)}^2}} /n $. Since $ {\left\| {\sum\limits_{i = 1}^n {{\boldsymbol{X}_{(i)}}} \boldsymbol{X}_{(i)}^{\rm{T}}{{\widehat {\mathit{\pmb{\Theta}}} }_j}\widehat {\mathit{\pmb{\Theta}}} _j^{\rm{T}}{\boldsymbol{X}_{(i)}}\boldsymbol{X}_{(i)}^{\rm{T}}/n} \right\|_{\max }} \le \mathop {\max }\limits_{i \in [n]} {\left\| {{\boldsymbol{X}_{(i)}}\boldsymbol{X}_{(i)}^{\rm{T}}} \right\|_{\max }} \mathop {\max }\limits_{j \in [p]} \sum\limits_{i = 1}^n {\widehat {\mathit{\pmb{\Theta}}} _j^{\rm{T}}} {\boldsymbol{X}_{(i)}}\boldsymbol{X}_{(i)}^{\rm{T}}{\widehat {\mathit{\pmb{\Theta}}} _j}/n{ \le _p}(\log (np))\mathop {\max }\limits_{j \in [p]} {\left\| {{{\widehat {\mathit{\pmb{\Theta}}} }_j}} \right\|_1}\left\| {\widehat {\mathit{\pmb{\Sigma}}} } \right. $$ \widehat{\mathit{\pmb{\Theta}}}_{j} \|_{\infty} \leqslant O_{p}(\log (n p)) \mathop {\max }\limits_{j \in [p]}\left(\left\|\mathit{\pmb{\Theta}}_{j}\right\|_{1}+\left\|\widehat{\mathit{\pmb{\Theta}}}_{j}-\mathit{\pmb{\Theta}}_{j}\right\|_{1}\right)(1+ \left.O_{p}(\sqrt{\log p / n})\right)=O_{p}(\log (n p)) $ we have $ \mathop {\max }\limits_{\left( {j, k} \right) \in E} \mathop \sum \limits_{i = 1}^n \left(\widehat{\mathit{\pmb{\Theta}}}_{j}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}}\left(\widehat{\mathit{\pmb{\Theta}}}_{k}-\mathit{\pmb{\Theta}}_{k}\right)\right)^{2} / n=\mathop {\max }\limits_{\left( {j, k} \right) \in E}\left(\widehat{\mathit{\pmb{\Theta}}}_{k}-\mathit{\pmb{\Theta}}_{k}\right)^{\mathrm{T}} \mathop \sum \limits_{i = 1}^n \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \widehat{\mathit{\pmb{\Theta}}}_{j} \mathit{\pmb{\Theta }}_{j}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}}\left(\widehat{\mathit{\pmb{\Theta}}}_{k}-\mathit{\pmb{\Theta}}_{k}\right) / n \leqslant \mathop {\max }\limits_{k \in \left[ p \right]} \| \widehat{\mathit{\pmb{\Theta}}}_{k} $ -$ \mathit{\pmb{\Theta}}_{k} \|{ }_{2}^{2} O_{p}(\log (n p))=O_{p}\left(s^{2} \log p \log (n p) / n\right)=o_{p}(1) $.

For the second part, it is obvious that

$ \left|I_{2}\right| \leqslant O_{p}(\log p / n), $

which is a direct result of event H.

Combining them together, we conclude =O_p(s²logplog(np)/n)=o_p(1).

Lemma A.6 Assume that conditions (A1)-(A4) hold. Let

$ \|\widehat{\boldsymbol{\varSigma}}-\boldsymbol{\varSigma}\|_{\max } \leqslant O_{p}(\sqrt{\log p / n}). $

Proof he proof is the same as Lemma L.3 of Ref.[13], which follows by invoking the inequality

$ \left\|\boldsymbol{X}_{i} \boldsymbol{X}_{j}\right\|_{\psi_{1}} \leqslant 2\left\|\boldsymbol{X}_{i}\right\|_{\psi_{2}}\left\|\boldsymbol{X}_{j}\right\|_{\psi_{2}} \leqslant 2 c^{-2} $

and Proposition 5.16 in Ref.[17] and the union bound.

Lemma A.7 Let $ \left\{ {{\boldsymbol{X}_i}} \right\}_{i = 1}^n $ be identically p-dimensional sub-Gaussian vectors with $ \mathop {\max }\limits_{i \in [n], j \in [p]} {\left\| {{\boldsymbol{X}_{ij}}} \right\|_{{\psi _2}}} = C $. Then we have

$ \max \limits_{i \in[n]}\left\|\boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}}\right\|_{\max }<O_{p}(\log (n p)). $

Proof The proof is the same as Lemma L.5 of Ref.[13]. It follows from the fact $ \mathop {\max }\limits_{i \in [n]} {\left\| {{\boldsymbol{X}_{(i)}}\boldsymbol{X}_{(i)}^{\rm{T}}} \right\|_{\max }} = \mathop {\max }\limits_{i \in [n]} \left\| {{\boldsymbol{X}_i}} \right\|_\infty ^2 $ and (5.10) in Ref.[17] with the union bound.

Lemma A.8 Let $ \boldsymbol{\alpha} , \boldsymbol{\beta} \in {{\mathbb{R}}^{p}} $ such that $ {\left\| \boldsymbol{\alpha} \right\|_2} \le M, {\left\| \boldsymbol{\beta} \right\|_2} \le M $. Let X_i satisfy the sub-Gaussian setting with a positive constant K. Then for any r≥2, we have

$ \begin{array}{c} \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \beta-\mathbb{E} \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} / \\ \left(2 M^{2} K^{2}\right)^{r} \leqslant r ! / 2. \end{array} $

Proof The proof is the same as Lemma 5 of Ref.[3]. Since $ {\left\| \boldsymbol{\alpha} \right\|_2} \le M, {\left\| \boldsymbol{\beta} \right\|_2} \le M $ and sub-Gaussian assumption with a constant K, we obtain

$ \mathbb{E} e^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\alpha}^{2} /(M K)^{2}} \leqslant 2 \text { and } \mathbb{E} e^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}^{2 /( { MK })^{2}}} \leqslant 2. $

By the inequality ab≤a²/2+b²/2 (for any $ a, b\in \mathbb{R} $) and Cauchy-Schwarz inequality we have

$ \begin{array}{c} \mathbb{E} \mathrm{e}^{\mid \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta} \boldsymbol{\gamma}(M K)^{2}} \leqslant \mathbb{E} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \alpha \mid^{2} /(M K)^{2} / 2} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\mid^{2 /(M K)^{2} / 2}} \leqslant \\ \left\{\mathbb{E} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\alpha}\mid^{2} /(M K)^{2}}\right\}^{1 / 2}\left\{\mathbb{E} \mathrm{e}^{\mid \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\mid^{2 /(M K)^{2}}}\right\}^{1 / 2} \leqslant 2 . \end{array} $

By the Taylor expansion, we have the inequality

$ \begin{array}{c} 1+\frac{1}{r !} \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} /(M K)^{2 r} \leqslant \\ \mathbb{E} \mathrm{e}^{\mid \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta} \mid /(M K)^{2}}. \end{array} $

Next it follows

$ \begin{array}{c} \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}-\mathbb{E} \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} /(M K)^{2 r} \leqslant \\ 2^{r-1} \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \boldsymbol{\beta}\right|^{r} /(M K)^{2 r} \leqslant \\ 2^{r-1} r !\left(\mathbb{E} \mathrm{e}^{\mid \alpha^{\mathrm{T}} X_{(i)} X_{(i)}^{\mathrm{T}} \boldsymbol{\beta} \mid /(M K)^{2}}-1\right)=2^{r-1} r !=\frac{r !}{2} 2^{r}. \end{array} $

Therefore, we have

$ \mathbb{E}\left|\boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} \beta-\mathbb{E} \boldsymbol{\alpha}^{\mathrm{T}} \boldsymbol{X}_{(i)} \boldsymbol{X}_{(i)}^{\mathrm{T}} {\beta}\right|^{r} /\left(2 M^{2} K^{2}\right)^{r} \leqslant \frac{r !}{2}. $

References

[1]	Lauritzen S L. Graphical Models[M]. New York: Oxford University Press, 1996.
[2]	Liu W. Gaussian graphical model estimation with false discovery rate control[J]. Ann Statist, 2013, 41(6): 2948-2978.
[3]	Janková J, van de Geer S. Confidence intervals for high-dimensional inverse covariance estimation[J]. Electron J Statist, 2015, 9: 1205-1229.
[4]	Janková J, Van de Geer S. Honest confidence regions and optimality in high-dimensional precision matrix estimation[J]. Test, 2017, 26(1): 143-162. DOI:10.1007/s11749-016-0503-5
[5]	Bühlmann P. Statistical significance in high-dimensional linear models[J]. Bernoulli, 2013, 19(4): 1212-1242.
[6]	Javanmard A, Montanari A. Confidence intervals and hypothesis testing for high-dimensional regression[J]. J Mach Learn Res, 2014, 15: 2869-2909.
[7]	van de Geer S, Bühlmann P, Ritov Y, et al. On asymptotically optimal confidence regions and tests for high-dimensional models[J]. Ann Statist, 2014, 42(3): 1166-1202.
[8]	Zhang C, Zhang S. Confidence intervals for low dimensional parameters in high dimensional linear models[J]. J R Stat Soc Ser B Stat Methodol, 2014, 76(1): 217-242. DOI:10.1111/rssb.12026
[9]	Chernozhukov V, Chetverikov D, Kato K. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors[J]. Ann Statist, 2013, 41(6): 2786-2819.
[10]	Chernozhukov V, Chetverikov D, Kato K. Comparison and anti-concentration bounds for maxima of Gaussian random vectors[J]. Probability Theory and Related Fields, 2015, 162: 47-70. DOI:10.1007/s00440-014-0565-9
[11]	Chernozhukov V, Chetverikov D, Kato K. Central limit theorems and bootstrap in high dimensions[J]. Annals of Probability, 2016, 45(4): 2309-2352.
[12]	Zhang X, Cheng G. Simultaneous inference for high-dimensional linear models[J]. J Amer Statist Assoc, 2016, 112(2): 757-768.
[13]	Neykov M, Lu J, Liu H. Combinatorial inference for graphical models[J]. Ann Statist, 2019, 47(6): 795-827.
[14]	Cai T, Liu W, Luo X. A constrained l₁ minimization approach to sparse precision matrix estimation[J]. J Amer Statist Assoc, 2011, 106(494): 594-607. DOI:10.1198/jasa.2011.tm10155
[15]	Liu W D, Luo X. Fast and adaptive sparse precision matrix estimation in high dimensions[J]. Journal of Multivariate Analysis, 2015, 135(4): 153-162.
[16]	Fan Y, Lv J. Innovated scalable efficient estimation in ultra-large gaussian graphical models[J]. Ann Statist, 2016, 44(5): 2098-2126.
[17]	Vershynin R. Introduction to the non-asymptotic analysis of random matrices[M]//Eldar Y C, Kutyniok G. Compressed sensing: theory and applications. Cambridge: Cambridge University Press, 2012.


中国科学院大学学报 2021, Vol. 38 Issue (2): 171-180	PDF