﻿ 成分数据的空间自回归模型<sup>*</sup>
 文章快速检索 高级检索

1. 北京航空航天大学 经济与管理学院, 北京 100083;
2. 城市运行应急保障模拟技术北京市重点实验室, 北京 10008;
3. 北京航空航天大学 大数据科学与脑机智能高精尖创新中心, 北京 100083;
4. 法国国立工艺学院 计算机和通信研究中心, 巴黎 75003

Spatial autoregressive model for compositional data
HUANG Tingting1,2, WANG Huiwen1,3, SAPORTA Gilbert4
1. School of Economics and Management, Beihang University, Beijing 100083, China;
2. Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operations, Beijing 10008;
3. Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100083, China;
4. Centre d'études et de Recherche en Informatique et Communications, Conservatoire National des Arts et Métiers, Paris 75003, France
Received: 2018-05-03; Accepted: 2018-07-28; Published online: 2018-08-23 10:38
Foundation item: National Natural Science Foundation of China (71420107025)
Corresponding author. WANG Huiwen, E-mail: wanghw@vip.sina.com
Abstract: The existing compositional linear models assume that samples are independent, which is often violated in practice. To solve this problem, we put forward a spatial autoregressive model for compositional data, which contains both compositional covariates and scalar predictors. Furthermore, a new estimation method is proposed. The new model has advantages of coping with mixed compositional and numerical data and expressing dependence between the responses. And the parameter estimators are obtained through isometric logratio (ilr) transformation, which transforms dependent compositional data into independent real vector. A Monte-Carlo simulation experiment verifies the effectiveness of the proposed estimation method.
Keywords: compositional data     isometric logratio (ilr) transformation     maximum likelihood estimation     spatial dependence     spatial autoregressive model

1 基础理论

1.1 单形空间

 (1)

 (2)
 (3)

 (4)

 (5)

xy的内积运算〈x, ya定义为

 (6)

 (7)
 (8)

1.2 等距对数比变换

ilr变换是Egozcue等[26]提出的。该变换将d维的单形空间Sd映射到d-1维的欧几里得空间Rd-1上，得到的实数向量消除了原成分数据中不同成分之间的共线性，可以直接用于建模。该变换利用标准正交基的正交性和单位长度性质，将成分数据变换成易于处理的标准正交基的系数。设标准正交基为{ek}k=1d-1, ek=(ek1, ek2, …, ekd)T，则任意一个成分数据x都可以表示为x=〈x, e1ae1⊕〈x, e2ae2⊕…⊕〈x, ed-1aed-1，相应地，x的ilr变换坐标ilr(x)为

 (9)

Egozcue等[26]证明，ilr变换是保内积的变换，即对于含有d个成分的成分数据xy, 有

 (10)

 (11)

Ψ为(d-1)×d维的矩阵，具体表达式为

2 模型的提出

 (12)

ρ=0时，式(12)退化为普通的成分数据线性模型。在这个意义上，式(12)比经典的成分数据线性模型具有更强的灵活性，可以处理更加复杂的数据关系。

3 估计方法

 (13)

 (14)

 (15)

 (16)
 (17)

 (18)

 (19)

4 数值模拟

 (20)

 (21)

 图 1 和的样本偏差 Fig. 1 Sample deviation of and
 图 2 、的标准差及的总方差 Fig. 2 Standard deviation of , and total variance of
 图 3 n和ρ取不同值时，偏差箱线图 Fig. 3 Boxplots of deviation of when n and ρ change

1) 的样本均值在所有的参数设置背景下偏离实际值均较小。图 1中给出了不同参数设置背景(Set1~Set9对应的(ρn)的取值分别为(300, 0)、(300, 0.5)、(300, 0.8)、(500, 0)、(500, 0.5)、(500, 0.8)、(900, 0)、(900, 0.5)和(900, 0.8)共9种情况)下不同参数估计值偏差的散点图，可以发现，偏差的绝对值不超过0.015，成分数据系数3个成分的偏差相对于均较小。

2) 样本标准差及的总方差随着样本量的增大而减小。从图 2中可以看出，不论ρ取何值，随着n的增加，估计量的标准差或总方差折线都是减小的趋势。

3) 当样本量大小相同时，的样本标准差随着ρ值的增大而减小。从图 3中可以看出，当n值固定时，随着ρ从0增加到0.8，箱子越来越窄。

5 结论

1) 新提出的模型不仅能够同时处理成分数据和普通数据，还能表达数据中因变量之间相互依赖的问题。特别地，新模型可以处理地理空间中的依赖性。

2) 新模型所提出的估计量具有相合性。随着样本量的增大，可以发现估计值的标准差在逐渐减小。除此之外，新提出的估计方法操作简单，可以在R软件上直接实现。

 [1] RAMSAY J O, SILVERMAN B W. Functional data analysis[M]. Berlin: Springer, 1997. [2] RAMSAY J O, SILVERMAN B W. Applied functional data analysis:Methods and case studies[M]. Berlin: Springer, 2002. [3] VIEU P, FERRATY F. Nonparametric functional data analysis[M]. Berlin: Springer, 2006. [4] PAWLOWSKY-GLAHN V, BUCCIANTI A. Compositional data analysis:Theory and applications[M]. Chichester: Wiley-Blackwell, 2011. [5] BILLARD L, DIDAY E.Symbolic regression analysis[M]//JAJUGA K, SOKOLOWSKI A, BOCK H.Classification, clustering, and data analysis.Berlin: Springer, 2002: 281-288. [6] BILLARD L, DIDAY E. Regression analysis for interval-valued data[M]. Berlin: Springer, 2000: 369-374. [7] FRY J M, FRY T R L, MCLAREN K R. Compositional data analysis and zeros in micro data[J]. Applied Economics, 2000, 32(8): 953-959. DOI:10.1080/000368400322002 [8] PAWLOWSKY-GLAHN V, EGOZCUE J J. Exploring compositional data with the CoDa-dendrogram[J]. Austrian Journal of Statistics, 2011, 40(1 & 2): 103-113. [9] PAWLOWSKY-GLAHN V, EGOZCUE J J, TOLOSANA-DELGADO R. Modelling and analysis of compositional data[J]. Hoboken:John Wiley & Sons, Ltd., 2015, 152-154. [10] AITCHISON J. The statistical analysis of compositional data[M]. Berlin: Springer, 1986. [11] AITCHISON J. The statistical analysis of compositional data[J]. Journal of the Royal Statistical Society Series B, 1982, 44(2): 139-177. [12] HRON K, FILZMOSER P, THOMPSON K. Linear regression with compositional explanatory variables[J]. Journal of Applied Statistics, 2012, 39(5): 1115-1128. DOI:10.1080/02664763.2011.644268 [13] ATCHISON J, SHEN S M. Logistic-normal distributions:Some properties and uses[J]. Biometrika, 1980, 67(2): 261-272. [14] WANG H, SHANGGUAN L, WU J, et al. Multiple linear regression modeling for compositional data[J]. Neurocomputing, 2013, 122: 490-500. DOI:10.1016/j.neucom.2013.05.025 [15] TOLOSANA-DELGADO R, EYNATTEN H V. Simplifying compositional multiple regression:Application to grain size controls on sediment geochemistry[J]. Computers & Geosciences, 2010, 36(5): 577-589. [16] ANSELIN L. Spatial econometrics:Methods and models[M]. Berlin: Springer, 1988. [17] 林光平, 龙志和, 吴梅. 中国地区经济σ-收敛的空间计量实证分析[J]. 数量经济技术经济研究, 2006, 23(4): 14-21. LIN G P, LONG Z H, WU M. A spatial investigation of σ-convergence in China[J]. The Journal of Quantitative & Technical Economics, 2006, 23(4): 14-21. DOI:10.3969/j.issn.1000-3894.2006.04.002 (in Chinese) [18] 郭金龙, 王宏伟. 中国区域间资本流动与区域经济差距研究[J]. 管理世界, 2003(7): 45-58. GUO J L, WANG H W. Study on the regional capital flows and regional economic differences in China[J]. Management World, 2003(7): 45-58. (in Chinese) [19] TOPA G. Social interactions, local spillovers and unemployment[J]. Review of Economic Studies, 2010, 68(2): 261-295. [20] BAICKER K. The spillover effects of state spending[J]. Journal of Public Economics, 2005, 89(2-3): 529-544. DOI:10.1016/j.jpubeco.2003.11.003 [21] ORD H. Estimation methods for models of spatial interaction[J]. Publications of the American Statistical Association, 1975, 70(349): 120-126. DOI:10.1080/01621459.1975.10480272 [22] LEE L F. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models[J]. Econometrica, 2004, 72(6): 1899-1925. DOI:10.1111/ecta.2004.72.issue-6 [23] KELEJIAN H, PRUCHA I R. A generalized moments estimator for the autoregressive parameter in a spatial model[J]. International Economic Review, 1999, 40(2): 509-533. DOI:10.1111/iere.1999.40.issue-2 [24] LEE L F. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models[J]. Journal of Econometrics, 2007, 137(2): 489-514. DOI:10.1016/j.jeconom.2005.10.004 [25] LESAGE J P, PACE R K. Introduction to spatial econometrics[M]. New York: CRC Press, 2009: 513-514. [26] EGOZCUE J J, PAWLOWSKYGLAHN V, MATEUFIGUERAS G, et al. Isometric logratio transformations for compositional data analysis[J]. Mathematical Geology, 2003, 35(3): 279-300. DOI:10.1023/A:1023818214614 [27] QU X, LEE L F. Estimating a spatial autoregressive model with an endogenous spatial weight matrix[J]. Journal of Econometrics, 2015, 184(2): 209-232. DOI:10.1016/j.jeconom.2014.08.008

#### 文章信息

HUANG Tingting, WANG Huiwen, SAPORTA Gilbert

Spatial autoregressive model for compositional data

Journal of Beijing University of Aeronautics and Astronsutics, 2019, 45(1): 93-98
http://dx.doi.org/10.13700/j.bh.1001-5965.2018.0253