首页 关于本刊 编 委 会 期刊动态 作者中心 审者中心 读者中心 下载中心 联系我们 English
 自动化学报  2017, Vol. 43 Issue (8): 1402-1411 PDF

1. 宁波工程学院电信学院 宁波 315016;
2. 浙江大学信息与电子工程学院 杭州 310027

Estimating Spatial Layout of Cluttered Rooms by Using Object Prior and Spatial Constraints
YAO Tuo-Zhong1, ZUO Wen-Hui2, SONG Jia-Tao1, YING Hong-Wei1
1. School of Electronic and Information Engineering, Ningbo University of Technology, Ningbo 315016;
2. College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027
Manuscript received : January 21, 2016, accepted: July 28, 2016.
Foundation Item: Supported by Zhejiang Provincial Natural Science Foundation (LQ15F020004), Zhejiang Provincial Public Welfare Technology Research Project (2016C33255), and Ningbo Natural Science Foundation (2015A610132, 2013A610113)
Corresponding author. YAO Tuo-Zhong Lecturer at the School of Electronic and Information Engineering, Ningbo University of Technology. He received his Ph.D. degree from Zhejiang University in 2011. His research interest covers computer vision and machine learning. Corresponding author of this paper
Recommended by Associate Editor JIA Yun-De
Abstract: Estimating spatial layout of a structural indoor scene is one of the research hotspots in computer vision. However, most of the current solutions cannot work robustly in a cluttered room due to occlusion of different objects inside. In this paper, a new algorithm which integrates geometric and semantic relations between room and objects is proposed to recover the spatial layout of a cluttered room. This algorithm parametrically represents the 3D volume of both room and objects and uses multiple high-level image semantics to obtain object priors. Furthermore, several spatial constraints such as spatial exclusion and containment are used which simultaneously optimize spatial layout estimation of the room and provide significant information for object recognition and localization. One advantage of the algorithm is its low computational complexity, and experimental results also demonstrate that it can work more robustly in cluttered rooms than several classic algorithms.
Key words: Spatial layout estimation     object prior     spatial constraint     combinational optimization

1 相关工作

2 算法描述

 图 1 本文算法的基本流程 Figure 1 The flowchart of our algorithm

1) 本文算法提取房间内的直线段并估计相互正交的三个主消失点, 上述消失点定义了房间中各个平面(例如不同朝向的墙壁、天花板和地板等)的主方向并为房间内部的地板, 墙面以及天花板等提供了空域约束.

2) 结合上述几何信息和多种高层图像语义分别生成房间和物体的初始结构假设(均用立方体表示).

3) 在房间和物体结构假设的基础上, 生成一系列候选的场景配置假设(房间假设+物体假设).

4) 由于并非所有房间和物体的结构假设都满足场景配置假设的约束, 为此本文使用简单的三维空域推理对上述约束进行强化, 并对每个"房间-物体"假设对以及"物体-物体"假设对进行空域兼容性测试并挑选出满足要求的场景配置.

5) 在最终的场景配置假设推理中, 为了有效减少场景配置假设搜索的计算复杂度, 本文利用基于经典的组合优化法来采样出最优的场景配置.

3 房间结构假设的生成

 图 2 角距离和直线段组的定义 Figure 2 The definitions of the angle distance and straight line groups
3.1 房间结构的朝向估计和参数化表达

 图 3 基于立方体描述的房间结构假设 Figure 3 The cubic based room hypothesis

 图 4 候选的房间结构假设集 Figure 4 Candidate room hypothesis set
3.2 候选房间结构假设的置信度估计

 $$${y^*} = \arg \mathop {\max }\limits_y f(x, y, w)$$$ (1)

 \begin{align} &\mathop {\min }\limits_{w, \xi } \frac{1}{2}\|w\|{^2} + C\sum\limits_i {{\xi _i}}\notag \\ & {\rm s.t.} \quad {\xi _i} \ge 0, \qquad\qquad\qquad\quad\ \ \ \forall i \notag \\ &\qquad \ {w^{\rm T}}F({x_i}, {y_i}) - {w^{\rm T}}F({x_i}, y) \ge D({y_i}, y) - {\xi _i}, \notag \\ &\qquad\qquad\qquad\qquad\qquad \qquad\ \ \ \ \forall i, ~\forall y \in Y\backslash {y_i} \end{align} (2)

${F({x_i}, y)}$为从房间结构假设${y}$中提取的特征向量, 可通过与主消失点方向一致的直线段组进行计算得到.在本文中, ${F({x_i}, y)}$由基于几何的低层特征${F_g}$和基于语义的高层特征${F_s}$两部分组成.对于每个平面${S_j}$, 基于几何的直线段组非加权性特征${f_l}$定义如式(3) 所示.其中, ${L_j}$为位于${S_j}$中的直线段集, ${R_j}$为位于${S_j}$中与两个消失点${VP_1}$${VP_2}朝向一致的直线段集, {|l|}表示直线段{l}的长度.最终, {F_g} = \{ {f_l}({S_1}), {f_l}({S_2}), {f_l}({S_3}), {f_l}({S_4}), {f_l}({S_5})\}.  $${f_l}({S_j}) = \frac{ \sum\limits_{{l_i} \in {R_j}} {|{l_i}|}}{ \sum\limits_{{l_i} \in {L_j}} {|{l_i}|} }$$ (3) 当房间结构假设中的每个平面通过消失点{VP_1}$${VP_2}$进行参数化后, 每个平面中的绝大多数直线段根据朝向将归属于上述两类消失点.然而, 位于物体上的部分直线段并不满足上述情况, 例如图 2 (b)中位于沙发的部分蓝色直线段应对应于水平消失点, 但是其朝向却显然与水平方向并不一致.为此, 本文同样将直线段未落入物体区域中的置信度估计${p({l_i})}$作为权重来计算直线段组, 其可通过高层图像语义推理得到.最终, 基于语义的直线段组加权性特征${f_s}$定义如式(4) 所示.其中, ${F_s}= \{ {f_s}({S_1}), {f_s}({S_2})$, ${f_s}({S_3}), {f_s}({S_4}), {f_s}({S_5})\}$.

 $$${f_s}({S_j}) = \frac{\sum\limits_{{l_i} \in {L_j}} {p({l_i}) \times |{l_i}|} }{\sum\limits_{{l_i} \in {L_j}} {|{l_i}|}}$$$ (4)
4 物体结构假设的生成 4.1 基于高层图像语义的物体位置估计

 图 5 基于不同高层图像语义的物体位置估计 Figure 5 Different high-level image semantic based object localization

4.2 候选物体结构假设的置信度估计

 $$$c\bar{x} = KR\bar{X}$$$ (5)

 图 6 候选物体结构假设的生成 Figure 6 Candidate object hypothesis generation

 $$$scr(\bar{c} ) = {w_1}\frac{{\sum\limits_i {{v_i}{\max\limits_{{f_i} \in N({f_i})}} \times s({f_i})} }}{{\sum\limits_i {{v_i}} }} + {w_2}v(\bar{c} )$$$ (6)

6 实验结果与分析 6.1 试验图像集

6.2 场景空域布局推理实验

 图 8 室内场景的空域布局推理结果 Figure 8 Spatial layout estimation of indoor scenes

6.3 房间结构假设分析

 图 10 不同高层图像语义在物体结构假设中的像素误差和物体识别率 Figure 10 The pixel error and object recognition rate of different high-level image semantics in object structure hypothesis
7 结论

 1 Coughlan J M, Yuille A L. Manhattan world:compass direction from a single image by Bayesian inference. In:Proceedings of the 7th IEEE International Conference on Computer Vision. Kerkyra, Greece:IEEE, 1999. 941-947 http://ieeexplore.ieee.org/document/790349/authors 2 Hedau V, Hoiem D, Forsyth D. Recovering the spatial layout of cluttered rooms. In:Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009. 1849-1856 Recovering the spatial layout of cluttered rooms 3 Lee D C, Hebert M, Kanade T. Geometric reasoning for single image structure recovery. In:Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA:IEEE, 2009. 2136-2143 Geometric reasoning for single image structure recovery 4 Košecká J, Zhang W. Video compass. In:Proceedings of the 7th European Conference on Computer Vision. Copenhagen, Denmark:Springer, 2002. 476-490 http://dl.acm.org/citation.cfm?id=649358 5 Rother C. A new approach to vanishing point detection in architectural environments. Image and Vision Computing, 2002, 20(9-10): 647-655. DOI:10.1016/S0262-8856(02)00054-9 6 Barinova O, Konushin V, Yakubenko A, Lee K, Lim H, Konushin A. Fast automatic single-view 3-D reconstruction of urban scenes. In:Proceedings of the 10th European Conference on Computer Vision. Marseille, France:Springer, 2008. 100-113 7 Yu S X, Zhang H, Malik J. Inferring spatial layout from a single image via depth-ordered grouping. In:Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Anchorage, AK, USA:IEEE, 2008. 1-7 Inferring spatial layout from a single image via depth-ordered grouping 8 Nabbe B, Hoiem D, Efros A A A, Hebert M. Opportunistic use of vision to push back the path-planning horizon. In:Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. Beijing, China:IEEE, 2006. 2388-2393 https://doi.org/10.1109/IROS.2006.281676 9 Hoiem D, Efros A A, Hebert M. Recovering surface layout from an image. International Journal of Computer Vision, 2007, 75(1): 151-172. DOI:10.1007/s11263-006-0031-y 10 Micusik B, Wildenauer H, Kosecka J. Detection and matching of rectilinear structures. In:Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA, 2008. 1-7 http://doi.ieeecomputersociety.org/10.1109/CVPR.2008.4587488 11 Saxena A, Schulte J, Ng A Y. Depth estimation using monocular and stereo cues. In:Proceedings of the 20th International Joint Conference on Artificial Intelligence. San Francisco, CA, USA:Morgan Kaufmann Publishers Inc., 2007. 2197-2203 12 Liu B Y, Gould S, Koller D. Single image depth estimation from predicted semantic labels. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA:IEEE, 2010. 1253-1260 13 Liu M M, Salzmann M, He X M. Discrete-continuous depth estimation from a single image. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA:IEEE, 2014. 716-723 http://dblp.uni-trier.de/db/conf/cvpr/cvpr2014.html#LiuSH14 14 Gupta A, Efros A A, Hebert M. Blocks world revisited:image understanding using qualitative geometry and mechanics. In:Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece:Springer, 2010. 482-496 15 Lee D C, Gupta A, Hebert M, Kanade T. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In:Proceedings of the 2010 Advances in Neural Information Processing Systems 23. Vancouver, British Columbia, Canada:Curran Associates, Inc., 2010. 1288-1296 16 Hedau V, Hoiem D, Forsyth D. Thinking inside the box:using appearance models and context based on room geometry. In:Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece:Springer, 2010. 224-237 17 Wang H Y, Gould S, Koller D. Discriminative learning with latent variables for cluttered indoor scene understanding. In:Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece:Springer, 2010. 497-510 18 Schwing A G, Fidler S, Pollefeys M, Urtasun R. Box in the box:joint 3D layout and object reasoning from single images. In:Proceedings of the 2013 IEEE International Conference on Computer Vision. Sydney, VIC, Australia:IEEE, 2013. 353 -360 19 Choi W, Chao Y W, Pantofaru C, Savarese S. Understanding indoor scenes using 3D geometric phrases. In:Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA:IEEE, 2013. 33-40 http://ieeexplore.ieee.org/document/6618856/authors 20 Tsochantaridis I, Joachims T, Hofmann T, Altun Y. Large margin methods for structured and interdependent output variables. The Journal of Machine Learning Research, 2005, 6: 1453-1484. 21 Li F X, Carreira J, Sminchisescu C. Object recognition as ranking holistic figure-ground hypotheses. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA:IEEE, 2010. 1712-1719 22 Lampert C H, Blaschko M B, Hofmann T. Efficient subwindow search:a branch and bound framework for object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(12): 2129-2142. DOI:10.1109/TPAMI.2009.144 23 Russakovsky O, Ng A Y. A Steiner tree approach to efficient object detection. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA:IEEE, 2010. 1070-1077 24 Vijayanarasimhan S, Grauman K. Efficient region search for object detection. In:Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA:IEEE, 2011. 1401-1408 25 Russell S, Norvig P. Artificial Intelligence:A Modern Approach (3rd edition). New Jersey:Pearson, 2009. 26 Russell B C, Torralba A, Murphy K P, Freeman W T. LabelMe:a database and web-based tool for image annotation. International Journal of Computer Vision, 2008, 77(1-3): 157-173. DOI:10.1007/s11263-007-0090-8