 自动化学报  2017, Vol. 43 Issue (8): 1402-1411

1. 宁波工程学院电信学院 宁波 315016;
2. 浙江大学信息与电子工程学院 杭州 310027

Estimating Spatial Layout of Cluttered Rooms by Using Object Prior and Spatial Constraints
YAO Tuo-Zhong1, ZUO Wen-Hui2, SONG Jia-Tao1, YING Hong-Wei1
1. School of Electronic and Information Engineering, Ningbo University of Technology, Ningbo 315016;
2. College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027
Manuscript received : January 21, 2016, accepted: July 28, 2016.
Foundation Item: Supported by Zhejiang Provincial Natural Science Foundation (LQ15F020004), Zhejiang Provincial Public Welfare Technology Research Project (2016C33255), and Ningbo Natural Science Foundation (2015A610132, 2013A610113)
Corresponding author. YAO Tuo-Zhong Lecturer at the School of Electronic and Information Engineering, Ningbo University of Technology. He received his Ph.D. degree from Zhejiang University in 2011. His research interest covers computer vision and machine learning. Corresponding author of this paper
Recommended by Associate Editor JIA Yun-De
Abstract: Estimating spatial layout of a structural indoor scene is one of the research hotspots in computer vision. However, most of the current solutions cannot work robustly in a cluttered room due to occlusion of different objects inside. In this paper, a new algorithm which integrates geometric and semantic relations between room and objects is proposed to recover the spatial layout of a cluttered room. This algorithm parametrically represents the 3D volume of both room and objects and uses multiple high-level image semantics to obtain object priors. Furthermore, several spatial constraints such as spatial exclusion and containment are used which simultaneously optimize spatial layout estimation of the room and provide significant information for object recognition and localization. One advantage of the algorithm is its low computational complexity, and experimental results also demonstrate that it can work more robustly in cluttered rooms than several classic algorithms.
Key words: Spatial layout estimation     object prior     spatial constraint     combinational optimization

1 相关工作

2 算法描述

 图 1 本文算法的基本流程 Figure 1 The flowchart of our algorithm

1) 本文算法提取房间内的直线段并估计相互正交的三个主消失点, 上述消失点定义了房间中各个平面(例如不同朝向的墙壁、天花板和地板等)的主方向并为房间内部的地板, 墙面以及天花板等提供了空域约束.

2) 结合上述几何信息和多种高层图像语义分别生成房间和物体的初始结构假设(均用立方体表示).

3) 在房间和物体结构假设的基础上, 生成一系列候选的场景配置假设(房间假设+物体假设).

4) 由于并非所有房间和物体的结构假设都满足场景配置假设的约束, 为此本文使用简单的三维空域推理对上述约束进行强化, 并对每个"房间-物体"假设对以及"物体-物体"假设对进行空域兼容性测试并挑选出满足要求的场景配置.

5) 在最终的场景配置假设推理中, 为了有效减少场景配置假设搜索的计算复杂度, 本文利用基于经典的组合优化法来采样出最优的场景配置.

3 房间结构假设的生成

 图 2 角距离和直线段组的定义 Figure 2 The definitions of the angle distance and straight line groups
3.1 房间结构的朝向估计和参数化表达

 图 3 基于立方体描述的房间结构假设 Figure 3 The cubic based room hypothesis

 图 4 候选的房间结构假设集 Figure 4 Candidate room hypothesis set
3.2 候选房间结构假设的置信度估计

 $$${y^*} = \arg \mathop {\max }\limits_y f(x, y, w)$$$ (1)

 \begin{align} &\mathop {\min }\limits_{w, \xi } \frac{1}{2}\|w\|{^2} + C\sum\limits_i {{\xi _i}}\notag \\ & {\rm s.t.} \quad {\xi _i} \ge 0, \qquad\qquad\qquad\quad\ \ \ \forall i \notag \\ &\qquad \ {w^{\rm T}}F({x_i}, {y_i}) - {w^{\rm T}}F({x_i}, y) \ge D({y_i}, y) - {\xi _i}, \notag \\ &\qquad\qquad\qquad\qquad\qquad \qquad\ \ \ \ \forall i, ~\forall y \in Y\backslash {y_i} \end{align} (2)

${F({x_i}, y)}$为从房间结构假设${y}$中提取的特征向量, 可通过与主消失点方向一致的直线段组进行计算得到.在本文中, ${F({x_i}, y)}$由基于几何的低层特征${F_g}$和基于语义的高层特征${F_s}$两部分组成.对于每个平面${S_j}$, 基于几何的直线段组非加权性特征${f_l}$定义如式(3) 所示.其中, ${L_j}$为位于${S_j}$中的直线段集, ${R_j}$为位于${S_j}$中与两个消失点${VP_1}$${VP_2}朝向一致的直线段集, {|l|}表示直线段{l}的长度.最终, {F_g} = \{ {f_l}({S_1}), {f_l}({S_2}), {f_l}({S_3}), {f_l}({S_4}), {f_l}({S_5})\}.  $${f_l}({S_j}) = \frac{ \sum\limits_{{l_i} \in {R_j}} {|{l_i}|}}{ \sum\limits_{{l_i} \in {L_j}} {|{l_i}|} }$$ (3) 当房间结构假设中的每个平面通过消失点{VP_1}$${VP_2}$进行参数化后, 每个平面中的绝大多数直线段根据朝向将归属于上述两类消失点.然而, 位于物体上的部分直线段并不满足上述情况, 例如图 2 (b)中位于沙发的部分蓝色直线段应对应于水平消失点, 但是其朝向却显然与水平方向并不一致.为此, 本文同样将直线段未落入物体区域中的置信度估计${p({l_i})}$作为权重来计算直线段组, 其可通过高层图像语义推理得到.最终, 基于语义的直线段组加权性特征${f_s}$定义如式(4) 所示.其中, ${F_s}= \{ {f_s}({S_1}), {f_s}({S_2})$, ${f_s}({S_3}), {f_s}({S_4}), {f_s}({S_5})\}$.

 $$${f_s}({S_j}) = \frac{\sum\limits_{{l_i} \in {L_j}} {p({l_i}) \times |{l_i}|} }{\sum\limits_{{l_i} \in {L_j}} {|{l_i}|}}$$$ (4)
4 物体结构假设的生成 4.1 基于高层图像语义的物体位置估计

 图 5 基于不同高层图像语义的物体位置估计 Figure 5 Different high-level image semantic based object localization

4.2 候选物体结构假设的置信度估计

 $$$c\bar{x} = KR\bar{X}$$$ (5)

 图 6 候选物体结构假设的生成 Figure 6 Candidate object hypothesis generation

 $$$scr(\bar{c} ) = {w_1}\frac{{\sum\limits_i {{v_i}{\max\limits_{{f_i} \in N({f_i})}} \times s({f_i})} }}{{\sum\limits_i {{v_i}} }} + {w_2}v(\bar{c} )$$$ (6)

6 实验结果与分析 6.1 试验图像集

6.2 场景空域布局推理实验

 图 8 室内场景的空域布局推理结果 Figure 8 Spatial layout estimation of indoor scenes

6.3 房间结构假设分析

 图 10 不同高层图像语义在物体结构假设中的像素误差和物体识别率 Figure 10 The pixel error and object recognition rate of different high-level image semantics in object structure hypothesis
7 结论

