«上一篇
 文章快速检索 高级检索

 智能系统学报  2019, Vol. 14 Issue (6): 1144-1151  DOI: 10.11992/tis.201905041 0

引用本文

SHAN Yi, YANG Jinfu, WU Suishuo, et al. Skip feature pyramid network with a global receptive field for small object detection[J]. CAAI Transactions on Intelligent Systems, 2019, 14(6): 1144-1151. DOI: 10.11992/tis.201905041.

文章历史

1. 北京工业大学 信息学部，北京 100124;
2. 计算智能与智能系统北京重点实验室，北京 100124

Skip feature pyramid network with a global receptive field for small object detection
SHAN Yi 1,2, YANG Jinfu 1,2, WU Suishuo 1,2, XU Bingbing 1,2
1. Beijing University of Technology, Faculty of Information Technology, Beijing 100124, China;
2. Beijing Key Laboratory of Computational Intelligence and Intelligence System, Beijing 100124, China
Abstract: With the development of deep learning, objects can be detected with high accuracy and efficiency. However, the detection of small objects remains challenging. The main reason for this is that the relationship between high-level semantic information and low-level feature maps is not fully utilized. To solve this problem, we propose a novel detection framework, called the skip feature pyramid network with a global receptive field, to improve the ability to detect small objects. Unlike previous detection architectures, the skip feature pyramid architecture fuses high-level semantic information with low-level feature maps to obtain detailed information. To extract global information from a network, we apply a global receptive field (GRF) with convolution kernels of different sizes and different dilated convolution steps. The experimental results on PASCAL VOC and MS COCO datasets show that the proposed approach realizes significant improvements over other comparable detection models.
Key words: skip feature pyramid network    global receptive field    object detection    deep learning    feature extraction    convolutional neural network    dilated convolution    image processing

1 算法模型

 Download: 图 1 基于跳跃连接金字塔的小目标检测模型 Fig. 1 Title Skip feature pyramid network with global receptive field for object detection

1.1 跳跃连接金字塔

 $o = [\frac{{i - f + 2p}}{s}] + 1$ (1)

 Download: 图 3 跳跃连接的金字塔的细节结构 Fig. 3 The detailed structure of skip feature pyramid
1.2 全局感受野模块

 Download: 图 4 全局感受野结构 Fig. 4 The network of global receptive field
1.3 包围框的设置

1.4 损失函数

 $\begin{gathered} \!\!\!\!\!\!\!\!\! {{L(\{ }}{{{p}}_i}{\rm{\} ,\{ }}{{{x}}_i}{\rm{\} ,\{ }}{{{c}}_i}{\rm{\} ,\{ }}{{{t}}_i}{\rm{\} ) = }}\frac{1}{{{N_{\rm conv}}}} {\left( \sum\limits_i {l_b}({p_i},[l_i^* \geqslant 1] \right) + \sum\limits_i {[l_i^* \geqslant 1]} } \\[-1pt] \!\!\!\!\!\!\!\!\!\!\!\!\!\!\! {l_r}({x_i},g_i^*)) + \frac{1}{{{N_p}}}\left(\sum\limits_i {{l_m}} ({c_i},l_i^*) + \sum\limits_i {[l_i^* \geqslant 1]} {l_r}({t_i},g_i^*) \right) \\ \end{gathered} \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!$ (2)

 ${l_r}(x,{g^*},{l^*}) = \sum\limits_{i \in {\rm Pos}}^N {\sum\limits_{m \in \{ cx,cy,w,h\} } {[{l^*} \geqslant 1]} {\rm smoot{h_{L1}}}(x_i^m - \widehat g_j^m)}$ (3)
 $\widehat g_j^{cx} = (g_j^{cx} - d_i^{cx})/d_i^w{\text{，}}\;\widehat g_j^{cy} = (g_j^{cy} - d_i^{cy})/d_i^h$ (4)
 $\widehat g_j^w = \log (\frac{{g_j^w}}{{d_i^w}}){\text{，}}\;\widehat g_j^h = \log (\frac{{g_j^h}}{{d_i^h}})$ (5)

2 实验结果及分析

2.1 PASCAL VOC

 Download: 图 5 在VOC2007上可视化的实验结果对比 Fig. 5 The visual comparison of experimental results on VOC2007 test
2.2 MS COCO

3 结束语

 [1] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, OH, USA, 2014: 580–587. (0) [2] GIRSHICK R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, 2015: 1440–1448. (0) [3] UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International journal of computer vision, 2013, 104(2): 154-171. DOI:10.1007/s11263-013-0620-5 (0) [4] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149. DOI:10.1109/TPAMI.2016.2577031 (0) [5] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA, 2016: 779–788. (0) [6] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands, 2016: 21–37. (0) [7] BELL S, ZITNICK C L, BALA K, et al. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA, 2016: 2874–2883. (0) [8] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, 2017: 936–944. (0) [9] FU Chengyang, LIU Wei, RANGA A, et al. DSSD: deconvolutional single shot detector[J]. arXiv: 1701.06659, 2017. (0) [10] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA, 2016: 770–778. (0) [11] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[J]. arXiv:1511.07122, 2015. (0) [12] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. arXiv:1409.1556, 2014. (0) [13] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The Pascal visual object classes (VOC) challenge[J]. International journal of computer vision, 2010, 88(2): 303-338. DOI:10.1007/s11263-009-0275-4 (0) [14] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland, 2014: 740–755. (0) [15] DAI Jifeng, LI Yi, HE Kaiming, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain, 2016: 379–387. (0) [16] SHEN Zhiqiang, LIU Zhuang, LI Jianguo, et al. DSOD: learning deeply supervised object detectors from scratch[C]//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy, 2017: 1937–1945. (0) [17] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, 2017: 6517–6525. (0) [18] ZHOU Peng, NI Bingbing, GENG Cong, et al. Scale-transferrable object detection[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, 2018: 528–537. (0) [19] GIDARIS S, KOMODAKIS N. Object detection via a multi-region and semantic segmentation-aware CNN model[C]//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, 2015: 1134–1142. (0) [20] HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, 2017: 2261–2269. (0)