首页 关于本刊 编 委 会 期刊动态 作者中心 审者中心 读者中心 下载中心 联系我们 English
 自动化学报  2018, Vol. 44 Issue (1): 99-105 PDF

1. 解放军信息工程大学信息系统工程学院 郑州 450001;
2. 75830部队 广州 510000;
3. 华侨大学计算机科学与技术学院 厦门 361021

Image Retrieval with Enhanced Visual Dictionary and Query Expansion
KE Sheng-Cai1,2, LI Bi-Cheng3, CHEN Gang1, ZHAO Yong-Wei1, WEI Han1
1. Institute of Information System Engineering, PLA Information Engineering University, Zhengzhou 450001;
2. Unit 75830, Guangzhou 510000;
3. College of Computer Science and Technology, Huaqiao University, Xiamen 361021
Manuscript received : January 29, 2016, accepted: August 15, 2016.
Foundation Item: Supported by National Natural Science Foundation of China (60872142) and Scientific Research Funds of Huaqiao University
Corresponding author. LI Bi-Cheng  Professor at the College of Computer Science and Technology, Huaqiao University. His research interest covers text analysis and understanding, speech/image/video processing and recognition, and information fusing. Corresponding author of this paper
Recommended by Associate Editor LIU Yue-Hu
Abstract: The most popular approach in image retrieval is based on the bag of visual-words (BoVW) model. However, there are several fundamental problems that restrict the performance of this method, such as low time efficiency, weak discrimination of visual words and less robustness. So, an image retrieval method with enhanced visual dictionary and query expansion is proposed. Firstly, clustering by fast search and finding density peaks are used to generate a group of visual words. Secondly, non-information words in the dictionary are eliminated by Chi-square model to improve the distinguishing ability of the visual dictionary. Finally, an efficient graph-based visual reranking method is introduced to refine the initial search results. Experimental results of Oxford5K and Paris6K datasets indicate that the expression ability of visual dictionary is effectively improved and the method is superior to the state-of-the-art image retrieval methods in performance.
Key words: Bag of visual words (BoVW)     clustering based on density     Chi-square model     query expansion

1 基于视觉词典优化和查询扩展的图像检索

 图 1 基于视觉词典优化和查询扩展的图像检索方法流程 Figure 1 The flow chart of image retrieval based on enhanced visual dictionary and query expansion
1.1 基于密度聚类的视觉词典组

 ${\rho _i} = \mathop \sum \limits_j \chi ({d_{ij}} - {d_c})$ (1)

 ${\delta _i} = \left\{ {\begin{array}{*{20}{c}} {\mathop {\min }\limits_{j:{\rho _j} > {\rho _i}} ({d_{ij}})}, &{{\rho _i} < {\rho _{\max }}}\\ {\mathop {\max }\limits_j ({d_{ij}})}, &{{\rho _i} = {\rho _{\max }}} \end{array}} \right.$ (2)

1.2 视觉单词过滤

 $x_i^2 = \sum\limits_{k = 1}^2 {\sum\limits_{j = 1}^m {\frac{{{{(N \cdot {n_{kj}} - {n_{k + }} \cdot {n_{ + j}})}^2}}}{{N \cdot {n_{k + }} \cdot {n_{ + j}}}}} }$ (3)

 $\tilde x_i^2 = \frac{{x_i^2}}{{tf{\rm{(}}{w_i}{\rm{)}}}}$ (4)

1.3 基于图结构的查询扩展

 图 2 基于图结构的查询扩展方法流程图 Figure 2 The flow chart of query expansion based on image structure

 ${R_k}(i, i') = \{ (i, i')|i \in {N_k}(i'), i' \in {N_k}(i)\}$ (5)

 $w(i, i') = \left\{ {\begin{array}{*{20}{l}} {\dfrac{{|{N_k}(i) \cap {N_k}(i')|}}{k}}, &{\mbox{若}~(i, i') \in {R_k}(i, i')}\\ 0, &\mbox{其他} \end{array}} \right.$ (6)

 ${s_i} = \min \left\{ {\beta ^n}\frac{{\left\| {{f_i} - {f_n}} \right\|_2^2}}{{\sigma _n^2}}|n = 1, 2, \cdots, {N_c}\right\}$ (7)

2 实验设置与性能评价 2.1 实验设置

2.1.1 实验性能分析

 图 3 距离阈值参数$d_c$对图像检索MAP值的影响 Figure 3 The effect of distance threshold on MAP

 图 4 视觉词典规模对图像检索MAP值的影响 Figure 4 The effect of vocabulary size on MAP

 图 5 去除停用词数目对图像检索MAP值的影响 Figure 5 The effect of parameter on MAP

 图 6 在Oxford5K和Oxford5K+Paris6K数据库上的图像检索MAP值 Figure 6 The MAP of different methods for Oxford5K and Oxford5K+Paris6K database

 图 7 EVD+GBQE方法在Oxford5K+Paris6K数据库上的检索结果 Figure 7 The image retrieval results of EVD+GBQE for Oxford5K+Paris6K database
3 结论

 1 Chen Y Z, Dick A, Li X, Van Den Hengel A. Spatially aware feature selection and weighting for object retrieval. Image and Vision Computing, 2013, 31(12): 935-948. DOI:10.1016/j.imavis.2013.09.005 2 Wang J J Y, Bensmail H, Gao X. Joint learning and weighting of visual vocabulary for bag-of-feature based tissue classification. Pattern Recognition, 2013, 46(12): 3249-3255. DOI:10.1016/j.patcog.2013.05.001 3 Cao Y, Wang C H, Li Z W, Zhang L Q, Zhang L. Spatial-bag-of-features. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA:IEEE, 2010. 3352-3359 4 Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Object retrieval with large vocabularies and fast spatial matching. In:Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA:IEEE, 2007. 1-8 5 Nister D, Stewenius H. Scalable recognition with a vocabulary tree. In:Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA:IEEE, 2006. 2161-2168 6 Goes J, Zhang T, Arora R, Lerman G. Robust stochastic principal component analysis. In:Proceedings of the 17th International Conference on Artificial Intelligence and Statistics. Reykjavik, Iceland:JMLR, 2014. 266-274 7 Goswami A K, Jain R, Tripathi P. Automatic segmentation of satellite image using self organizing feature map (SOFM) an artificial neural network (ANN) approach. International Journal of Advanced Research in Computer Science, 2014, 5(8): 92-97. 8 McLachlan G, Krishnan T. The EM Algorithm and Extensions (Second Edition). Hoboken, New Jersey:John Wiley & Sons, 2008. 9 Sivic J, Zisserman A. Video Google:a text retrieval approach to object matching in videos. In:Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France:IEEE, 2003. 1470-1477 10 Yuan J S, Wu Y, Yang M. Discovery of collocation patterns:from visual words to visual phrases. In:Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA:IEEE, 2007. 1-8 11 Fulkerson B, Vedaldi A, Soatto S. Localizing objects with smart dictionaries. In:Proceedings of the 10th European Conference on Computer Vision. Berlin, Heidelberg, Germany:Springer, 2008. 179-192 12 Perd'och M, Chum O, Matas J. Efficient representation of local geometry for large scale object retrieval. In:Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA:IEEE, 2009. 9-16 13 Shen X H, Lin Z, Brandt J, Avidan S, Wu Y. Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In:Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA:IEEE, 2012. 3013-3020 14 Chum O, Philbin J, Sivic J, Isard M, Zisserman A. Total recall:automatic query expansion with a generative feature model for object retrieval. In:Proceedings of the 11th IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil:IEEE, 2007. 1-8 15 Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344(6191): 1492-1496. DOI:10.1126/science.1242072 16 Kesom K, Poslad S. An enhanced bag-of-visual word vector space model to represent visual content in athletics images. IEEE Transactions on Multimedia, 2012, 14(1): 211-222. DOI:10.1109/TMM.2011.2170665 17 Zhang S T, Yang M, Cour T, Yu K, Metaxas D N. Query specific rank fusion for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(4): 803-815. DOI:10.1109/TPAMI.2014.2346201 18 Philbin J, Arandjelović R, Zisserman A. Oxford5K dataset[Online], available:http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/, December, 2015. 19 Philbin J, Zisserman A. Paris6K database[Online], available:http://www.robots.ox.ac.uk/~vgg/data/parisbuil-dings/, December, 2015. 20 Arandjelović R, Zisserman A. Three things everyone should know to improve object retrieval. In:Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA:IEEE, 2012. 2911-2918 21 Xie H T, Zhang Y D, Tan J L, Guo L, Li J T. Contextual query expansion for image retrieval. IEEE Transactions on Multimedia, 2014, 16(4): 1104-1114. DOI:10.1109/TMM.2014.2305909 22 Gao Y, Shi M J, Tao D C, Xu C. Database saliency for fast image retrieval. IEEE Transactions on Multimedia, 2015, 17(3): 359-369. DOI:10.1109/TMM.2015.2389616