首页 关于本刊 编 委 会 期刊动态 作者中心 审者中心 读者中心 下载中心 联系我们 English
 自动化学报  2018, Vol. 44 Issue (6): 1086-1095 PDF

1. 宁波大学信息科学与工程学院 宁波 315211

3D Human Body Pose Reconstruction via L1/2 Regularization
HONG Jin-Hua1, ZHANG Rong1, GUO Li-Jun1
1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211
Manuscript received : April 14, 2017, accepted: February 7, 2018.
Foundation Item: Supported by Zhejiang Provincial Natural Science Foundation (LY17F030002), Zhejiang Provincial Public Welfare Technology Research Project (LGF18F020007)
Corresponding author. GUO Li-Jun  Professor at the Faculty of Electrical Engineering and Computer Science, Ningbo University. His research interest covers machine learning, computer vision and pattern recognition. Corresponding author of this paper
Recommended by Associate Editor YANG Jian
Abstract: In order to estimate the 3D shape with the given 2D feature points for a single image, we present a convex relaxation approach based on L1/2 regularization by using the shape space model and combining L1/2 regularization and the properties of spectral norm. Thereby we transform the non-convex optimization problem in a shape space model into a convex programming problem. When optimizing the solution to the convex programming problem by using the ADMM algorithm, we further propose a spectral norm proximal operator to satisfy the orthogonality and sparsity constraint of the solution. Using the proposed optimization algorithm, we conduct experiments on the CMU motion capture dataset for 3D human body pose reconstruction based on the shape space model and 3D deformable shape model. Comparison experimental results show qualitatively and quantitatively that the proposed algorithm outperforms the existing optimization algorithms. The effectiveness of the proposed algorithm is validated.
Key words: 3D reconstruction     sparse representation     L1/2 regularization     convex program

1 相关工作

2 本文算法 2.1 问题描述

 $$$\label{eq1} W=\prod S$$$ (1)

 $$$\label{eq2}S=\sum\limits_{i = 1}^{k}{{c}_i}{B}_i$$$ (2)

 \begin{align} \label{eq10} \min\limits_{\pmb C, M}\frac{{1}}{2}{{\left\|W-{\sum\limits_{i=1}^k}M_iB_i\right\|}^2_F+\lambda{\|\pmb C\|}_{0}}\nonumber \\ {\rm s.\, t.}~~M_i{M_i}^{\rm T}=c_i^2{I_2}, \forall {i}\in\left[1, ~k\right] \end{align} (10)

 \begin{align} \label{eq11} \min\limits_{\pmb C}{\|\pmb C\|}_{0}\quad\quad\quad\quad\quad\quad\quad\quad \nonumber \\{\rm s.\, t.}~~W=\sum\limits_{i = 1}^{k}{M_i}{B}_i, \|M_i\|_2\leq|c_i|, \forall {i}\in\left[1, ~k\right] \end{align} (11)

 \begin{align} \label{eq12} &\min\limits_{\pmb C}{\|\pmb C\|}_{0}\nonumber \\&{\rm s.\, t.}~~{\left\|W-\sum\limits_{i = 1}^{k}{M_i}B_i\right\|^2_F}<\varepsilon, \|M_i\|_2\leq|c_i|, \nonumber \\& \forall {i}\in\left[1, ~k\right] \end{align} (12)

 \begin{align} \label{eq13} \min\limits_{\pmb C}{\|\pmb C\|}^{1/2}_{1/2}\quad\quad\quad\quad\quad\quad\quad\quad \nonumber \\{\rm s.\, t.}~~{\left\|W-\sum\limits_{i = 1}^{k}{M_i}B_i\right\|^2_F}<\varepsilon, \|\pmb C\|^{1/2}_{1/2}=\sum\limits_{i = 1}^{k}|c_i|^{1/2}, \nonumber \\ \|M_i\|_2\leq|c_i|, \forall i\in\left[1, ~k\right]\quad\quad\quad\quad\quad\quad\quad\quad \end{align} (13)

2.3.2 基于3D可变形状模型的3D形状重构

 $$$\label{eq16} \min\limits_{c_1, \cdots, c_k, \bar {R}}{\|M_i-c_i\bar {R}\|}^2_F \quad {\rm s.\, t.}~~ \bar {R}\bar {R}^{\rm T}=I_2$$$ (16)

1) 初始化$\pmb C, \bar{R}$; /*图像投影配准算法*/

2) 优化化$\pmb C, \bar{R}$; /*交替迭代最小化方法*/

3) $S=\sum_{i = 1}^{k}{c_i}B_i$;

4) 算法结束.

3 模型优化求解

3.1 谱范数近端梯度算法

 $$$\label{eq17} \min\limits_{M_i}\left(\frac{{1}}{2}{{\|Y-M_i\|}^2_F+\lambda\|M_i\|^{1/2}_2}\right)$$$ (17)

 $$$\label{eq18} M^*_i=D_{\sqrt{\lambda}}{(Y)}=U_Y{\rm diag}\left\{\pmb {\sigma}_Y-\sqrt{\lambda}P_{L_1}(\frac{{\pmb {\sigma}}_Y}{\sqrt{\lambda}})\right\}V^{\rm T}_Y$$$ (18)

 $$$\label{eq19} {\rm prox}_{\lambda F}(Y)=\arg\min\limits_{M_i}\left(\frac{{1}}{2}{{\|Y-M_i\|}^2_F}+\lambda {F(M_i)}\right)$$$ (19)

 $$$\label{eq20} {\rm prox}_{\lambda F}(Y)=U_Y{\rm diag}\{{\rm prox}_{\lambda f}({\pmb \sigma}_Y)\}V^{\rm T}_Y$$$ (20)

 \begin{align} \label{eq21} {\rm prox}_{\lambda f}({\pmb \sigma}_Y)=\, & {\pmb\sigma}_{Y}-{\rm prox}_{(\lambda f)^*}({\pmb \sigma}_Y)=\nonumber \\& {\pmb\sigma}_Y- {{\rm prox}_{\lambda f{(./\lambda)}^*}}({\pmb \sigma}_Y)=\nonumber \\&{\pmb \sigma}_Y-{\rm prox}_{\sqrt{\lambda}f{(./\sqrt{\lambda})}^*}({\pmb\sigma}_Y)=\nonumber \\& {\pmb \sigma}_Y-{\rm prox}_{\frac{\sqrt{\lambda}}{\sqrt{\lambda}}{(\sqrt{\lambda}f)}^*}( {\pmb\sigma}_Y)= \nonumber \\&{\pmb\sigma}_{Y}-\sqrt{\lambda}P_{L_1}(\frac{ {\pmb\sigma}_Y}{\sqrt{\lambda}}) \end{align} (21)

4.2 3D人体姿态重构

4.2.1 基于形状空间模型的重构

 图 1 三种方法的定性实验效果对比图 Figure 1 The comparison of qualitative experiment results of three methods

 图 2 三种方法的重构误差对比图 Figure 2 The reconstruction error comparison of three methods

 图 3 重构误差的盒图 Figure 3 The box diagram of reconstruction error

 图 4 三种方法的稀疏度对比图 Figure 4 The sparse contrast graphs of three methods

4.2.2 基于3D可变形状模型的重构

 图 5 三种方法的定性实验效果对比图 Figure 5 The qualitative experiment effect contrast chart of the three methods

 图 6 三种方法的重构误差对比图 Figure 6 The reconstruction error contrast chart of three methods
 图 7 重构误差的盒图 Figure 7 The box diagram of reconstruction error
 图 8 三种方法的稀疏度对比图 Figure 8 The sparse contrast graphs of three methods
5 结束语

 1 Fidler S, Dickinson S, Urtasun R. 3D object detection and viewpoint estimation with a deformable 3D cuboid model. In: Proceedings of the 2012 International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: Curran Associates Inc., 2012. 611-619 http://dl.acm.org/citation.cfm?id=2999134.2999203 2 Simo-Serra E, Quattoni A, Torras C, Moreno-Noguer F. A joint model for 2D and 3D pose estimation from a single image. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013. 3634-3641 http://dl.acm.org/citation.cfm?id=2516195 3 Cootes T F, Taylor C J, Cooper D H, Graham J. Active shape models-their training and application. Computer Vision and Image Understanding, 1995, 61(1): 38-59. DOI:10.1006/cviu.1995.1004 4 Hejrati M, Ramanan D. Analyzing 3D objects in cluttered images. In: Proceeding of the 2012 International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: Curran Associates Inc., 2012. 593-601 http://dl.acm.org/citation.cfm?id=2999134.2999201 5 Zia M Z, Stark M, Schiele B, Schindler K. Detailed 3d representations for object recognition and modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2608-2623. DOI:10.1109/TPAMI.2013.87 6 Fang Hong, Yang Hai-Rong. Greedy algorithms and compressed sensing. Acta Automatica Sinica, 2011, 37(12): 1413-1421.( 方红, 杨海蓉. 贪婪算法与压缩感知理论. 自动化学报, 2011, 37(12): 1413-1421.) 7 Zhou Yu, Liu Jun-Tao, Bai Xiang. Research and perspective on shape matching. Acta Automatica Sinica, 2012, 38(6): 889-910.( 周瑜, 刘俊涛, 白翔. 形状匹配方法研究与展望. 自动化学报, 2012, 38(6): 889-910.) 8 Wang C Y, Wang Y Z, Lin Z C, Yuille A L, Gao W. Robust estimation of 3D human poses from a single image. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC, USA: IEEE, 2014. 2369-2376 http://arxiv.org/abs/1406.2282 9 Blanz V, Vetter T. Face recognition based on fitting a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(9): 1063-1074. DOI:10.1109/TPAMI.2003.1227983 10 Gu L, Kanade T. 3D alignment of face in a single image. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, NY, USA: IEEE, 2006. 1305-1312 http://dl.acm.org/citation.cfm?id=1153537 11 Cao C, Weng Y L, Lin S, Zhou K. 3D shape regression for real-time facial animation. ACM Transactions on Graphics, 2013, 32: Article No. 41. 12 Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008. 1-8 http://doi.ieeecomputersociety.org/10.1109/CVPR.2008.4587597 13 Lin Y L, Morariu V I, Hsu W, Davis L S. Jointly optimizing 3D model fitting and fine-grained classification. In: Proceedings of the 2014 European Conference on Computer Vision, Lecture Notes in Computer Science, Vol. 8692. Heidelberg, Berlin, Germany: Springer, 2014. 466-480 http://link.springer.com/10.1007/978-3-319-10593-2_31 14 Ramakrishna V, Kanade T, Sheikh Y. Reconstructing 3D human pose from 2D image landmarks. In: Proceedings of the 2012 European Conference on Computer Vision, Lecture Notes in Computer Science, Vol. 7575. Heidelberg, Berlin, Germany: Springer, 2012. 573-586 http://link.springer.com/10.1007/978-3-642-33765-9_41 15 Fan X C, Zheng K, Zhou Y J, Wang S. Pose locality constrained representation for 3D human pose reconstruction. In: Proceedings of the 2014 European Conference on Computer Vision, Lecture Notes in Computer Science, Vol. 8689. Heidelberg, Berlin, Germany: Springer, 2014. 174-188 16 Zhou F, de la Torre F. Spatio-temporal Matching for human detection in video. In: Proceedings of the 2014 Computer Vision, Lecture Notes in Computer Science, Vol. 8694. Heidelberg, Berlin, Germany: Springer, 2014. 62-77 17 Akhter I, Black M J. Pose-conditioned joint angle limits for 3D human pose reconstruction. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015. 1446-1455 https://www.researchgate.net/publication/298380919_Pose-Conditioned_Joint_Angle_Limits_for_3D_Human_Pose_Reconstruction 18 Cashman T J, Fitzgibbon A W. What shape are dolphins? Building 3D morphable models from 2D images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 232-244. DOI:10.1109/TPAMI.2012.68 19 Vicente S, Carreira J, Agapito L, Batosta J. Reconstructing PASCAL VOC. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014. 41-48 http://dl.acm.org/citation.cfm?id=2679600.2679960 20 Carreira J, Kar A, Tulsiani S, Malik J. Virtual view networks for object reconstruction. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015. 2937-2946 http://arxiv.org/abs/1411.6091 21 Kar A, Tulsiani S, Carreira J, Malik J. Category-specific object reconstruction from a single image. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015. 1966-1974 http://arxiv.org/abs/1411.6069 22 Su H, Huang Q X, Mitra N J, Li Y Y, Guibas L. Estimating image depth using shape collections. ACM Transactions on Graphics, 2014, 33(4): Article No. 37. 23 Huang Q X, Wang H, Koltun V. Single-view reconstruction via joint analysis of image and shape collections. ACM Transactions on Graphics, 2015, 34(4): Article No. 87. 24 Zhou X W, Leonardos S, Hu X Y, Daniilidis K. 3D shape estimation from 2d landmarks: a convex relaxation approach. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015. 4447-4455 http://ieeexplore.ieee.org/document/7299074/ 25 Zhou X W, Zhu M L, Leonardos S, Daniilidis K. Sparse representation for 3D shape estimation: a convex relaxation approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(8): 1648-1661. DOI:10.1109/TPAMI.2016.2605097 26 Tibshirani R. Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2011, 73(3): 273-282. DOI:10.1111/rssb.2011.73.issue-3 27 Chen S, Donoho D. Basis pursuit. In: Proceedings of the 1994 Conference Record of the Twenty-Eighth Asilomar Conference on Signals, Systems and Computers. Pacific Grove, CA, USA: IEEE, 2002, 1: 41-44 28 Chen S S, Donoho D L, Saunders M A. Atomic decomposition by basis pursuit. Siam Review, 2001, 43(1): 129-159. DOI:10.1137/S003614450037906X 29 Elad M, Bruckstein A M. A generalized uncertainty principle and sparse representation in pairs of bases. IEEE Transactions on Information Theory, 2002, 48(9): 2558-2567. DOI:10.1109/TIT.2002.801410 30 Donoho D L, Huo X. Uncertainty principles and ideal atomic decomposition. IEEE Transactions on Information Theory, 2001, 47(7): 2845-2862. DOI:10.1109/18.959265 31 Xu Z B, Zhang H, Wang Y, Change X Y, Liang Y. L1/2 regularization. Science China Information Sciences, 2010, 53(6): 1159-1169. DOI:10.1007/s11432-010-0090-0 32 Del Bue A, Xavier J, Agapito L, Paladini M. Bilinear modeling via augmented lagrange multipliers (BALM). IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(8): 1496-1508. DOI:10.1109/TPAMI.2011.238 33 Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 2010, 3(1): 1-122. 34 Parikh N, Boyd S. Proximal algorithms. Foundations and Trends in Optimization, 2013, 1(3): 123-231. 35 Mairal J, Bach F, Ponce J, Sapiro G. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 2010, 11: 19-60. 36 Mocap: Carnegie Mellon university motion capture database[Online], available: http://Mocap.cs.cmu.edu/, March 1, 2017 37 Zhu Yu, Zhao Jiang-Kun, Wang Yi-Ning, Zheng Bing-Bing. A review of human action recognition based on deep learning. Acta Automatica Sinica, 2016, 42(6): 848-857.( 朱煜, 赵江坤, 王逸宁, 郑兵兵. 基于深度学习的人体行为识别算法综述. 自动化学报, 2016, 42(6): 848-857.) 38 Zhou X W, Zhu M L, Pavlakos G, Leonardos S, Derpanis K G, Daniilidis K. MonoCap: monocular human motion capture using a CNN coupled with a geometric prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018. DOI:10.1109/TPAMI.2018.2816031 39 Zhou X W, Zhu M L, Leonardos S, Derpanis K G, Daniilidis K. Sparseness meets deepness: 3D human pose estimation from monocular video. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 4966-4975