首页 关于本刊 编 委 会 期刊动态 作者中心 审者中心 读者中心 下载中心 联系我们 English
 自动化学报  2017, Vol. 43 Issue (10): 1759-1772 PDF

1. 西北工业大学计算机学院 西安 710072 中国;
2. 宾夕法尼亚大学 工程与应用科学学院 费城 19104 美国

Synchronization of Video Sequences Through 3D Trajectory Reconstruction
WANG Xue1, SHI Jian-Bo2, PARK Hyun-Soo2, WANG Qing1
1. School of Computer Science, Northwestern Polytechinical University, Xi'an 710072, China;
2. School of Engineering and Applied Science, University of Pennsylvania, Philadelphia PA 19104, USA
Manuscript received : August 10, 2016, accepted: March 2, 2017.
Foundation Item: Supported by National Natural Science Foundation of China (61531014)
Corresponding author. WANG Qing  Professor at the School of Computer Science and Engineering, Northwestern Polytechnical University. His research interest covers computer vision, image and video signal processing, light field, and virtual reality. Corresponding author of this paper.E-mail:qwang@nwpu.edu.cn
Recommended by Associate Editor HUANG Qing-Ming
Abstract: We present an algorithm for synchronization of an arbitrary number of videos captured by cameras independently moving in a dynamic 3D scene. Assuming the 3D spatial poses of the cameras are known for each frame, we first reconstruct the 3D trajectory of a moving point using the trajectory basis-based method. The trajectory coefficients are computed for each sequence separately. Point correspondences across sequences are not required, or even it is possible to track different points in different sequences, only if every 3D point tracked in the second sequence is a linear combination of subsets of the 3D points tracked in the first sequence. Then we propose use a robust rank constraint of the coefficient matrices to measure the spatio-temporal alignment quality for every feasible pair of video fragments. Finally, the optimal temporal mapping is found using a graph-based approach. Our algorithm can use both short and long feature trajectories, and it is robust to mild outliers. We verify the robustness and performance of the proposed approach on synthetic data as well as on challenging real video sequences.
Key words: Video synchronization     independently-moving cameras     non-rigid structure from motion     trajectory basis     rank constraint

$S_r=\{I_r(1), I_r(2), \cdots, I_r(N_r)\}$$S_o= \{I_o(1), I_o(2), \cdots, I_o(N_o)\}分别表示由独立运动相机拍摄的参考图像序列和观测图像序列, 其中N_r$$N_o$分别为两个序列的帧数.可检验的整数时间偏移量$\Delta$的取值范围是${R}=[-N_o+F, N_r-F]$.

 \begin{align} &\begin{bmatrix} \hat{X}^{(1)}_1&\cdots&\hat{X}^{(1)}_{P_o}\\ \vdots&\ddots&\vdots \\ \hat{X}^{(F)}_1&\cdots & \hat{X}^{(F)}_{P_o}\end{bmatrix}=\nonumber\\ &\qquad\begin{bmatrix} {X}^{(1)}_1&\cdots&{X}^{(1)}_{P_r}\\ \vdots&\ddots&\vdots \\ {X}^{(F)}_1&\cdots & {X}^{(F)}_{P_r}\end{bmatrix}\begin{bmatrix}{\pmb \alpha}_1^{\rm T} \\ \vdots \\ {\pmb \alpha}_{P_o}^{\rm T}\end{bmatrix}^{\rm T} \end{align} (6)

 $$$[\hat{\pmb \beta}_1 \ \cdots \ \hat{\pmb \beta}_{P_o}]=[{\pmb \beta}_1 \ \cdots \ {\pmb \beta}_{P_r}][{\pmb \alpha}_1 \ \cdots \ {\pmb \alpha}_{P_o}] \label{eq:betarelation}$$$ (7)

$P_r$$P_o的确定依据以下两个规则: 1) 以参考图像序列为例, 将持续跟踪子序列段f_r(j)的图像点个数记为P_r(j), 则P_r=\min\{P_r(j)\}, \lfloor F/2\rfloor+1\leq j\leq N_r- \lfloor F/2\rfloor.对P_o同理. 2) 满足两个不等式, 2F \geq 3K$$3K\geq P_r+P_o$.前者为了确保运动目标轨迹重建时的超定系统, 后者则保证了$\overline{M}$的秩最大不超过$P_r$.

 图 4 双序列时域对准算法流程图 Figure 4 The flow chart of pairwise alignment
2 多序列时域对准

3 仿真实验

 图 5 仿真数据重建结果(黑)和真实值(灰) Figure 5 Reconstruction (black) and ground truth (gray) of simulated data

 $\varepsilon=\frac{1}{N}\sum\limits_{t_o=1}^{N}|\hat{\omega}(t_o)-\omega(t_o)|$ (11)
3.1 鲁棒性

 图 6 跟踪误差、数据丢失和图像点数量对同步结果的影响 Figure 6 Comparisons of robustness with regard to tracking error, missing data and point number
3.2 准确性

 图 7 仿真数据集上各算法在不同跟踪误差下的实验结果对比以及估算的代价矩阵示例 Figure 7 Comparisons of alignment accuracy using different methods regarding tracking noise level and representative cost matrices with estimated optimal paths superimposed
4 第一人称视角数据

 图 8 三维重建结果(从左到右对应场景依次为:积木, 健身毯, 篮球#1, 篮球#2和玩具火车) Figure 8 The 3D reconstruction results (From left to right: block building, exercise mat, basketball (#1), basketball (#2) and toy train.)

 图 9 积木场景中各算法的时域对准结果对比(从左到右依次为:参考序列中的图像帧、本文算法、PDM、BPM、ECM、MFM和SMM找到的第二个序列中的对应帧(上)及第三个序列中的对应帧(下)) Figure 9 Synchronization results on the blocks scene (From left to right: sample frames from the reference sequence, corresponding frames from the second sequence (top) and the third sequence (bottom) by our method, PDM, BPM, ECM, MFM and SMM, respectively.)
 图 10 健身毯场景中各算法的时域对准结果对比(同图 9) Figure 10 Synchronization results on the exercise mat scene idem as Fig. 9
 图 11 篮球#1场景中各算法的时域对准结果对比(从左到右依次为:参考序列中的图像帧、本文算法、PDM、BPM、ECM、MFM和SMM找到的第二个序列中的对应帧) Figure 11 Synchronization results on the basketball scene (#1) (From left to right: sample frames from the reference sequence, corresponding frames from the second sequence by our method, PDM, BPM, ECM, MFM and SMM, respectively.)
 图 12 篮球#2场景中各算法的时域对准结果对比(同图 11) Figure 12 Synchronization results on the basketball scene (#2) idem as Fig. 11
 图 13 玩具火车场景中各算法的时域对准结果对比(同图 11) Figure 13 Synchronization results on the toy train scene idem as Fig. 11
 图 14 不同有效秩对同步结果的影响及不同有效秩对应的代价矩阵 Figure 14 Comparisons of alignment accuracy with different λ values for efficient rank and cost matrices computed with different λ values

 图 15 不同帧率比对同步结果的影响及观测序列帧率为46 fps、40 fps和24 fps时的代价矩阵 Figure 15 Comparisons of alignment accuracy with different frame rate ratios and cost matrices computed when the frame rate of the observed sequence is 46, 40 and 24, respectively

5 结论

 1 Caspi Y, Irani M. Spatio-temporal alignment of sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(11): 1409-1424. DOI:10.1109/TPAMI.2002.1046148 2 Caspi Y, Simakov D, Irani M. Feature-based sequence-tosequence matching. International Journal of Computer Vision, 2006, 68(1): 53-64. DOI:10.1007/s11263-005-4842-z 3 Lu C, Mandal M. A robust technique for motion-based video sequences temporal alignment. IEEE Transactions on Multimedia, 2013, 15(1): 70-82. DOI:10.1109/TMM.2012.2225036 4 Pundik D, Moses Y. Video synchronization using temporal signals from epipolar lines. In:Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece:Springer Berlin Heidelberg, 2010. 15-28 5 Pádua F, Carceroni F, Santos G, Kutulakos K. Linear sequence-to-sequence alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(2): 304-320. DOI:10.1109/TPAMI.2008.301 6 Yilmaz A, Shah M. Matching actions in presence of camera motion. Computer Vision and Image Understanding, 2006, 104(2-3): 221-231. DOI:10.1016/j.cviu.2006.07.012 7 Rao C, Gritai A, Shah M, Syeda-Mahmood T. Viewinvariant alignment and matching of video sequences. In:Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France:IEEE, 2003. 939-945 8 Tresadern P A, Reid I D. Video synchronization from human motion using rank constraints. Computer Vision and Image Understanding, 2009, 113(8): 891-906. DOI:10.1016/j.cviu.2009.03.012 9 Wolf L, Zomet A. Correspondence-free synchronization and reconstruction in a non-rigid scene. In:Proceedings of the 7th European Conference on Computer Vision, Workshop on Vision and Modelling of Dynamic Scenes. Copenhagen, Denmark:Springer Berlin Heidelberg, 2002. 10 Wolf L, Zomet A. Wide baseline matching between unsynchronized video sequences. International Journal of Computer Vision, 2006, 68(1): 43-52. DOI:10.1007/s11263-005-4841-0 11 Sand P, Teller S. Video matching. ACM Transactions on Graphics, 2004, 23(3): 592-599. DOI:10.1145/1015706 12 Evangelidis G D, Bauckhage C. Efficient subframe video alignment using short descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(10): 2371-2386. DOI:10.1109/TPAMI.2013.56 13 Serrat J, Diego F, Lumbreras F, Álvarez J M. Synchronization of video sequences from free-moving camreas. In:Proceedings of the 3rd Iberian Conference on Pattern Recognition and Image Analysis, Part Ⅱ. Girona, Spain:Springer Berlin Heidelberg, 2007. 620-627 14 Diego F, Ponsa D, Serrat J, López A M. Video alignment for change detection. IEEE Transactions on Image Processing, 2011, 20(7): 1858-1869. DOI:10.1109/TIP.2010.2095873 15 Diego F, Serrat J, López A M. Joint spatio-temporal alignment of sequences. IEEE Transactions on Multimedia, 2013, 15(6): 1377-1387. DOI:10.1109/TMM.2013.2247390 16 Wang O, Schroers C, Zimmer H, Gross M, Sorkine-Hornung A. VideoSnapping:interactive synchronization of multiple videos. ACM Transactions on Graphics, 2014, 33(4): 77. 17 Tuytelaars T, van Gool L. Synchronizing video sequences. In:Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington D C, USA:IEEE, 2004. 762-768 18 Lei C, Yang Y. Trifocal tensor-based multiple video synchronization with subframe optimization. IEEE Transactions on Image Processing, 2006, 15(9): 2473-2480. DOI:10.1109/TIP.2006.877438 19 Dexter E, Pérez P, Laptev I. Multi-view synchronization of human actions and dynamic scenes. In:Proceedings of the 2009 British Machine Vision Conference. London, UK:BMVA Press, 2009. 122:1-122:11 20 Akhter I, Sheikh Y, Khan S, Kanade T. Nonrigid strcture from motion in trajectory space. In:Proceedings of the 2008 Advances in Neural Information Processing Systems. Vancouver, Canada:NIPS, 2008. 41-48 21 Park H S, Shiratori T, Matthews I, Sheikh Y. 3D reconstruction of a moving point from a series of 2D projections. In:Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece:Springer, 2010. 158-171 22 Kutulakos K N, Vallino J. Affine object representations for calibration-free augmented reality. In:Proceedings of the 1996 IEEE Virtual Reality Annual International Symposium. Washington DC, USA:IEEE, 1996. 25-36 23 Fragkiadaki K, Zhang W J, Zhang G, Shi J B. Twogranularity tracking:mediating trajectory and detection graphs for tracking under occlusions. In:Proceedings of the 12th European Conference on Computer Vision. Florence, Italy:Springer, 2012. 552-565 24 Lucas B D, Kanade T. An interative image registration technique with an application to stereo vision. In:Proceedings of the 7th International Joint Conference on Artificial Intelligence. Vancouver, Canada:Morgan Kaufmann Publishers Inc., 1981. 674-679 25 Snavely N, Seitz S M, Szeliski R. Photo tourism:exploring photo collections in 3D. ACM Transactions on Graphics, 2006, 25(3): 835-846. DOI:10.1145/1141911 26 Hartley R I, Zisserman A. Multiple View Geometry in Computer Vision (2nd edition). Cambridge: Cambridge University Press, 2004. 27 Park H S, Jain E, Sheikh Y. 3D gaze concurrences from head-mounted cameras. In:Proceedings of the 2012 Advances in Neural Information Processing Systems. Nevada, USA:NIPS, 2012. 422-430