 智能系统学报  2020, Vol. 15 Issue (5): 900-909  DOI: 10.11992/tis.201906054 0

LIU Dongjingdian, MENG Xuechun, ZHANG Zixin, et al. A behavioral recognition algorithm based on 2D spatiotemporal information extraction[J]. CAAI Transactions on Intelligent Systems, 2020, 15(5): 900-909. DOI: 10.11992/tis.201906054.

A behavioral recognition algorithm based on 2D spatiotemporal information extraction
LIU Dongjingdian , MENG Xuechun , ZHANG Zixin , YANG Xu , NIU Qiang
College of Computer Science & Technology, China University of Mining and Technology , Xuzhou 221008, China
Abstract: Human behavior recognition technology based on computer vision is a research hotspot currently. It is widely applied in various fields of social life, such as behavioral detection, video surveillance, etc. Traditional behavior recognition methods are computationally cumbersome and time-sensitive. Therefore, the development of deep learning has greatly improved the accuracy of behavior recognition algorithms. However, compared with the field of image processing, there is a certain gap in the effect of such methods. We introduce a novel behavior recognition algorithm based on DenseNet, which uses DenseNet as the network architecture, learns spatio-temporal information through 2D convolution, selects frames for characterizing behavior in video, organizes these frames into RGB space in time-space order and inputs them into our network to train the network. We have carried out a large number experiments on the UCF101 dataset, and our method can reach an accuracy rate of 94.46%.
Key words: behavior recognition    video analysis    neural networks    deep learning    convolutional neural networks    classification    spatiotemporal feature    densenet

1 相关工作 1.1 卷积网络

ResNet引入了残差块，即增加了把当前输出直接传输给后面层网络而绕过了非线性变换的直接连接，梯度可以直接流向前面层，有助于解决梯度消失和梯度爆炸问题。然而该网络的缺点是，前一层的输出与其卷积变换后的输出之间通过值相加操作结合在一起可能会阻碍网络中的信息流[5-6]

DenseNet在ResNet的基础上提出了一种不同的连接方式。它建立了一个密集块内前面层和后面所有层的密集连接，即每层的输入是其前面所有层的特征图，与ResNet在值上的累加不同，DenseNet是维度上的累加，因此在信息流方面克服了ResNet的缺点，改进了信息流。DenseNet的网络结构由密集块组成，其中，两个密集块之间有过渡层。密集块内的结构参照了ResNet的瓶颈结构(Bottleneck)，而过渡层中包括了一个 $1\times 1$ 的卷积层和一个 $2\times 2$ 的平均池化层。DenseNet减少了参数，使网络更窄，缓解了梯度消失问题，加强了特征的传播，鼓励特征重用[6]

1.2 行为识别算法

2 2D时空卷积设计以及时空特征组织形式

2.1 2D卷积理解与时空特征提取可行性分析

 $d=\frac{2\times w}{k}$ (1)

 $d=\frac{2\times w}{k\times {f}^{n}}$ (2)

 ${A_{n,i,j}} = R\left[ {\begin{array}{*{20}{c}} {{A_{n - 1,i - \frac{{k - 1}}{2},j - \frac{{k - 1}}{2}}}}& \cdots &{{A_{n - 1,i - \frac{{k - 1}}{2},j + \frac{{k - 1}}{2}}}}\\ \vdots & & \vdots \\ {{A_{n - 1,i + \frac{{k - 1}}{2},j - \frac{{k - 1}}{2}}}}& \cdots &{{A_{n - 1,i + \frac{{k - 1}}{2},j + \frac{{k - 1}}{2}}}} \end{array}} \right]$ (3)

 ${r}_{n}=\left({r}_{n-1}+k-1\right)\times {f}_{n-1}$ (4)

2.2 选取和拼接的组织

2.3 翻转操作及原因

2.4 DenseNet的选择

DenseNet是CVPR2017的最佳论文，不同于之前的神经网络在宽度(inception结构)和深度(resblock结构)上的改进，在模型的特征维度进行了改进，将不同卷积阶段所提取的特征进行维度上的密集连接，可以保留更丰富的信息。DenseNet建立了一个denseblock内前面层和后面所有层的密集连接，即每层的输入是其前面所有层的特征图，第 $l$ 层的输出 ${x}_{l}$ 可以表示为如下恒等函数：

 ${x}_{l}={H}_{l}\left(\left\{{x}_{0},{x}_{1},\cdots, {x}_{l-1}\right\}\right)$ (5)

2.5 引入时空卷积层提取时空信息

3 实验

3.1 翻转操作的验证

 Download: 图 10 无翻转操作与带翻转操作准确率对比 Fig. 10 Accuracy comparison between no flipping operation and flipping operation

3.2 时空卷积层效果提升与特征可视化

 Download: 图 11 2DSDCN_R 和 2DSDCN_D的准确率对比 Fig. 11 Accuracy comparison between 2DSDCN_R and 2DSDCN_D
 Download: 图 12 denseblock和resblock设计的特征可视化 Fig. 12 Feature of visualization denseblock and resblock
3.3 不同的帧选取方式下模型鲁棒性的验证

 Download: 图 13 每5帧采样下2DSDCN_R和2DSDCN_D的准确率对比 Fig. 13 Accuracy comparison between 2DSDCN_R and 2DSDCN_D with sampling every 5 frames
3.4 实验分析

4 结束语

