CN104317386B

Movatterモバイル変換

Info

Publication number: CN104317386B
Application number: CN201410293405.9A
Authority: CN
Inventors: 吴亚东; 林水强; 张红英
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2014-06-25
Filing date: 2014-06-25
Publication date: 2017-08-04
Anticipated expiration: 2034-06-25
Also published as: CN104317386A

Abstract

Translated fromChinese

本发明公开了一种姿势序列有限状态机动作识别方法，首先将Kinect传感器获取的肢体节点数据进行坐标变换，采用统一的空间网格模型对变换数据进行度量，建立肢体节点坐标系；然后通过定义肢体节点特征向量对预定义肢体动作关节点运动序列进行采样和分析；最后建立基于关节点运动轨迹的肢体动作轨迹正则表达式，构造姿势序列有限状态机，实现对预定义动作的快速识别。实验结果表明，该方法具有良好的扩展性和通用性，对预定义的17种肢体动作识别准确率超过94％，识别反馈时间小于0.1s，能够满足体感交互应用需求。

The invention discloses a gesture sequence finite state machine action recognition method. Firstly, coordinate transformation is performed on limb node data obtained by a Kinect sensor, and a unified spatial grid model is used to measure the transformed data to establish a limb node coordinate system; then by defining The limb node feature vector samples and analyzes the predefined limb motion joint point motion sequence; finally, establishes the regular expression of the limb motion trajectory based on the joint point motion trajectory, constructs the pose sequence finite state machine, and realizes the rapid recognition of the predefined motion. The experimental results show that the method has good scalability and versatility, the recognition accuracy of 17 predefined body movements exceeds 94%, and the recognition feedback time is less than 0.1s, which can meet the needs of somatosensory interaction applications.

Description

Translated fromChinese

一种姿势序列有限状态机动作识别方法A Pose Sequence Finite State Machine Action Recognition Method

技术领域technical field

本发明涉及人机交互技术，尤其涉及的是一种姿势序列有限状态机动作识别方法。The present invention relates to human-computer interaction technology, in particular to a posture sequence finite state machine action recognition method.

背景技术Background technique

在人机交互领域，动作识别是体感交互的前提，动作识别和行为理解逐渐成为人机交互领域的研究热点^[1-3]。为达到有效的交互目的，必须对不同的交互动作，包含肢体运动、手势以及静态姿势，进行定义和识别^[4]。近年来，基于Kinect体感技术的动作识别应用开发十分丰富，但在这些应用中，虽然能够有效跟踪人体运动轨迹^[5-7]，但识别动作比较单一且识别方法不利于扩展^[8-10]，亟待研究和开发一种具有扩展性和通用性的动作识别模型。In the field of human-computer interaction, action recognition is the premise of somatosensory interaction, and action recognition and behavior understanding have gradually become research hotspots in the field of human-computer interaction^[1-3] . In order to achieve effective interaction, different interaction actions, including body movements, gestures and static postures, must be defined and identified^[4] . In recent years, the development of motion recognition applications based on Kinect somatosensory technology has been very rich, but in these applications, although the human body movement trajectory can be effectively tracked^[5-7] , the recognition action is relatively single and the recognition method is not conducive to expansion^[8-10] , it is urgent to research and develop an action recognition model with scalability and versatility.

目前，基于Kinect的动作识别方法较多，如事件触发方法、模板匹配方法、机器学习方法等。文献[11]提到的灵活动作和铰接式骨架工具包(flexible action andarticulated skeleton toolkit,FAAST)是介于Kinect开发包和应用程序之间的体感操作中间件，主要采用事件触发方式进行识别，如角度、距离、速度等事件，该方法计算量小，实时性和准确率高，但事件触发方法本身具有局限性，对连续动作的识别较困难。Ellis等^[12]探讨了行为识别准确率与延迟之间的权衡，从动作数据序列中确定关键帧实例，从而推导出动作模板，但动作模板仅保存同一类行为的形态和模型，忽略了变化。Wang等^[13]给出了一种动作子集组合的方法，对关节点子集进行分类，识别率高，但该方法着重于是事先分割的数据流级别，不能用于从未分割数据流中进行在线识别。Zhao等^[14]提出了一种结构化流骨骼(structured streaming skeletons,SSS)特征匹配的方法，通过离线训练建立一个特征字典和手势模型，为未知动作的动作数据流的每一帧数据分配标签，通过提取SSS特征在线预测动作类型，该方法能够有效解决错误分割和模板匹配不足的问题，能够从未分割数据流中进行在线识别，但计算复杂，识别反馈时间不稳定，且对每个动作识别需要特征字典库，对于扩展动作类型识别时，需要收集大量动作数据进行离线训练，特定动作识别与训练集耦合度较高。At present, there are many motion recognition methods based on Kinect, such as event triggering method, template matching method, machine learning method and so on. The flexible action and articulated skeleton toolkit (FAAST) mentioned in [11] is a somatosensory operation middleware between the Kinect development kit and the application program. It mainly adopts the event trigger method for recognition, such as Angle, distance, speed and other events, this method has a small amount of calculation, high real-time and high accuracy, but the event trigger method itself has limitations, and it is difficult to recognize continuous actions. Ellis et al.^[12] explored the trade-off between behavior recognition accuracy and delay, and determined the key frame instance from the motion data sequence to derive the motion template, but the motion template only saves the shape and model of the same type of behavior, ignoring the change . Wang et al.^[13] proposed a method of combination of action subsets to classify joint point subsets with a high recognition rate, but this method focuses on the level of data streams that have been segmented in advance and cannot be used for unsegmented data streams. Online identification. Zhao et al.^[14] proposed a structured streaming skeleton (SSS) feature matching method, which established a feature dictionary and gesture model through offline training, and assigned labels to each frame of the action data stream of unknown actions. , by extracting SSS features to predict action types online, this method can effectively solve the problems of wrong segmentation and insufficient template matching, and can perform online recognition from unsegmented data streams, but the calculation is complex, the recognition feedback time is unstable, and each action Recognition requires a feature dictionary library. For extended action type recognition, a large amount of action data needs to be collected for offline training, and specific action recognition is highly coupled to the training set.

参考文献：references:

[1]Yu Tao.Kinect application development and actual combat:In themost natural way to dialogue with the machine[M].Beijing:China Machine Press，2012：46-47(in Chinese)[1] Yu Tao. Kinect application development and actual combat: In the most natural way to dialogue with the machine [M]. Beijing: China Machine Press, 2012: 46-47 (in Chinese)

(余涛.Kinect应用开发实战：用最自然的方式与机器对话[M].北京：机械工业出版社，2012：46-47)(Yu Tao. Kinect application development practice: use the most natural way to talk to the machine [M]. Beijing: Mechanical Industry Press, 2012: 46-47)

[2]Wang J，Xu Z J.STV-based video feature processing for actionrecognition[J].Signal Processing，2013，93(8)：2151-2168[2] Wang J, Xu Z J. STV-based video feature processing for action recognition [J]. Signal Processing, 2013, 93(8): 2151-2168

[3]Xu Guangyou，Cao Yuanyuan.Action recognition and activityunderstanding:A review[J].Journal of Image and Graphics，2009，14(2)：189-195(inChinese)[3] Xu Guangyou, Cao Yuanyuan. Action recognition and activity understanding: A review [J]. Journal of Image and Graphics, 2009, 14(2): 189-195 (in Chinese)

(徐光祐，曹媛媛.动作识别与行为理解综述[J].中国图象图形学报，2009，14(2)：189-195)(Xu Guangyou, Cao Yuanyuan. A Review of Action Recognition and Behavior Understanding [J]. Chinese Journal of Image and Graphics, 2009, 14(2): 189-195)

[4]Van den Bergh M，Carton D，De Nijs R，et al.Real-time 3D hand gestureinteraction with a robot for understanding directions from humans[C]//Proceedings of Robot and Human Interactive Communication.Los Alamitos:IEEEComputer Society Press,2011:357-362[4]Van den Bergh M, Carton D, De Nijs R, et al.Real-time 3D hand gesture interaction with a robot for understanding directions from humans[C]//Proceedings of Robot and Human Interactive Communication.Los Alamitos:IEEEComputer Society Press,2011:357-362

[5]Zhang Q S，Song X，Shao X W，et al.Unsupervised skeleton extractionand motion capture from 3D deformable matching[J].Neurocomputing,2013,100:170-182[5]Zhang Q S, Song X, Shao X W, et al.Unsupervised skeleton extraction and motion capture from 3D deformable matching[J].Neurocomputing,2013,100:170-182

[6]Shotton J，Sharp T，Kipman A，et al.Real-time human pose recognitionin parts from single depth images[J].Communications of the ACM，2013，56(1)：116-124[6] Shotton J, Sharp T, Kipman A, et al. Real-time human pose recognition in parts from single depth images [J]. Communications of the ACM, 2013, 56(1): 116-124

[7]El-laithy R A，Huang J，Yeh M.Study on the use of Microsoft Kinectfor robotics applications[C]//Proceedings of Position Location and NavigationSymposium.Los Alamitos:IEEE Computer Society Press,2012:1280-1288[7]El-laithy R A, Huang J, Yeh M. Study on the use of Microsoft Kinect for robotics applications[C]//Proceedings of Position Location and Navigation Symposium. Los Alamitos: IEEE Computer Society Press, 2012:1280-1288

[8]Oikonomidis I，Kyriazis N，Argyros A.Efficient model-based 3Dtracking of hand articulations using Kinect[C]//Proceedings of the 22^ndBritish Machine Vision Conference.British:BMVA Press,2011:1-11[8]Oikonomidis I, Kyriazis N, Argyros A. Efficient model-based 3Dtracking of hand articulations using Kinect[C]//Proceedings of the 22nd British Machine Vision Conference.British:^BMVA Press,2011:1-11

[9]Shen Shihong，Li Weiqing.Research on Kinect-based gesturerecognition system[C]//Proceedings of 8^th Harmonious Human Machine EnvironmentConference CHCI.Beijing:Tsinghua University Press,2012:55-62(in Chinese)[9]Shen Shihong, Li Weiqing. Research on Kinect-based gesture recognition system[C]//Proceedings of 8^th Harmonious Human Machine Environment Conference CHCI.Beijing: Tsinghua University Press,2012:55-62(in Chinese)

(沈世宏，李蔚清.基于Kinect的体感手势识别系统的研究[C]//第八届和谐人机环境联合学术会议(HHME2012)论文集CHCI.北京：清华大学出版社，2012：55-62)(Shen Shihong, Li Weiqing. Research on Kinect-Based Somatosensory Gesture Recognition System[C]//Proceedings of the Eighth Harmonious Human-Machine Environment Joint Academic Conference (HHME2012) CHCI. Beijing: Tsinghua University Press, 2012: 55-62)

[10]Soltani F，Eskandari F，Golestan S.Developing a gesture-based gamefor deaf/mute people using Microsoft Kinect[C]//Proceedings of 2012SixthInternational Conference on Complex,Intelligent and Software IntensiveSystems(CISIS).Los Alamitos:IEEE Computer Society Press,2012:491-495[10] Soltani F, Eskandari F, Golestan S. Developing a gesture-based game for deaf/mute people using Microsoft Kinect[C]//Proceedings of 2012Sixth International Conference on Complex, Intelligent and Software Intensive Systems (CISIS). Los Alamitos: IEEE Computer Society Press, 2012:491-495

[11]Suma E A,Krum D M,Lange B,et al.Adapting user interfaces forgestural interaction with the flexible action and articulated skeletontoolkit[J].Computers&Graphics,2013，37(3):193-201[11]Suma E A, Krum D M, Lange B, et al. Adapting user interfaces forgestural interaction with the flexible action and articulated skeleton toolkit [J]. Computers & Graphics, 2013, 37(3): 193-201

[12]Ellis C,Masood S Z,Tappen M F,et al.Exploring the trade-offbetween accuracy and observational latency in action recognition[J].International Journal of Computer Vision,2013,101(3):420-436[12]Ellis C, Masood S Z, Tappen M F, et al.Exploring the trade-offbetween accuracy and observational latency in action recognition[J].International Journal of Computer Vision,2013,101(3):420-436

[13]Wang J,Liu Z,Wu Y,et al.Mining actionlet ensemble for actionrecognition with depth cameras[C]//Proceedings of Computer Vision and PatternRecognition(CVPR).Los Alamitos:IEEE Computer Society Press,2012:1290-1297[13]Wang J, Liu Z, Wu Y, et al.Mining actionlet ensemble for actionrecognition with depth cameras[C]//Proceedings of Computer Vision and Pattern Recognition(CVPR).Los Alamitos:IEEE Computer Society Press,2012:1290- 1297

[14]Zhao X,Li X,Pang C,et al.Online human gesture recognition frommotion data streams[C]//Proceedings of the 21^st ACM International Conferenceon Multimedia.New York:ACM Press,2013:23-32[14]Zhao X, Li X, Pang C, et al.Online human gesture recognition from motion data streams[C]//Proceedings of the 21^st ACM International Conference on Multimedia.New York:ACM Press,2013:23-32

[15]Biswas K K,Basu S K.Gesture recognition using Microsoft Kinect[C]//Proceedings of 20115^th International Conference on Automation,Roboticsand Applications(ICARA).Los Alamitos:IEEE Computer Society Press,2011:100-103[15]Biswas KK, Basu S K.Gesture recognition using Microsoft Kinect[C]//Proceedings of^20115th International Conference on Automation, Robotics and Applications(ICARA).Los Alamitos:IEEE Computer Society Press,2011:100-103

[16]Zhang Yi，Zhang Shuo，Luo Yuan，et al.Gesture track recognitionbased on Kinect depth image information and its applications[J].ApplicationResearch of Computers，2012，29(9)：3547-3550(in Chinese)[16] Zhang Yi, Zhang Shuo, Luo Yuan, et al. Gesture track recognition based on Kinect depth image information and its applications [J]. Application Research of Computers, 2012, 29(9): 3547-3550 (in Chinese)

(张毅，张烁，罗元等.基于Kinect深度图像信息的手势轨迹识别及应用[J].计算机应用研究，2012，29(9)：3547-3550)(Zhang Yi, Zhang Shuo, Luo Yuan, etc. Gesture Trajectory Recognition and Application Based on Kinect Depth Image Information [J]. Computer Application Research, 2012, 29(9): 3547-3550)

[17]Chaquet J M，Carmona E J，Fernandez-Caballero A.A survey of videodatasets for human action and activity recognition[J].Computer Vision andImage Understanding，2013，117(6)：633-659[17]Chaquet J M, Carmona E J, Fernandez-Caballero A. A survey of video datasets for human action and activity recognition [J]. Computer Vision and Image Understanding, 2013, 117(6): 633-659

发明内容Contents of the invention

本发明所要解决的技术问题是针对现有技术的不足提供一种姿势序列有限状态机动作识别方法。The technical problem to be solved by the present invention is to provide a posture sequence finite state machine action recognition method for the deficiencies of the prior art.

本发明的技术方案如下：Technical scheme of the present invention is as follows:

一种姿势序列有限状态机动作识别方法，首先，将体感交互设备获得的肢体节点数据变换到以用户为中心的肢体节点坐标系，通过定义肢体节点特征向量，对肢体动作序列进行采样分析，建立预定义动作轨迹正则表达式，构造姿势序列有限状态机，从而实现对预定义肢体动作的解析和识别。A posture sequence finite state machine action recognition method. First, transform the body node data obtained by the somatosensory interaction device into the user-centered body node coordinate system. By defining the body node feature vector, the body action sequence is sampled and analyzed to establish Predefine the regular expression of the action trajectory, and construct the gesture sequence finite state machine, so as to realize the analysis and recognition of the predefined body movements.

所述的姿势序列有限状态机动作识别方法，所述的肢体节点坐标系建立方法，定义用户空间坐标系为：以用户右手方向为x轴正方向，头部正上方为y轴正方向，面向交互设备正前方为z轴正方向，两肩中心为坐标原点；Kinect空间坐标系oxyz下的坐标点P(x,y,z)与用户空间坐标系o'x'y'z'下的坐标点P'(x',y',z')的变换关系可描述为式(1)。In the gesture sequence finite state machine action recognition method, in the limb node coordinate system establishment method, the user space coordinate system is defined as: the direction of the user's right hand is the positive direction of the x-axis, and the direction directly above the head is the positive direction of the y-axis. The front of the interactive device is the positive direction of the z-axis, and the center of the shoulders is the coordinate origin; the coordinate point P(x,y,z) in the Kinect space coordinate system oxyz and the coordinates in the user space coordinate system o'x'y'z' The transformation relationship of point P'(x', y', z') can be described as formula (1).

式(1)中，O'(x₀,y₀,z₀)表示用户空间坐标系o'x'y'z'的原点坐标，θ为用户相对于传感器xoy平面的旋转角度，θ＝arctan((x_r-x_l)/(z_r-z_l))，其中x_r>x_l，-45°<θ<+45°；In formula (1), O'(x₀ ,y₀ ,z₀ ) represents the origin coordinates of the user space coordinate system o'x'y'z', θ is the rotation angle of the user relative to the sensor xoy plane, θ=arctan ((x_r -x_l )/(z_r -z_l )), where x_r >x_l , -45°<θ<+45°;

用户坐标系下单位度量描述为单位立方体网格，对于不同身高的用户需要考虑其高度和肢体长度的比例对应关系，用统一的方式对肢体动作进行描述；通过式(1)的坐标系变换后，在用户坐标系中建立当前用户特有的空间网格模型，将网格模型划分为w³的三维立方体网格，w为一维网格划分数，取值w＝11；The unit measurement in the user coordinate system is described as a unit cube grid. For users of different heights, it is necessary to consider the proportional relationship between their height and body length, and describe body movements in a unified way; after the coordinate system transformation of formula (1) , in the user coordinate system, the current user-specific spatial grid model is established, and the grid model is divided into three-dimensional cubic grids of w³ , w is the number of one-dimensional grid divisions, and the value w=11;

在用户坐标系下，以原点为中心，分别对三维方向的网格进行比例划分，x轴正负方向比例为1:1，y轴正负方向比例为3:8，z轴正负方向比例为6:5；通过计算出单位网格的边长d，用统一的方式对动作类型进行描述，根据用户相对身高比例定义网格边长，单位网格边长d可描述为d＝h/(w-1)，其中，h表示当前用户在用户坐标系下的相对高度，w为一维网格划分数；In the user coordinate system, with the origin as the center, the grid in the three-dimensional direction is divided into proportions. The ratio of the positive and negative directions of the x-axis is 1:1, the ratio of the positive and negative directions of the y-axis is 3:8, and the ratio of the positive and negative directions of the z-axis is 6:5; by calculating the side length d of the unit grid, the action type is described in a unified way, and the grid side length is defined according to the user’s relative height ratio. The unit grid side length d can be described as d=h/ (w-1), wherein, h represents the relative height of the current user in the user coordinate system, and w is the number of one-dimensional grid divisions;

建立三维网格划分模型后可以对用户坐标系中的区域以立方体网格形式进行描述，保证始终以用户为中心建立独立的用户肢体节点坐标系，从而尽可能消除用户个体差异。After the three-dimensional mesh division model is established, the area in the user coordinate system can be described in the form of a cubic grid to ensure that an independent user body node coordinate system is always centered on the user, thereby eliminating individual user differences as much as possible.

所述的姿势序列有限状态机动作识别方法，所述的肢体节点特征向量定义，肢体节点特征向量包括关节点空间运动矢量，关节点运动时间间隔和关节点空间距离，肢体节点特征向量V定义见式(2)；In the gesture sequence finite state machine action recognition method, the limb node feature vector definition, the limb node feature vector includes the joint point space motion vector, the joint point movement time interval and the joint point space distance, the limb node feature vector V is defined in Formula (2);

式(2)中，T表示动作类型，k(0≤k≤19)表示关节点索引，i(i＝0,1,…,s)表示当前采样帧，s表示对应关节点到达下一个特定采样点的结束帧，J_kⁱJ_kⁱ⁺¹表示关节点k从当前采样帧i运动到下一帧i+1的空间运动矢量，J_kⁱ表示关节点k在第i帧的空间坐标点(x_kⁱ,y_kⁱ,z_kⁱ)，Δt_k^s表示关节点k从J_k⁰坐标点通过轨迹运动到J_k^s坐标点的时间间隔，|P_mP_n|表示人体特定关节点之间的空间距离，该距离作为网格模型中的比例特征校验量；In formula (2), T represents the action type, k (0≤k≤19) represents the index of the joint point, i (i=0,1,...,s) represents the current sampling frame, and s represents the arrival of the corresponding joint point to the next specific The end frame of the sampling point, J_kⁱ J_kⁱ⁺¹ represents the spatial motion vector of the joint point k moving from the current sampling frame i to the next frame i+1, and J_kⁱ represents the space coordinate of the joint point k in the i-th frame point (x_kⁱ , y_kⁱ , z_kⁱ ), Δt_k^s represents the time interval of joint point k moving from J_k⁰ coordinate point to J_k^s coordinate point through trajectory, |P_m P_n | The spatial distance between joint points, which is used as the scale feature check in the grid model;

每个关节点定义空间运动矢量J_kⁱJ_kⁱ⁺¹，计算出肢体节点的运动方向和轨迹，动作的每一步采样点转移所用时长可以通过时间间隔Δt_k^s＝t_k^s-t_k⁰进行描述，其中，t_k⁰和t_k^s分别对应关节点k在每组起始采样帧和结束采样帧的时刻；由式(2)定义可知，J_kⁱ＝(x_kⁱ,y_kⁱ,z_kⁱ)，J_kⁱ⁺¹＝(x_kⁱ⁺¹,y_kⁱ⁺¹,z_kⁱ⁺¹)，则空间运动矢量J_kⁱJ_kⁱ⁺¹表示为：Each joint point defines the space motion vector J_kⁱ J_kⁱ⁺¹ , and calculates the movement direction and trajectory of the limb node. The time taken for each step of the sampling point transfer of the action can be passed through the time interval Δt_k^s =t_k^s -t_k⁰ , where t_k⁰ and t_k^s respectively correspond to the moment of joint point k in each group of start sampling frame and end sampling frame; defined by formula (2), J_kⁱ =(x_kⁱ ,y_kⁱ , z_kⁱ ), J_kⁱ⁺¹ = (x_kⁱ⁺¹ , y_kⁱ⁺¹ , z_kⁱ⁺¹ ), then the spatial motion vector J_kⁱ J_kⁱ⁺¹ is expressed as:

在|P_mP_n|中，P_m和P_n分别表示人体肢体部位的两端关节点，m和n分别表示将关节点集合起始和终止索引号，其中m<n，点(x_j,y_j,z_j)表示人体肢体部位对应关节点的空间坐标，j(m≤j≤n-1)表示在计算时对应关节点的索引变量，则肢体关节点之间的空间固定距离计算公式如下：In |P_m P_n |, P_m and P_n represent the joint points at both ends of the human limbs, m and n represent the starting and ending index numbers of the joint point collection, where m<n, the point (x_j , y_j , z_j ) represent the spatial coordinates of the joint points corresponding to the limbs of the human body, j (m≤j≤n-1) represents the index variable corresponding to the joint points during calculation, then the spatial fixed distance between the joint points of the limbs is calculated The formula is as follows:

根据上述定义的肢体节点特征向量参数，可以定义各种交互动作；根据躯干部位和运动特性的不同将动作类型进行分类阐述，三类代表动作的肢体节点特征向量包括右腿侧踢的肢体节点特征向量、右手划圆的肢体节点特征向量、双手水平展开的肢体节点特征向量；According to the limb node feature vector parameters defined above, various interactive actions can be defined; according to the different body parts and movement characteristics, the action types are classified and explained. The limb node feature vectors of the three types of representative actions include the limb node features of the right leg side kick vector, the eigenvector of the limb node with the right hand drawing a circle, and the eigenvector of the limb node with both hands extended horizontally;

根据式(2)，定义可表示出三类代表动作的肢体节点特征向量V(T,k)，当关节点k到达下一个特定采样点时，结束帧s被确定，将当前的特征向量作为状态转移函数的输入参数进行分析，将采样帧i置零，等待下一次的结束帧，再分析，再置零，直到最后一个采样点；According to formula (2), the definition can represent three types of limb node feature vectors V(T,k) representing actions. When the joint point k reaches the next specific sampling point, the end frame s is determined, and the current feature vector is used as Analyze the input parameters of the state transition function, set the sampling frame i to zero, wait for the next end frame, analyze again, and then set to zero until the last sampling point;

1)对于右腿侧踢动作，提取右脚关节点(k＝19)的通用特征数据，从而定义肢体节点特征向量：其中，L为腿长度；1) For the side kick action of the right leg, extract the general feature data of the joint points of the right foot (k=19), thereby defining the feature vector of the limb nodes: Among them, L is the leg length;

2)对于右手划圆动作，提取右手关节点(k＝11)的通用特征数据，从而定义肢体节点特征向量：其中D为手臂长度；2) For the right-hand circle action, extract the general feature data of the right-hand joint points (k=11), thereby defining the limb node feature vector: where D is the arm length;

3)对于双手水平展开动作，提取左/右手关节点(k＝7,11)的通用特征数据，从而定义肢体节点特征向量：3) For the horizontal movement of both hands, extract the general feature data of the left/right hand joint points (k=7,11) to define the limb node feature vector:

以此类推，采用此方法可以为其他肢体动作定义肢体节点特征向量，然后通过姿势序列有限状态机对肢体节点特征向量进行分析，从而实现动作识别。By analogy, using this method can define limb node feature vectors for other body movements, and then analyze the limb node feature vectors through gesture sequence finite state machines to realize action recognition.

所述的姿势序列有限状态机动作识别方法，所述姿势序列有限状态机构造，定义姿势序列有限状态机Λ，其五元组表示见式(3)；In the gesture sequence finite state machine action recognition method, the gesture sequence finite state machine structure defines the gesture sequence finite state machine Λ, and its five-tuple representation is shown in formula (3);

Λ＝(S,Σ,δ,s₀,F) (3)Λ=(S,Σ,δ,s₀ ,F) (3)

式(3)中，S表示状态集{s₀,s₁,…,s_n,f₀,f₁}，对动作的每个特定的姿势状态进行描述；Σ表示输入的肢体节点特征向量集和限制参数字母表其中符号表示逻辑否定；δ为转移函数，定义为S×Σ→S，表示姿势序列有限状态机从当前状态转换到后继状态；s₀表示开始状态；F＝{f₀,f₁}为最终状态集合，分别表示识别成功状态和识别无效状态；In formula (3), S represents the state set {s₀ , s₁ ,…,s_n , f₀ , f₁ }, which describes each specific posture state of the action; Σ represents the input limb node feature vector set and limit the parameter alphabet where the symbol Indicates logical negation; δ is a transfer function, defined as S×Σ→S, indicating that the pose sequence finite state machine transitions from the current state to the subsequent state; s₀ represents the starting state; F={f₀ ,f₁ } is the final state set , respectively represent the recognition success state and the recognition invalid state;

字母表Σ中，变量u代表某个动作类型对应的所有肢体节点特征向量V的集合，特征向量表示动作轨迹在空间网格中的离散点域规则，通过点域规则可以构造出动作的轨迹正则表达式；In the alphabet Σ, the variable u represents the set of feature vectors V of all limb nodes corresponding to a certain action type. The feature vector represents the discrete point domain rules of the action trajectory in the spatial grid, and the regularity of the trajectory of the action can be constructed through the point domain rules. expression;

路径限制p＝{xyz|x∈[x_min,x_max],y∈[y_min,y_max],z∈[z_min,z_max]}对特定的姿势进行关键点的范围控制，在任何情况下超出预定义路径范围，即为真，则被标记为无效状态；Path constraints p＝{xyz|x∈[x_min ,x_max ], y∈[y_min ,y_max ], z∈[z_min ,z_max ]} Control the range of key points for a specific pose, in any case outside the predefined path range, i.e. is true, it is marked as an invalid state;

时间戳t∈[t_start,t_end]规定了动作在当前状态到后继状态转移所需要的时间，若动作的某个状态在规定的时间内未转移到后继有效状态，即为真，则跳转到无效状态；The time stamp t∈[t_start ,t_end ] specifies the time required for the action to transition from the current state to the subsequent state. If a certain state of the action does not transfer to the subsequent effective state within the specified time, that is If true, jump to invalid state;

每个动作由几个典型的静态姿势构成，静态姿势对应已定义的状态量，每种状态量由关键点特征数据在空间网格中计算得到，动作状态转移必须满足路径限制p和时间戳t的条件，从而识别动作类型，理解用户交互意图；通过五元组可以描述姿势序列有限状态机的各项属性特性以及每一步转移过程，姿势序列有限状态机运行过程：在初始状态s₀下，按照预定义的动作达到第一个有效状态s₁，如果下一时刻的姿势仍然在预定义的范围内，则达到后继有效状态s_k，以此类推，直到达到成功状态f₀，即识别动作成功；在初始和任意有效状态下，如果行为超出路径限制或时间戳范围，则直接标记该序列动作为无效状态，即识别动作失败。在达到任意结束状态后，当前的姿势序列有限列状态机运行完毕，重新初始化进行下一组肢体动作的识别。Each action is composed of several typical static poses. Static poses correspond to defined state quantities. Each state quantity is calculated from the key point feature data in the spatial grid. The action state transition must satisfy the path constraint p and time stamp t Conditions, so as to identify the type of action and understand the user's interaction intention; through the five-tuple, the attributes and characteristics of the gesture sequence finite state machine and the transfer process of each step can be described. The operation process of the gesture sequence finite state machine: in the initial state s₀ , Follow the predefined actions to reach the first valid state s₁ , if the posture at the next moment is still within the predefined range, then reach the subsequent valid state s_k , and so on, until the successful state f₀ is reached, that is, the recognition action Success; in the initial and any valid state, if the action exceeds the path limit or the time stamp range, the sequence action is directly marked as an invalid state, that is, the recognition action fails. After reaching any end state, the current gesture sequence finite-sequence state machine finishes running, and is re-initialized for the recognition of the next group of body movements.

本发明提出了一种姿势序列有限状态机动作识别方法，实现了对预定义动作的快速识别。本发明方法采用肢体节点特征向量描述肢体动作特征数据，对预定义肢体动作序列进行采样分析，建立肢体动作轨迹正则表达式，构造出姿势序列有限状态机，实现肢体动作识别。本发明方法可对任意动作或手势进行整体描述，不需要离线训练和学习，通用性和扩展性较强，且对简单和连续动作的识别准确率高，实时性好，满足体感交互应用需求。The invention proposes an action recognition method of a posture sequence finite state machine, which realizes rapid identification of predefined actions. The method of the invention uses the feature vector of the limb node to describe the characteristic data of the limb movement, samples and analyzes the predefined limb movement sequence, establishes a regular expression of the limb movement trajectory, constructs a posture sequence finite state machine, and realizes the recognition of the limb movement. The method of the invention can describe any action or gesture as a whole, does not require offline training and learning, has strong versatility and scalability, has high recognition accuracy for simple and continuous actions, and has good real-time performance, meeting the needs of somatosensory interactive applications.

本发明方法的优势主要有：1)动作识别准确率高，通过对30名不同身高和体形的用户进行重复三次样本测试，识别准确率在94％以上；2)动作识别反馈时间快，实际测试识别反馈时间在0.060s-0.096s之间，小于0.1s；3)在扩展动作类型时不需要收集大量动作数据进行离线训练，对于特定动作只需要定义肢体动作的轨迹正则表达式，通用性和扩展性强；4)本发明提出的姿势序列有限状态机模型中定义了初始和结束状态，通过肢体节点特征向量进行实时分析，能够处理未分割的动作数据流；5)姿势序列有限状态机是一种以离散姿势序列来拟合连续动作轨迹的方法，因此，适用于简单和连续动作的识别。但本发明方法在鲁棒性上表现不够好，主要由于在识别过程中任一状态不符合预定义规则即被视为无效状态，因此，姿势序列识别较为敏感，需要用户动作在个人风格范围内尽量规范。The advantages of the method of the present invention mainly include: 1) the accuracy of motion recognition is high, and the recognition accuracy rate is above 94% by repeating three sample tests on 30 users of different heights and shapes; 2) the feedback time of motion recognition is fast, and the actual test The recognition feedback time is between 0.060s-0.096s, which is less than 0.1s; 3) It is not necessary to collect a large amount of motion data for offline training when expanding the type of motion. Strong expansibility; 4) the initial and end states are defined in the posture sequence finite state machine model proposed by the present invention, which can be analyzed in real time through the feature vectors of limb nodes and can handle undivided action data streams; 5) the posture sequence finite state machine is A method for fitting continuous action trajectories to discrete pose sequences, and thus, suitable for simple and continuous action recognition. However, the performance of the method of the present invention is not good enough in terms of robustness, mainly because any state in the recognition process does not meet the predefined rules and is regarded as an invalid state. Therefore, gesture sequence recognition is relatively sensitive and requires the user's actions to be within the scope of personal style Be as standardized as possible.

附图说明Description of drawings

图1为姿势序列有限状态机动作识别方法框架；Fig. 1 is the framework of gesture sequence finite state machine action recognition method;

图2为空间坐标系转换；a，用户空间坐标系，b，Kinect坐标系下用户旋转俯视图；Fig. 2 is the space coordinate system transformation; a, the user space coordinate system, b, the user rotation top view under the Kinect coordinate system;

图3为空间网格划分在xoy平面的截面示意图；Fig. 3 is a schematic cross-sectional view of spatial grid division on the xoy plane;

图4为用户空间坐标系下的特征数据表示；Fig. 4 is the feature data representation under the user space coordinate system;

图5为三类代表动作肢体节点特征向量示意图；a，腿部动作-右腿侧踢，b，单手动作-右手划圆，c，双手动作-双手水平展开；Figure 5 is a schematic diagram of the feature vectors of limb nodes representing three types of actions; a, leg action-right leg side kick, b, one-handed action-right hand drawing a circle, c, two-handed action-hands spread horizontally;

图6为姿势序列有限状态机原型；Fig. 6 is the prototype of the posture sequence finite state machine;

图7为部分动作轨迹示意图；Fig. 7 is a schematic diagram of part of the motion trajectory;

图8为动作的姿势序列有限状态机；Fig. 8 is the pose sequence finite state machine of action;

图9为动作识别实现效果；a腿部动作；b，左/右手划圆；c左/右手向上举d左/右手向下按，e双手斜向上推；f双手斜向下推；g双手水平展开h双手水平收缩；i左/右手水平滑动；Figure 9 shows the effect of action recognition; a leg movement; b, left/right hand draws a circle; c left/right hand lifts up d left/right hand presses down, e both hands push up obliquely; f both hands push down obliquely; g both hands Horizontally expand hHands shrink horizontally; iSlide left/right hand horizontally;

具体实施方式detailed description

以下结合具体实施例，对本发明进行详细说明。The present invention will be described in detail below in conjunction with specific embodiments.

为了避免以往肢体动作识别方法的不易扩展和识别效率低等不足，本发明提出一种姿势序列有限状态机动作识别方法。特定的肢体动作可以看作由多个姿势在时间轴上的一组运动序列，即姿势序列来描述。提出的姿势序列有限状态机动作识别方法主要采用肢体节点特征向量描述肢体动作特征数据，通过对预定义的肢体动作序列进行采样分析，建立肢体动作的轨迹正则表达式，通过轨迹正则表达式构造姿势序列有限状态机，从而实现对肢体动作的分析和识别。In order to avoid the disadvantages of difficult expansion and low recognition efficiency of the previous body action recognition methods, the present invention proposes a posture sequence finite state machine action recognition method. A specific body movement can be viewed as a set of motion sequences of multiple poses on the time axis, ie, a pose sequence. The proposed posture sequence finite state machine action recognition method mainly uses the limb node feature vector to describe the characteristic data of the limb movement. By sampling and analyzing the predefined body movement sequence, the trajectory regular expression of the limb movement is established, and the posture is constructed through the trajectory regular expression. Sequence finite state machine, so as to realize the analysis and recognition of body movements.

1姿势序列有限状态机动作识别方法1 Pose Sequence Finite State Machine Action Recognition Method

姿势序列有限状态机动作识别方法框架如图1所示。首先，将体感交互设备获得的肢体节点数据变换到以用户为中心的肢体节点坐标系，通过定义肢体节点特征向量，对肢体动作序列进行采样分析，建立预定义动作轨迹正则表达式，构造姿势序列有限状态机，从而实现对预定义肢体动作的解析和识别。The framework of gesture sequence finite state machine action recognition method is shown in Figure 1. Firstly, transform the limb node data obtained by the somatosensory interactive device into the user-centered limb node coordinate system, by defining the limb node feature vector, sample and analyze the limb movement sequence, establish a predefined movement trajectory regular expression, and construct the posture sequence Finite state machine, so as to realize the parsing and recognition of predefined body movements.

1.1肢体节点坐标系建立1.1 Limb node coordinate system establishment

为了尽可能消除用户个体差异，需要将用户动作的空间描述从设备空间坐标系转换到用户空间坐标系，建立符合用户个体属性的交互动作特征。本发明定义用户空间坐标系为：以用户右手方向为x轴正方向，头部正上方为y轴正方向，面向交互设备正前方为z轴正方向，两肩中心为坐标原点。由于在动作识别过程中，用户身体正方向不一定与交互设备平面垂直，因此，需要对获取的用户肢体节点数据进行变换，建立用户肢体节点坐标系。空间坐标系变换如图2所示，图2a描述了用户空间坐标系，O'表示用户空间坐标系o'x'y'z'的原点，图2b描述了Kinect空间坐标系下用户绕y轴进行旋转的俯视图，L(x_l,z_l)表示Kinect空间坐标系中的用户左肩映射坐标点，R(x_r,z_r)表示用户右肩映射坐标点，θ表示用户相对于设备xoy平面的旋转角度(-45°<θ<+45°)。In order to eliminate individual user differences as much as possible, it is necessary to transform the spatial description of user actions from the device space coordinate system to the user space coordinate system, and establish interactive action features that conform to the user's individual attributes. The present invention defines the user space coordinate system as follows: the direction of the user's right hand is the positive direction of the x-axis, the positive direction of the y-axis is directly above the head, the positive direction of the z-axis is the direction facing the front of the interactive device, and the center of the two shoulders is the coordinate origin. Since the positive direction of the user's body is not necessarily perpendicular to the plane of the interactive device during the action recognition process, it is necessary to transform the acquired user's body node data to establish the user's body node coordinate system. The space coordinate system transformation is shown in Figure 2. Figure 2a describes the user space coordinate system, O' represents the origin of the user space coordinate system o'x'y'z', and Figure 2b describes the user's rotation around the y-axis in the Kinect space coordinate system Rotating top view, L(x_l , z_l ) represents the mapping coordinate point of the user's left shoulder in the Kinect space coordinate system, R(x_r , z_r ) represents the mapping coordinate point of the user's right shoulder, θ represents the user's relative to the xoy plane of the device The rotation angle of (-45°<θ<+45°).

由于获取的肢体节点数据是镜面对称的^[15]，因此，Kinect空间坐标系oxyz下的坐标点P(x,y,z)与用户空间坐标系o'x'y'z'下的坐标点P'(x',y',z')的变换关系可描述为式(1)。Since the obtained limb node data is mirror-symmetric^[15] , the coordinate point P(x,y,z) in the Kinect space coordinate system oxyz is the same as the coordinate point in the user space coordinate system o'x'y'z' The transformation relationship of P'(x', y', z') can be described as formula (1).

式(1)中，O'(x₀,y₀,z₀)表示用户空间坐标系o'x'y'z'的原点坐标，θ为用户相对于传感器xoy平面的旋转角度，θ＝arctan((x_r-x_l)/(z_r-z_l))，其中x_r>x_l，-45°<θ<+45°。In formula (1), O'(x₀ ,y₀ ,z₀ ) represents the origin coordinates of the user space coordinate system o'x'y'z', θ is the rotation angle of the user relative to the sensor xoy plane, θ=arctan ((x_r -x_l )/(z_r -z_l )), where x_r >x_l , -45°<θ<+45°.

用户坐标系下单位度量描述为单位立方体网格，对于不同身高的用户需要考虑其高度和肢体长度的比例对应关系，用统一的方式对肢体动作进行描述。通过式(1)的坐标系变换后，在用户坐标系中建立当前用户特有的空间网格模型，本发明中将网格模型划分为w³的三维立方体网格(w为一维网格划分数，本发明经验取值w＝11)，空间网格在xoy平面的截面示意图如图3所示。The unit measurement in the user coordinate system is described as a unit cube grid. For users of different heights, it is necessary to consider the proportional relationship between their height and body length, and describe body movements in a unified way. After the coordinate system transformation by formula (1), in the user coordinate system, the current user-specific spatial grid model is established, and the grid model is divided into three-dimensional cube grids of w³ in the present invention (w is a one-dimensional grid division number, the empirical value of the present invention is w=11), and the cross-sectional schematic diagram of the spatial grid on the xoy plane is shown in FIG. 3 .

如图3所示。在用户坐标系下，以原点为中心，分别对三维方向的网格进行比例划分，x轴正负方向比例为1:1，y轴正负方向比例为3:8，z轴正负方向比例为6:5。通过计算出单位网格的边长d，用统一的方式对动作类型进行描述，本发明根据用户相对身高比例定义网格边长，单位网格边长d可描述为d＝h/(w-1)，其中，h表示当前用户在用户坐标系下的相对高度，w为一维网格划分数。As shown in Figure 3. In the user coordinate system, with the origin as the center, the grid in the three-dimensional direction is divided into proportions. The ratio of the positive and negative directions of the x-axis is 1:1, the ratio of the positive and negative directions of the y-axis is 3:8, and the ratio of the positive and negative directions of the z-axis It is 6:5. By calculating the side length d of the unit grid, the action type is described in a unified manner. The present invention defines the grid side length according to the user's relative height ratio, and the unit grid side length d can be described as d=h/(w- 1), where h represents the relative height of the current user in the user coordinate system, and w is the number of one-dimensional grid divisions.

1.2肢体节点特征向量定义1.2 Definition of limb node feature vector

动作在某一时间点的状态为静态姿势，人体某一关节或多个关节点在空间的运动序列为动态行为^[16]。识别动作之前，需要在用户空间坐标系下描述通用特征数据，通用特征数据一般包括相对关节点的三维坐标信息，关节点空间运动矢量，关节点间空间距离等，肢体动作特征数据描述如图4所示。The state of an action at a certain time point is a static posture, and the motion sequence of a certain joint or multiple joint points of the human body in space is a dynamic behavior^[16] . Before recognizing the action, it is necessary to describe the general feature data in the user space coordinate system. The general feature data generally includes the three-dimensional coordinate information of the relative joint points, the space motion vector of the joint points, the space distance between the joint points, etc. The description of the body movement feature data is shown in Figure 4 shown.

本发明定义肢体节点特征向量来描述动作特征数据，通过对肢体节点特征向量参数的计算和分析，实现对多个特定姿势组合形成的动态序列的识别，即对肢体动作的识别。肢体节点特征向量包括关节点空间运动矢量，关节点运动时间间隔和关节点空间距离，肢体节点特征向量V定义见式(2)。The present invention defines limb node feature vectors to describe action feature data, and realizes recognition of dynamic sequences formed by combining multiple specific postures, that is, recognition of body movements, through calculation and analysis of limb node feature vector parameters. Limb node feature vectors include joint point space motion vectors, joint point motion time intervals and joint point space distances. The definition of limb node feature vector V is shown in formula (2).

式(2)中，T表示动作类型，k(0≤k≤19)表示关节点索引，i(i＝0,1,…,s)表示当前采样帧，s表示对应关节点到达下一个特定采样点的结束帧，J_kⁱJ_kⁱ⁺¹表示关节点k从当前采样帧i运动到下一帧i+1的空间运动矢量，J_kⁱ表示关节点k在第i帧的空间坐标点(x_kⁱ,y_kⁱ,z_kⁱ)，Δt_k^s表示关节点k从J_k⁰坐标点通过轨迹运动到J_k^s坐标点的时间间隔，|P_mP_n|表示人体特定关节点之间的空间距离，该距离作为网格模型中的比例特征校验量。In formula (2), T represents the action type, k (0≤k≤19) represents the index of the joint point, i (i=0,1,...,s) represents the current sampling frame, and s represents the arrival of the corresponding joint point to the next specific The end frame of the sampling point, J_kⁱ J_kⁱ⁺¹ represents the spatial motion vector of the joint point k moving from the current sampling frame i to the next frame i+1, and J_kⁱ represents the space coordinate of the joint point k in the i-th frame point (x_kⁱ , y_kⁱ , z_kⁱ ), Δt_k^s represents the time interval of joint point k moving from J_k⁰ coordinate point to J_k^s coordinate point through trajectory, |P_m P_n | The spatial distance between joint points, which is used as the scale feature check in the mesh model.

每个关节点定义空间运动矢量J_kⁱJ_kⁱ⁺¹，计算出肢体节点的运动方向和轨迹，动作的每一步采样点转移所用时长可以通过时间间隔Δt_k^s＝t_k^s-t_k⁰进行描述，其中，t_k⁰和t_k^s分别对应关节点k在每组起始采样帧和结束采样帧的时刻。由式(2)定义可知，J_kⁱ＝(x_kⁱ,y_kⁱ,z_kⁱ)，J_kⁱ⁺¹＝(x_kⁱ⁺¹,y_kⁱ⁺¹,z_kⁱ⁺¹)，则空间运动矢量J_kⁱJ_kⁱ⁺¹表示为：Each joint point defines the space motion vector J_kⁱ J_kⁱ⁺¹ , and calculates the movement direction and trajectory of the limb node. The time taken for each step of the sampling point transfer of the action can be passed through the time interval Δt_k^s =t_k^s -t_k⁰ , where t_k⁰ and t_k^s correspond to the moment of the joint point k in each group of start sampling frame and end sampling frame respectively. According to the definition of formula (2), J_kⁱ =(x_kⁱ ,y_kⁱ ,z_kⁱ ), J_kⁱ⁺¹ =(x_kⁱ⁺¹ ,y_kⁱ⁺¹ ,z_kⁱ⁺¹ ), then the spatial motion vector J_kⁱ J_kⁱ⁺¹ is expressed as:

根据上述定义的肢体节点特征向量参数，可以定义各种交互动作。根据躯干部位和运动特性的不同将动作类型进行分类阐述，三类代表动作的肢体节点特征向量如图5所示，图5a为右腿侧踢的肢体节点特征向量示意图，图5b为右手划圆的肢体节点特征向量示意图，图5c为双手水平展开的肢体节点特征向量示意图。According to the limb node feature vector parameters defined above, various interactive actions can be defined. According to the different parts of the torso and movement characteristics, the action types are classified and explained. The eigenvectors of the limb nodes of the three types of representative actions are shown in Figure 5. Figure 5a is a schematic diagram of the eigenvectors of the limb nodes of the right leg kicking sideways, and Figure 5b is the drawing of the right hand. Fig. 5c is a schematic diagram of limb node feature vectors with both hands spread out horizontally.

根据式(2)，定义可表示出三类代表动作的肢体节点特征向量V(T,k)，当关节点k到达下一个特定采样点时，结束帧s被确定，将当前的特征向量作为状态转移函数的输入参数进行分析，将采样帧i置零，等待下一次的结束帧，再分析，再置零，直到最后一个采样点。According to formula (2), the definition can represent three types of limb node feature vectors V(T,k) representing actions. When the joint point k reaches the next specific sampling point, the end frame s is determined, and the current feature vector is used as Analyze the input parameters of the state transition function, set the sampling frame i to zero, wait for the next end frame, analyze again, and then set to zero until the last sampling point.

1)对于图5a中的右腿侧踢动作，提取右脚关节点(k＝19)的通用特征数据，从而定义肢体节点特征向量：其中，L为腿长度。1) For the side kick action of the right leg in Figure 5a, extract the general feature data of the joint points of the right foot (k=19), thereby defining the feature vector of the limb nodes: where L is the leg length.

2)对于图5b中的右手划圆动作，提取右手关节点(k＝11)的通用特征数据，从而定义肢体节点特征向量：其中D为手臂长度。2) For the right hand circle action in Figure 5b, extract the general feature data of the right hand joint point (k=11), thereby defining the limb node feature vector: where D is the arm length.

3)对于图5c中的双手水平展开动作，提取左/右手关节点(k＝7,11)的通用特征数据，从而定义肢体节点特征向量：3) For the horizontal spread of both hands in Figure 5c, the general feature data of the left/right hand joint points (k=7, 11) are extracted to define the limb node feature vector:

1.3姿势序列有限状态机构造1.3 Pose sequence finite state machine construction

针对人体自然交互动作具有多样性和多变性特点^[17]，需要一种通用且高效的方法识别动作。每个动作由对应的肢体关节点的连续运动轨迹构成，连续的运动轨迹可以由离散的关键点进行拟合，每个关键点对应特定的姿势状态，通过识别每个状态的转移变化过程，可以实现动作的判定。基于上述思想，本发明提出姿势序列有限状态机方法识别预定义的肢体动作。姿势序列表示一个动作由多个姿势在时间轴上描述的一组运动序列，姿势序列有限状态机描述了每个动作的有限个状态以及各个状态之间的转移过程，本发明定义了姿势序列有限状态机Λ，其五元组表示见式(3)。In view of the diversity and variability of human natural interaction actions^[17] , a general and efficient method for action recognition is needed. Each action is composed of the continuous motion trajectory of the corresponding limb joint points. The continuous motion trajectory can be fitted by discrete key points. Each key point corresponds to a specific posture state. By identifying the transition process of each state, you can Realize action judgment. Based on the above ideas, the present invention proposes a posture sequence finite state machine method to recognize predefined body movements. A gesture sequence represents a group of motion sequences in which an action is described by multiple gestures on the time axis. The gesture sequence finite state machine describes the finite number of states of each action and the transition process between each state. The present invention defines a finite number of gesture sequences. State machine Λ, its five-tuple representation is shown in formula (3).

Λ＝(S,Σ,δ,s₀,F) (3)Λ=(S,Σ,δ,s₀ ,F) (3)

式(3)中，S表示状态集{s₀,s₁,…,s_n,f₀,f₁}，对动作的每个特定的姿势状态进行描述；Σ表示输入的肢体节点特征向量集和限制参数字母表其中符号表示逻辑否定；δ为转移函数，定义为S×Σ→S，表示姿势序列有限状态机从当前状态转换到后继状态；s₀表示开始状态；F＝{f₀,f₁}为最终状态集合，分别表示识别成功状态和识别无效状态。In formula (3), S represents the state set {s₀ , s₁ ,…,s_n , f₀ , f₁ }, which describes each specific posture state of the action; Σ represents the input limb node feature vector set and limit the parameter alphabet where the symbol Indicates logical negation; δ is a transfer function, defined as S×Σ→S, indicating that the pose sequence finite state machine transitions from the current state to the subsequent state; s₀ represents the starting state; F={f₀ ,f₁ } is the final state set , which respectively represent the recognition success state and the recognition invalid state.

字母表Σ中，变量u代表某个动作类型对应的所有肢体节点特征向量V的集合，特征向量表示动作轨迹在空间网格中的离散点域规则，通过点域规则可以构造出动作的轨迹正则表达式。In the alphabet Σ, the variable u represents the set of feature vectors V of all limb nodes corresponding to a certain action type. The feature vector represents the discrete point domain rules of the action trajectory in the spatial grid, and the regularity of the trajectory of the action can be constructed through the point domain rules. expression.

路径限制p＝{xyz|x∈[x_min,x_max],y∈[y_min,y_max],z∈[z_min,z_max]}对特定的姿势进行关键点的范围控制，在任何情况下超出预定义路径范围，即为真，则被标记为无效状态。Path constraints p＝{xyz|x∈[x_min ,x_max ], y∈[y_min ,y_max ], z∈[z_min ,z_max ]} Control the range of key points for a specific pose, in any case outside the predefined path range, i.e. is true, it is marked as invalid.

时间戳t∈[t_start,t_end]规定了动作在当前状态到后继状态转移所需要的时间，若动作的某个状态在规定的时间内未转移到后继有效状态，即为真，则跳转到无效状态。The time stamp t∈[t_start ,t_end ] specifies the time required for the action to transition from the current state to the subsequent state. If a certain state of the action does not transfer to the subsequent effective state within the specified time, that is If true, jump to invalid state.

每个动作由几个典型的静态姿势构成，静态姿势对应已定义的状态量，每种状态量由关键点特征数据在空间网格中计算得到，动作状态转移必须满足路径限制p和时间戳t的条件，从而识别动作类型，理解用户交互意图。通过五元组可以描述姿势序列有限状态机的各项属性特性以及每一步转移过程，姿势序列有限状态机运行过程的状态图模型如图6所示。Each action is composed of several typical static poses. Static poses correspond to defined state quantities. Each state quantity is calculated from the key point feature data in the spatial grid. The action state transition must satisfy the path constraint p and time stamp t conditions to identify action types and understand user interaction intentions. The attributes and characteristics of the gesture sequence finite state machine and the transfer process of each step can be described by the quintuple. The state diagram model of the posture sequence finite state machine is shown in Figure 6.

在初始状态s₀下，按照预定义的动作达到第一个有效状态s₁，如果下一时刻的姿势仍然在预定义的范围内，则达到后继有效状态s_k，以此类推，直到达到成功状态f₀，即识别动作成功。在初始和任意有效状态下，如果行为超出路径限制或时间戳范围，则直接标记该序列动作为无效状态，即识别动作失败。在达到任意结束状态后，当前的姿势序列有限列状态机运行完毕，重新初始化进行下一组肢体动作的识别。通过姿势序列有限状态机的状态图可以得到状态转移表1。In the initial state s₀ , follow the predefined actions to reach the first valid state s₁ , if the posture at the next moment is still within the predefined range, then reach the subsequent valid state s_k , and so on until success is achieved State f₀ , that is, the recognition action is successful. In the initial and any valid state, if the action exceeds the path limit or the time stamp range, the sequence action is directly marked as an invalid state, that is, the recognition action fails. After reaching any end state, the current gesture sequence finite-sequence state machine finishes running, and is re-initialized for the recognition of the next group of body movements. The state transition table 1 can be obtained through the state diagram of the pose sequence finite state machine.

表1姿势序列有限状态机状态转移表Table 1 Posture sequence finite state machine state transition table

表中，k,x＝0,1,2,…,n，且k≠k+x≤n，n表示识别动作所需要的中间有效状态总数。该姿势序列有限状态机接受的点域规则为In the table, k, x=0, 1, 2,..., n, and k≠k+x≤n, n represents the total number of intermediate effective states required for the recognition action. The point field rule accepted by the pose sequence finite state machine is

由式(3)定义，采用姿势序列有限状态机对1.2节中的动作进行描述。输入字母表Σ＝{a_i,b_i,c_i,d_i,e_i,f_i,g_i,h_i,m_i}，其中i＝0,1，每个字母变量描述了采样点在空间网格模型中所关联的空间范围，即点域。运动轨迹可采用多个点域进行拟合，点域字符串构成了对某个动作轨迹的离散化描述。如图7所示，x负方向空间中的点域用下标i＝0表示，x正方向空间中的点域用下标i＝1表示。Defined by Equation (3), the gesture sequence finite state machine is used to describe the actions in Section 1.2. Input the alphabet Σ＝{a_i ,b_i ,c_i ,d_i ,e_i ,f_i ,g_i ,h_i ,m_i }, where i=0,1, each letter variable describes the sampling point at The associated spatial range in the spatial grid model, that is, the point domain. The motion trajectory can be fitted with multiple dot fields, and the dot field string constitutes a discretized description of an action trajectory. As shown in FIG. 7 , the point field in the negative x direction space is represented by the subscript i=0, and the point field in the positive x direction space is represented by the subscript i=1.

针对1.2节中给出的三类代表动作，分别为右腿侧踢、右手划圆和双手水平展开，部分动作轨迹示意图如图7所示，表2给出了三类代表动作的特征向量点域字符串。For the three types of representative actions given in Section 1.2, they are side kicking with the right leg, drawing a circle with the right hand, and horizontal expansion of both hands. The schematic diagram of some action trajectories is shown in Figure 7. Table 2 gives the feature vector points of the three types of representative actions domain string.

表2三类代表动作的特征向量点域字符串Table 2 Three types of feature vector dot field strings representing actions

通过动作的特征向量点域字符串可以提取到动作的轨迹正则表达式。在字母表Σ中，Σ_i(1-i)＝Σ_i∧Σ_1-i，其中i＝0,1，表示沿yoz平面对称的空间点域同时成立。动作的轨迹正则表达式整理简化后见式(4)。The track regular expression of the action can be extracted through the feature vector dot field string of the action. In the alphabet Σ, Σ_i(1-i) ＝Σ_i ∧Σ_1-i , where i=0,1, means that the spatial point domains symmetrical along the yoz plane hold simultaneously. After sorting and simplifying the regular expression of the trajectory of the action, see formula (4).

R＝a_ic_i|d_ie_1-if_ig_ih_i|d_i(1-i)e_i(1-i)h_i(1-i) (4)R＝a_i c_i |d_i e_1-i f_i g_i h_i |d_i(1-i) e_i(1-i) h_i(1-i) (4)

根据轨迹正则表达式(4)得出三类代表动作及其对称动作类型的有限状态机图，如图8所示，其中，省略了f₁无效状态，s₀为初始状态，s_k为过渡有效状态，f₀状态代表可接受点域字符串的成功状态。在初始和任意有效状态下，如果动作姿势超出路径限制或时间戳范围，则直接标记该系列动作为无效状态f₁，即识别动作失败。在达到任意结束状态后，当前动作的姿势序列有限状态机运行完毕，重新初始化进行下一组动作行为的识别。According to the trajectory regular expression (4), the finite state machine diagrams of three types of representative actions and their symmetrical action types are obtained, as shown in Figure 8, where the invalid state of f₁ is omitted, s₀ is the initial state, and s_k is the transition Valid status, f₀ status represents a successful status for acceptable dotfield strings. In the initial and any valid state, if the action gesture exceeds the path limit or the time stamp range, the series of actions will be directly marked as an invalid state f₁ , that is, the action recognition fails. After reaching any end state, the pose sequence finite state machine of the current action is finished running, and it is re-initialized for the recognition of the next set of action behaviors.

在姿势序列有限状态机运行过程中需要同步对动作集合进行优化处理，优化算法步骤如下：During the operation of the posture sequence finite state machine, it is necessary to optimize the action set synchronously. The optimization algorithm steps are as follows:

1)初始化所有可能动作的集合{T}＝{“右腿侧踢”，“右手划圆”，“双手水平展开”，…}，其中每个特定的动作对应了多个点域字符串表达式；1) Initialize the set of all possible actions {T}={"kick with the right leg", "draw a circle with the right hand", "spread both hands horizontally", ...}, where each specific action corresponds to multiple dot-field string expressions Mode;

2)在姿势序列有限状态机运行过程中，每一步状态转移后将所有不可能的动作从集合中排除，所有可能的动作在集合中进行保留；2) During the operation of the posture sequence finite state machine, all impossible actions are excluded from the set after each step of state transition, and all possible actions are retained in the set;

3)当到达结束状态时，如果最终状态为无效状态，则无任何输出，并重新开始；如果为可接受的成功状态，则当前集合的唯一元素即为动作类型，进行动作类型输出，跳转到步骤1，循环识别。3) When the end state is reached, if the final state is an invalid state, there will be no output and start again; if it is an acceptable successful state, then the only element of the current set is the action type, output the action type, and jump Go to step 1, loop recognition.

最后，由用户定义交互动作的语义和用途，用户通过得到的动作类型赋予新的语义，如左右抬腿代表场景漫游，左右手滑动代表文档翻页，双手水平展开代表拉开帷幕等，从而实现体感交互应用。Finally, the user defines the semantics and purpose of the interactive action. The user gives new semantics through the obtained action type, such as raising the left and right legs to represent the scene roaming, sliding the left and right hands to represent the page of the document, and spreading the hands horizontally to represent the opening of the curtain, etc., so as to realize the body feeling interactive application.

2实验结果与分析2 Experimental results and analysis

2.1实验测试及结果2.1 Experimental tests and results

根据本发明提出的姿势序列有限状态机模型和算法实现了17种肢体动作的识别，并在Intel Xeon CPU(2.53GHz)X3440，4GB内存的Windows 7x64系统下进行了测试。肢体动作定义见表3，根据运动特性和躯干部位的不同将动作类别分为腿部动作、单手动作和双手动作。According to the pose sequence finite state machine model and algorithm proposed by the present invention, the recognition of 17 kinds of body movements has been realized, and the test has been carried out under the Windows 7x64 system of Intel Xeon CPU (2.53GHz) X3440 and 4GB memory. The definition of body movements is shown in Table 3, and the movement categories are divided into leg movements, one-hand movements and two-hand movements according to the movement characteristics and body parts.

表3肢体动作定义表Table 3 Definition table of body movements

图9展示了17种预定义肢体动作动态识别过程，图中人体胸前的绿点表示识别过程的部分状态，红点表示识别成功状态。Figure 9 shows the dynamic recognition process of 17 predefined body movements. The green dot on the chest of the human body in the figure indicates the partial status of the recognition process, and the red dot indicates the successful status of the recognition.

对30名不同身高和体形的志愿者进行实验测试，每名待测者对每一种肢体动作进行三次重复样本测试，总共1530个动作实例。动作识别测试结果混淆矩阵见表4，其中None表示未检测到任何动作。The experimental test was carried out on 30 volunteers of different heights and body shapes, and each subject performed three repeated sample tests on each body movement, with a total of 1530 movement examples. The confusion matrix of the action recognition test results is shown in Table 4, where None means no action was detected.

表4动作识别测试结果混淆矩阵Table 4 Confusion matrix of action recognition test results

TTAABBCCDD.EE.FfGGHhIIJJKKLLMmNNOoPPQQNonoAA11000.0.0.0.0.0.0.0.0.0.000000000.0.0.0.0.0.0.0.0.0.0.0.0.0.BB00110.0.0.0.0.0.0.0.0.0.000000000.0.0.0.0.0.0.0.0.0.0.0.0.0.CC000098980.0.1.1.0.0.0.0.000000000.0.0.0.0.0.0.0.0.0.0.0.0.0.DD.00000.0.98981.1.0.0.0.0.000000000.0.0.0.0.0.0.0.0.0.0.0.0.0.EE.00000.0.0.0.94940.0.0.0.000000000.0.0.0.0.0.0.0.0.0.0.0.5.5.Ff00000.0.0.0.0.0.97970.0.110000000.0.0.0.0.0.0.0.0.0.0.0.1.1.GG00000.0.0.0.0.0.0.0.9797001100000.0.0.0.0.0.0.0.0.0.0.0.1.1.Hh00000.0.0.0.0.0.0.0.0.0.110000000.0.0.0.0.0.0.0.0.0.0.0.0.0.II00000.0.0.0.0.0.0.0.0.0.001100000.0.0.0.0.0.0.0.0.0.0.0.0.0.JJ00000.0.0.0.0.0.0.0.0.0.000011000.0.0.0.0.0.0.0.0.0.0.0.0.0.KK00000.0.0.0.0.0.0.0.0.0.000000110.0.0.0.0.0.0.0.0.0.0.0.0.0.LL00000.0.0.0.0.0.0.0.0.0.2200000096960.0.0.0.0.0.0.0.0.0.1.1.Mm00000.0.0.0.0.0.0.0.0.0.003300000.0.95950.0.0.0.0.0.0.0.1.1.NN00000.0.0.0.0.0.0.0.0.0.000000000.0.0.0.96960.0.0.0.0.0.3.3.Oo00000.0.0.0.0.0.0.0.0.0.000000000.0.0.0.0.0.94940.0.0.0.5.5.PP00000.0.0.0.0.0.0.0.0.0.000000000.0.0.0.0.0.2.2.95950.0.2.2.QQ00000.0.0.0.0.0.0.0.0.0.000000000.0.0.0.0.0.0.0.0.0.98981.1.TTAABBCCDD.EE.FfGGHhIIJJKKLLMmNNOoPPQQNonoAA11000.0.0.0.0.0.0.0.0.0.000000000.0.0.0.0.0.0.0.0.0.0.0.0.0.BB00110.0.0.0.0.0.0.0.0.0.000000000.0.0.0.0.0.0.0.0.0.0.0.0.0.CC000098980.0.1.1.0.0.0.0.000000000.0.0.0.0.0.0.0.0.0.0.0.0.0.DD.00000.0.98981.1.0.0.0.0.000000000.0.0.0.0.0.0.0.0.0.0.0.0.0.EE.00000.0.0.0.94940.0.0.0.000000000.0.0.0.0.0.0.0.0.0.0.0.5.5.Ff00000.0.0.0.0.0.97970.0.110000000.0.0.0.0.0.0.0.0.0.0.0.1.1.GG00000.0.0.0.0.0.0.0.9797001100000.0.0.0.0.0.0.0.0.0.0.0.1.1.Hh00000.0.0.0.0.0.0.0.0.0.110000000.0.0.0.0.0.0.0.0.0.0.0.0.0.II00000.0.0.0.0.0.0.0.0.0.001100000.0.0.0.0.0.0.0.0.0.0.0.0.0.JJ00000.0.0.0.0.0.0.0.0.0.000011000.0.0.0.0.0.0.0.0.0.0.0.0.0.KK00000.0.0.0.0.0.0.0.0.0.000000110.0.0.0.0.0.0.0.0.0.0.0.0.0.LL00000.0.0.0.0.0.0.0.0.0.2200000096960.0.0.0.0.0.0.0.0.0.1.1.Mm00000.0.0.0.0.0.0.0.0.0.003300000.0.95950.0.0.0.0.0.0.0.1.1.NN00000.0.0.0.0.0.0.0.0.0.000000000.0.0.0.96960.0.0.0.0.0.3.3.Oo00000.0.0.0.0.0.0.0.0.0.000000000.0.0.0.0.0.94940.0.0.0.5.5.PP00000.0.0.0.0.0.0.0.0.0.000000000.0.0.0.0.0.2.2.95950.0.2.2.QQ00000.0.0.0.0.0.0.0.0.0.000000000.0.0.0.0.0.0.0.0.0.98981.1.

测试结果表明，动作识别率均在94％以上，并且大部分动作识别率达到100％，平均识别率达98％，能够满足体感交互应用的需求。其中，双脚起跳(E)在没有明显起跳幅度的情况下无法识别，左/右划圆(L、M)动作可能会被误识别为左/右手向上举(H、I)的动作，双手斜向上和向下推(N、O)动作可能出现未识别的情况，双手水平展开(P)可能会被误判为双手斜向下推(O)的动作。但在总体上，本发明提出的方法能够很好地识别上述肢体动作，实际测试识别反馈时间在0.060s-0.096s之间，满足实时交互要求。The test results show that the action recognition rate is above 94%, and most of the action recognition rates reach 100%, with an average recognition rate of 98%, which can meet the needs of somatosensory interactive applications. Among them, the double-foot take-off (E) cannot be recognized without obvious take-off range, and the left/right circular movement (L, M) may be misrecognized as the left/right hand upward movement (H, I). Pushing up and down (N, O) obliquely may be unrecognized, and spreading hands horizontally (P) may be misjudged as pushing down obliquely (O). But in general, the method proposed by the present invention can recognize the above-mentioned body movements well, and the actual test recognition feedback time is between 0.060s-0.096s, which meets the requirement of real-time interaction.

2.2实验对比及分析2.2 Experimental comparison and analysis

表5给出了人体动作识别方法比较，其中，多实例学习方法^[12]、动作子集组合方法^[13]和SSS特征匹配方法^[14]采用的数据集为MSR-Action3D Dataset，本发明采用的测试数据集为30名实际用户的1530个动作实例。实验中识别准确率在0.6以下为“低”，在0.6～0.8之间为“中”，在0.8～1.0之间为“高”。Table 5 shows the comparison of human action recognition methods. Among them, the data set used by the multi-instance learning method^[12] , the action subset combination method^[13] and the SSS feature matching method^[14] is MSR-Action3D Dataset. The present invention adopts The test data set of 1530 action instances of 30 real users. In the experiment, the recognition accuracy rate is below 0.6 as "low", between 0.6 and 0.8 as "medium", and between 0.8 and 1.0 as "high".

表5人体动作识别方法比较Table 5 Comparison of human action recognition methods

多实例学习方法^[12]从动作数据序列中确定关键帧实例，从而推导出动作模板，但动作模板仅保存同一类行为的形态和模型，忽略了变化，在鲁棒性和实时性上表现一般；动作子集组合方法^[13]对关节点子集进行分类，识别率较高，但该方法着重于是事先分割的数据流级别，不能用于从未分割数据流中进行在线识别，虽然鲁棒性较高，但计算复杂，实时性较差；SSS特征匹配方法^[14]通过离线训练建立一个特征字典和手势模型，为未知动作的动作数据流的每一帧数据分配标签，通过提取SSS特征在线预测动作类型，能够从未分割数据流中进行在线识别，鲁棒性较高，但计算复杂，识别反馈时间不稳定(-1.5s表示提前1.5s识别，+1.5s表示延后1.5s识别)。以上三种方法都采用了机器学习和模板匹配技术实现，该类算法对每个动作识别需要特征字典库，对于扩展动作类型识别时，需要收集大量动作数据进行离线训练，对特定动作识别与训练集耦合度较高，因此，扩展性一般。FAAST动作识别方法^[11]采用角度、距离、速度等事件触发方式进行识别，该方法计算量小，实时性好，扩展性较强，对于已定义的简单动作，识别准确率高，但由于事件触发技术本身具有局限性，鲁棒性较低，且对连续动作识别较困难。The multi-instance learning method^[12] determines the key frame instance from the action data sequence, thereby deriving the action template, but the action template only saves the shape and model of the same type of behavior, ignoring the change, and the performance is average in terms of robustness and real-time performance. ; The action subset combination method^[13] classifies joint point subsets, and the recognition rate is high, but this method focuses on the level of data streams that have been segmented in advance, and cannot be used for online recognition from unsegmented data streams. Although the robustness It is relatively high, but the calculation is complex and the real-time performance is poor; the SSS feature matching method^[14] establishes a feature dictionary and gesture model through offline training, and assigns labels to each frame of the motion data stream of unknown actions. By extracting SSS features online Predict the type of action, and can perform online recognition from undivided data streams, with high robustness, but the calculation is complex, and the recognition feedback time is unstable (-1.5s means 1.5s ahead of time, +1.5s means 1.5s behind) . The above three methods are all implemented using machine learning and template matching technology. This type of algorithm requires a feature dictionary library for each action recognition. For extended action type recognition, it is necessary to collect a large amount of action data for offline training. For specific action recognition and training The set coupling is high, so the scalability is average. The FAAST action recognition method^[11] uses angle, distance, speed and other event triggers for recognition. This method has a small amount of calculation, good real-time performance, and strong scalability. For defined simple actions, the recognition accuracy is high. However, due to event The trigger technology itself has limitations, low robustness, and it is difficult to recognize continuous actions.

应当理解的是，对本领域普通技术人员来说，可以根据上述说明加以改进或变换，而所有这些改进和变换都应属于本发明所附权利要求的保护范围。It should be understood that those skilled in the art can make improvements or changes based on the above description, and all these improvements and changes should belong to the protection scope of the appended claims of the present invention.