CN110200640B

Movatterモバイル変換

Info

Publication number: CN110200640B
Application number: CN201910398156.2A
Authority: CN
Inventors: 洪弘; 肖雅萌; 顾陈; 孙理; 李彧晟; 朱晓华; 张宏宇
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2022-02-18
Anticipated expiration: 2039-05-14
Also published as: CN110200640A

Abstract

Translated fromChinese

本发明公开了一种基于双模态传感器的非接触式情绪识别方法，包括以下步骤：从雷达传感器采集的回波信号中获取生命体征信号，并从中提取呼吸信号和心跳信号；从视频传感器采集的视频图像中选取人脸区域，并根据人脸区域获取心跳信号和光流矢量信号；针对心跳信号，进行基于光强法的心跳优化处理；基于光流矢量信号，对雷达传感器获得的呼吸信号进行体动优化处理；针对优化后的心跳和呼吸信号，进行特征提取；对心跳特征和呼吸特征进行特征选择，根据筛选后的特征建立情绪识别模型；根据情绪识别模型，即可对待识别情绪进行识别。本发明的非接触测量不仅不会引入不适感，而且能减小测量误差，识别准确率高，鲁棒性好，且适用性更广。

The invention discloses a non-contact emotion recognition method based on a dual-modal sensor, comprising the following steps: obtaining vital sign signals from echo signals collected by radar sensors, and extracting breathing signals and heartbeat signals therefrom; collecting from video sensors Select the face area from the video image, and obtain the heartbeat signal and optical flow vector signal according to the face area; for the heartbeat signal, perform heartbeat optimization processing based on the light intensity method; Body motion optimization processing; feature extraction for the optimized heartbeat and breathing signals; feature selection for heartbeat features and breathing features, and an emotion recognition model based on the filtered features; according to the emotion recognition model, the emotion to be recognized can be recognized . The non-contact measurement of the present invention not only does not introduce discomfort, but also can reduce measurement errors, has high recognition accuracy, good robustness and wider applicability.

Description

Non-contact emotion recognition method based on dual-mode sensor

Technical Field

The invention belongs to the field of emotion recognition, and particularly relates to a non-contact emotion recognition method based on a bimodal sensor.

Background

The emotion is a psychological state generated by a person facing objective things, and different emotional states have different influences on behaviors of the person such as memory, learning and the like. With the improvement of the quality of life of people, the health consciousness of the whole people is generally enhanced, emotional health becomes a problem which is increasingly concerned by the public, and the poor emotional state can influence the personal quality of life. Especially for certain special groups, such as drivers, pilots, military personnel, doctors, etc., the emotional well-being of such groups may even affect public safety and social stability. The emotion plays an important role in the daily life of people, and the emotion recognition and research are particularly important.

Common emotion recognition methods are mainly classified into two types based on physiological signals and non-physiological signals, wherein the non-physiological signals mainly comprise facial expressions, voice, text input and body postures, and the physiological signals mainly comprise respiratory signals, electrocardiosignals, myoelectric signals, fingertip pulse wave signals, skin conductivity, electroencephalogram signals and the like.

The existing methods for researching physiological signals for emotion recognition are all contact type, which can affect emotion induction to a certain extent, and discomfort brought to a subject can increase measurement errors.

Disclosure of Invention

The invention aims to provide a non-contact emotion recognition method with good robustness and wide applicability.

The technical solution for realizing the purpose of the invention is as follows: a non-contact emotion recognition method based on a dual-mode sensor comprises the following steps:

step 1, acquiring a vital sign signal from an acquired echo signal, and extracting a respiratory signal and a heartbeat signal from the signal;

step 2, selecting a face area from the collected video image, and acquiring a heartbeat signal and an optical flow vector signal according to the face area;

step 3, aiming at the heartbeat signals, heartbeat optimization processing based on a light intensity method is carried out so as to solve the influence of a low-light environment on heartbeat detection of a video sensor;

step 4, based on the optical flow vector signals, carrying out body motion optimization processing on the respiration signals so as to solve the influence of body motion on a radar sensor;

step 5, extracting features of the optimized heartbeat and respiration signals obtained in the step 3 and the step 4;

step 6, carrying out feature selection on the heartbeat features and the respiration features extracted in the step 5, and establishing an emotion recognition model according to the screened features; and recognizing the emotion to be recognized according to the emotion recognition model.

Compared with the prior art, the invention has the following remarkable advantages: 1) the video and the radar sensor are combined to obtain signals, the non-contact mode can reduce the discomfort of a human body, so that the emotion of an experimenter is not influenced in the aspect of experiments, the measurement error is reduced, and a plurality of limitations can be overcome, such as incapability of directly contacting contact pieces due to burning or other factors on the contact parts required to be contacted, and the like; 2) the accuracy of heart rate detection is improved by the algorithm of heartbeat signal optimization based on light intensity; the algorithm of body motion detection based on the optical flow method utilizes the optical flow information of the video to detect the body motion of the radar so as to improve the accuracy of the detection of the radar breathing signal; the two algorithms are mutually complementary, so that the advantages of the dual-mode system, the necessity and the robustness of the dual-mode system are fully embodied; 3) physiological signals are used as emotion classification bases, so that the phenomenon that expressions and sounds can be pretended can be effectively avoided, and the method is wider in applicability; 4) according to the method, the proper characteristic parameters are selected to represent different emotions, models of the different emotions can be distinguished through a K-fold cross validation method, and finally the recognition result is good and the accuracy is high.

The present invention is described in further detail below with reference to the attached drawing figures.

Drawings

FIG. 1 is a flow chart of a non-contact emotion recognition method based on a dual-mode sensor.

Fig. 2 is a flow chart of a heartbeat optimization algorithm based on a light intensity method in the invention.

FIG. 3 is a flow chart of a radar body motion detection algorithm based on an optical flow method in the present invention.

FIG. 4 is a diagram of a feature fusion model in the present invention.

FIG. 5 is a graph comparing a respiration signal with a reference signal in an embodiment of the present invention.

FIG. 6 is a diagram comparing a heartbeat signal with a reference signal according to an embodiment of the present invention.

FIG. 7 is a face region capture graph according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of heart rate measurement accuracy corresponding to different illumination conditions in the embodiment of the present invention.

Fig. 9 is a schematic diagram of a light intensity determination model in an embodiment of the invention.

Fig. 10 is a comparison graph of classification accuracy before and after the heartbeat optimization algorithm in the embodiment of the present invention.

Fig. 11 is an optical flow vector diagram in three states in the embodiment of the present invention, where (a) is an optical flow vector diagram in a stationary state, (b) is an optical flow vector diagram in a translational motion state, and (c) is an optical flow vector diagram in a radial motion state.

FIG. 12 is a graph of accuracy P1 versus threshold T1 in an embodiment of the present invention.

FIG. 13 is a graph of accuracy P2 as a function of threshold T2 in an embodiment of the present invention.

Fig. 14 is a comparison graph of the respiratory signal measured by the radar and the respiratory signal frequency domain information measured by the respiratory belt before and after the body motion influence is eliminated in the embodiment of the invention, wherein (a) is a normalized amplitude-frequency graph before the body motion influence is eliminated, and (b) is a normalized amplitude-frequency graph after the body motion influence is eliminated.

FIG. 15 is a comparison chart of the classification results of the radar sensors before and after body motion elimination in the embodiment of the present invention.

FIG. 16 is a graph comparing the final results with the classification results of a single sensor in the example of the present invention.

Detailed Description

With reference to fig. 1, the invention relates to a non-contact emotion recognition method based on a dual-mode sensor, which comprises the following steps:

step 1, acquiring a vital sign signal from an echo signal acquired by a radar sensor, and extracting a respiratory signal and a heartbeat signal from the signal;

step 2, selecting a face area from a video image acquired by a video sensor, and acquiring a heartbeat signal and an optical flow vector signal according to the face area;

step 3, aiming at the heartbeat signals obtained by the radar sensor and the video sensor, heartbeat optimization processing based on a light intensity method is carried out to solve the influence of a low-light environment on the heartbeat detection of the video sensor;

step 4, based on the optical flow vector signals obtained by the video sensor, body motion optimization processing is carried out on the respiratory signals obtained by the radar sensor so as to solve the influence of body motion on the radar sensor;

Further preferably, thestep 1 of obtaining the vital sign signal from the signal collected by the radar sensor specifically includes: and obtaining a phase signal by utilizing an arc tangent demodulation algorithm according to the radar echo signal so as to obtain the vital sign signal.

Further preferably, instep 1, the respiratory signal and the heartbeat signal are extracted from the vital sign signal, specifically: extracting respiratory signals by using a 0.15Hz-0.7Hz band-pass filter, wherein the frequency range can cover the normal respiratory range of a human body for 12-20 times/minute (namely 0.2-0.33 Hz); the heartbeat signal is extracted by a band-pass filter of 0.8Hz-4Hz, and the frequency range can cover the normal heartbeat range of the human body for 50-90 times/minute (namely 0.83-1.5 Hz).

Further preferably, thestep 2 selects a face region from the image acquired by the video sensor, specifically: and selecting a human face region from the image acquired by the video sensor by using an Adaboost algorithm.

Further, instep 2, a face area is selected from the video image acquired by the video sensor, and a heartbeat signal and an optical flow vector signal are acquired according to the face area, specifically:

2-1, detecting and extracting a face area aiming at each frame of image in the video;

step 2-2, carrying out gray level averaging on pixel points in the face area in the RGB three channels respectively to obtain a gray level mean value;

2-3, acquiring an optical flow vector signal of each pixel point in the face area according to an optical flow method, and storing the optical flow vector signal;

step 2-4, forming the gray average values of all the frame images into a three-channel gray average value sequence signal, and performing L2 detrending and filtering on the signal;

step 2-5, extracting a heartbeat signal H according to the three-channel gray level mean value sequence signal processed in the step 2-4:

H＝R-2G+B

in the formula, R is a red channel signal, G is a green channel signal, and B is a blue channel signal.

Further, with reference to fig. 2, step 3 describes that, for the heartbeat signals obtained by the radar sensor and the video sensor, heartbeat optimization processing based on a light intensity method is performed to solve the influence of a low-light environment on heartbeat detection of the video sensor, specifically:

step 3-1, performing windowing processing on the heartbeat signals and the respiration signals collected by the radar sensor and the video sensor, and calibrating a corresponding emotion label for each window; wherein the window length is t seconds;

step 3-2, solving the light intensity value corresponding to each window, specifically: taking the average value of the light intensity of all the image frames in the window length range as the light intensity value corresponding to the window;

3-3, acquiring heart rates corresponding to the heartbeat signals respectively measured by the radar sensor and the video sensor in each window;

step 3-4, calculating the accuracy of the heart rate of the heartbeat signal respectively measured by the radar sensor and the video sensor by combining with the reference heart rate; wherein the reference heart rate is a heart rate measured by a contact heart rate measuring device;

step 3-5, obtaining the label value corresponding to each light intensity value in the step 3-2 according to the accuracy: if the accuracy of the heart rate of the heartbeat signal measured by the video sensor is higher than that of the heart rate of the heartbeat signal measured by the radar sensor, the label value corresponding to the light intensity value is 1, otherwise, the label value is 0;

3-6, taking heartbeat signals corresponding to all windows as samples, dividing the samples into K parts, taking 1 part of the samples as a test sample each time, and taking the rest K-1 parts as training samples;

step 3-7, taking each light intensity value in the step 3-2 as a feature, inputting the feature and a label corresponding to the feature into a decision tree classifier for training, obtaining node thresholds of the decision tree in different light intensity ranges, and obtaining a decision tree model;

3-8, testing the test sample according to the decision tree model obtained in the step 3-7, comparing the test sample with the threshold value of each node of the decision tree to obtain a final judgment result, selecting a heartbeat signal measured by the video sensor when the judgment result is 1, and selecting a heartbeat signal of the radar sensor when the judgment result is 0;

and 3-9, repeating the steps 3-7 and the steps 3-8 until the K times are finished.

Illustratively, the window time t in step 3-1 is 60 seconds.

Further, with reference to fig. 3, in step 4, based on the optical flow vector signal obtained by the video sensor, body motion optimization processing is performed on the respiratory signal obtained by the radar sensor to solve the influence of the body motion on the radar sensor, specifically:

step 4-1, acquiring optical flow vector characteristic parameters of radial motion detection of each frame of image according to the optical flow vector signals of each pixel point of the face area of each frame of image acquired in step 2-3, wherein the optical flow vector characteristic parameters comprise: first characteristic parameter A for distinguishing between stationary and moving states_iThe average value of the magnitude of the optical flow vector of the ith frame; second characteristic parameter B for distinguishing radial motion from translational motion_iThe variance of the optical flow vector direction of the ith frame;

step 4-2, combining the characteristic parameters of the N frames of images in step 4-1 to form two groups of characteristic vectors:

A＝[A₁,A₂,A₃,...A_N]

B＝[B₁,B₂,B₃,...B_N]

performing sliding window processing on both A and B;

step 4-3, carrying out normalization processing on the two groups of feature vectors by using a z-scores method to obtain corresponding threshold values T1 and T2 which are used as dual decision thresholds of radial motion;

step 4-4, double judgment is carried out, and the first layer judgment: if the first characteristic parameter value of a certain frame image is greater than a first judgment threshold T1, judging the frame as moving, otherwise, judging the frame as static; and second-layer judgment: for a certain frame of image which is judged to move, if the second characteristic parameter value of the certain frame of image is larger than a second judgment threshold T2, the certain frame is judged to have radial movement, otherwise, the certain frame of image is translational movement;

and 4-5, recording the starting time and the ending time of the radial motion of the video segment, and deleting the data of the corresponding time period in the respiratory signal measured by the radar.

Further, in step 5, for the optimized heartbeat and respiration signals obtained in steps 3 and 4, feature extraction is performed, specifically:

step 5-1, assuming that the optimized respiratory signal sequence is X, the normalized respiratory signal sequence is X1, N is the sequence length, X_jFor the jth signal in the sequence of respiratory signals, X1_j’To normalize the jth signal in the sequence of respiratory signals, the features extracted from the respiratory signals include:

(8) mean value of the sequence mu_x：

(9) Standard deviation of sequence σ_x：

(10) Mean value delta of absolute values of first order differences of the sequence_x：

(11)Mean value delta 1 of absolute values of first order differences of normalized sequence_x：

(12) Mean value gamma of the absolute values of the second order differences of the sequence_x：

(13) Normalized sequence second order difference absolute value mean gamma 1_x：

(14) Approximate entropy ApEn (m, r), the obtaining step is:

(7-1) reconstructing the respiratory signal sequence X into an m-dimensional phase space:

X(q),1≤q≤N-(m-1)

(7-2) defining the distance d between any two vectors X (j) and X (i) in the phase space_ijComprises the following steps:

d_ij＝max|X(i+k)-X(j+k)|,1≤k≤m-1；1≤i,j≤N-m+1,i≠j

(7-3) performing template matching for each x (i) in space, given a similarity tolerance r:

(7-4) mixing

Taking the logarithm, and taking the average value of i as phi^m(r)：

(7-5) adding 1 to the dimension m, repeating the above 7-2 to 7-4 to obtain phi^m+1(r)；

(7-6) from the above 7-4 and 7-5, the approximate entropy ApEn (m, r) of the sequence is obtained as:

in practice N is a finite value:

ApEn(m,r)＝φ^m(r)-φ^m+1(r)

(8) geometric features, comprising:

1) major axis SD of fitting ellipse₁And minor axis SD₂：

Wherein s is a time delay, γ_EE(s) is an autocorrelation function of the interval of the respiratory signal with a time delay of s,

is the average of the respiratory signal;

2) SD representing balance between sympathetic and parasympathetic nerves₁₂：

3) Area S of poincare plot representing fitted ellipse:

S＝π×SD₁×SD₂

4) variance of the entire breath time series SDRR:

step 5-2, assuming that the optimized heartbeat signal sequence is H, M is the sequence length, H_lThe mean value, the standard deviation, the mean value of the first-order difference absolute value of the normalized sequence, the mean value of the second-order difference absolute value of the normalized sequence and the geometric characteristics of the first-order difference absolute value of the normalized sequence are the same as those defined in the step 5-1, and the other characteristics are respectively as follows:

(1) skewness sk:

in the formula (I), the compound is shown in the specification,

is the mean value of the heartbeat signal sequence;

(2) kurtosis ku:

(3) mean square difference of successive differences RMSSD:

(4) VLF, LF, HF frequency band power: e_VLF,E_LF,E_HF；

(5) Frequency Peak corresponding to power maximum value point in VLF, LF and HF frequency bands_VLF,Peak_LF,Peak_HF；

(6) Per ratio of power in VLF, LF and HF frequency bands to sum of power in three frequency bands_VLF,Per_LF,Per_HF；

(7) Normalized low frequency power nLF representing a quantitative indicator of sympathetic activity:

(8) normalized high frequency power nHF representing a quantitative indicator of parasympathetic activity:

(9) ratio of low and high frequency power, LF/HF, representing the balance of autonomic nervous activity:

further preferably, in conjunction with fig. 4, step 5 performs feature selection on the extracted heartbeat feature and respiration feature by specifically using a feature elimination method.

Further, with reference to fig. 4, the step 6 of establishing an emotion recognition model according to the screened features specifically includes: and taking heartbeat signals and respiratory signals corresponding to all windows as samples, inputting the characteristics of each sample after being screened and the emotion labels corresponding to the characteristics into a classifier for training, and obtaining an emotion recognition model.

Examples

The invention relates to a non-contact emotion recognition method based on a dual-mode sensor, which comprises the following steps:

1. vital sign signals are obtained from radar signals through an arc-tangent demodulation algorithm, then respiratory signals are extracted through a 0.15Hz-0.7Hz band-pass filter, heartbeat signals are extracted through a 0.8Hz-4Hz band-pass filter, and the respiratory signals and the heartbeat signals are respectively shown in fig. 5 and fig. 6.

2. Obtaining a human face region through an Adaboost algorithm, and extracting a heartbeat signal and an optical flow signal from the human face region, wherein the Adaboost algorithm specifically comprises the following steps:

2-1, improving image contrast by an Adaboost algorithm and introducing a local histogram-based normalization technology, and extracting a face region by selecting Harr characteristics as detection characteristics as shown in fig. 7;

2-2, respectively carrying out gray level averaging on pixel points in the face area in the RGB three channels to obtain a gray level mean value;

2-4, forming the gray average values of all the frame images into a three-channel gray average value sequence signal, and performing L2 detrending and filtering on the signal;

2-5, extracting a heartbeat signal H according to the three-channel gray level mean value sequence signal processed by the step 2-4:

H＝R-2G+B

3. Aiming at the heartbeat signals obtained by the radar sensor and the video sensor, heartbeat optimization processing based on a light intensity method is carried out so as to solve the influence of a low-light environment on the heartbeat detection of the video sensor; the method specifically comprises the following steps:

3-1, performing windowing processing on the heartbeat signals of the video and the radar, wherein the window is 60s for a long time;

3-2, calculating the light intensity average value of all image frames in the window length range, and taking the average value as the light intensity value of the window, wherein the calculation formula is as follows:

I＝0.299×R+0.587×G+0.114×B

wherein, R is a red channel, G is a green channel, and B is a blue channel.

3-4, calculating the accuracy of comparing the heartbeat signals measured by the radar sensor and the video sensor in each window with the reference heart rate respectively, and obtaining a graph showing the change of the heart rate detection accuracy of the radar sensor and the video sensor along with the light intensity as shown in fig. 8, wherein the change of the accuracy of the video heart rate detection along with the light intensity can be seen;

3-5, acquiring a label value corresponding to each light intensity value in the step 3-2 according to the accuracy: if the accuracy of the heart rate of the heartbeat signal measured by the video sensor is higher than that of the heart rate of the heartbeat signal measured by the radar sensor, the label value corresponding to the light intensity value is 1, otherwise, the label value is 0;

3-6, dividing all samples of 7 volunteers into 10 parts, taking 1 part of the samples as a test sample each time, and taking the other 9 parts as training samples;

3-7, taking each light intensity value in the step 3-2 as a feature, inputting the feature and a label corresponding to the feature into a decision tree classifier for training, obtaining node thresholds of the decision tree in different light intensity ranges, and obtaining a decision tree model;

3-8, testing the test sample according to the decision tree model obtained in 3-7, and comparing the test sample with the threshold value of each node of the decision tree to obtain a final judgment result;

3-9, repeating 3-7 and steps 3-8 until 10 times of finishing, and obtaining a judgment model as shown in FIG. 9;

in order to verify the reliability of the model, the universality of the experimental data detection model of 4 volunteers is obtained in the embodiment, fig. 10 is a comparison of classification accuracy before and after the heartbeat optimization algorithm, and it can be seen from the graph that the accuracy is improved from 54.8% to 66%, which is greatly improved, and the optimization effect is good.

4. Based on the light stream vector signal that above-mentioned video sensor obtained, carry out body motion optimal treatment to the respiratory signal that radar sensor obtained to solve the influence of body motion to radar sensor, specifically do:

4-1, face region obtained according to 2-3The optical flow vector signal of each pixel point in the domain is an optical flow vector diagram in three states as shown in fig. 11, and the diagram (a), the diagram (b) and the diagram (c) are in a static state, a translation state and a radial motion state in sequence; acquiring optical flow vector characteristic parameters for detecting radial motion: the method comprises the following steps: first characteristic parameter A for distinguishing between stationary and moving states_i(the magnitude of the optical flow vector in the motion state is larger than that in the static state), which is the average value of the magnitudes of the optical flow vectors in the ith frame; second characteristic parameter B for distinguishing radial motion from translational motion_i(the variance of the radial motion is greater than the translational motion), which is the variance of the ith frame optical flow vector direction;

4-2, forming two groups of feature vectors according to the feature parameters of the 4-1, N frames of images:

A＝[A₁,A₂,A₃,...A_N]

B＝[B₁,B₂,B₃,...B_N]

carrying out sliding window processing on the A and the B;

4-3, carrying out normalization processing on the two groups of feature vectors by using a z-scores method to obtain corresponding threshold values T1 and T2 which are used as dual decision thresholds of radial motion;

4-4, carrying out double judgment, wherein the judgment of the first layer is as follows: if the first characteristic parameter value of a certain frame image is greater than a first judgment threshold T1, judging the frame as moving, otherwise, judging the frame as static; the variable P1 is introduced here as a measure of the accuracy of motion detection, the P1 value is the ratio of the time of motion detected by the optical flow method to the actual motion duration, and fig. 12 is a graph of the accuracy P1 against the threshold T1; the P1 value is greater than 1, i.e. the motion time detected by the algorithm is greater than the actual motion time, because the algorithm detects the non-body motion segment as the body motion segment, which causes misjudgment, and in this case, the algorithm accuracy is reduced, so the P value is considered to be lower than 1. It can be observed that when the threshold T1 is 0.02, the P1 value is the highest, the measurement accuracy is the highest, 99.13%, and therefore T1 is set to 0.02;

and second-layer judgment: for a certain frame of image which is judged to move, if the second characteristic parameter value of the certain frame of image is larger than a second judgment threshold T2, the certain frame is judged to have radial movement, otherwise, the certain frame of image is translational movement; introducing a variable P2, wherein P2 is the ratio of the radial motion time detected by an optical flow method to the actual radial motion time, and FIG. 13 is a graph of the change of the accuracy P2 along with a threshold T2; the P2 value greater than 1 indicates that the detection algorithm detects the translational motion segment as radial motion, which affects the accuracy of radial motion detection. As can be seen from observation, when the selection range of the threshold T2 is-1 to 0.2, the detection accuracy of the radial motion is high and is 97%. Here threshold T2 is chosen to be 0;

4-5, recording the starting time and the ending time of the radial motion of the video band, positioning the body motion segment of the radar waveform of the corresponding time band and deleting the body motion segment in the radar sensor, and referring to fig. 14, which is a comparison graph of the respiratory signal measured by the radar and the respiratory signal frequency domain information measured by the respiratory band before and after the elimination of the body motion influence. Fig. 15 is a comparison graph of classification results of the radar sensors before and after body motion elimination, and it can be seen that the accuracy is improved from 73.7% to 78.6%, and the optimization effect is good.

5. Respectively extracting respiratory signal and heartbeat signal characteristics;

6. the heartbeat feature and the respiratory feature extracted in the step 5 are subjected to feature selection by using a feature elimination method, and 6 features of a normalization sequence, namely a first-order difference absolute value mean value, a second-order difference absolute value mean value, a normalization second-order difference absolute value mean value and the like, are eliminated from the respiratory signal feature in the embodiment; the heartbeat signal features delete 19 features such as standard deviation, normalized first-order difference absolute value mean, kurtosis, skewness and the like; the screened features and the corresponding emotion labels are input into a bundled Trees classifier for classification, and a comparison graph of the final result and the classification result of a single sensor is shown in fig. 16, so that the accuracy of the method is obviously higher than that of the single sensor.

The invention collects physiological signals through a video sensor and a radar sensor, optimizes heartbeat signals of the video and body movement signals of the radar through a complementary optimization algorithm, extracts characteristics of the optimized signals, inputs the signals into a classifier to obtain an emotion classification judgment model, and utilizes the model to recognize emotion. In conclusion, the non-contact measurement method provided by the invention has the advantages that the discomfort cannot be caused, the measurement error can be reduced, the identification accuracy is high, the robustness is good, and the applicability is wider.

Claims

Translated fromChinese

1.一种基于双模态传感器的非接触式情绪识别方法，其特征在于，包括以下步骤：1. a non-contact emotion recognition method based on dual-modal sensor, is characterized in that, comprises the following steps:

步骤1、从雷达传感器采集的回波信号中获取生命体征信号，并从该信号中提取呼吸信号和心跳信号；Step 1. Obtain the vital sign signal from the echo signal collected by the radar sensor, and extract the breathing signal and the heartbeat signal from the signal;

步骤2、从采集的视频图像中选取人脸区域，并根据人脸区域获取心跳信号和光流矢量信号；Step 2, select the face area from the collected video image, and obtain the heartbeat signal and the optical flow vector signal according to the face area;

步骤3、针对上述步骤1和步骤2所述的心跳信号，进行基于光强法的心跳优化处理以解决弱光环境对视频传感器检测心跳的影响，具体为：Step 3, for the heartbeat signals described in the above steps 1 and 2, perform heartbeat optimization processing based on the light intensity method to solve the impact of the weak light environment on the detection of heartbeats by the video sensor, specifically:

步骤3-1、对雷达传感器、视频传感器采集到的心跳信号、呼吸信号均进行分窗处理，并对每个窗标定相应的情绪标签；其中，窗长时间为t秒；Step 3-1. Perform window processing on the heartbeat signal and breathing signal collected by the radar sensor and the video sensor, and demarcate the corresponding emotional label for each window; wherein, the window duration is t seconds;

步骤3-2、求取每一个窗对应的光强值，具体为：将窗长范围内所有图像帧的光强的均值作为该窗对应的光强值；Step 3-2: Obtain the light intensity value corresponding to each window, specifically: taking the mean value of the light intensity of all image frames within the window length range as the light intensity value corresponding to the window;

步骤3-3、获取每个窗内雷达传感器、视频传感器分别测得的心跳信号对应的心率；Step 3-3, obtaining the heart rate corresponding to the heartbeat signal measured by the radar sensor and the video sensor in each window respectively;

步骤3-4、结合参考心率，求取所述雷达传感器、视频传感器分别测得的心跳信号心率的准确率；其中，参考心率为通过接触式心率测量器件测得的心率；Step 3-4, in combination with the reference heart rate, obtain the accuracy rate of the heartbeat signal heart rate measured by the radar sensor and the video sensor respectively; wherein, the reference heart rate is the heart rate measured by the contact heart rate measuring device;

步骤3-5、根据准确率获取步骤3-2中每个光强值对应的标签值：若视频传感器测得的心跳信号心率的准确率高于雷达传感器测得的心跳信号心率的准确率，则该光强值对应的标签值为1，反之为0；Step 3-5, obtain the label value corresponding to each light intensity value in step 3-2 according to the accuracy rate: if the accuracy rate of the heart rate of the heartbeat signal measured by the video sensor is higher than the accuracy rate of the heart rate of the heartbeat signal measured by the radar sensor, Then the label value corresponding to the light intensity value is 1, otherwise it is 0;

步骤3-6、将所有的窗对应的心跳信号作为样本，并将样本分为K份，每次取其中1份作为测试样本，其余K-1份作为训练样本；Step 3-6, take the heartbeat signals corresponding to all the windows as samples, and divide the samples into K parts, take one of them as a test sample each time, and the remaining K-1 parts as training samples;

步骤3-7、将步骤3-2中的每个光强值作为特征，并将该特征及其对应的标签值输入至决策树分类器中进行训练，获得不同光强范围下决策树的节点阈值，并获得决策树模型；Step 3-7. Use each light intensity value in step 3-2 as a feature, and input the feature and its corresponding label value into the decision tree classifier for training, and obtain the nodes of the decision tree under different light intensity ranges. Threshold, and obtain a decision tree model;

步骤3-8、根据步骤3-7获得的决策树模型对测试样本进行测试，将测试样本与决策树各节点阈值比较，获得最终判决结果，当判决结果为1时，选择视频传感器测得的心跳信号，判决结果为0时，选择雷达传感器的心跳信号；Step 3-8, test the test sample according to the decision tree model obtained in step 3-7, compare the test sample with the threshold of each node of the decision tree, and obtain the final judgment result. When the judgment result is 1, select the test sample measured by the video sensor. Heartbeat signal, when the judgment result is 0, select the heartbeat signal of the radar sensor;

步骤3-9、重复步骤3-7～步骤3-8，直至K次结束；Step 3-9, repeat steps 3-7 to 3-8 until the end of K times;

步骤4、基于上述光流矢量信号，对所述呼吸信号进行体动优化处理以解决体动对雷达传感器的影响；具体为：Step 4. Based on the above-mentioned optical flow vector signal, perform body motion optimization processing on the breathing signal to solve the influence of body motion on the radar sensor; specifically:

步骤4-1、根据步骤2-3获取的每帧图像人脸区域每个像素点的光流矢量信号，获取每帧图像的检测径向运动的光流矢量特征参数，包括：用于区分静止与运动状态的第一个特征参数A_i，其为第i帧光流矢量大小的均值；用于区分径向运动与平移运动的第二个特征参数B_i，其为第i帧光流矢量方向的方差；Step 4-1, according to the optical flow vector signal of each pixel in the face region of each frame of the image obtained in step 2-3, obtain the optical flow vector feature parameters of each frame of image for detecting radial motion, including: for distinguishing static and the first feature parameter A_i of the motion state, which is the average value of the optical flow vector size of the ith frame; the second feature parameter B_i used to distinguish radial motion and translational motion, which is the optical flow vector of the ith frame the variance of the direction;

步骤4-2、结合步骤4-1，N帧图像的特征参数组成两组特征向量：Step 4-2, combined with step 4-1, the feature parameters of N frame images form two sets of feature vectors:

A＝[A₁,A₂,A₃,...A_N]A=[A₁ ,A₂ ,A₃ ,...A_N ]

B＝[B₁,B₂,B₃,...B_N]B=[B₁ ,B₂ ,B₃ ,...B_N ]

对A，B均进行滑窗处理；Sliding window processing is performed on both A and B;

步骤4-3、利用z-scores法对两组特征向量进行归一化处理获得对应的阈值T1、T2，作为径向运动的双重判决门限；Step 4-3, using the z-scores method to normalize the two sets of eigenvectors to obtain the corresponding thresholds T1 and T2, which are used as double judgment thresholds for radial motion;

步骤4-4、进行双重判决，第一层判决：若某一帧图像的第一个特征参数值大于第一个判决门限T1，该帧图像判决为运动，否则判决为静止；第二层判决：针对被判决为运动的某一帧图像，若其第二个特征参数值大于第二个判决门限T2，该帧图像判决为发生了径向运动，否则为平移运动；Step 4-4, carry out double judgment, the first layer of judgment: if the value of the first characteristic parameter of a certain frame image is greater than the first judgment threshold T1, the frame image is judged to be motion, otherwise it is judged to be static; the second layer of judgment : For a certain frame of image that is judged to be moving, if its second characteristic parameter value is greater than the second decision threshold T2, the frame of image is judged to have radial motion, otherwise it is translational motion;

步骤4-5、记录视频段径向运动发生的起始时间与结束时间，删除雷达测得的呼吸信号中与该起始时间和该结束时间相应时间段的数据；Step 4-5, record the start time and end time of the radial motion of the video segment, delete the data in the corresponding time period of the start time and the end time in the breathing signal measured by the radar;

步骤5、针对步骤3和步骤4获得的优化后的心跳和呼吸信号，进行特征提取；Step 5, for the optimized heartbeat and breathing signals obtained in steps 3 and 4, perform feature extraction;

步骤6、对步骤5提取的心跳特征和呼吸特征进行特征选择，根据筛选后的特征建立情绪识别模型；根据所述情绪识别模型，即可对待识别情绪进行识别。Step 6. Perform feature selection on the heartbeat feature and breathing feature extracted in step 5, and establish an emotion recognition model according to the screened features; according to the emotion recognition model, the emotion to be recognized can be recognized.

2.根据权利要求1所述的基于双模态传感器的非接触式情绪识别方法，其特征在于，步骤1所述从采集的回波信号中获取生命体征信号具体为：根据雷达传感器采集的回波信号，利用反正切解调算法获得相位信号从而获取生命体征信号。2 . The non-contact emotion recognition method based on dual-modal sensors according to claim 1 , wherein the acquiring vital sign signals from the collected echo signals in step 1 is specifically: according to the echo signals collected by the radar sensor. 3 . wave signal, and use the arctangent demodulation algorithm to obtain the phase signal to obtain the vital sign signal.

3.根据权利要求1或2所述的基于双模态传感器的非接触式情绪识别方法，其特征在于，所述步骤1从生命体征信号中提取呼吸信号和心跳信号，具体为：利用0.15Hz-0.7Hz的带通滤波器提取呼吸信号，利用0.8Hz-4Hz的带通滤波器提取心跳信号。3. The non-contact emotion recognition method based on a dual-modal sensor according to claim 1 or 2, wherein the step 1 extracts a breathing signal and a heartbeat signal from the vital sign signal, specifically: using 0.15Hz A -0.7Hz bandpass filter was used to extract the respiratory signal, and a 0.8Hz-4Hz bandpass filter was used to extract the heartbeat signal.

4.根据权利要求3所述的基于双模态传感器的非接触式情绪识别方法，其特征在于，步骤2所述从采集的视频图像中选取人脸区域，具体为：利用Adaboost算法从视频传感器采集的图像中选取人脸区域。4. the non-contact emotion recognition method based on dual-modal sensor according to claim 3, is characterized in that, described in step 2, selects face area from the video image collected, is specially: utilize Adaboost algorithm from video sensor Select the face area in the collected image.

5.根据权利要求4所述的基于双模态传感器的非接触式情绪识别方法，其特征在于，步骤2所述从采集的视频图像中选取人脸区域，并根据人脸区域获取心跳信号和光流矢量信号，具体为：5. The non-contact emotion recognition method based on a dual-modal sensor according to claim 4, wherein the step 2 selects a face region from the video image collected, and obtains a heartbeat signal and light according to the face region. Flow vector signal, specifically:

步骤2-1、针对视频中的每一帧图像，检测并提取人脸区域；Step 2-1, for each frame of image in the video, detect and extract the face area;

步骤2-2、在RGB三通道内分别对人脸区域内的像素点做灰度平均，获得灰度均值；Step 2-2, in the RGB three channels, respectively perform the grayscale average of the pixels in the face area to obtain the grayscale mean;

步骤2-3、根据光流法获取人脸区域每个像素点的光流矢量信号，并进行存储；Step 2-3, obtain the optical flow vector signal of each pixel in the face area according to the optical flow method, and store it;

步骤2-4、将所有帧图像的灰度均值构成三通道灰度均值序列信号，并对该信号进行L2去趋势和滤波；Step 2-4, forming a three-channel grayscale mean sequence signal from the grayscale mean of all frame images, and performing L2 detrending and filtering on the signal;

步骤2-5、根据步骤2-4处理后的三通道灰度均值序列信号提取心跳信号H：Step 2-5, extract the heartbeat signal H according to the three-channel grayscale mean sequence signal processed in step 2-4:

H＝R-2G+BH=R-2G+B

式中，R为红色通道信号，G为绿色通道信号，B为蓝色通道信号。In the formula, R is the red channel signal, G is the green channel signal, and B is the blue channel signal.

6.根据权利要求1所述的基于双模态传感器的非接触式情绪识别方法，其特征在于，步骤5所述针对步骤3和步骤4获得的优化后的心跳和呼吸信号，进行特征提取，具体为：6. the non-contact emotion recognition method based on dual-modal sensor according to claim 1, is characterized in that, described in step 5, for the optimized heartbeat and breathing signal obtained in step 3 and step 4, carry out feature extraction, Specifically:

步骤5-1、假设优化后的呼吸信号序列为X，归一化呼吸信号序列为X1，N为序列长度，X_j为呼吸信号序列中的第j个信号，X1_j’为归一化呼吸信号序列中的第j’个信号，则由呼吸信号提取的特征包括：Step 5-1. Suppose that the optimized breathing signal sequence is X, the normalized breathing signal sequence is X1, N is the sequence length, X_j is the jth signal in the breathing signal sequence, and X1_j' is the normalized breathing The j'th signal in the signal sequence, the features extracted by the respiratory signal include:

(1)序列的均值μ_x：(1) The mean μ_x of the sequence:

(2)序列的标准差σ_x：(2) Standard deviation σ_x of the sequence:

(3)序列一阶差分绝对值的均值δ_x：(3) The mean value δ_x of the absolute value of the first-order difference of the sequence:

(4)归一化序列一阶差分绝对值均值δ1_x：(4) Normalized sequence first-order difference absolute value mean δ1_x :

(5)序列二阶差分绝对值的均值γ_x：(5) The mean value of the absolute value of the second-order difference of the sequence γ_x :

(6)归一化序列二阶差分绝对值均值γ1_x：(6) Normalized sequence second-order difference absolute value mean γ1_x :

(7)近似熵ApEn(m,r)，获取步骤为：(7) Approximate entropy ApEn(m,r), the steps to obtain are:

(7-1)将呼吸信号序列X重构为m维相空间：(7-1) Reconstruct the respiratory signal sequence X into an m-dimensional phase space:

X(q),1≤q≤N-(m-1)X(q), 1≤q≤N-(m-1)

(7-2)定义相空间中任意两矢量X(j)与X(i)的距离d_ij为：(7-2) Define the distance d_ij between any two vectors X(j) and X(i) in the phase space as:

d_ij＝max|X(i+k)-X(j+k)|,1≤k≤m-1；1≤i,j≤N-m+1,i≠jd_ij =max|X(i+k)-X(j+k)|, 1≤k≤m-1; 1≤i, j≤N-m+1, i≠j

(7-3)给定相似容限r，对空间中每一个X(i)，进行模板匹配：(7-3) Given the similarity tolerance r, perform template matching for each X(i) in the space:

{d_ij＜r的数目}1≤j≤N-(m-1)

{d_ij <number of r}1≤j≤N-(m-1)

(7-4)将

取对数，并对i平均值，记为φ^m(r)：(7-4) will

Take the logarithm and average over i, denoted as φ^m (r):

(7-5)维数m加1，重复上述7-2到7-4，获得φ^m+1(r)；(7-5) Add 1 to the dimension m, repeat the above 7-2 to 7-4, and obtain φ^m+1 (r);

(7-6)由上述7-4和7-5，获得序列的近似熵ApEn(m,r)为：(7-6) From the above 7-4 and 7-5, the approximate entropy ApEn(m,r) of the obtained sequence is:

实际中N为有限值：In practice N is a finite value:

ApEn(m,r)＝φ^m(r)-φ^m+1(r)ApEn(^m ,r)＝φm(r)-φm⁺¹ (r)

(8)几何特征，包括：(8) Geometric features, including:

1)拟合椭圆的长轴SD₁和短轴SD₂：1) Fit the major axis SD₁ and minor axis SD₂ of the ellipse:

SD₁²＝γ_EE(0)-γ_EE(s)SD₁² =γ_EE (0)-γ_EE (s)

式中，s为时间延迟，γ_EE(s)为时间延迟为s的呼吸信号间隔的自相关函数，

为呼吸信号的平均值；where s is the time delay, γ_EE (s) is the autocorrelation function of the respiratory signal interval with a time delay of s,

is the average value of the respiratory signal;

2)表示交感神经与副交感神经之间平衡的SD₁₂：2) SD₁₂ representing the balance between sympathetic and parasympathetic nerves:

3)表示拟合椭圆的庞加莱图的面积S：3) represents the area S of the Poincaré diagram of the fitted ellipse:

S＝π×SD₁×SD₂S=π×SD₁ ×SD₂

4)整个呼吸时间序列的方差SDRR：4) Variance SDRR of the entire respiratory time series:

步骤5-2、假设优化后心跳信号序列为H，M为序列长度，H_l为心跳序列的第l个信号，其均值、标准差、一阶差分绝对值的均值、归一化序列一阶差分绝对值均值、二阶差分绝对值的均值、归一化序列二阶差分绝对值均值以及几何特征均与步骤5-1中定义相同，其他特征分别为：Step 5-2. Suppose that the optimized heartbeat signal sequence is H, M is the sequence length, and H1 is the_lth signal of the heartbeat sequence, its mean, standard deviation, the mean of the absolute value of the first-order difference, and the first-order normalized sequence. The mean of the absolute value of the difference, the mean of the absolute value of the second-order difference, the mean of the absolute value of the second-order difference of the normalized sequence, and the geometric features are the same as those defined in step 5-1, and the other features are:

(1)偏度sk：(1) Skewness sk:

式中，

为心跳信号序列均值；In the formula,

is the mean value of the heartbeat signal sequence;

(2)峰度ku：(2) Kurtosis ku:

(3)连续差值的均方差RMSSD：(3) Mean square error RMSSD of continuous difference:

(4)VLF、LF、HF频段功率：E_VLF,E_LF,E_HF；(4) VLF, LF, HF frequency band power: E_VLF , E_LF , E_HF ;

(5)VLF、LF、HF频段内功率极大值点对应的频率Peak_VLF,Peak_LF,Peak_HF；(5) Frequency Peak_VLF , Peak_LF , Peak_HF corresponding to the power maximum point in the VLF, LF, HF frequency bands;

(6)VLF、LF、HF频段内功率占三个频段功率之和的比例Per_VLF,Per_LF,Per_HF；(6) The ratio of the power in the VLF, LF and HF frequency bands to the sum of the power of the three frequency bands Per_VLF , Per_LF , Per_HF ;

(7)表示交感神经活性定量指标的标准化低频功率nLF：(7) The normalized low frequency power nLF representing the quantitative index of sympathetic nerve activity:

(8)表示副交感神经活性定量指标的标准化高频功率nHF：(8) Normalized high-frequency power nHF representing a quantitative index of parasympathetic nerve activity:

(9)表示自律神经活性平衡的低、高频功率的比值LF/HF：(9) The ratio LF/HF of low and high frequency power representing the balance of autonomic nervous activity:

7.根据权利要求6所述的基于双模态传感器的非接触式情绪识别方法，其特征在于，对所述步骤5提取的心跳特征和呼吸特征进行特征选择，具体为：利用特征消除法进行特征选择。7. The non-contact emotion recognition method based on dual-modal sensor according to claim 6, is characterized in that, feature selection is carried out to the heartbeat feature and breathing feature extracted in described step 5, specifically: utilize feature elimination method to carry out feature selection. Feature selection.

8.根据权利要求7所述的基于双模态传感器的非接触式情绪识别方法，其特征在于，步骤6所述根据筛选后的特征建立情绪识别模型，具体为：将所有窗对应的心跳信号和呼吸信号作为样本，将每个样本筛选后的特征及其对应的情绪标签输入至分类器进行训练，获得情绪识别模型。8. the non-contact emotion recognition method based on bimodal sensor according to claim 7, is characterized in that, described in step 6, establishes emotion recognition model according to the feature after screening, is specially: the heartbeat signal corresponding to all windows and breathing signals as samples, and input the filtered features of each sample and their corresponding emotion labels into the classifier for training to obtain an emotion recognition model.