Detailed Description
With reference to fig. 1, the invention relates to a non-contact emotion recognition method based on a dual-mode sensor, which comprises the following steps:
step 1, acquiring a vital sign signal from an echo signal acquired by a radar sensor, and extracting a respiratory signal and a heartbeat signal from the signal;
step 2, selecting a face area from a video image acquired by a video sensor, and acquiring a heartbeat signal and an optical flow vector signal according to the face area;
step 3, aiming at the heartbeat signals obtained by the radar sensor and the video sensor, heartbeat optimization processing based on a light intensity method is carried out to solve the influence of a low-light environment on the heartbeat detection of the video sensor;
step 4, based on the optical flow vector signals obtained by the video sensor, body motion optimization processing is carried out on the respiratory signals obtained by the radar sensor so as to solve the influence of body motion on the radar sensor;
step 5, extracting features of the optimized heartbeat and respiration signals obtained in the step 3 and the step 4;
step 6, carrying out feature selection on the heartbeat features and the respiration features extracted in the step 5, and establishing an emotion recognition model according to the screened features; and recognizing the emotion to be recognized according to the emotion recognition model.
Further preferably, thestep 1 of obtaining the vital sign signal from the signal collected by the radar sensor specifically includes: and obtaining a phase signal by utilizing an arc tangent demodulation algorithm according to the radar echo signal so as to obtain the vital sign signal.
Further preferably, instep 1, the respiratory signal and the heartbeat signal are extracted from the vital sign signal, specifically: extracting respiratory signals by using a 0.15Hz-0.7Hz band-pass filter, wherein the frequency range can cover the normal respiratory range of a human body for 12-20 times/minute (namely 0.2-0.33 Hz); the heartbeat signal is extracted by a band-pass filter of 0.8Hz-4Hz, and the frequency range can cover the normal heartbeat range of the human body for 50-90 times/minute (namely 0.83-1.5 Hz).
Further preferably, thestep 2 selects a face region from the image acquired by the video sensor, specifically: and selecting a human face region from the image acquired by the video sensor by using an Adaboost algorithm.
Further, instep 2, a face area is selected from the video image acquired by the video sensor, and a heartbeat signal and an optical flow vector signal are acquired according to the face area, specifically:
2-1, detecting and extracting a face area aiming at each frame of image in the video;
step 2-2, carrying out gray level averaging on pixel points in the face area in the RGB three channels respectively to obtain a gray level mean value;
2-3, acquiring an optical flow vector signal of each pixel point in the face area according to an optical flow method, and storing the optical flow vector signal;
step 2-4, forming the gray average values of all the frame images into a three-channel gray average value sequence signal, and performing L2 detrending and filtering on the signal;
step 2-5, extracting a heartbeat signal H according to the three-channel gray level mean value sequence signal processed in the step 2-4:
H=R-2G+B
in the formula, R is a red channel signal, G is a green channel signal, and B is a blue channel signal.
Further, with reference to fig. 2, step 3 describes that, for the heartbeat signals obtained by the radar sensor and the video sensor, heartbeat optimization processing based on a light intensity method is performed to solve the influence of a low-light environment on heartbeat detection of the video sensor, specifically:
step 3-1, performing windowing processing on the heartbeat signals and the respiration signals collected by the radar sensor and the video sensor, and calibrating a corresponding emotion label for each window; wherein the window length is t seconds;
step 3-2, solving the light intensity value corresponding to each window, specifically: taking the average value of the light intensity of all the image frames in the window length range as the light intensity value corresponding to the window;
3-3, acquiring heart rates corresponding to the heartbeat signals respectively measured by the radar sensor and the video sensor in each window;
step 3-4, calculating the accuracy of the heart rate of the heartbeat signal respectively measured by the radar sensor and the video sensor by combining with the reference heart rate; wherein the reference heart rate is a heart rate measured by a contact heart rate measuring device;
step 3-5, obtaining the label value corresponding to each light intensity value in the step 3-2 according to the accuracy: if the accuracy of the heart rate of the heartbeat signal measured by the video sensor is higher than that of the heart rate of the heartbeat signal measured by the radar sensor, the label value corresponding to the light intensity value is 1, otherwise, the label value is 0;
3-6, taking heartbeat signals corresponding to all windows as samples, dividing the samples into K parts, taking 1 part of the samples as a test sample each time, and taking the rest K-1 parts as training samples;
step 3-7, taking each light intensity value in the step 3-2 as a feature, inputting the feature and a label corresponding to the feature into a decision tree classifier for training, obtaining node thresholds of the decision tree in different light intensity ranges, and obtaining a decision tree model;
3-8, testing the test sample according to the decision tree model obtained in the step 3-7, comparing the test sample with the threshold value of each node of the decision tree to obtain a final judgment result, selecting a heartbeat signal measured by the video sensor when the judgment result is 1, and selecting a heartbeat signal of the radar sensor when the judgment result is 0;
and 3-9, repeating the steps 3-7 and the steps 3-8 until the K times are finished.
Illustratively, the window time t in step 3-1 is 60 seconds.
Further, with reference to fig. 3, in step 4, based on the optical flow vector signal obtained by the video sensor, body motion optimization processing is performed on the respiratory signal obtained by the radar sensor to solve the influence of the body motion on the radar sensor, specifically:
step 4-1, acquiring optical flow vector characteristic parameters of radial motion detection of each frame of image according to the optical flow vector signals of each pixel point of the face area of each frame of image acquired in step 2-3, wherein the optical flow vector characteristic parameters comprise: first characteristic parameter A for distinguishing between stationary and moving statesiThe average value of the magnitude of the optical flow vector of the ith frame; second characteristic parameter B for distinguishing radial motion from translational motioniThe variance of the optical flow vector direction of the ith frame;
step 4-2, combining the characteristic parameters of the N frames of images in step 4-1 to form two groups of characteristic vectors:
A=[A1,A2,A3,...AN]
B=[B1,B2,B3,...BN]
performing sliding window processing on both A and B;
step 4-3, carrying out normalization processing on the two groups of feature vectors by using a z-scores method to obtain corresponding threshold values T1 and T2 which are used as dual decision thresholds of radial motion;
step 4-4, double judgment is carried out, and the first layer judgment: if the first characteristic parameter value of a certain frame image is greater than a first judgment threshold T1, judging the frame as moving, otherwise, judging the frame as static; and second-layer judgment: for a certain frame of image which is judged to move, if the second characteristic parameter value of the certain frame of image is larger than a second judgment threshold T2, the certain frame is judged to have radial movement, otherwise, the certain frame of image is translational movement;
and 4-5, recording the starting time and the ending time of the radial motion of the video segment, and deleting the data of the corresponding time period in the respiratory signal measured by the radar.
Further, in step 5, for the optimized heartbeat and respiration signals obtained in steps 3 and 4, feature extraction is performed, specifically:
step 5-1, assuming that the optimized respiratory signal sequence is X, the normalized respiratory signal sequence is X1, N is the sequence length, XjFor the jth signal in the sequence of respiratory signals, X1j’To normalize the jth signal in the sequence of respiratory signals, the features extracted from the respiratory signals include:
(8) mean value of the sequence mux:
(9) Standard deviation of sequence σx:
(10) Mean value delta of absolute values of first order differences of the sequencex:
(11)Mean value delta 1 of absolute values of first order differences of normalized sequencex:
(12) Mean value gamma of the absolute values of the second order differences of the sequencex:
(13) Normalized sequence second order difference absolute value mean gamma 1x:
(14) Approximate entropy ApEn (m, r), the obtaining step is:
(7-1) reconstructing the respiratory signal sequence X into an m-dimensional phase space:
X(q),1≤q≤N-(m-1)
(7-2) defining the distance d between any two vectors X (j) and X (i) in the phase spaceijComprises the following steps:
dij=max|X(i+k)-X(j+k)|,1≤k≤m-1;1≤i,j≤N-m+1,i≠j
(7-3) performing template matching for each x (i) in space, given a similarity tolerance r:
(7-4) mixing
Taking the logarithm, and taking the average value of i as phi
m(r):
(7-5) adding 1 to the dimension m, repeating the above 7-2 to 7-4 to obtain phim+1(r);
(7-6) from the above 7-4 and 7-5, the approximate entropy ApEn (m, r) of the sequence is obtained as:
in practice N is a finite value:
ApEn(m,r)=φm(r)-φm+1(r)
(8) geometric features, comprising:
1) major axis SD of fitting ellipse1And minor axis SD2:
Wherein s is a time delay, γ
EE(s) is an autocorrelation function of the interval of the respiratory signal with a time delay of s,
is the average of the respiratory signal;
2) SD representing balance between sympathetic and parasympathetic nerves12:
3) Area S of poincare plot representing fitted ellipse:
S=π×SD1×SD2
4) variance of the entire breath time series SDRR:
step 5-2, assuming that the optimized heartbeat signal sequence is H, M is the sequence length, HlThe mean value, the standard deviation, the mean value of the first-order difference absolute value of the normalized sequence, the mean value of the second-order difference absolute value of the normalized sequence and the geometric characteristics of the first-order difference absolute value of the normalized sequence are the same as those defined in the step 5-1, and the other characteristics are respectively as follows:
(1) skewness sk:
in the formula (I), the compound is shown in the specification,
is the mean value of the heartbeat signal sequence;
(2) kurtosis ku:
(3) mean square difference of successive differences RMSSD:
(4) VLF, LF, HF frequency band power: eVLF,ELF,EHF;
(5) Frequency Peak corresponding to power maximum value point in VLF, LF and HF frequency bandsVLF,PeakLF,PeakHF;
(6) Per ratio of power in VLF, LF and HF frequency bands to sum of power in three frequency bandsVLF,PerLF,PerHF;
(7) Normalized low frequency power nLF representing a quantitative indicator of sympathetic activity:
(8) normalized high frequency power nHF representing a quantitative indicator of parasympathetic activity:
(9) ratio of low and high frequency power, LF/HF, representing the balance of autonomic nervous activity:
further preferably, in conjunction with fig. 4, step 5 performs feature selection on the extracted heartbeat feature and respiration feature by specifically using a feature elimination method.
Further, with reference to fig. 4, the step 6 of establishing an emotion recognition model according to the screened features specifically includes: and taking heartbeat signals and respiratory signals corresponding to all windows as samples, inputting the characteristics of each sample after being screened and the emotion labels corresponding to the characteristics into a classifier for training, and obtaining an emotion recognition model.
Examples
The invention relates to a non-contact emotion recognition method based on a dual-mode sensor, which comprises the following steps:
1. vital sign signals are obtained from radar signals through an arc-tangent demodulation algorithm, then respiratory signals are extracted through a 0.15Hz-0.7Hz band-pass filter, heartbeat signals are extracted through a 0.8Hz-4Hz band-pass filter, and the respiratory signals and the heartbeat signals are respectively shown in fig. 5 and fig. 6.
2. Obtaining a human face region through an Adaboost algorithm, and extracting a heartbeat signal and an optical flow signal from the human face region, wherein the Adaboost algorithm specifically comprises the following steps:
2-1, improving image contrast by an Adaboost algorithm and introducing a local histogram-based normalization technology, and extracting a face region by selecting Harr characteristics as detection characteristics as shown in fig. 7;
2-2, respectively carrying out gray level averaging on pixel points in the face area in the RGB three channels to obtain a gray level mean value;
2-3, acquiring an optical flow vector signal of each pixel point in the face area according to an optical flow method, and storing the optical flow vector signal;
2-4, forming the gray average values of all the frame images into a three-channel gray average value sequence signal, and performing L2 detrending and filtering on the signal;
2-5, extracting a heartbeat signal H according to the three-channel gray level mean value sequence signal processed by the step 2-4:
H=R-2G+B
in the formula, R is a red channel signal, G is a green channel signal, and B is a blue channel signal.
3. Aiming at the heartbeat signals obtained by the radar sensor and the video sensor, heartbeat optimization processing based on a light intensity method is carried out so as to solve the influence of a low-light environment on the heartbeat detection of the video sensor; the method specifically comprises the following steps:
3-1, performing windowing processing on the heartbeat signals of the video and the radar, wherein the window is 60s for a long time;
3-2, calculating the light intensity average value of all image frames in the window length range, and taking the average value as the light intensity value of the window, wherein the calculation formula is as follows:
I=0.299×R+0.587×G+0.114×B
wherein, R is a red channel, G is a green channel, and B is a blue channel.
3-3, acquiring heart rates corresponding to the heartbeat signals respectively measured by the radar sensor and the video sensor in each window;
3-4, calculating the accuracy of comparing the heartbeat signals measured by the radar sensor and the video sensor in each window with the reference heart rate respectively, and obtaining a graph showing the change of the heart rate detection accuracy of the radar sensor and the video sensor along with the light intensity as shown in fig. 8, wherein the change of the accuracy of the video heart rate detection along with the light intensity can be seen;
3-5, acquiring a label value corresponding to each light intensity value in the step 3-2 according to the accuracy: if the accuracy of the heart rate of the heartbeat signal measured by the video sensor is higher than that of the heart rate of the heartbeat signal measured by the radar sensor, the label value corresponding to the light intensity value is 1, otherwise, the label value is 0;
3-6, dividing all samples of 7 volunteers into 10 parts, taking 1 part of the samples as a test sample each time, and taking the other 9 parts as training samples;
3-7, taking each light intensity value in the step 3-2 as a feature, inputting the feature and a label corresponding to the feature into a decision tree classifier for training, obtaining node thresholds of the decision tree in different light intensity ranges, and obtaining a decision tree model;
3-8, testing the test sample according to the decision tree model obtained in 3-7, and comparing the test sample with the threshold value of each node of the decision tree to obtain a final judgment result;
3-9, repeating 3-7 and steps 3-8 until 10 times of finishing, and obtaining a judgment model as shown in FIG. 9;
in order to verify the reliability of the model, the universality of the experimental data detection model of 4 volunteers is obtained in the embodiment, fig. 10 is a comparison of classification accuracy before and after the heartbeat optimization algorithm, and it can be seen from the graph that the accuracy is improved from 54.8% to 66%, which is greatly improved, and the optimization effect is good.
4. Based on the light stream vector signal that above-mentioned video sensor obtained, carry out body motion optimal treatment to the respiratory signal that radar sensor obtained to solve the influence of body motion to radar sensor, specifically do:
4-1, face region obtained according to 2-3The optical flow vector signal of each pixel point in the domain is an optical flow vector diagram in three states as shown in fig. 11, and the diagram (a), the diagram (b) and the diagram (c) are in a static state, a translation state and a radial motion state in sequence; acquiring optical flow vector characteristic parameters for detecting radial motion: the method comprises the following steps: first characteristic parameter A for distinguishing between stationary and moving statesi(the magnitude of the optical flow vector in the motion state is larger than that in the static state), which is the average value of the magnitudes of the optical flow vectors in the ith frame; second characteristic parameter B for distinguishing radial motion from translational motioni(the variance of the radial motion is greater than the translational motion), which is the variance of the ith frame optical flow vector direction;
4-2, forming two groups of feature vectors according to the feature parameters of the 4-1, N frames of images:
A=[A1,A2,A3,...AN]
B=[B1,B2,B3,...BN]
carrying out sliding window processing on the A and the B;
4-3, carrying out normalization processing on the two groups of feature vectors by using a z-scores method to obtain corresponding threshold values T1 and T2 which are used as dual decision thresholds of radial motion;
4-4, carrying out double judgment, wherein the judgment of the first layer is as follows: if the first characteristic parameter value of a certain frame image is greater than a first judgment threshold T1, judging the frame as moving, otherwise, judging the frame as static; the variable P1 is introduced here as a measure of the accuracy of motion detection, the P1 value is the ratio of the time of motion detected by the optical flow method to the actual motion duration, and fig. 12 is a graph of the accuracy P1 against the threshold T1; the P1 value is greater than 1, i.e. the motion time detected by the algorithm is greater than the actual motion time, because the algorithm detects the non-body motion segment as the body motion segment, which causes misjudgment, and in this case, the algorithm accuracy is reduced, so the P value is considered to be lower than 1. It can be observed that when the threshold T1 is 0.02, the P1 value is the highest, the measurement accuracy is the highest, 99.13%, and therefore T1 is set to 0.02;
and second-layer judgment: for a certain frame of image which is judged to move, if the second characteristic parameter value of the certain frame of image is larger than a second judgment threshold T2, the certain frame is judged to have radial movement, otherwise, the certain frame of image is translational movement; introducing a variable P2, wherein P2 is the ratio of the radial motion time detected by an optical flow method to the actual radial motion time, and FIG. 13 is a graph of the change of the accuracy P2 along with a threshold T2; the P2 value greater than 1 indicates that the detection algorithm detects the translational motion segment as radial motion, which affects the accuracy of radial motion detection. As can be seen from observation, when the selection range of the threshold T2 is-1 to 0.2, the detection accuracy of the radial motion is high and is 97%. Here threshold T2 is chosen to be 0;
4-5, recording the starting time and the ending time of the radial motion of the video band, positioning the body motion segment of the radar waveform of the corresponding time band and deleting the body motion segment in the radar sensor, and referring to fig. 14, which is a comparison graph of the respiratory signal measured by the radar and the respiratory signal frequency domain information measured by the respiratory band before and after the elimination of the body motion influence. Fig. 15 is a comparison graph of classification results of the radar sensors before and after body motion elimination, and it can be seen that the accuracy is improved from 73.7% to 78.6%, and the optimization effect is good.
5. Respectively extracting respiratory signal and heartbeat signal characteristics;
6. the heartbeat feature and the respiratory feature extracted in the step 5 are subjected to feature selection by using a feature elimination method, and 6 features of a normalization sequence, namely a first-order difference absolute value mean value, a second-order difference absolute value mean value, a normalization second-order difference absolute value mean value and the like, are eliminated from the respiratory signal feature in the embodiment; the heartbeat signal features delete 19 features such as standard deviation, normalized first-order difference absolute value mean, kurtosis, skewness and the like; the screened features and the corresponding emotion labels are input into a bundled Trees classifier for classification, and a comparison graph of the final result and the classification result of a single sensor is shown in fig. 16, so that the accuracy of the method is obviously higher than that of the single sensor.
The invention collects physiological signals through a video sensor and a radar sensor, optimizes heartbeat signals of the video and body movement signals of the radar through a complementary optimization algorithm, extracts characteristics of the optimized signals, inputs the signals into a classifier to obtain an emotion classification judgment model, and utilizes the model to recognize emotion. In conclusion, the non-contact measurement method provided by the invention has the advantages that the discomfort cannot be caused, the measurement error can be reduced, the identification accuracy is high, the robustness is good, and the applicability is wider.