Background
It is extremely difficult for human observers to monitor a large number of individuals for their behavior and activity from large camera topologies. The areas with abnormal behaviors are usually highly crowded urban areas, and the intelligent behavior detection technology for dense crowds extracts useful behavior pattern information, which is important for public safety, crowd management and timely key decision and support.
Existing research has focused primarily on sparse, mostly stage-like scenes with relatively little effort for reliably classifying and understanding human activities in real and very crowded scenes. In a full-size scenario, where the targets (people and objects of interest) are typically very small, research objects are primarily focused on target recognition tasks to detect or track individual people or vehicles, and are challenging to analyze population behavior, such as characterizing behavioral interactions of an aggregated population.
Currently, researchers have proposed two methods to analyze behavior in such complex scenarios. The first is to consider crowd and scene objects as a whole, where crowd objects, locations, scenes, are not individually identified or classified as individual objects such as their behavior or interaction, but rather are processed according to their overall appearance, and understanding crowd behavior without knowledge of individual behavior is often beneficial and simple. The second object-based approach, where each individual (person and/or object) is detected and segmented to perform motion and/or behavior analysis, is a very challenging task to perform complex segmentation and tracking of individuals in crowd-sourced videos.
Disclosure of Invention
The present invention is intended to solve the above technical problems to some extent.
In view of the above, the invention provides a method for identifying behaviors of video monitoring crowd based on deep residual error neural network convolution, which effectively extracts useful information of crowd behavior categories and can provide key decisions and support for public safety and crowd management in time.
In order to solve the technical problem, the invention provides a video monitoring crowd behavior identification method based on deep residual error neural network convolution, which is characterized by comprising the following steps of:
s1: the method comprises the following steps that a plurality of cameras collect crowd behavior videos and then extract frame images, and the extracted images are subjected to amplification and standard adjustment to form a data source;
s2: inputting a data source into a residual error neural network for image processing, extracting overall crowd behavior characteristics, and dividing the overall crowd behavior characteristics into different types of sub-crowd behavior characteristics based on the overall crowd behavior characteristics;
s3: creating subclasses in the behavior characteristics of each sub-population by using PCA, calculating a dispersion matrix in the subclasses to extract key subclasses, learning variances formed in the key subclasses, and optimizing the variances by using a Fisher criterion;
s4: regularization processing is carried out on the key sub-crowd behavior characteristics so as to model the optimized variance and obtain a modeled model characteristic value;
s5: judging the abnormal conditions of all subclasses based on the obtained characteristic value of each subclass, and outputting a safety index by combining the crowd behaviors of the subclasses;
s6: sending the safety index to a crowd behavior detection module, finding out a corresponding frame image extracted by the detection module for the crowd subclass with lower safety index, and detecting whether the video frame image has abnormal behavior by a background difference method and an area optical flow method;
s7: and finding a similarity measure between frame images of two different behavior groups by using dynamic time warping and a first neighbor classifier on a cosine distance measure so as to track the dynamic video of the abnormal crowd behavior.
Further, in step S4, the modeling the optimized variance, and obtaining the modeled eigenvalue includes the following steps:
s41: within subclass scatter matrix S for computation and description
wsCharacteristic value of
The associated feature vector, using the functional form 1/f, estimates the feature spectrum as:
1≤k≤r
ωswherein α and β are two constants to obtain an eigenspectrum of the initial part that models real numbers;
s42: by making
And
to determine α and β, to obtain
Model feature values for generating a close match to the true feature values are obtained.
Further, in step S3, the calculation formula of the scattering matrix is:
where E is the number of behavioral characteristics of the sub-population, where H
iIndicates the sub-number of the ith class, G
ijDenotes the number of samples, x, of the jth sub-class of the ith class
ijkIs the k-th image vector in the j-th sub-class of the i-th class,
is the sample mean, p, of the jth sub-class of the ith class
iAnd
is the estimated prior probability; fisher objective function of
Is the overall sub-class scatter matrix.
Further, in step S2, the residual neural network inputs the data source and is then fine-tuned and trained by the behavior characteristics of the crowd database.
Further, in the step S1, the data of the data augmentation includes gaussian noise, flipping and brightness.
Further, in step S1, the image normalization includes adjusting the image to a uniform pixel value, and normalizing each channel to zero mean and unit standard deviation.
The invention has the technical effects that: (1) useful crowd behavior pattern information is effectively extracted, and key decisions and support can be provided for public safety and crowd management in time.
(2) The scene type is not required, but the flow characteristic is required; the characteristic sequence for expressing the crowd behavior information is obtained by fully utilizing the bottom layer local motion characteristic information and combining the bottom layer local motion characteristic information with the high-level information of the space-time cube, and through a training model, uniform intensive crowd behaviors can be recognized, and abnormal behaviors can be detected. In some cases, detection and tracking of targets in high-density crowd scenarios lacks reliability. The method avoids the detection and tracking of the target, so that the method has good performance in various high-density population identification and is suitable for large and small populations.
Detailed Description
The present invention is further described with reference to the following drawings and specific examples so that those skilled in the art can better understand the present invention and can practice the present invention, but the examples are not intended to limit the present invention.
As shown in fig. 1, a method for identifying behaviors of a video monitoring crowd based on convolution of a depth residual neural network includes the following steps: s1: the method comprises the following steps that a plurality of cameras collect crowd behavior videos and then extract frame images, and the extracted images are subjected to amplification and standard adjustment to form a data source; s2: inputting a data source into a residual error neural network for image processing, extracting overall crowd behavior characteristics, and dividing the overall crowd behavior characteristics into different types of sub-crowd behavior characteristics based on the overall crowd behavior characteristics; s3: creating subclasses in the behavior characteristics of each sub-population by using PCA, calculating a dispersion matrix in the subclasses to extract key subclasses, learning variances formed in the key subclasses, and optimizing the variances by using a Fisher criterion; s4: regularization processing is carried out on the key sub-crowd behavior characteristics so as to model the optimized variance and obtain a modeled model characteristic value; s5: judging the abnormal conditions of all subclasses based on the obtained characteristic value of each subclass, and outputting a safety index by combining the crowd behaviors of the subclasses; s6: sending the safety index to a crowd behavior detection module, finding out a corresponding frame image extracted by the detection module for the crowd subclass with lower safety index, and detecting whether the video frame image has abnormal behavior by a background difference method and an area optical flow method; s7: and finding a similarity measure between frame images of two different behavior groups by using dynamic time warping and a first neighbor classifier on a cosine distance measure so as to track the dynamic video of the abnormal crowd behavior.
According to the specific embodiment of the invention, the method for identifying the behaviors of the video monitoring crowd based on the depth residual error neural network convolution adopts monitoring videos provided by a certain downtown business area as training data, two monitoring videos of a daytime scene and a nighttime scene of 6 provided camera point positions are extracted as frame images, the time of each video is 10 minutes, one frame image is captured at intervals of 1s as a sample, fuzzy samples are eliminated through screening, 4000 pictures containing crowd aggregation are reserved, and the extracted images are subjected to amplification and standardized adjustment to form a data source;
inputting a data source into a residual error neural network for image processing, screening out overall crowd behavior characteristics with human body behaviors, and dividing the overall crowd behavior characteristics into different classes of sub-crowd behavior characteristics based on the overall crowd behavior characteristics, wherein the different classes of behaviors comprise framed people, queuing people, dining people and the like;
creating subclasses in a feature map of the behavior features of each sub-population by using PCA, calculating a dispersion matrix in the subclasses to extract key subclasses, learning variances formed in the key subclasses, and optimizing the variances by using a Fisher criterion; wherein, the calculation formula of the dispersion matrix is as follows:
where E is the number of behavioral characteristics of the sub-population, where H
iIndicates the sub-number of the ith class, G
ijDenotes the number of samples, x, of the jth sub-class of the ith class
ijkIs the k-th image vector in the j-th sub-class of the i-th class,
is the sample mean, p, of the jth sub-class of the ith class
iAnd
is the estimated prior probability; fisher objective function of
Is a general subclass scatter matrix;
regularization processing is carried out on the key sub-crowd behavior characteristics so as to model the optimized variance and obtain a modeled model characteristic value;
judging the abnormal conditions of all subclasses based on the obtained characteristic value of each subclass, and outputting a safety index by combining the crowd behaviors of the subclasses;
the safety index is sent to a crowd behavior detection module, the detection module finds out corresponding frame images extracted by the detection module for crowd subclasses with lower safety indexes, detects whether the video frame images have abnormal behaviors or not by a background difference method and an area optical flow method, judges whether the pictures have violent and terrorist behaviors such as crowding, treading, fighting and the like or not by adopting a maximum speed value, has better discrimination on the normal behaviors and the abnormal behaviors, and can carry out real-time monitoring;
the similarity measurement between frame images of two different behavior groups is found by using dynamic time warping and a first neighbor classifier on the cosine distance measurement, time conversion allowing matching of two behaviors is calculated to perform time alignment and normalization, and the same behavior cannot be completely reproduced by the same crowd is tracked.
As shown in fig. 1, the step S4 of modeling the optimized variance and obtaining the modeled eigenvalue includes the following steps:
s41: within subclass scatter matrix S for computation and description
wsCharacteristic value of
The associated feature vector, using the functional form 1/f, estimates the feature spectrum as:
1≤k≤r
ωswherein α and β are two constants to obtain an eigenspectrum of the initial part that models real numbers;
s42: by making
And
to determine α and β, to obtain
Model feature values for generating a close match to the true feature values are obtained.
As shown in fig. 1, in step S2, the residual neural network input data source is trimmed and trained by the behavior characteristics of the crowd database.
As shown in fig. 1, in step S1, the data to be augmented includes gaussian noise, flipping and brightness.
As shown in fig. 1, in step S1, image normalization includes adjusting the image to uniform pixel values, normalizing each channel to zero mean and unit standard deviation.
According to a specific embodiment of the present invention, all images are adjusted to 224 × 224, and each channel is normalized to zero mean and unit standard deviation.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.