Movatterモバイル変換


[0]ホーム

URL:


CN114372556B - A driving hazard scene recognition method based on lightweight multimodal neural network - Google Patents

A driving hazard scene recognition method based on lightweight multimodal neural network

Info

Publication number
CN114372556B
CN114372556BCN202111551051.XACN202111551051ACN114372556BCN 114372556 BCN114372556 BCN 114372556BCN 202111551051 ACN202111551051 ACN 202111551051ACN 114372556 BCN114372556 BCN 114372556B
Authority
CN
China
Prior art keywords
driving
data
video
vehicle
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111551051.XA
Other languages
Chinese (zh)
Other versions
CN114372556A (en
Inventor
高珍
许靖宁
余荣杰
范鸿飞
孙萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji UniversityfiledCriticalTongji University
Priority to CN202111551051.XApriorityCriticalpatent/CN114372556B/en
Publication of CN114372556ApublicationCriticalpatent/CN114372556A/en
Application grantedgrantedCritical
Publication of CN114372556BpublicationCriticalpatent/CN114372556B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明涉及一种基于轻量级多模态神经网络的行车危险场景辨识方法,包括以下:获取当前时间段内驾驶视频和车载数据;将驾驶视频的画面划分为上下分布的三个驾驶区域,将视频每一帧画面的每个驾驶区域内的图像在竖直方向上做均值化处理,转化为一行像素,然后将每帧对应的一行像素按时间顺序拼接在一起,形成每个驾驶区域的运动轮廓图;将每个驾驶区域的运动轮廓图和车载数据输入至行车风险评估模型得到辨识结果;所述行车风险评估模型为包括视觉数据处理层、运动学数据处理层、数据融合层和预测层的多模态神经网络。与现有技术相比,本发明具有减小运行数据量、简化模型计算过程,耗时低、准确率高等优点。

The present invention relates to a method for identifying dangerous driving scenarios based on a lightweight multimodal neural network. The method comprises the following steps: obtaining driving video and vehicle data within a current time period; dividing the driving video into three vertically distributed driving zones; vertically averaging the image within each driving zone in each frame of the video to convert it into a row of pixels; then chronologically concatenating the corresponding row of pixels for each frame to form a motion profile for each driving zone; and inputting the motion profile for each driving zone and the vehicle data into a driving risk assessment model to obtain an identification result. The driving risk assessment model is a multimodal neural network comprising a visual data processing layer, a kinematic data processing layer, a data fusion layer, and a prediction layer. Compared with existing technologies, the present invention has the advantages of reducing the amount of running data, simplifying the model calculation process, reducing time consumption, and achieving high accuracy.

Description

Driving dangerous scene identification method based on lightweight multi-mode neural network
Technical Field
The invention relates to the field of automatic driving algorithms, in particular to a driving dangerous scene identification method based on a lightweight multi-mode neural network.
Background
Currently, autopilot cars are being tested on a global scale, with safety testing being a primary concern. Unlike conventional vehicles that typically employ distance-based methods for testing, automated vehicles primarily employ scene-based testing methods. Thus, the construction of virtual driving scenes is currently a key research problem. Among other things, dangerous driving scenarios are generally considered more important than normal scenarios, as the former can be used to more quickly identify potential safety issues, thereby improving the efficiency of the testing effort.
In order to identify dangerous driving scenarios, conventional methods rely primarily on structured data, including radar-acquired kinematic data speed, acceleration, and the like. For example, classical machine learning classifiers, including kNN, random forest, SVM, decision tree, gaussian neighborhood and AdaBoost, are used in "CRASH AND NEAR-crash prediction from VEHICLE KINEMATICS DATA: a SHRP2 naturalistic driving study" based on kinematic data. However, one major problem with these conventional methods is the high false positive rate due to poor quality of the extracted structured data and incomplete perception of the driving environment.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a driving dangerous scene identification method based on a lightweight multi-mode neural network, which is used for testing an automatic driving algorithm and improves the accuracy of the automatic driving test.
The aim of the invention can be achieved by the following technical scheme:
a driving dangerous scene identification method based on a lightweight multi-mode neural network is characterized by comprising the following steps:
S1, acquiring driving video and vehicle-mounted data in a current time period;
s2, dividing a picture of a driving video into three driving areas which are distributed up and down, carrying out averaging treatment on images in each driving area of each frame of picture of the video in the vertical direction, converting the images into one row of pixels, and then splicing one row of pixels corresponding to each frame together in time sequence to form a motion profile diagram of each driving area;
s3, inputting the motion profile graph and the vehicle-mounted data of each driving area into a driving risk assessment model to obtain an identification result;
The driving risk assessment model is a multi-mode neural network comprising a visual data processing layer, a kinematic data processing layer, a data fusion layer and a prediction layer, wherein the visual data processing layer is a lightweight CNN network, a AlexNet network structure is adopted, an attention mechanism is introduced for improvement, the visual data processing layer is used for inputting a motion profile graph into the lightweight CNN network and outputting to obtain visual characteristics, the kinematic data processing layer is an LSTM network and is used for inputting vehicle-mounted data into the LSTM network and outputting to obtain kinematic characteristics, and the data fusion layer is a full-connection layer and is used for inputting the visual characteristics and the kinematic characteristics and outputting to obtain identification results.
Further, the step S2 specifically includes:
s21, dividing three driving areas from the original video according to the distance between the driving video and the vehicle according to the camera position, wherein each area is divided by an upper boundary and a lower boundary;
S22, based on the driving video clip in the current time period [ ta,tb ], sampling each driving area obtained in the step S21, and obtaining RGB pixel values in a longitudinal [ yl,yu ] and transverse [0,w ] rectangular range in each frame of picture, wherein w is the video width, yl is the sampling lower boundary, and yu is the sampling upper boundary;
S23, respectively carrying out the following operations on R, G, B channels of the image in the rectangular range, namely taking a pixel mean value in the vertical direction, compressing a matrix of (w multiplied by (yu-yl)) into a matrix of (w multiplied by 1), and then superposing the results of the three channels to obtain a row of (w multiplied by 3) pixel matrix corresponding to each frame;
S24, a row of pixel matrixes obtained by each frame are spliced together in time sequence to form (fps× (tb-ta), w,3 matrixes, and a colorful motion profile is generated according to the pixel matrixes, wherein fps is the number of frames per second of video.
Further, in step S3, the lightweight CNN network introduces an attention mechanism module after each layer of convolution layer, performs channel attention and spatial attention transformation on the feature map, and reconstructs a new feature map, where the calculation formulas of the channel attention and the spatial attention are as follows:
Attentionc=σ(MLP(AvgPool(F))+MLP(MaxPool(F))
Attentions=σ(Conv([AvgPool(F),MaxPool(F)]))
Wherein, attentionc,Attentions represents the result of channel Attention and space Attention respectively, F represents the feature map of the output of a certain layer of convolution layer, sigma represents Sigmoid function, MLP represents a multi-layer perceptron network, conv represents a convolution layer with output channel number of 1.
Further, the output training set of the driving risk assessment model comprises a normal event set and a high risk event set, and the acquisition method comprises the following steps:
A1, collecting historical vehicle-mounted data;
A2, detecting and filtering abnormal values of historical vehicle-mounted data by using a normal distributed 3 sigma principle, and taking the abnormal values as missing values;
a3, filling missing values in the historical vehicle-mounted data by adopting a linear interpolation method to obtain complete vehicle-mounted data;
A4, providing vehicle acceleration data a in the complete vehicle-mounted data, drawing and observing a distribution curve, determining an acceleration threshold of obvious deceleration behavior, and recording as THd;
A5, traversing all vehicle acceleration data according to time sequence, collecting emergency braking time td according to acceleration conditions a which are less than or equal to THd, for each time td, taking time segments from d1 before to d2 seconds after to form a potential high risk event segment ec, combining video verification, eliminating false alarms caused by data collection errors, and forming a plurality of high risk event segments into a high risk event set;
a6, randomly sampling a plurality of normal non-conflict events from the vehicle acceleration data left in the step A5 by taking |d1+d2 | as a time window to serve as a normal event set.
Further, in step A2, each non-airborne dynamic characteristic variable of a piece of historical vehicle-mounted data is subjected to condition judgment, and the abnormal value is met, wherein the expression of the condition judgment is as follows:
|x-μ|>3σ
where x is the non-airborne kinematic characteristic variable, μ is the mean value of x and σ is the standard deviation of x.
Further, in the step A3, the linear interpolation method has a calculation expression as follows:
Wherein, theIs a missing value, di-1 is the last non-empty nearest neighbor of the missing value, di+1 is the next non-empty nearest neighbor of the missing value, n is the total number of records, ti-1,ti,ti+1 is di-1,Di+1 corresponds to the time instant.
Further, the output training set of the driving risk assessment model comprises a CNN network training set, and the acquisition method comprises the following steps:
acquiring a historical driving video;
Dividing the picture of the historical driving video into three driving areas which are distributed up and down, carrying out averaging treatment on the image in each driving area of each frame of picture of the video in the vertical direction, converting the image into one row of pixels, and then splicing the corresponding one row of pixels of each frame together according to time sequence to form a motion profile graph, wherein all motion profiles form a CNN network training set.
Further, the CNN training set is expanded through data enhancement processing, wherein the data enhancement processing comprises the steps of randomly transforming brightness, contrast, saturation and hue, and turning over the motion profile in the horizontal direction with a certain probability.
Compared with the prior art, the invention has the following beneficial effects:
The invention firstly carries out region division on a driving video picture and respectively generates a motion profile, and carries out data compression under the condition of retaining image characteristics, secondly, designs a multi-mode neural network with a visual data processing layer, a kinematic data processing layer, a data fusion layer and a prediction layer as a driving risk assessment model, adopts a lightweight CNN network to simplify the operation amount and introduces a attention mechanism to improve the classification performance of the model, and meanwhile, the LSTM network is used for increasing the extraction of the kinematic characteristics in the identification process in the driving risk assessment model, thereby effectively improving the accuracy of model prediction. In summary, the method can effectively extract video data, reduce the operation data amount, simplify the model calculation process, and has low time consumption, high accuracy and good practical application value.
Drawings
Fig. 1 is a schematic structural view of the present invention.
Fig. 2 is a schematic representation of a section of driving video of the present invention.
Fig. 3 is a schematic diagram of the motion profile generation of a driving area according to the present invention.
Fig. 4 is a motion profile data enhancement effect diagram of the present invention.
Fig. 5 is a schematic representation of ROC curve comparison of the model of the present invention.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.
As shown in fig. 1, the present embodiment provides a driving hazard scene identification method based on a lightweight multi-modal neural network, which includes the following steps:
and S1, acquiring driving video and vehicle-mounted data in the current time period.
And S2, dividing the picture of the driving video into three driving areas which are distributed up and down, carrying out averaging treatment on the image in each driving area of each frame of picture of the video in the vertical direction, converting the image into one row of pixels, and splicing the corresponding one row of pixels of each frame together according to time sequence to form a motion profile diagram of each driving area.
And S3, inputting the motion profile graph and the vehicle-mounted data of each driving area into a driving risk assessment model to obtain an identification result. The driving risk assessment model is a multi-modal neural network comprising a visual data processing layer, a kinematic data processing layer, a data fusion layer and a prediction layer:
The visual data processing layer is a lightweight CNN network, performs network structure lightweight on the basis of AlexNet, introduces an attention mechanism to improve, and is used for outputting visual characteristics after the motion profile is input into the lightweight CNN network;
the kinematic data processing layer is an LSTM network and is used for outputting the vehicle-mounted data after being input into the LSTM network to obtain kinematic characteristics;
the data fusion layer is a full-connection layer and is used for outputting an identification result after inputting visual features and kinematic features.
The steps can be specifically described by adopting the following six parts:
1. Motion profile generation algorithm:
1) For each section of forward driving video, as shown in fig. 2, three driving areas are divided from the original video according to the distance from the vehicle according to the camera position, and each area is divided by an upper boundary and a lower boundary, as shown in the upper half of fig. 3.
2) Sampling each region obtained in the step 1.1 based on the driving video segment in the time period [ ta,tb ], setting fps as the number of frames per second of video, w as the video width, yl as the sampling lower boundary and yu as the sampling upper boundary. The samples of each driving area are processed to finally obtain a motion profile with the length of fps× (tb-ta) and the width of w. The specific development is as follows:
I) Acquiring RGB pixel values in a longitudinal [ yl,yu ] and transverse [0,w ] rectangular range in each frame of picture, namely (yu-yl, w, 3) three-dimensional integer matrix (wherein '3' represents RGB three channels);
II) for each channel in RGB in the range, taking the average value of longitudinal pixels as the pixel value of a point, namely taking the average value of the first dimension of a (yu-yl, w, 3) three-dimensional integer matrix, and arranging the average value into a row of pixels of 1 Xw, namely a (1, w, 3) matrix;
III) a row of pixel matrixes obtained by each frame are spliced together in time sequence to form a (fps× (tb-ta), w, 3) matrix, and a colorful motion profile is generated according to the pixel matrixes, as shown in the lower half of figure 3, and is used for converting a middle-distance driving area into a motion profile.
2. Constructing a light-weight CNN network that draws attention to mechanisms
The lightweight CNN network is based on AlexNet network structure, and a lightweight convolutional neural network comprising a convolutional layer for extracting local visual data features and a full-connection layer for global feature processing is constructed, and meanwhile, an attention mechanism improvement model is introduced, so that the model is focused on key position information of a motion profile. When the motion profile is input into the lightweight CNN network, firstly, three motion profiles of three driving areas are converted into matrixes mnear、mmid and mfar respectively, and the three matrixes are combined on an image channel domain to obtain a nine-channel matrix m1 for input.
The detailed construction flow of the lightweight CNN network is as follows:
1) First, an input layer, such as a 224 pixel by 224 pixel map, is constructed that is converted into a matrix of (224,224,9);
2) Setting parameters of a convolution layer by passing m1 through a Conv1 layer, wherein the parameters mainly comprise the number, the size, the step length, the activation function and the like of filters, for example, using 16 filters of 11 multiplied by 11 to carry out convolution with the step length of 4, and obtaining a matrix m2 by passing a ReLU activation function;
3) Setting parameters of a pooling layer by passing m2 through a Pool1 layer, wherein the parameters mainly comprise the size, the type, the step length and the like of a filter, for example, using a 3×3 filter to carry out maximum pooling with the step length of 2, so as to obtain a matrix m3;
4) Similarly, m3 passes through Conv2 layer, sets parameters of convolution layer, for example, convolves with step length 1 and packing as 2 by using 32 filters of 5×5, and obtains matrix m4 by ReLU activation function;
5) Similarly, m4 is passed through Pool2 layer, and pooling layer parameters are set, such as maximum pooling is performed with 3×3 filter with step size 2 to obtain matrix m5;
6) Similarly, m5 is passed through Conv3 layer, convolutional layer parameters are set, such as convolution is carried out with 32 filters of 3×3 and step length of 3 and padding is 1, and matrix m6 is obtained through ReLU activation function;
7) Similarly, m6 is passed through Pool3 layer, and pooling layer parameters are set, such as maximum pooling is performed with 3×3 filter with step size 2 to obtain matrix m7;
8) Passing m6 through AdaptiveAvgPool layers, setting parameters such as setting the output size to be 3x3 to obtain a matrix m8;
9) Smoothing m8 to a one-dimensional matrix m9 through the full connection layer FC4 to output a one-dimensional matrix m10 of r×1 (e.g., 128×1);
10 Passing m10 through Drop4 layer, discarding part of the neural nodes with a certain proportion of Drop probability (such as 50%) to prevent overfitting, and obtaining matrix m11;
11 M11 is passed through the FC5 full connection layer to output a one-dimensional matrix m12 of r×1 (such as 32×1);
12 Passing m12 through Drop5 layer, discarding part of the neural nodes with a certain proportion of Drop probability (such as 50%) to prevent overfitting, and obtaining matrix m13;
13 M11 is output as a 2 x1 matrix through the FC6 full connection layer, two values in the matrix correspond to predicted values of probabilities belonging to risky and risky categories, then the predicted values are processed by Softmax to make the sum of the probabilities of the two categories 1, and the total network structure is demonstrated as follows:
TABLE 1 Multi-modal network Structure Table
Layer(s)Input deviceOutput of
Conv1224×224×955×55×16
Pool155×55×1627×27×16
Conv227×27×1627×27×32
Pool227×27×3213×13×32
Conv313×13×3213×13×32
Pool313×13×326×6×32
AdaptiveAvgPool6×6×323×3×32
FC43×3×32128×1
Drop4128×1128×1
FC5128×132×1
Drop532×132×1
FC632x12×1
Finally, introducing an attention mechanism module after each convolution layer, respectively carrying out channel attention and space attention transformation on the feature map, and reconstructing to obtain a new feature map, wherein the calculation formulas of the channel attention and the space attention are respectively as follows:
Attentionc=σ(MLP(AvgPool(F))+MLP(MaxPool(F))
Attentions=σ(Conv([AvgPool(F),MaxPool(F)]))
Wherein, attentionc,Attentions represents the result of channel Attention and space Attention respectively, F represents the feature map of the output of a certain layer of convolution layer, sigma represents Sigmoid function, MLP represents a multi-layer perceptron network, conv represents a convolution layer with output channel number of 1.
3. Data enhancement of motion profile
The training set of the driving risk assessment model comprises a CNN network training set for an input side, the acquisition method of the training set is basically the same as the generation algorithm of the motion profile of the first part, and the training set comprises the following steps:
dividing the picture of the historical driving video into three driving areas which are distributed up and down, carrying out averaging treatment on the image in each driving area of each frame of picture of the video in the vertical direction, converting the image into a row of pixels, and then splicing the corresponding row of pixels of each frame together according to time sequence to form a motion profile graph, wherein all motion profiles form a CNN network training set.
In order to improve the generalization capability of the model, the CNN network training set is further expanded through data enhancement processing. The enhancement process includes a random transformation of brightness, contrast, saturation, and hue, with a probability of flipping the motion profile horizontally. The effect diagram after the data enhancement processing is shown in fig. 4.
Luminance transformation, namely randomly changing the luminance of the motion profile, setting the original picture as im1 and the luminance transformation factor as factorb, and obtaining a transformed image im2 as follows:
im2=factorb×im1
Saturation transformation, namely randomly changing the saturation of a motion profile, wherein the saturation transformation factor is factors, the gray image corresponding to im2 is gray2, the pixel mean value of the gray image is mean, and the transformed image im3 is:
im3=factorc×im2+(1-factors)×gray2
Contrast conversion, namely randomly changing the contrast of the motion profile map, converting im3 into a corresponding gray map by a contrast conversion factor of factorc, and calculating the pixel mean value of the gray map to mean, wherein the converted image im4 is:
im4=factorc×im3+(1-factorc)×mean。
Hue conversion, namely randomly changing the hue of the motion profile, converting im4 into HSV format to obtain hue H, randomly converting hue, and converting a new HSV image back into the original format to obtain im5:
Hnew=Horigin+factorh×255。
And (3) turning over and transforming, namely horizontally turning over im5 with a certain probability to obtain im6.
4. Acceleration threshold based auxiliary video verification and calibration dangerous scene
The training set of the driving risk assessment model comprises a normal event set and a high risk event set which are used for an output side, the acquisition method comprises the steps of detecting and filtering abnormal values by using a normal distributed 3 sigma principle on historical vehicle-mounted data collected by a radar, filling missing values by using a linear interpolation method, acquiring acceleration distribution on the basis of filled historical vehicle-mounted data, determining an acceleration dividing threshold value of a dangerous driving event so as to judge obvious vehicle avoidance behaviors, judging and extracting potential dangerous driving events on the basis of the threshold value, and calibrating the normal event set and the high risk event set on the potential dangerous driving events on the basis of video verification. The detailed flow is as follows:
1) Collecting historical vehicle-mounted data;
2) The vehicle kinematic characteristic variables in the vehicle-mounted data mostly accord with normal distribution, the 3 sigma principle is used for carrying out outlier filtering, namely, each non-empty kinematic characteristic variable of one driving record is subjected to condition judgment, the abnormal value is met, and the abnormal value is processed according to the missing value:
|x-μ|>3σ
where x is the kinematic parameter, μ is the mean value of x, σ is the standard deviation of x.
3) Because the driving environment is complex, and the interference sources are numerous, the signal intensity of the detection equipment can be influenced, and the driving data always have missing values, so that the missing values need to be filled. Filling the missing value by adopting a linear interpolation method, wherein the calculation formula is as follows:
Wherein, theIs a missing value, di-1 is the last non-empty nearest neighbor of the missing value, di+1 is the next non-empty nearest neighbor of the missing value, n is the total number of records, ti-1,ti,ti+1 is di-1,Di+1 corresponds to the time instant.
4) Extracting vehicle acceleration data a in natural driving data, and drawing and observing a distribution curveThe acceleration threshold for significant deceleration behavior is determined and noted as THd.
5) Traversing all vehicle acceleration data according to time sequence, collecting emergency braking time td according to acceleration conditions a≤THd, for each time td, taking time segments from front d1 to back d2 seconds to form a potential high risk event segment ec, combining video verification, eliminating false alarms caused by data collection errors, and forming nconflict_candidate high risk event segments into a high risk event setTo avoid event overlap, it is ensured that adjacent emergency braking moments meet the condition td[i+1]-td[i]≥|d1+d2.
6) Randomly sampling nnormal_candidate normal non-conflict events from the rest vehicle acceleration data by taking |d1+d2 | as a time window to serve as a normal event set
5. Integrally constructing driving risk assessment model
And the conventional LSTM is used for extracting kinematic features in the driving risk assessment model, and the kinematic features are fused with visual features extracted by the lightweight CNN network, so that the identification accuracy of the model is further improved.
1) Vehicle-mounted data from the radar are extracted at certain time intervals, the LSTM network is used for extracting kinematic features, and the output structure of the LSTM network is denoted as fkinematics.
2) And processing the potential high-risk event according to a generation algorithm of the motion profile of the first part to obtain a corresponding motion profile, inputting the profile into a lightweight CNN network to extract visual characteristics, and recording the output structure of the network as fvision.
3) Fvision is combined with fkinematic, i.e., [ fvision fkinematic ] as input to the fully connected layer, and outputs a 2 x 1 matrix with two values corresponding to the predicted values of the probabilities belonging to the risky class and the non-risky class, and then the predicted values are processed using Softmax to sum the probabilities of the two classes to 1.
4) Dividing the normal event set from the fourth partAnd a high risk event setThe training set Θtrain and the test set Θtest are divided into 3:1 respectively.
5) And training the model, wherein during training, data enhancement is carried out on the motion profile according to the third part, the loss value of the model is converged to a smaller value through nepoch epochs, training is stopped, and the final model MVK is stored.
6) Calling a trained MVK model for each event in a test set Θtest (comprising ec normal events and en high-risk events), obtaining a predicted classification value of the model, and counting to obtain the normal event predicted by the modelConflict eventGenerating a confusion matrix according to the prediction result of the test set as follows:
TABLE 2 confusion matrix
The sensitivity Isensitivity and the specificity Ispecificity of the model are calculated, and the formula is as follows:
Isensitivity=TP/(TP+FN)
Ispecificity=TN/(FP+TN)
And generating ROC curves according to Isensitivity and Ispecificity for evaluating model prediction effects.
As shown in fig. 5, in the effect graph of the running risk assessment model of the present invention, the more accurate the AUC value effect, the better the effect, and the running risk assessment model (VK-Net) of the present invention reaches 0.95, with very good accuracy and precision.
6. And distinguishing dangerous scenes based on the driving risk assessment model.
Generating a motion profile diagram from a continuous current driving video through a motion profile diagram generation algorithm, taking the motion profile diagram and a kinematic feature variable extracted from vehicle-mounted data as inputs of a driving risk assessment model, calculating to obtain a predicted value of whether the driving is at risk, and alarming if the driving is at risk.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention by one of ordinary skill in the art without undue burden. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by the person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims (7)

CN202111551051.XA2021-12-172021-12-17 A driving hazard scene recognition method based on lightweight multimodal neural networkActiveCN114372556B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111551051.XACN114372556B (en)2021-12-172021-12-17 A driving hazard scene recognition method based on lightweight multimodal neural network

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111551051.XACN114372556B (en)2021-12-172021-12-17 A driving hazard scene recognition method based on lightweight multimodal neural network

Publications (2)

Publication NumberPublication Date
CN114372556A CN114372556A (en)2022-04-19
CN114372556Btrue CN114372556B (en)2025-09-05

Family

ID=81139796

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111551051.XAActiveCN114372556B (en)2021-12-172021-12-17 A driving hazard scene recognition method based on lightweight multimodal neural network

Country Status (1)

CountryLink
CN (1)CN114372556B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117724137B (en)*2023-11-212024-08-06江苏北斗星通汽车电子有限公司 A kind of automobile accident automatic detection system and method based on multimodal sensor

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10210391B1 (en)*2017-08-072019-02-19Mitsubishi Electric Research Laboratories, Inc.Method and system for detecting actions in videos using contour sequences
CN109740419B (en)*2018-11-222021-03-02东南大学 A Video Action Recognition Method Based on Attention-LSTM Network
CN111242015B (en)*2020-01-102023-05-02同济大学 A Method of Predicting Dangerous Driving Scenes Based on Motion Contour Semantic Map
CN111325203B (en)*2020-01-212022-07-05福州大学 An American license plate recognition method and system based on image correction
CN112487996B (en)*2020-12-022023-07-28重庆邮电大学Driving behavior recognition method based on DenseNet121 network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"A Lightweight VK-Net Based on Motion Profiles for Hazardous Driving Scenario Identification";Zhen Gao;《2021 IEEE 23rd Int Conf on High Performance Computing & Communications》;20211220;第908-913页*
"Predicting Hazardous Driving Events Using Multi-Modal Deep Learning Based on Video Motion Profile and Kinematics Data";Z. Gao;《2018 21st International Conference on Intelligent Transportation Systems》;20181107;第3352-3357页*
"基于深度视觉注意神经网络的端到端自动驾驶模型";胡学敏;童秀迟;郭琳;张若晗;孔力;;《计算机应用》;20200803;第40卷(第07期);第1926-1931页*
"基于自动驾驶系统的轻量型卷积神经网络优化";高秀龙;葛动元;;《计算机系统应用》;20200315;第29卷(第03期);第93-99页*

Also Published As

Publication numberPublication date
CN114372556A (en)2022-04-19

Similar Documents

PublicationPublication DateTitle
CN111242015B (en) A Method of Predicting Dangerous Driving Scenes Based on Motion Contour Semantic Map
CN110321923B (en) Target detection method, system and medium for fusion of feature layers of different scales of receptive fields
CN108805015B (en) A Crowd Anomaly Detection Method for Weighted Convolutional Autoencoder Long Short-Term Memory Networks
Yao et al.When, where, and what? A new dataset for anomaly detection in driving videos
CN112766195B (en)Electrified railway bow net arcing visual detection method
CN113569756B (en) Abnormal behavior detection and location method, system, terminal equipment and readable storage medium
CN110222604B (en)Target identification method and device based on shared convolutional neural network
CN116342894B (en)GIS infrared feature recognition system and method based on improved YOLOv5
CN111814755A (en)Multi-frame image pedestrian detection method and device for night motion scene
CN104463241A (en)Vehicle type recognition method in intelligent transportation monitoring system
CN111160216A (en)Multi-feature multi-model living human face recognition method
CN107220603A (en)Vehicle checking method and device based on deep learning
CN106529419A (en)Automatic detection method for significant stack type polymerization object in video
CN118135800B (en)Abnormal traffic event accurate identification warning method based on deep learning
CN108345894A (en)A kind of traffic incidents detection method based on deep learning and entropy model
CN118644761B (en) A construction site safety helmet detection method, computer equipment and storage medium
CN119418032A (en) Target detection method based on infrared and visible light feature enhancement and fusion
CN116958786A (en)Dynamic visual identification method for chemical waste residues based on YOLOv5 and ResNet50 neural network
CN114372556B (en) A driving hazard scene recognition method based on lightweight multimodal neural network
Mobahi et al.An improved deep learning solution for object detection in self-driving cars
Huang et al.Self-supervised multi-granularity graph attention network for vision-based driver fatigue detection
Anwer et al.Accident vehicle types classification: a comparative study between different deep learning models
Aqeel et al.Detection of anomaly in videos using convolutional autoencoder and generative adversarial network model
Jeziorek et al.Traffic sign detection and recognition using event camera image reconstruction
CN117671584A (en)Method, device, equipment, medium and product for detecting personnel gathering area

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp