CN110210378A

Movatterモバイル変換

Info

Publication number: CN110210378A
Application number: CN201910461504.6A
Authority: CN
Inventors: 张江辉; 马敏; 田西兰; 赵洪立; 蔡红军; 王曙光; 夏勇; 夏鹏; 王斌; 刘丽莎; 吴昭; 吴颖; 李江涛; 孙龙; 吴涛; 姜欢欢; 刘海飞; 常沛; 张玉营
Original assignee: CETC 38 Research Institute
Current assignee: CETC 38 Research Institute
Priority date: 2019-05-30
Filing date: 2019-05-30
Publication date: 2019-09-06
Anticipated expiration: 2039-05-30
Also published as: CN110210378B

Abstract

The invention discloses a kind of embedded video method for analyzing image and device based on edge calculations, it is parsed applied to the camera review in video surveillance network, it and include the video camera that several connect with monitoring center in video surveillance network, which comprises identify goal-selling from the video that video camera is shot；For the goal-selling identified from video, the attributive character of goal-selling and/or the scene properties feature of the goal-selling are obtained；The attributive character of the goal-selling includes type, the position in image and quantity etc.；The scene properties feature of the goal-selling includes: one of the shooting time of original image, shooting location, shooting angle or combination；The scene properties feature of the attributive character of the goal-selling of the acquisition and the goal-selling is uploaded into monitoring center corresponding with the video camera, for constructing video big data analysis application system.It, can be with save the cost using the embodiment of the present invention.

Description

Translated fromChinese

一种基于边缘计算的嵌入式视频图像解析方法及装置An embedded video image analysis method and device based on edge computing

技术领域technical field

本发明涉及一种图像识别方法及装置，更具体涉及一种基于边缘计算的嵌入式视频图像解析方法及装置。The present invention relates to an image recognition method and device, and more particularly to an embedded video image analysis method and device based on edge computing.

背景技术Background technique

图像，尤其是视频图像中蕴藏着丰富的信息是其他信息获取手段难以比拟的，因此图像是人类最直观、最可靠的信息获取手段，历来是人们关注和赖以应用的焦点。随着社会安全技术的发展，以监控视频图像为基础的监控网络在安防、交通等领域中发挥了重要的作用。目前，在城市道路、高速公路、商场、车站等众多场所布置了大量的视频监控摄像机，形成了覆盖广泛的监控视频网络。该网络每天可以产生的大量视频图像，集聚起了规模宏大的监控视频大数据资源。然而，由于图像本身是一种非结构化信息，无法直接利用大数据技术进行挖掘处理，导致信息丰富的海量监控视频图像无法得到实时处理和有效利用。通常情况下，对监控视频图像的分析和解读还主要依赖人工进行，导致效率较低，不能满足安全监控个的时效性要求较高的场合，如：城市交通高峰期道路交通情况整体态势实时监视和调控，恶性犯罪案件中车辆、人员的快速追踪和关联侦破等。上述这个场合急需要对较大范围内的视频监控图像进行实时分析处理，形成较完整、清晰的态势，从而为交通调控、案件侦破决策提供准确有力的情报支撑。Images, especially video images, contain a wealth of information that is incomparable with other information acquisition methods. Therefore, images are the most intuitive and reliable information acquisition method for human beings, and have always been the focus of people's attention and application. With the development of social security technology, surveillance networks based on surveillance video images have played an important role in security, transportation and other fields. At present, a large number of video surveillance cameras are arranged in many places such as urban roads, highways, shopping malls, stations, etc., forming an extensive surveillance video network. The large amount of video images that the network can generate every day has gathered a large-scale surveillance video big data resource. However, since the image itself is a kind of unstructured information, it cannot be directly mined and processed by big data technology, resulting in the inability to process and effectively utilize the massive information-rich surveillance video images in real time. Under normal circumstances, the analysis and interpretation of surveillance video images mainly rely on manual work, resulting in low efficiency and cannot meet the high timeliness requirements of security monitoring, such as: real-time monitoring of the overall situation of road traffic during urban traffic rush hours and regulation, fast tracking and correlation detection of vehicles and personnel in vicious crime cases, etc. The above situation urgently needs to conduct real-time analysis and processing of large-scale video surveillance images to form a relatively complete and clear situation, so as to provide accurate and powerful intelligence support for traffic regulation and case detection and decision-making.

为了解决上述问题，常规识别方法是在后台建设“云”计算中心，利用“云”中心强大计算能力对视频图像进行解析。但是现有的摄像机每秒可以产生数十兆的视频数据，当所部署的数以千、万路的摄像机产生的原始视频图像形成的海量大数据，不仅对视频监控网络的数据传输能力形成巨大的挑战，而且“云”中心的计算能力也显得捉襟见肘，难以应付，因此该方法需要大幅度改造升级现有的视频数据传输网络，以及大幅度提高云计算中心的计算能力，进而导致成本较大。In order to solve the above problems, the conventional identification method is to build a "cloud" computing center in the background, and use the powerful computing power of the "cloud" center to analyze the video images. However, existing cameras can generate tens of megabytes of video data per second. When thousands of cameras are deployed, the original video images generated by thousands of channels of cameras generate massive amounts of big data, which not only affects the data transmission capacity of video surveillance networks. In addition, the computing power of the "cloud" center is also stretched and difficult to cope with. Therefore, this method requires a large transformation and upgrade of the existing video data transmission network and a large increase in the computing power of the cloud computing center, which in turn leads to high costs.

因此，现有技术存在传统监控视频升级费用较高的技术问题。Therefore, the prior art has the technical problem that the upgrade cost of traditional surveillance video is relatively high.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题在于提供了一种基于边缘计算的嵌入式视频图像解析方法及装置，该装置是一种边缘计算装置，用以解决现有技术中存在的传统监控视频升级费用较高的技术问题。The technical problem to be solved by the present invention is to provide an embedded video image analysis method and device based on edge computing. The device is an edge computing device, which is used to solve the problem of high upgrade cost of traditional surveillance video in the prior art. technical issues.

本发明是通过以下技术方案解决上述技术问题的：The present invention solves the above-mentioned technical problems through the following technical solutions:

本发明实施例提供了一种基于边缘计算的嵌入式视频图像解析方法，应用于视频监控网络中的摄像机，且所述视频监控网络中包含有若干个与监控中心通信连接的摄像机，所述方法包括：An embodiment of the present invention provides an embedded video image analysis method based on edge computing, which is applied to cameras in a video surveillance network, and the video surveillance network includes several cameras that are connected to a surveillance center. include:

从摄像机拍摄的视频中识别出预设目标，其中，所述预设目标包括：人物、车辆、建筑中的一种或组合；A preset target is identified from the video shot by the camera, wherein the preset target includes: one or a combination of a person, a vehicle, and a building;

针对从视频中识别出的预设目标，获取所述预设目标的属性特征和/或所述预设目标的场景属性特征，其中，所述预设目标的属性特征包括：预设目标为车辆，识别出车辆的车型、车身颜色、车牌、车辆位置等；所述预设目标为人物，识别人物的性别、年龄、衣着以及位置等；所述预设目标为建筑中的一种，识别其类型、位置等；所述预设目标的场景属性特征包括：原始图像的拍摄时间、拍摄地点、拍摄角度中的一种或组合；For the preset target identified from the video, obtain the attribute feature of the preset target and/or the scene attribute feature of the preset target, wherein the attribute feature of the preset target includes: the preset target is a vehicle , identify the vehicle type, body color, license plate, vehicle location, etc.; the preset target is a person, identifying the gender, age, clothing and location of the person; the preset target is one of the buildings, identify its type, location, etc.; the scene attribute features of the preset target include: one or a combination of the shooting time, shooting location, and shooting angle of the original image;

将所述获取的所述预设目标的属性特征以及所述预设目标的场景属性特征上传到与所述摄像机对应的监控中心。Upload the acquired attribute feature of the preset target and the scene attribute feature of the preset target to a monitoring center corresponding to the camera.

可选的，在从摄像机拍摄的视频中识别出预设目标之前，所述方法还包括：Optionally, before identifying the preset target from the video shot by the camera, the method further includes:

获取摄像机所拍摄的视频流数据中的原始图像，并将所述原始图像作为摄像机拍摄的视频。The original image in the video stream data captured by the camera is acquired, and the original image is used as the video captured by the camera.

可选的，所述获取摄像机所拍摄的视频流数据中的原始图像，包括：Optionally, the obtaining of the original image in the video stream data captured by the camera includes:

获取摄像机的型号数据，根据所述摄像机的型号数据从预存的型号数据-视频编码格式列表中查找出所述摄像机的视频编码格式；Obtain the model data of the camera, and find out the video encoding format of the camera from the pre-stored model data-video encoding format list according to the model data of the camera;

利用与所述视频编码格式对应的解码方法解码所述摄像机拍摄的视频流数据，还原出所述摄像机拍摄的原始图像。The video stream data captured by the camera is decoded using a decoding method corresponding to the video encoding format, and the original image captured by the camera is restored.

可选的，所述从摄像机拍摄的视频中识别出预设目标，包括：Optionally, identifying the preset target from the video shot by the camera includes:

使用ARM作为主控单元，FPGA作为核心加速单元构建用于识别预设目标的硬件计算架构；基于所述硬件架构，利用预先构建的卷积神经网络模型识别出各个原始图像中国所包含的预设目标，其中，所述预设目标包括：人物、车辆、建筑中的一种或组合。Use ARM as the main control unit and FPGA as the core acceleration unit to construct a hardware computing architecture for identifying preset targets; based on the hardware architecture, use a pre-built convolutional neural network model to identify presets contained in each original image. A target, wherein the preset target includes: one or a combination of a person, a vehicle, and a building.

可选的，所述预先构建的目标卷积神经网络的构建过程如下：Optionally, the construction process of the pre-built target convolutional neural network is as follows:

构建具备输入层、卷积层、池化层、全连接层以及输出层的初始卷积神经网络，并进行训练；Build an initial convolutional neural network with input layer, convolution layer, pooling layer, fully connected layer and output layer, and train it;

根据预设的剪枝后得到的目标卷积神经网络中卷积核的数量，以及所构建的初始卷积神经网络中卷积核的数量，获取针对剪枝操作的转换矩阵；According to the number of convolution kernels in the target convolutional neural network obtained after the preset pruning, and the number of convolution kernels in the constructed initial convolutional neural network, the transformation matrix for the pruning operation is obtained;

根据转换矩阵以及各个卷积核的权重，获取初始卷积神经网络中各个卷积核的最小化重构误差；Obtain the minimum reconstruction error of each convolution kernel in the initial convolutional neural network according to the transformation matrix and the weight of each convolution kernel;

将对应的最小化重构误差超出预设数值范围的卷积核剔除，得到构建的目标卷积神经网络。The corresponding convolution kernel whose minimum reconstruction error exceeds the preset value range is eliminated to obtain the constructed target convolutional neural network.

可选的，所述根据预设的剪枝后得到的目标卷积神经网络中卷积核的数量，以及所构建的初始卷积神经网络中卷积核的数量，获取针对剪枝操作的转换矩阵，包括：Optionally, according to the preset number of convolution kernels in the target convolutional neural network obtained after pruning, and the number of convolution kernels in the constructed initial convolutional neural network, the conversion for the pruning operation is obtained. matrix, including:

根据预设的剪枝后得到的目标卷积神经网络中卷积核的数量，以及所构建的初始卷积神经网络中卷积核的数量，利用公式，Y＝(N×c×k_h×k_w)^-1·n×c×k_h×k_w，获取针对剪枝操作的转换矩阵，其中，According to the number of convolution kernels in the target convolutional neural network obtained after the preset pruning, and the number of convolution kernels in the constructed initial convolutional neural network, using the formula, Y=(N×c×k_h × k_w )^-1 ·n×c×k_h ×k_w , obtain the transformation matrix for the pruning operation, where,

Y为针对剪枝操作的转换矩阵；N为初始卷积神经网络中卷积核的数量；c为特征图对应的通道数；k_h×k_w为卷积核的尺寸；n为剪枝后得到的目标卷积神经网络中卷积核的数量。Y is the transformation matrix for the pruning operation; N is the number of convolution kernels in the initial convolutional neural network; c is the number of channels corresponding to the feature map; k_h ×k_w is the size of the convolution kernel; n is the size of the convolution kernel after pruning The number of convolution kernels in the resulting target convolutional neural network.

可选的，所述根据转换矩阵以及各个卷积核的权重，获取初始卷积神经网络中各个卷积核的最小化重构误差，包括：Optionally, according to the transformation matrix and the weight of each convolution kernel, the minimum reconstruction error of each convolution kernel in the initial convolutional neural network is obtained, including:

根据转换矩阵以及各个卷积核的权重，利用公式，According to the transformation matrix and the weight of each convolution kernel, using the formula,

获取初始卷积神经网络中各个卷积核的最小化重构误差，其中， Obtain the minimum reconstruction error of each convolution kernel in the initial convolutional neural network, where,

min为最小值求值函数；β为长度为c的通道对应的选择矢量系数；β_i为第i个通道的批量标记；W为卷积核的权重矩阵；N为初始卷积神经网络中卷积核的数量；|| ||_F为范数函数；Y为针对剪枝操作的转换矩阵；∑为求和函数；X_i为第i个通道的切片矩阵；W^T为卷积核的权重矩阵的转置矩阵；c′为剪枝后所保留的通道数；c为特征图对应的通道数；|| ||₀为零范数函数。min is the minimum value evaluation function; β is the selection vector coefficient corresponding to the channel of length c; β_i is the batch label of the ith channel; W is the weight matrix of the convolution kernel; N is the initial convolutional neural network volume The number of product kernels; || ||_F is the norm function; Y is the transformation matrix for the pruning operation; ∑ is the summation function; X_i is the slice matrix of the ith channel; W^T is the weight of the convolution kernel The transpose matrix of the matrix; c' is the number of channels retained after pruning; c is the number of channels corresponding to the feature map; || ||₀ is a zero-norm function.

针对每一个卷积核，根据转换矩阵以及各个卷积核的权重，利用公式，For each convolution kernel, according to the transformation matrix and the weight of each convolution kernel, using the formula,

获取初始卷积神经网络中各个卷积核的重构误差，其中， Obtain the reconstruction error of each convolution kernel in the initial convolutional neural network, where,

β为长度为c的通道对应的选择矢量系数；β_i为第i个通道的批量标记；W为卷积核的权重矩阵；N为初始卷积神经网络中卷积核的数量；|| ||_F为范数函数；Y为针对剪枝操作的转换矩阵；∑为求和函数；X_i为第i个通道的切片矩阵；W^T为卷积核的权重矩阵的转置矩阵；λ为惩罚系数；|| ||₁为一范数函数；为任意一个i；c′为剪枝后所保留的通道数；c为特征图对应的通道数；|| ||₀为零范数函数。β is the selection vector coefficient corresponding to the channel of length c; β_i is the batch label of the ith channel; W is the weight matrix of the convolution kernel; N is the number of convolution kernels in the initial convolutional neural network; || | |_F is the norm function; Y is the transformation matrix for the pruning operation; ∑ is the summation function; X_i is the slice matrix of the ith channel; W^T is the transpose matrix of the weight matrix of the convolution kernel; λ is the Penalty coefficient; || ||₁ is a norm function; is any i; c′ is the number of channels retained after pruning; c is the number of channels corresponding to the feature map; || ||₀ is a zero-norm function.

可选的，所述将对应的最小化重构误差超出预设数值范围的卷积核剔除，得到构建的目标卷积神经网络，包括：Optionally, the corresponding convolution kernel whose minimum reconstruction error exceeds the preset value range is eliminated to obtain a constructed target convolutional neural network, including:

将所述初始卷积神经网络作为当前网络模型，针对当前网络模型中的当前层卷积层中的每一个卷积核，将对应的最小化重构误差超出预设数值范围的卷积核剔除；Taking the initial convolutional neural network as the current network model, for each convolution kernel in the current layer convolution layer in the current network model, the corresponding convolution kernel whose minimum reconstruction error exceeds the preset numerical range is eliminated. ;

针对剔除后剩余的每一个卷积核，保持卷积核的权重矩阵不变，利用公式，For each remaining convolution kernel after removal, keep the weight matrix of the convolution kernel unchanged, using the formula,

获取长度为c的通道对应的选择矢量系数的当前值，其中， Get the current value of the selected vector coefficient corresponding to the channel of length c, where,

为长度为c的通道对应的选择矢量系数的当前值；argmin为函数最小值变量求值函数； is the current value of the selection vector coefficient corresponding to the channel of length c; argmin is the function minimum variable evaluation function;

判断||β||₀是否收敛；Determine whether ||β||₀ converges;

若是，利用公式，获取对应于最小化重构误差的卷积核的权重；将所述长度为c的通道对应的选择矢量系数的当前值，以及对应于最小化重构误差的卷积核的权重作为该卷积核的目标选择矢量系数以及目标卷积核权重，并根据所述目标选择矢量系数以及所述目标卷积核权重更新当前网络模型；If so, using the formula, Obtain the weight corresponding to the convolution kernel that minimizes the reconstruction error; use the current value of the selected vector coefficient corresponding to the channel of length c and the weight corresponding to the convolution kernel that minimizes the reconstruction error as the convolution The target of the kernel selects vector coefficients and target convolution kernel weights, and updates the current network model according to the target selection vector coefficients and the target convolution kernel weights;

若否，根据预设的步长更新惩罚系数，并返回执行获取长度为c的通道对应的选择矢量系数的当前值的步骤，直至||β||₀收敛；If not, update the penalty coefficient according to the preset step size, and return to the step of obtaining the current value of the selection vector coefficient corresponding to the channel of length c until ||β||₀ converges;

将更新后的当前网络模型作为当前网络模型，并将当前卷积层的下一个卷积层作为当前卷积层，并返回执行所述针对当前网络模型中的当前层卷积层中的每一个卷积核，将对应的最小化重构误差超出预设数值范围的卷积核剔除的步骤，直至当前网络模型的各个卷积层都被剪枝，再将剪枝后的当前网络模型作为目标卷积神经网络模型。Taking the updated current network model as the current network model, and taking the next convolutional layer of the current convolutional layer as the current convolutional layer, and returning the execution of the current network model for each of the convolutional layers of the current layer Convolution kernel, the step of removing the corresponding convolution kernel whose reconstruction error exceeds the preset value range, until each convolution layer of the current network model is pruned, and then the pruned current network model is used as the target Convolutional Neural Network Model.

针对剔除后剩余的每一个卷积核，利用公式，For each remaining convolution kernel after culling, using the formula,

利用公式，获取对应于重构误差的卷积核的当前权重；Using the formula, Get the current weight of the convolution kernel corresponding to the reconstruction error;

判断与选择矢量系数的当前值以及卷积核的当前权重对应的重构误差是否收敛；Determine whether the reconstruction error corresponding to the current value of the selected vector coefficient and the current weight of the convolution kernel converges;

若是，将所述长度为c的通道对应的选择矢量系数的当前值，以及对应于最小化重构误差的卷积核的权重作为该卷积核的目标选择矢量系数以及目标卷积核权重，并根据所述目标选择矢量系数以及所述目标卷积核权重更新当前网络模型；If so, use the current value of the selection vector coefficient corresponding to the channel of length c, and the weight corresponding to the convolution kernel that minimizes the reconstruction error as the target selection vector coefficient of the convolution kernel and the weight of the target convolution kernel, and update the current network model according to the target selection vector coefficient and the target convolution kernel weight;

若否，根据预设的步长更新惩罚系数，并返回执行获取长度为c的通道对应的选择矢量系数的当前值的步骤，直至与选择矢量系数的当前值以及卷积核的当前权重对应的重构误差收敛；If not, update the penalty coefficient according to the preset step size, and return to the step of obtaining the current value of the selection vector coefficient corresponding to the channel of length c, until the current value of the selection vector coefficient and the current weight of the convolution kernel correspond to Reconstruction error convergence;

可选的，所述再将剪枝后的当前网络模型作为目标卷积神经网络模型，包括：Optionally, the pruned current network model is then used as the target convolutional neural network model, including:

使用线性量化算法将剪枝后的当前网络模型中的模型参数进行量化，将32位浮点数转换为8位整数；Use the linear quantization algorithm to quantize the model parameters in the current network model after pruning, and convert the 32-bit floating point number into an 8-bit integer;

再使用哈弗曼编码算法将模型参数量化后的当前网络模型进行编码；Then use the Huffman coding algorithm to encode the current network model after the model parameters are quantized;

将编码后的当前网络模型作为目标卷积神经网络模型。Take the encoded current network model as the target convolutional neural network model.

可选的，在利用预先训练的卷积神经网络模型识别时，将n*m卷积核运算拆分成n*m次乘法运算、以及n*m-1次加法运算，且，Optionally, when using the pre-trained convolutional neural network model for identification, the n*m convolution kernel operations are split into n*m multiplication operations and n*m-1 addition operations, and,

在n*m为奇数时，将n*m-1次加法运算作为当前运算，并将当前运算中的各个运算进行两两求和，得到求和后的运算结果，将求和后的运算结果作为当前运行，并返回执行所述将当前运算中的各个运算进行两两求和，得到求和后的运算结果的步骤，直至完成对n*m-1次加法运算的求和，得到n*m卷积核的运算结果；When n*m is an odd number, use n*m-1 addition operations as the current operation, and sum each operation in the current operation in pairs to obtain the operation result after the summation, and then calculate the operation result after the summation. As the current operation, and return to execute the step of summing each operation in the current operation in pairs to obtain the operation result after the summation, until the summation of n*m-1 addition operations is completed, and n* The operation result of m convolution kernel;

在n*m为偶数时，将n*m-2次加法运算作为当前运算，并将当前运算中的各个运算进行两两求和，得到求和后的运算结果，将求和后的运算结果作为当前运行，并返回执行所述将当前运算中的各个运算进行两两求和，得到求和后的运算结果的步骤，直至完成对n*m-2次加法运算的求和；然后将n*m-2次加法运算的和与未参与运算的加法运算求和，得到n*m卷积核的运算结果。When n*m is an even number, use n*m-2 addition operations as the current operation, and sum each operation in the current operation in pairs to obtain the operation result after the summation, and the operation result after the summation is obtained. As the current operation, and return to execute the step of summing the respective operations in the current operation in pairs to obtain the operation result after the summation, until the summation of n*m-2 addition operations is completed; then n The sum of *m-2 addition operations and the addition operations that are not involved in the operation are summed to obtain the operation result of the n*m convolution kernel.

本发明实施例提供了一种基于边缘计算的嵌入式视频图像解析装置，应用于视频监控网络中的摄像机，且所述视频监控网络中包含有若干个与监控中心通信连接的摄像机，所述装置包括：An embodiment of the present invention provides an embedded video image parsing device based on edge computing, which is applied to cameras in a video surveillance network, and the video surveillance network includes several cameras that are connected to a monitoring center. include:

识别模块，用于从摄像机拍摄的视频中识别出预设目标，其中，所述预设目标包括：人物、车辆、建筑中的一种或组合；an identification module, configured to identify a preset target from the video shot by the camera, wherein the preset target includes: one or a combination of a person, a vehicle, and a building;

第一获取模块，用于针对从视频中识别出的预设目标，获取所述预设目标的属性特征和/或所述预设目标的场景属性特征，其中，所述预设目标的属性特征包括：预设目标为车辆时，车辆的车型、车身颜色、车牌、位置中的一种或组合；所述预设目标为人物时，人物的性别、年龄、衣着、位置中的一种或组合；所述预设目标为建筑时，建筑的位置、类型中的一种或组合；所述预设目标的场景属性特征包括：原始图像的拍摄时间、拍摄地点、拍摄角度中的一种或组合；A first acquisition module, configured to acquire the attribute feature of the preset target and/or the scene attribute feature of the preset target for the preset target identified from the video, wherein the attribute feature of the preset target Including: when the preset target is a vehicle, one or a combination of the vehicle's model, body color, license plate, and location; when the preset target is a character, one or a combination of the character's gender, age, clothing, and location ; When the preset target is a building, one or a combination of the location and type of the building; the scene attribute features of the preset target include: one or a combination of the shooting time, shooting location, and shooting angle of the original image ;

上传模块，用于将所述获取的所述预设目标的属性特征以及所述预设目标的场景属性特征上传到与所述摄像机对应的监控中心。The uploading module is configured to upload the acquired attribute feature of the preset target and the scene attribute feature of the preset target to the monitoring center corresponding to the camera.

可选的，本发明实施例还包括：第二获取模块，用于获取摄像机所拍摄的视频流数据中的原始图像，并将所述原始图像作为摄像机拍摄的视频。Optionally, this embodiment of the present invention further includes: a second acquiring module, configured to acquire an original image in the video stream data captured by the camera, and use the original image as a video captured by the camera.

可选的，所述第二获取模块，用于：Optionally, the second obtaining module is used for:

可选的，所述识别模块，用于：Optionally, the identification module is used for:

判断||β||₀是否收敛；Determine whether ||β||₀ converges;

使用线性量化算法将剪枝后的当前网络模型中的模型参数进行量化；Use a linear quantization algorithm to quantify the model parameters in the pruned current network model;

本发明相比现有技术具有以下优点：Compared with the prior art, the present invention has the following advantages:

应用本发明实施例，在摄像机端将所拍摄的视频流数据中的目标属性特征以及场景属性特征提取出来，避免了云计算中心同时对成千上万路视频的接收存储以及解析计算的问题，降低了对云计算中心传输带宽和计算能力升级的需求，进而节约了成本。By applying the embodiments of the present invention, the target attribute features and scene attribute features in the captured video stream data are extracted at the camera end, thereby avoiding the simultaneous reception, storage and analysis and calculation of thousands of videos by the cloud computing center. It reduces the need to upgrade the transmission bandwidth and computing power of the cloud computing center, thereby saving costs.

附图说明Description of drawings

图1为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法的流程示意图；1 is a schematic flowchart of an embedded video image analysis method based on edge computing provided by an embodiment of the present invention;

图2为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法的原理示意图；2 is a schematic diagram of the principle of an embedded video image analysis method based on edge computing provided by an embodiment of the present invention;

图3为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中卷积神经网络压缩示意图；3 is a schematic diagram of convolutional neural network compression in an edge computing-based embedded video image analysis method provided by an embodiment of the present invention;

图4为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中初始卷积神经网络剪枝流程示意图；4 is a schematic diagram of an initial convolutional neural network pruning process in an edge computing-based embedded video image analysis method provided by an embodiment of the present invention;

图5为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中初始卷积神经网络剪枝流程另一种示意图；5 is another schematic diagram of an initial convolutional neural network pruning process in an edge computing-based embedded video image analysis method provided by an embodiment of the present invention;

图6为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中网络参数量化前后数据流动示意图；6 is a schematic diagram of data flow before and after network parameter quantization in an embedded video image analysis method based on edge computing provided by an embodiment of the present invention;

图7为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中FPGA实现流程示意图；FIG. 7 is a schematic flowchart of an FPGA implementation in an edge computing-based embedded video image analysis method provided by an embodiment of the present invention;

图8为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中卷积加速运算示意图；8 is a schematic diagram of a convolution acceleration operation in an embedded video image analysis method based on edge computing provided by an embodiment of the present invention;

图9为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中池化加速运算示意图；9 is a schematic diagram of pooling acceleration operation in an embedded video image analysis method based on edge computing provided by an embodiment of the present invention;

图10为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析装置的结构示意图。FIG. 10 is a schematic structural diagram of an embedded video image analysis apparatus based on edge computing according to an embodiment of the present invention.

具体实施方式Detailed ways

下面对本发明的实施例作详细说明，本实施例在以本发明技术方案为前提下进行实施，给出了详细的实施方式和具体的操作过程，但本发明的保护范围不限于下述的实施例。The embodiments of the present invention are described in detail below. This embodiment is implemented on the premise of the technical solution of the present invention, and provides a detailed implementation manner and a specific operation process, but the protection scope of the present invention is not limited to the following implementation. example.

本发明实施例提供了一种基于边缘计算的嵌入式视频图像解析方法及装置，下面首先就本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法进行介绍。Embodiments of the present invention provide an edge computing-based embedded video image parsing method and device. The following first introduces an edge computing-based embedded video image parsing method provided by the embodiments of the present invention.

首先需要说明的是，本发明实施例优选适用于现有安防、交通等大型视频监控网络的摄像机图像内容的解析，通常包含有多个与监控中心通信连接的摄像机。First of all, it should be noted that the embodiment of the present invention is preferably applicable to the analysis of camera image content in existing large-scale video surveillance networks such as security and traffic, and usually includes a plurality of cameras that are communicatively connected to the monitoring center.

实施例1Example 1

图1为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法的流程示意图，图2为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法的原理示意图；如图1和图2所示，所述方法包括：1 is a schematic flowchart of an embedded video image analysis method based on edge computing provided by an embodiment of the present invention, and FIG. 2 is a schematic schematic diagram of the principle of an embedded video image analysis method based on edge computing provided by an embodiment of the present invention; As shown in Figure 1 and Figure 2, the method includes:

S101：从摄像机拍摄的视频中识别出预设目标，其中，所述预设目标包括：人物、车辆、建筑中的一种或组合。S101: Identify a preset target from a video shot by a camera, where the preset target includes one or a combination of a person, a vehicle, and a building.

图3为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中目标卷积神经网络压缩流程示意图；图4为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中初始卷积神经网络剪枝流程示意图；图5为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中初始卷积神经网络剪枝流程另一种示意图；如图3-图4所示，FIG. 3 is a schematic diagram of a target convolutional neural network compression process in an embedded video image analysis method based on edge computing provided by an embodiment of the present invention; FIG. 4 is an embedded video image based on edge computing provided by an embodiment of the present invention. A schematic diagram of an initial convolutional neural network pruning process in the analysis method; FIG. 5 is another schematic diagram of an initial convolutional neural network pruning process in an edge computing-based embedded video image analysis method provided by an embodiment of the present invention; 3-As shown in Figure 4,

具体的，本步骤可以包括以下步骤：Specifically, this step may include the following steps:

A：构建具备输入层、卷积层、池化层、全连接层以及输出层的初始卷积神经网络，并进行训练。A: Build an initial convolutional neural network with an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer, and train it.

可以理解的是，可以使用常用结构的卷积神经网络，然后使用监控图像组成的训练集对构建的卷积神经网络进行训练，卷积神经网络可以根据训练集的情况，自动调节卷积神经网络的权重参数；同时人为调整卷积神经网络的超参数，进而完成卷积神经网络的训练，得到初始卷积神经网络。It is understandable that a convolutional neural network with a common structure can be used, and then a training set composed of monitoring images can be used to train the constructed convolutional neural network. The convolutional neural network can automatically adjust the convolutional neural network according to the conditions of the training set. At the same time, the hyperparameters of the convolutional neural network are adjusted artificially, and then the training of the convolutional neural network is completed, and the initial convolutional neural network is obtained.

B：根据预设的剪枝后得到的目标卷积神经网络中卷积核的数量，以及所构建的初始卷积神经网络中卷积核的数量，获取针对剪枝操作的转换矩阵。B: According to the number of convolution kernels in the target convolutional neural network obtained after the preset pruning, and the number of convolutional kernels in the constructed initial convolutional neural network, the transformation matrix for the pruning operation is obtained.

具体的，可以根据预设的剪枝后得到的目标卷积神经网络中卷积核的数量n，以及所构建的初始卷积神经网络中卷积核的数量N，利用公式，Specifically, according to the number n of convolution kernels in the target convolutional neural network obtained after the preset pruning, and the number N of convolution kernels in the constructed initial convolutional neural network, using the formula,

Y＝(N×c×k_h×k_w)^-1·n×c×k_h×k_w，获取针对剪枝操作的转换矩阵，其中，Y=(N×c×k_h ×k_w )^-1 ·n×c×k_h ×k_w , obtain the transformation matrix for the pruning operation, where,

C：根据转换矩阵以及各个卷积核的权重，获取初始卷积神经网络中各个卷积核的最小化重构误差。C: Obtain the minimum reconstruction error of each convolution kernel in the initial convolutional neural network according to the transformation matrix and the weight of each convolution kernel.

具体的，C步骤可以使用C1步骤，或者C2步骤。Specifically, step C can use step C1 or step C2.

C1：根据转换矩阵Y以及各个卷积核的权重，利用公式，C1: According to the transformation matrix Y and the weight of each convolution kernel, using the formula,

C2：针对每一个卷积核，根据转换矩阵Y以及各个卷积核的权重，利用公式，C2: For each convolution kernel, according to the transformation matrix Y and the weight of each convolution kernel, using the formula,

在实际应用中，获取最小化重构误差的方法也被称为LASSO回归的方法。In practical applications, the method of obtaining the minimum reconstruction error is also called the method of LASSO regression.

D：将对应的最小化重构误差超出预设数值范围的卷积核剔除，得到构建的目标卷积神经网络。D: Eliminate the corresponding convolution kernel whose minimum reconstruction error exceeds the preset value range to obtain the constructed target convolutional neural network.

具体的，D步骤可以使用D1步骤，或者D2步骤。Specifically, the D step can use the D1 step or the D2 step.

D1：将所述初始卷积神经网络作为当前网络模型，针对当前网络模型中的当前层卷积层中的每一个卷积核，将对应的最小化重构误差超出预设数值范围的卷积核剔除；D1: Using the initial convolutional neural network as the current network model, for each convolution kernel in the convolutional layer of the current layer in the current network model, the corresponding minimum reconstruction error exceeds the preset value range. nuclear culling;

判断||β||₀是否收敛；Determine whether ||β||₀ converges;

D2：将所述初始卷积神经网络作为当前网络模型，针对当前网络模型中的当前层卷积层中的每一个卷积核，将对应的最小化重构误差超出预设数值范围的卷积核剔除；D2: Using the initial convolutional neural network as the current network model, for each convolution kernel in the convolutional layer of the current layer in the current network model, the corresponding minimum reconstruction error exceeds the preset value range. nuclear culling;

将更新后的网络模型作为当前网络模型，并将当前卷积层的下一个卷积层作为当前卷积层，并返回执行所述针对当前网络模型中的当前层卷积层中的每一个卷积核，将对应的最小化重构误差超出预设数值范围的卷积核剔除的步骤，直至当前网络模型的各个卷积层都被剪枝，简化网络模型结构，有效减少计算量。Take the updated network model as the current network model, and take the next convolutional layer of the current convolutional layer as the current convolutional layer, and return to execute the current network model for each volume in the convolutional layer of the current layer Accumulation kernel, the step of removing the corresponding convolution kernel whose reconstruction error exceeds the preset value range, until each convolution layer of the current network model is pruned, which simplifies the structure of the network model and effectively reduces the amount of computation.

图6为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中量化前后数据流动示意图，如图6所示，使用线性量化算法将剪枝后的当前网络模型中的权重参数进行量化；再使用哈弗曼编码算法将量化后的模型参数进行编码；将编码后的当前网络模型作为最终目标卷积神经网络模型。经过量化编码后的网络模型存储规模减小，降低了存储的要求。FIG. 6 is a schematic diagram of data flow before and after quantization in an embedded video image analysis method based on edge computing provided by an embodiment of the present invention. As shown in FIG. 6 , a linear quantization algorithm is used to prune weight parameters in the current network model. Perform quantization; then use the Huffman coding algorithm to encode the quantized model parameters; use the encoded current network model as the final target convolutional neural network model. The storage scale of the network model after quantization coding is reduced, which reduces the storage requirements.

在实际应用中，在训练好的模型中，参数都以32bit浮点数的形式存储，大型CNN网络训练的模型占用存储空间以百兆计，因此可以通过改变参数存储方式进而压缩模型，即参数量化压缩算法。在算法实际应用当中，需要根据网络结构特点进行量化参数设置。量化方法可以为：In practical applications, in the trained model, the parameters are stored in the form of 32bit floating point numbers, and the model trained by a large CNN network occupies hundreds of megabytes of storage space. Therefore, the model can be compressed by changing the parameter storage method, that is, parameter quantization. compression algorithm. In the practical application of the algorithm, it is necessary to set the quantization parameters according to the characteristics of the network structure. The quantification method can be:

统计参数的最大值和最小值，将所有的参数都除以参数中的最大值和最小值之差，然后将得到值的商乘以256，将值映射到0-255的区间，得到量化后的8位参数，以将32位浮点数转化为8位整数。在实际应用中，量化算法可以为非线性量化算法等。Calculate the maximum and minimum values of the parameters, divide all parameters by the difference between the maximum and minimum values in the parameters, then multiply the quotient of the obtained value by 256, map the value to the interval of 0-255, and get the quantized 8-bit argument to convert 32-bit floating point numbers to 8-bit integers. In practical applications, the quantization algorithm may be a nonlinear quantization algorithm or the like.

本发明实施例通过量化权重，，并通过可变长度编码，即霍夫曼编码来利用有效权重的不均匀分布，并在没有训练准确度损失情况下使用可变长度编码表征权重。Embodiments of the present invention exploit the uneven distribution of effective weights by quantizing the weights, and use variable-length coding, ie, Huffman coding, to characterize the weights without loss of training accuracy.

E，将目标卷积神经网络部署在嵌入式硬件平台上，然后由嵌入式计算平台根据根据摄像机所拍摄的视频图像中的每张图像中包含的内容进行预设目标的识别。E. The target convolutional neural network is deployed on the embedded hardware platform, and then the embedded computing platform performs recognition of the preset target according to the content contained in each image in the video image captured by the camera.

在实际应用中，本步骤中所要识别出的预设目标可以为人为设置的预设目标列表中的目标，该预设目标列表可以由操作人员手动更新，也可以由系统对重点目标进行自动识别，进而自动添加进去的。In practical applications, the preset target to be identified in this step can be a target in a preset target list set manually, and the preset target list can be manually updated by the operator, or the system can automatically identify key targets , which is automatically added.

通常情况下，重点目标是指，动作幅度超过设定范围的人员，进入警戒区域的人员还或者车辆、着装特殊的人员等。Usually, the key target refers to the people whose movement range exceeds the set range, the people who enter the warning area, the vehicles, the people who wear special clothes, etc.

如图4所示，在将某一卷积核剔除后，应当移除与该卷积核对应的冗余神经元，以简化网络结构。As shown in Figure 4, after removing a certain convolution kernel, the redundant neurons corresponding to the convolution kernel should be removed to simplify the network structure.

应用本发明实施例，通过上述方法实现了卷积神经网络权值的压缩，进而减少了简化了卷积神经网络的结构、减少了权重参数的存储量，使目标卷积神经网络可以在计算资源少、存储量小的嵌入式环境下实现相同的运算速度以及等同的目标检测识别效果。By applying the embodiments of the present invention, the compression of the weights of the convolutional neural network is realized by the above method, thereby reducing and simplifying the structure of the convolutional neural network and reducing the storage amount of the weight parameters, so that the target convolutional neural network can be used in computing resources. It can achieve the same computing speed and the same target detection and recognition effect in the embedded environment with less storage and less storage.

S102：针对从视频中识别出的预设目标，获取所述预设目标的属性特征和/或所述预设目标的场景属性特征，形成关于场景图像内容的描述，其中，所述预设目标的属性特征包括：预设目标为车辆，识别车辆的车型、车身颜色、车牌和图像中的位置等；所述预设目标为人物，识别人员的性别、年龄、衣着和图像中的位置等；所述预设目标为建筑中的一种，识别建筑物的类型、图像中的位置等；所述预设目标的场景属性特征包括：原始图像的拍摄时间、拍摄地点、拍摄角度中的一种或组合。S102: For the preset target identified from the video, obtain the attribute feature of the preset target and/or the scene attribute feature of the preset target, and form a description about the content of the scene image, wherein the preset target The attribute features include: the preset target is a vehicle, identifying the vehicle's model, body color, license plate and position in the image, etc.; the preset target is a person, identifying the gender, age, clothing and position of the person in the image, etc.; The preset target is a type of building, identifying the type of building, the location in the image, etc.; the scene attribute features of the preset target include: one of the shooting time, shooting location, and shooting angle of the original image or combination.

在实际应用中，可以识别结果对所提取的目标属性特征和/或预设目标的场景属性特征进行结构化描述，生成图像及其对应描述信息相结合的完整信息帧，形成符合TCP/IP协议的格式化信息码流上传至网络。In practical applications, the identification results can be used to structurally describe the extracted target attribute characteristics and/or the scene attribute characteristics of the preset target, and generate a complete information frame combining the image and its corresponding description information, forming a complete information frame that conforms to the TCP/IP protocol. The formatted information code stream is uploaded to the network.

S103：将所述获取的所述预设目标的属性特征以及所述预设目标的场景属性特征上传到与所述摄像机对应的监控中心。S103: Upload the acquired attribute feature of the preset target and the scene attribute feature of the preset target to a monitoring center corresponding to the camera.

将得到的数据通过视频传输网络传输到监控中心，以使监控中心能够基于大数据技术接收的数据进行处理，完成视频图像及其内容的存储、分析、检索和统计等任务，满足城市或区域对海量视频内容进行快速分析处理和应用的需求。The obtained data is transmitted to the monitoring center through the video transmission network, so that the monitoring center can process the data received based on the big data technology, complete the tasks of storage, analysis, retrieval and statistics of video images and their contents, and meet the needs of cities or regions. Mass video content needs to be quickly analyzed, processed and applied.

应用本发明图1所示实施例，在摄像机端将所拍摄的视频流数据中的目标属性特征以及场景属性特征提取出来，避免了云计算中心同时对成千上万路视频的接收存储以及解析计算的问题，降低了对云计算中心传输带宽和计算能力升级的需求，进而节约了成本。Using the embodiment shown in FIG. 1 of the present invention, the target attribute features and scene attribute features in the captured video stream data are extracted at the camera end, avoiding the simultaneous reception, storage and analysis of thousands of videos by the cloud computing center. The problem of computing reduces the need to upgrade the transmission bandwidth and computing power of the cloud computing center, thereby saving costs.

另外，可以将应用本发明图1所示实施例方法的设备与多个摄像机进行数据连接，由一个设备进行多个摄像机拍摄的图像的解析和上传，减少了设备的部署数量，进而节约了成本。In addition, the device applying the method of the embodiment shown in FIG. 1 of the present invention can be connected to multiple cameras for data connection, and one device can perform analysis and upload of images captured by multiple cameras, which reduces the number of devices deployed and thus saves costs. .

实施例2Example 2

在本发明实施例1的基础上，在S101步骤之前，所述方法还包括：On the basis of Embodiment 1 of the present invention, before step S101, the method further includes:

S104：获取摄像机所拍摄的视频流数据中的原始图像。S104: Acquire the original image in the video stream data captured by the camera.

具体的，可以获取摄像机的型号数据，根据所述摄像机的型号数据从预存的型号数据-视频编码格式列表中查找出所述摄像机的视频编码格式；利用与所述视频编码格式对应的解码方法解码所述摄像机拍摄的视频流数据，还原出所述摄像机拍摄的原始图像。Specifically, the model data of the camera can be obtained, and the video encoding format of the camera can be found from the pre-stored model data-video encoding format list according to the model data of the camera; decoding is performed using a decoding method corresponding to the video encoding format. The video stream data captured by the camera restores the original image captured by the camera.

发明人还发现，在安防领域，由于摄像机的生产厂家众多，建设时间跨度较大，监控视频终端设备规格、图像压缩格式和编码格式不一，本发明实施例，根据不同规格的监控摄像机的编码格式采用不同的解码策略，从摄像机输出码流中解码恢复出原始图像和图像参数等信息，进而再进行预设目标的识别，达到了兼容现有的不同型号的摄像机的目的。The inventor also found that in the field of security, due to the large number of camera manufacturers, the construction time span is large, and the specifications, image compression formats and encoding formats of monitoring video terminal equipment are different. The format adopts different decoding strategies to decode and recover the original image and image parameters from the output stream of the camera, and then identify the preset target to achieve the purpose of being compatible with existing cameras of different models.

实施例3Example 3

在本发明实施例1的基础上，进一步的在执行本发明实施例1时，可以使用ARM(Random Access Memory，随机只读存储器)作为主控单元，FPGA(Field ProgrammableGate Array，即现场可编程门阵列)作为加速单元构建用于识别预设目标的硬件核心平台架构。基于所述硬件架构，利用预先构建的卷积神经网络模型识别出各个原始图像中所包含的预设目标，其中，所述预设目标包括：人物、车辆、建筑中的一种或组合。On the basis of Embodiment 1 of the present invention, when further implementing Embodiment 1 of the present invention, an ARM (Random Access Memory, random read-only memory) can be used as the main control unit, and an FPGA (Field Programmable Gate Array, that is, a Field Programmable Gate Array) can be used. array) as an acceleration unit to build the hardware core platform architecture for identifying preset targets. Based on the hardware architecture, a pre-built convolutional neural network model is used to identify preset targets contained in each original image, wherein the preset targets include one or a combination of people, vehicles, and buildings.

具体的，可以在利用预先训练的卷积神经网络模型识别时，将n*m卷积核运算拆分成n*m次乘法运算、以及n*m-1次加法运算，且，Specifically, when using the pre-trained convolutional neural network model to identify, the n*m convolution kernel operation can be split into n*m multiplication operations and n*m-1 addition operations, and,

示例性的，图7为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中FPGA实现流程示意图；图8为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中卷积加速运算示意图；图9为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析方法中池化加速运算示意图；如图7-图8所示，可以将本发明实施例的处理方法部署在FPGA上，进而使执行本发明实施例上述操作。Exemplarily, FIG. 7 is a schematic diagram of an FPGA implementation process in an edge computing-based embedded video image analysis method provided by an embodiment of the present invention; FIG. 8 is an edge computing-based embedded video image provided by an embodiment of the present invention. Schematic diagram of convolution acceleration operation in the analysis method; FIG. 9 is a schematic diagram of pooling acceleration operation in an embedded video image analysis method based on edge computing provided by an embodiment of the present invention; The processing method of the embodiment is deployed on the FPGA, so that the foregoing operations of the embodiment of the present invention are performed.

采用ARM+FPGA的异构处理架构，其中ARM作为控制单元，主要完成算法的调度和任务管理；FPGA作为核心加速单元，针对神经网络中主体的卷积、池化等运算进行加速处理，提高算法运行效率，满足实时处理需求。The heterogeneous processing architecture of ARM+FPGA is adopted, in which ARM is used as the control unit to mainly complete the scheduling and task management of the algorithm; FPGA is used as the core acceleration unit to accelerate the operations of the main body in the neural network, such as convolution and pooling, to improve the algorithm. Operational efficiency to meet real-time processing needs.

目前，第一方面，深度学习算法一般借助Caffe、TensorFlow以及Pytorch等深度学习框架，使用python语言实现。虽然，采用深度学习框架大大降低了算法的开发难度，但是在嵌入式平台安装这些深度学习框架将占据大部分资源，使得算法无法满足实时处理的要求。因此，本发明实施例采用C/C++语言实现卷积网络的实现，避免使用深度学习框架，以节省嵌入式平台资源，有效提高算法的处理速度。At present, on the first aspect, deep learning algorithms are generally implemented in the python language with the help of deep learning frameworks such as Caffe, TensorFlow, and Pytorch. Although the use of deep learning frameworks greatly reduces the difficulty of algorithm development, installing these deep learning frameworks on embedded platforms will occupy most of the resources, making the algorithms unable to meet the requirements of real-time processing. Therefore, the embodiment of the present invention adopts the C/C++ language to realize the realization of the convolutional network, and avoids the use of a deep learning framework, so as to save embedded platform resources and effectively improve the processing speed of the algorithm.

第二方面，在嵌入式应用场合，目前还没有成熟芯片可供选用。本发明选用ARM+FPGA的硬件处理架构，模拟CPU(Center Processing Unit，中央处理器)和GPU(GraphicsProcessing Unit，图形处理器)架构，利用FPGA的并行处理能力进行卷积神经网络加速计算，并使用Verilog HDL(Hardware Description Language，硬件描述语言)硬件描述语言实现，达到类似“GPU”的加速效果，以满足实时处理的要求。Second, in embedded applications, there are currently no mature chips to choose from. The present invention selects the hardware processing architecture of ARM+FPGA, simulates the architecture of CPU (Center Processing Unit, central processing unit) and GPU (Graphics Processing Unit, graphics processor), uses the parallel processing capability of FPGA to perform convolutional neural network acceleration calculation, and uses Verilog HDL (Hardware Description Language, hardware description language) hardware description language implementation, to achieve a similar "GPU" acceleration effect to meet the requirements of real-time processing.

由于卷积神经网络存在大量的卷积、池化运算，在计算过程中需要消耗较多的DSP(digital singnal processor，数字信号处理)处理资源和RAM存储资源，因此，本发明实施例选用计算和存储资源相对较多的ZYNQ7100型FPGA作为系统处理的核心。如图4所示，该芯片内部有2020个DSP slice，按每次乘法和加法运算各消耗2个DSP slice计算，则一个时钟周期内可并行进行超过1000次的乘法或加法运算；内部存储容量为26.5Mb，满足光学图像、卷积模板和特征图等数据缓存的需要；内部逻辑处理单元为444K，可为复杂的逻辑运算和控制提供较充分的逻辑运算资源保障。表1为本发明实施例适用的FPGA型号列表，如表1所示，Since there are a large number of convolution and pooling operations in the convolutional neural network, a lot of DSP (digital signal processing, digital signal processing) processing resources and RAM storage resources need to be consumed in the calculation process. Therefore, in the embodiment of the present invention, computing and The ZYNQ7100 FPGA with relatively more storage resources is used as the core of the system processing. As shown in Figure 4, there are 2020 DSP slices inside the chip, and each multiplication and addition operation consumes 2 DSP slices for calculation, so more than 1000 multiplication or addition operations can be performed in parallel in one clock cycle; the internal storage capacity It is 26.5Mb, which meets the needs of data buffering such as optical images, convolution templates and feature maps; the internal logic processing unit is 444K, which can provide sufficient logic operation resource guarantee for complex logic operations and control. Table 1 is a list of FPGA models applicable to the embodiment of the present invention, as shown in Table 1,

表1Table 1

在实际应用中，深度学习网络模型中每次乘法操作占用2个DSP Slice，每次加法占用2个DSP Slice。本系统以较具代表性的目标检测识别算法SSD为例(Single ShotMultiBox Detector，单发多盒检测器)进行评估，该算法网络中第1个卷积层的输入图像是300(图像长)×300(图像宽)×3(图像通道数)，卷积核尺寸为3×3，数量为64个。其需要计算乘法的次数为300×300×3×3×3×64＝155520000次。In practical applications, each multiplication operation in the deep learning network model occupies 2 DSP slices, and each addition occupies 2 DSP slices. This system takes the representative object detection and recognition algorithm SSD (Single Shot MultiBox Detector) as an example for evaluation. The input image of the first convolutional layer in the algorithm network is 300 (image length) × 300 (image width) × 3 (number of image channels), the size of the convolution kernel is 3 × 3, and the number is 64. The number of times of multiplication required to be calculated is 300×300×3×3×3×64=155520000 times.

如果在FPGA中执行一次上述，需要DSP Slice共3亿多个，显然在现实情况中无法满足。因此，本发明在FPGA中设计了卷积加速计算架构，以3×3卷积为例，如图8所示，共运算乘法9次，加法8次，则运算一次需占用9×2+8×2＝34个DSP Slice。在ZYNQ7100型FPGA中共有2020个DSP Slice，因此可以在一个时钟内并行计算约59次。以第一层卷积为例，其输入是300×300×3，共有64个3×3卷积核。因此供需使用FPGA中的3×3卷积架构300×300×3×64＝17280000次，由于每个时钟可并行计算59次，故该操作供需17280000/59＝292881个时钟。ZYNQ7100的时钟频率为250MHz，理论上共需处理时间为1.17ms。该计算时间未考虑延迟以及读取数据时间等，是理论计算可达到的上限。If the above is performed once in an FPGA, more than 300 million DSP slices are required, which obviously cannot be satisfied in reality. Therefore, the present invention designs a convolution acceleration computing architecture in an FPGA. Taking 3×3 convolution as an example, as shown in FIG. 8 , there are 9 multiplications and 8 additions in total, and one operation takes 9×2+8 ×2=34 DSP slices. There are a total of 2020 DSP slices in the ZYNQ7100 FPGA, so about 59 parallel computations can be performed in one clock. Taking the first layer of convolution as an example, its input is 300×300×3, and there are 64 3×3 convolution kernels in total. Therefore, the supply and demand use the 3×3 convolutional architecture in the FPGA 300×300×3×64=17280000 times. Since each clock can be calculated 59 times in parallel, the supply and demand for this operation is 17280000/59=292881 clocks. The clock frequency of ZYNQ7100 is 250MHz, and the theoretical total processing time is 1.17ms. This calculation time does not take into account the delay and the time to read data, etc., and is the upper limit that can be achieved by theoretical calculation.

深度学习网络主要由卷积层和池化层构成，池化操作也可在FPGA中进行加速，其中最大池化操作在FPGA中的加速架构如图9所示。算法采用2×2最大池化，每步池化操作需占用3个DSP Slice。由于ZYNQ7100中共有2020个DSP Slice单元，所以可并行执行670个池化结构。以池化1层为例，其输入为300×300×64，其需要150×150×64＝1440000次池化，需要并行执行1440000/670＝2150个时钟，需要0.0085ms。因此计算时间绝大部分在于卷积运算。The deep learning network is mainly composed of a convolution layer and a pooling layer. The pooling operation can also be accelerated in the FPGA. The acceleration architecture of the maximum pooling operation in the FPGA is shown in Figure 9. The algorithm adopts 2×2 max pooling, and each step of pooling operation takes 3 DSP slices. Since there are a total of 2020 DSP slice units in the ZYNQ7100, 670 pooling structures can be executed in parallel. Taking the pooling layer 1 as an example, its input is 300×300×64, which requires 150×150×64=1440000 pooling times, and needs to execute 1440000/670=2150 clocks in parallel, which takes 0.0085ms. Therefore, most of the computation time is in the convolution operation.

基于综合分析，SSD目标检测识别算法在该平台上运行所需消耗时间约230毫秒，处理速度大于4帧/秒，可以满足近实时解析处理需求。随着技术的进步，本发明的神经网络算法可以方便移植先进FPGA硬件平台或AI芯片上，进一步提高算法处理的速度。Based on the comprehensive analysis, the SSD target detection and recognition algorithm takes about 230 milliseconds to run on this platform, and the processing speed is greater than 4 frames per second, which can meet the needs of near real-time analysis and processing. With the advancement of technology, the neural network algorithm of the present invention can be easily transplanted on an advanced FPGA hardware platform or an AI chip to further improve the speed of algorithm processing.

应用本发明实施例3，可以在FPGA进行卷积神经网络的加速运行。By applying Embodiment 3 of the present invention, the accelerated operation of the convolutional neural network can be performed on the FPGA.

在实际应用中，利用AI(Artificial Intelligence，人工智能)技术对视频监控系统生成的图像进行识别时，可以使用AI智能摄像机代替传统的摄像机，使得摄像机不仅能“成像”，还能“看懂”图像内容，实现图像到“信息”的转化，大大简化了后端大数据处理的负载。因此，智能摄像机成为未来发展趋势。但是，在实际应用中发现，AI智能摄像机价格昂贵，低端AI智能摄像机价格达到了十万元以上，中端AI智能摄像机的价格更是十数万元甚至几十万元。对于现存分布广泛的大量传统监控视频终端而言，全部替换成AI智能摄像机不仅代价巨大，且存在巨大的资源浪费和重复建设问题，显得得不偿失。In practical applications, when using AI (Artificial Intelligence, artificial intelligence) technology to identify images generated by video surveillance systems, AI smart cameras can be used to replace traditional cameras, so that cameras can not only "image", but also "understand" Image content, realize the conversion of image to "information", which greatly simplifies the load of back-end big data processing. Therefore, smart cameras become the future development trend. However, in practical applications, it is found that the price of AI smart cameras is expensive, the price of low-end AI smart cameras has reached more than 100,000 yuan, and the price of mid-end AI smart cameras is hundreds of thousands or even hundreds of thousands of yuan. For a large number of existing widely distributed traditional surveillance video terminals, replacing all of them with AI smart cameras is not only costly, but also has huge waste of resources and repeated construction problems, which is not worth the gain.

因此，应用本发明实施例3，通过嵌入式边缘计算平台，可以对多路现有摄像机视频图像同时进行内容解析，实现预设目标的识别，然后将预设目标的属性信息通过视频监控网络发送到监控中心，，通过分布式处理降低了对海量视频图像同时进行解析处理的压力，节约了大量的费用。Therefore, by applying Embodiment 3 of the present invention, through the embedded edge computing platform, the content analysis can be performed on the video images of multiple existing cameras at the same time, so as to realize the identification of the preset target, and then send the attribute information of the preset target through the video surveillance network. In the monitoring center, distributed processing reduces the pressure of simultaneous analysis and processing of massive video images, saving a lot of costs.

与本发明图1所示实施例相对应，本发明实施例还提供了一种基于边缘计算的嵌入式视频图像解析装置。Corresponding to the embodiment shown in FIG. 1 of the present invention, the embodiment of the present invention further provides an embedded video image parsing apparatus based on edge computing.

图10为本发明实施例提供的一种基于边缘计算的嵌入式视频图像解析装置的结构示意图，如图10所示，应用于视频监控网络中的摄像机图像内容解析，且可同时对若干个与监控中心通信连接的摄像机图像进行处理，所述装置包括：FIG. 10 is a schematic structural diagram of an edge computing-based embedded video image analysis device provided by an embodiment of the present invention. As shown in FIG. 10 , it is applied to the analysis of camera image content in a video surveillance network, and can simultaneously analyze several The camera images connected to the communication connection of the monitoring center are processed, and the device includes:

识别模块1001，用于从摄像机拍摄的视频中识别出预设目标，其中，所述预设目标包括：人物、车辆、建筑中的一种或组合；An identification module 1001, configured to identify a preset target from a video shot by a camera, wherein the preset target includes: one or a combination of a person, a vehicle, and a building;

第一获取模块1002，用于针对从视频中识别出的预设目标，获取所述预设目标的属性特征和/或所述预设目标的场景属性特征，其中，所述预设目标的属性特征包括：预设目标为车辆，所述预设目标为人物、所述预设目标为建筑中的一种；所述预设目标的场景属性特征包括：原始图像的拍摄时间、拍摄地点、拍摄角度中的一种或组合；The first obtaining module 1002 is configured to obtain the attribute feature of the preset target and/or the scene attribute feature of the preset target for the preset target identified from the video, wherein the attribute of the preset target The features include: the preset target is a vehicle, the preset target is a person, and the preset target is one of buildings; the scene attribute features of the preset target include: shooting time, shooting location, shooting time of the original image, shooting one or a combination of angles;

上传模块1003，用于将所述获取的所述预设目标的属性特征以及所述预设目标的场景属性特征上传到与所述摄像机对应的监控中心。The uploading module 1003 is configured to upload the acquired attribute feature of the preset target and the scene attribute feature of the preset target to a monitoring center corresponding to the camera.

应用本发明图10所示实施例，在靠近监控摄像机处将所拍摄的视频流数据中的目标属性特征以及场景属性特征提取出来再进行传输，不改变原有系统架构，解决了后台对海量视频图像同时进行解析的难题，节约了监控视频网络升级改造的成本。Using the embodiment shown in FIG. 10 of the present invention, the target attribute features and scene attribute features in the captured video stream data are extracted near the surveillance camera, and then transmitted, without changing the original system architecture, which solves the problem of massive video problems in the background. The problem of analyzing images at the same time saves the cost of upgrading and transforming the surveillance video network.

在本发明实施例的一种具体实施方式中，在本发明图10所示实施例的基础上，增加了：In a specific implementation of the embodiment of the present invention, on the basis of the embodiment shown in FIG. 10 of the present invention, the following is added:

在本发明实施例的一种具体实施方式中，所述装置还包括：In a specific implementation manner of the embodiment of the present invention, the device further includes:

第二获取模块，用于获取摄像机所拍摄的视频流数据中的原始图像，并将所述原始图像作为摄像机拍摄的视频。The second acquiring module is configured to acquire the original image in the video stream data captured by the camera, and use the original image as the video captured by the camera.

在本发明实施例的一种具体实施方式中，所述第二获取模块，用于：In a specific implementation manner of the embodiment of the present invention, the second acquisition module is used for:

在本发明实施例的一种具体实施方式中，所述识别模块1001，用于：In a specific implementation of the embodiment of the present invention, the identification module 1001 is used for:

在本发明实施例的一种具体实施方式中，所述识别模块1001，包括构建单元，用于：In a specific implementation of the embodiment of the present invention, the identification module 1001 includes a construction unit, which is used for:

在本发明实施例的一种具体实施方式中，所述构建单元，用于：In a specific implementation of the embodiment of the present invention, the building unit is used for:

根据预设的剪枝后得到的目标卷积神经网络中卷积核的数量，以及所构建的初始卷积神经网络中卷积核的数量，利用公式，Y＝(N×c×k_h×k_w)-¹·n×c×k_h×k_w，获取针对剪枝操作的转换矩阵，其中，According to the number of convolution kernels in the target convolutional neural network obtained after the preset pruning, and the number of convolution kernels in the constructed initial convolutional neural network, using the formula, Y=(N×c×k_h × k_w )-¹ ·n×c×k_h ×k_w , obtains the transformation matrix for the pruning operation, where,

判断||β||₀是否收敛；Determine whether ||β||₀ converges;

使用量化算法将剪枝后的当前网络模型中的模型参数进行量化；Use a quantization algorithm to quantify the model parameters in the pruned current network model;

在利用预先训练的卷积神经网络模型识别时，将n*m卷积核运算拆分成n*m次乘法运算、以及n*m-1次加法运算，且，When using the pre-trained convolutional neural network model for identification, the n*m convolution kernel operation is split into n*m multiplication operations and n*m-1 addition operations, and,

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.