CN111383204A

Movatterモバイル変換

Info

Publication number: CN111383204A
Application number: CN201911317310.5A
Authority: CN
Inventors: 陈姝媛; 刘佳琪; 艾夏; 刘鑫; 高路; 刘向荣; 孟刚; 水涌涛; 周岩; 龚晓刚; 金科; 刘洪艳; 赵巨岩; 白锦良; 秦鹏; 江志烨; 徐锋; 李虎; 曹阳; 王上月
Original assignee: China Academy of Launch Vehicle Technology CALT; Beijing Aerospace Changzheng Aircraft Institute
Current assignee: China Academy of Launch Vehicle Technology CALT; Beijing Aerospace Changzheng Aircraft Institute
Priority date: 2019-12-19
Filing date: 2019-12-19
Publication date: 2020-07-07

Abstract

Translated fromChinese

本申请实施例中提供了一种视频图像融合方法、全景监控方法以及全景监控系统。采用本申请实施例中的视频图像融合方法包括以下步骤：检测尺度空间极值点，并以极值点作为特征点；对特征点进行精准定位，获得各特征点的位置和尺度；确定以各特征点为中心特定大小的区域内、固定位置的像素点对；逐一比较像素点对的灰度值，并根据灰度值的二值化比较结果生成描述符；将描述符输出到特征匹配函数得到融合图像，本申请进一步提高了全景融合画面质量，解决现有技术中没有新的视频图像融合方法来提高画面质量适应新的应用场景及使用需求的问题。

The embodiments of the present application provide a video image fusion method, a panoramic monitoring method, and a panoramic monitoring system. Using the video image fusion method in the embodiment of the present application includes the following steps: detecting extreme points in the scale space, and using the extreme points as feature points; accurately locating the feature points to obtain the position and scale of each feature point; The feature point is a pixel point pair in a certain size of the center and a fixed position; the gray value of the pixel point pair is compared one by one, and the descriptor is generated according to the binarization comparison result of the gray value; the descriptor is output to the feature matching function The fusion image is obtained, the present application further improves the quality of the panoramic fusion image, and solves the problem that there is no new video image fusion method in the prior art to improve the image quality and adapt to new application scenarios and usage requirements.

Description

Translated fromChinese

视频图像融合方法、融合装置、全景监控系统及存储介质Video image fusion method, fusion device, panoramic monitoring system and storage medium

技术领域technical field

本申请属于计算机图像处理技术领域，具体地，涉及一种视频图像融合方法、融合装置、全景监控系统及存储介质。The present application belongs to the technical field of computer image processing, and in particular, relates to a video image fusion method, a fusion device, a panoramic monitoring system and a storage medium.

背景技术Background technique

随着电子信息科学技术的飞速发展，视频监控技术得到稳定提高和进步；其中，视频监控系统已经在大型公共活动场得到广泛的应用。但是，传统视频解决方案在短时间内只能反应局部某些变量信息，无法对全局变量信息进行及时有效地把控，以致对偶发的事件不能及时发现和追踪。相对于单一视点监控系统，全景视频监控系统可以提供更全面的信息，可以消除监控的盲区，从而高效地实现对目标区域的有效监控。With the rapid development of electronic information science and technology, video surveillance technology has been steadily improved and progressed; among them, video surveillance systems have been widely used in large-scale public venues. However, traditional video solutions can only reflect some local variable information in a short period of time, and cannot control the global variable information in a timely and effective manner, so that incidental events cannot be detected and tracked in time. Compared with the single-view monitoring system, the panoramic video monitoring system can provide more comprehensive information, and can eliminate the blind area of monitoring, so as to effectively realize the effective monitoring of the target area.

然而，海量视频资源在传统的分镜头电视墙监控模式下，视频之间的空间位置关系难以很好地表达，传统分镜头监控模式具有画面相互孤立、缺乏关联性的应用局限。导致监控区域和监看目标在时空上产生非连续性；进而造成监控人员通过若干个单个视频信息，难以从宏观上对监控区域的安全态势进行及时、快速、有效地把控。However, in the traditional video wall monitoring mode of split shot video resources, it is difficult to express the spatial positional relationship between videos well, and the traditional split shot monitoring mode has the application limitations that the pictures are isolated from each other and lack of correlation. This leads to the discontinuity of the monitoring area and the monitoring target in time and space; furthermore, it is difficult for the monitoring personnel to control the security situation of the monitoring area timely, quickly and effectively from a macro perspective through several single video information.

因此，以物联网技术为基础而产生的三维全景视频监控融合系统应运而生。该监控系统的主要原理：将部署在不同地理位置的多路实时监控视频与监控区域的三维模型进行配准融合，生成大范围三维全景动态监控画面，实现监控区域整体态势信息的宏观掌控。三维全景视频监控系统大多融合了视频编码技术、网络传输技术、数据库技术、流媒体技术和嵌入式技术等应用系统。可以与现有的多媒体、控制和信息系统集成，实现数据和信息的共享。但是随着应用场景和使用需求的增加，亟需新的视频图像融合方法实现更高质量画面的三维全景视频监控。Therefore, the three-dimensional panoramic video surveillance fusion system based on the Internet of Things technology came into being. The main principle of the monitoring system is to register and fuse the multi-channel real-time monitoring videos deployed in different geographical locations with the three-dimensional model of the monitoring area to generate a large-scale three-dimensional panoramic dynamic monitoring picture, so as to realize the macro control of the overall situation information of the monitoring area. Most of the 3D panoramic video surveillance systems integrate video coding technology, network transmission technology, database technology, streaming media technology and embedded technology and other application systems. It can be integrated with existing multimedia, control and information systems to share data and information. However, with the increase of application scenarios and usage requirements, new video image fusion methods are urgently needed to achieve 3D panoramic video surveillance with higher quality images.

发明内容SUMMARY OF THE INVENTION

本发明提出了一种视频图像融合方法、融合装置、全景监控系统及存储介质，旨在解决现有技术中没有新的视频图像融合方法来提高画面质量适应新的应用场景及使用需求的问题。The present invention proposes a video image fusion method, a fusion device, a panoramic monitoring system and a storage medium, aiming to solve the problem that there is no new video image fusion method in the prior art to improve picture quality and adapt to new application scenarios and usage requirements.

根据本申请实施例的第一个方面，提供了一种视频图像融合方法，包括以下步骤：According to a first aspect of the embodiments of the present application, a video image fusion method is provided, comprising the following steps:

检测尺度空间极值点，并以极值点作为特征点；Detect extreme points in scale space, and use extreme points as feature points;

对特征点进行精准定位，获得各特征点的位置和尺度；Accurately locate the feature points to obtain the position and scale of each feature point;

确定以各特征点为中心特定大小的区域内、固定位置的像素点对；Determine the pixel point pair at a fixed position in an area of a specific size centered on each feature point;

逐一比较像素点对的灰度值，并根据灰度值的二值化比较结果生成描述符；Compare the grayscale values of pixel pairs one by one, and generate descriptors according to the binarized comparison results of the grayscale values;

将描述符输出到特征匹配函数得到融合图像。The fused image is obtained by outputting the descriptor to the feature matching function.

根据本申请实施例的第二个方面，提供了一种视频图像融合装置，具体包括：According to a second aspect of the embodiments of the present application, a video image fusion device is provided, which specifically includes:

检测模块，用于检测尺度空间极值点，并以所述极值点作为特征点；a detection module, used for detecting extreme points in the scale space, and using the extreme points as feature points;

定位模块，用于对所述特征点进行精准定位，获得各所述特征点的位置和尺度；A positioning module, used for accurately positioning the feature points, to obtain the position and scale of each of the feature points;

确定模块，用于确定以各特征点为中心特定大小的区域内、固定位置的像素点对；A determination module is used to determine a pixel point pair at a fixed position in an area of a specific size centered on each feature point;

比较模块，用于逐一比较所述像素点对的灰度值，并根据所述灰度值的二值化比较结果生成描述符；a comparison module for comparing the grayscale values of the pixel pairs one by one, and generating descriptors according to the binarization comparison result of the grayscale values;

输出模块，用于将所述描述符输出到特征匹配函数得到融合图像。The output module is used for outputting the descriptor to the feature matching function to obtain a fusion image.

根据本申请实施例的第三个方面，提供了一种全景监控系统，具体包括：According to a third aspect of the embodiments of the present application, a panoramic monitoring system is provided, which specifically includes:

前端视频采集单元，用于采集监控区域内视频信息；The front-end video acquisition unit is used to collect video information in the monitoring area;

根据以上所述的视频融合装置。According to the above-mentioned video fusion device.

根据本申请实施例的第四个方面，提供了一种计算机可读存储介质，其上存储有计算机程序；计算机程序被处理器执行以实现视频图像融合方法。According to a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided on which a computer program is stored; the computer program is executed by a processor to implement a video image fusion method.

采用本申请实施例中的视频图像融合方法，通过Brief算法对现有视频图像融合技术的Surf算法中对每个子区域进行哈尔小波响应计算得到描述符进行了改进，本申请通过对像素点对灰度值的逐一比较并通过二值化生成描述符，从而提高了计算速度，进一步提高了全景融合画面质量，解决现有技术中没有新的视频图像融合方法来提高画面质量适应新的应用场景及使用需求的问题。Using the video image fusion method in the embodiment of the present application, the descriptors obtained by calculating the Haar wavelet response of each sub-region in the Surf algorithm of the existing video image fusion technology are improved by using the Brief algorithm. The gray value is compared one by one and the descriptor is generated by binarization, thereby improving the calculation speed and further improving the quality of the panoramic fusion image, solving the problem that there is no new video image fusion method in the existing technology to improve the image quality and adapt to new application scenarios. and usage requirements.

附图说明Description of drawings

此处所说明的附图用来提供对本申请的进一步理解，构成本申请的一部分，本申请的示意性实施例及其说明用于解释本申请，并不构成对本申请的不当限定。在附图中：The drawings described herein are used to provide further understanding of the present application and constitute a part of the present application. The schematic embodiments and descriptions of the present application are used to explain the present application and do not constitute an improper limitation of the present application. In the attached image:

图1中示出了根据本申请实施例的全景视频监控系统模型图；Fig. 1 shows a model diagram of a panoramic video surveillance system according to an embodiment of the present application;

图2中示出了现有技术中基于Surf算法的视频图像融合方法的流程图；Fig. 2 shows the flow chart of the video image fusion method based on Surf algorithm in the prior art;

图3中示出了根据本申请实施例的一种视频图像融合方法的步骤流程图；FIG. 3 shows a flowchart of steps of a video image fusion method according to an embodiment of the present application;

图4中示出了根据本申请另一实施例的一种视频图像融合方法的流程图；Figure 4 shows a flowchart of a video image fusion method according to another embodiment of the present application;

图5中示出了根据本申请实施例的一种视频图像融合装置的结构示意图；FIG. 5 shows a schematic structural diagram of a video image fusion apparatus according to an embodiment of the present application;

图6中示出了根据本申请实施例的监控区域摄像头采集示意图；FIG. 6 shows a schematic diagram of camera acquisition in a monitoring area according to an embodiment of the present application;

图7中示出了根据本申请实施例的一种全景监控系统的结构示意图；FIG. 7 shows a schematic structural diagram of a panoramic monitoring system according to an embodiment of the present application;

图8中示出了根据本申请实施例的一种全景监控系统的处理流程图。FIG. 8 shows a process flow chart of a panoramic monitoring system according to an embodiment of the present application.

具体实施方式Detailed ways

在实现本申请的过程中，发明人发现随着电子信息科学技术的飞速发展，视频监控技术得到稳定提高和进步；其中，视频监控系统已经在大型公共活动场得到广泛的应用。相对于单一视点监控系统，全景视频监控系统可以提供更全面的信息，可以消除监控的盲区，从而高效地实现对目标区域的有效监控。三维全景视频监控融合系统可以从宏观上对监控区域的安全态势进行及时、快速、有效地把控。但是，随着应用场景和使用需求的增加，亟需新的视频图像融合方法实现更高质量画面的三维全景视频监控。In the process of realizing this application, the inventor found that with the rapid development of electronic information science and technology, video surveillance technology has been steadily improved and progressed; wherein, video surveillance systems have been widely used in large-scale public event venues. Compared with the single-view monitoring system, the panoramic video monitoring system can provide more comprehensive information, and can eliminate the blind area of monitoring, so as to effectively realize the effective monitoring of the target area. The three-dimensional panoramic video surveillance fusion system can control the security situation of the surveillance area in a timely, rapid and effective manner from a macro perspective. However, with the increase of application scenarios and usage requirements, new video image fusion methods are urgently needed to achieve 3D panoramic video surveillance with higher quality images.

针对上述问题，本申请实施例中提供了一种视频图像融合方法、融合装置、全景监控系统及存储介质，视频图像融合方法通过Brief算法对现有视频图像融合技术的Surf算法进行了改进，本申请通过对像素点对灰度值的逐一比较并通过二值化生成描述符，从而，进一步提高了全景融合画面质量，解决现有技术中没有新的视频图像融合方法来提高画面质量适应新的应用场景及使用需求的问题。In view of the above problems, the embodiments of the present application provide a video image fusion method, a fusion device, a panoramic monitoring system and a storage medium. The video image fusion method improves the Surf algorithm of the existing video image fusion technology through the Brief algorithm. The application compares the gray values of the pixels one by one and generates descriptors through binarization, thereby further improving the quality of the panoramic fusion picture, solving the problem that there is no new video image fusion method in the prior art to improve the picture quality and adapt to the new Application scenarios and usage requirements.

为了使本申请实施例中的技术方案及优点更加清楚明白，以下结合附图对本申请的示例性实施例进行进一步详细的说明，显然，所描述的实施例仅是本申请的一部分实施例，而不是所有实施例的穷举。需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。In order to make the technical solutions and advantages of the embodiments of the present application more clear, the exemplary embodiments of the present application will be described in further detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, and Not all embodiments are exhaustive. It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other under the condition of no conflict.

实施例1Example 1

图1中示出了根据本申请实施例的全景视频监控系统模型图。FIG. 1 shows a model diagram of a panoramic video surveillance system according to an embodiment of the present application.

三维全景视频监控融合系统的主要原理是：将部署在不同地理位置的多路实时监控视频与监控区域的三维模型进行配准融合，生成大范围三维全景动态监控画面，实现监控区域整体态势信息的宏观掌控。主要使用物联网技术、虚拟现实技术，将部署在不同地理位置的多路实时监控视频，与监控区域的三维模型进行校正和配准，使用三维渲染技术，将实时监控视频动态映射到三维模型中，从而生成大范围监控区域的三维全景监控画面。三维全景视频监控系统能够实现不同类型和规格的监控视频接入，同时也可以接入其他传感器信息、人员设备定位信息以及地理信息系统GIS信息等。它使用计算机视觉和三维渲染技术，将监控区域实时视频、三维模型、各类传感器和定位信息等融合显示在一个增强虚拟环境中，为系统用户提供了一个完整的、大区域的监控视图。The main principle of the 3D panoramic video surveillance fusion system is to register and fuse the multi-channel real-time surveillance videos deployed in different geographical locations with the 3D model of the surveillance area to generate a large-scale 3D panoramic dynamic surveillance picture, and realize the overall situation information of the surveillance area. macro control. Mainly using the Internet of Things technology and virtual reality technology, the multi-channel real-time surveillance video deployed in different geographical locations is corrected and registered with the 3D model of the surveillance area, and the 3D rendering technology is used to dynamically map the real-time surveillance video to the 3D model. , so as to generate a three-dimensional panoramic monitoring picture of a large-scale monitoring area. The 3D panoramic video surveillance system can realize different types and specifications of surveillance video access, and can also access other sensor information, personnel and equipment positioning information, and geographic information system GIS information. It uses computer vision and 3D rendering technology to fuse and display the real-time video, 3D model, various sensors and positioning information of the surveillance area in an enhanced virtual environment, providing system users with a complete and large area surveillance view.

全景视频监控系统主要包括：前端视频采集设备，是由部署在不同地理位置的多种不同型号监控摄像探头，采集监测区域视频信息并通过光纤通信设备传回网络中心。图像处理单元，主要负责处理图像去噪、增强、裁剪、配准标定、融合预处理等操作；调度单元，主要功能提供策略、特征以及监控视频的存储设备，对系统运行状态如网络流量、服务器负载等进行监控，然后优化分配和有效调度融合处理作业进行，并对融合视频源进行调度管理等操作；融合处理单元提供人机交互功能，实现图像化界面功能操作，主要包括三维全景监控视频的实时显示、融合策略库的维护、安全态势显示等，融合视频画面以 VGA或HDMI信号输出到大屏显示器。The panoramic video surveillance system mainly includes: front-end video acquisition equipment, which is a variety of different types of surveillance camera probes deployed in different geographical locations to collect video information in the monitoring area and transmit it back to the network center through optical fiber communication equipment. The image processing unit is mainly responsible for processing image denoising, enhancement, cropping, registration calibration, fusion preprocessing and other operations; the scheduling unit is mainly responsible for providing strategies, features and monitoring video storage devices, and for system operation status such as network traffic, server Monitor the load, etc., and then optimize the allocation and effective scheduling of fusion processing operations, and perform operations such as scheduling management of the fusion video source; the fusion processing unit provides human-computer interaction functions and realizes the operation of graphical interface functions, mainly including 3D panoramic surveillance video. Real-time display, maintenance of fusion policy library, security situation display, etc. The fusion video image is output to the large-screen display with VGA or HDMI signal.

其中，融合处理单元采用的视频图像融合技术就是将一组在时间和空间上存在相关联的图像进行配准处理，融合成一幅包含这组图像序列信息的宽视野的、完整的新图像。Among them, the video image fusion technology adopted by the fusion processing unit is to perform registration processing on a group of images that are related in time and space, and fuse them into a wide-field and complete new image containing the sequence information of this group of images.

视频图像配准的主要原理为：根据两幅具有重叠区域图像间重叠区域的匹配特征点对的位置关系估计图像间的点变换关系，然后按变换关系进行拼接；从而，将此两副图像采样到一副更大的空白图像中；采样过程中，对两幅图像的交接边界处进行颜色平滑过渡，完成融合处理，形成一幅融合的包含副图像信息的新图像。The main principle of video image registration is: estimate the point transformation relationship between the images according to the positional relationship of the matching feature point pairs in the overlapping area between the two images with overlapping areas, and then splicing them according to the transformation relationship; thus, the two images are sampled. into a larger blank image; during the sampling process, a smooth color transition is performed on the border of the two images to complete the fusion process to form a fused new image containing the information of the secondary image.

视频图像拼接融合技术主要分为两种：Video image stitching and fusion technology is mainly divided into two types:

其中一种视频图像拼接融合技术：单摄像机采集的序列视频流，其一是将运行摄像机的视频序列图像以最大25帧/s的速率进行解帧，对单针视频图像进行预处理后，利用序列中图像间的相关性，对图像进行配准并进行图像拼接，形成一个大视场范围的全景图。One of the video image splicing and fusion technologies: the sequence video stream collected by a single camera, the first is to deframe the video sequence image of the running camera at a maximum rate of 25 frames/s The correlation between the images in the sequence, the images are registered and the images are stitched to form a panorama with a large field of view.

另一种视频图像拼接融合技术：多路以某种相对位置固定安装的且相邻镜头具有一定重叠场景的摄像头组同时采集的多路视频流。拼接流程采用对两两相邻摄像机视频流的重叠区域进行加权融合拼接而成全景视频图像。Another video image splicing and fusion technology: multi-channel video streams collected at the same time by multiple camera groups that are fixedly installed in a certain relative position and have a certain overlapping scene between adjacent lenses. The stitching process uses weighted fusion and stitching of the overlapping areas of the video streams of two adjacent cameras to form a panoramic video image.

为减少同一物体在不同图像中其灰度均值出现的不同程序偏移的影响，在进行图像拼接融合前，通常情况下需要进行图像预处理。图像预处理通常采用边缘检测的方法提取图像的边缘，使图像的轮廓更为清晰，例如边缘提取算子采用Sobel算子、Canny算子或高斯的拉普拉斯算子等。In order to reduce the influence of different program shifts in the gray mean value of the same object in different images, image preprocessing is usually required before image stitching and fusion. Image preprocessing usually uses edge detection to extract the edge of the image to make the contour of the image clearer. For example, the edge extraction operator uses Sobel operator, Canny operator or Laplacian of Gaussian.

图像的配准是图像融合的重要步骤，图像选取点在选取时，可选角点、纹理特征点或其他突出点，特征点的特征选取可为亮度、颜色或者纹理结构等。通过在匹配区域的匹配点的特征匹配，搜寻需要融合的两副图像的旋转、平移、缩放等变换关系；从而对两幅图像重新定位，然后采用像素及图像融合加权法，实现两幅图像重叠区域的融合处理。Image registration is an important step in image fusion. When selecting image selection points, corner points, texture feature points or other prominent points can be selected. The feature selection of feature points can be brightness, color or texture structure. Through the feature matching of the matching points in the matching area, the transformation relationship such as rotation, translation and scaling of the two images to be merged is searched; thus the two images are repositioned, and then the pixel and image fusion weighting method is used to realize the overlapping of the two images. Region fusion processing.

通常采用渐入渐出的融合加权法对两幅图像重叠区域的每个像素值按距离重叠区域边缘的远近进行加权，离边缘越近加权系数越小，计算每个拼接图像重叠区域每个位置的像素值加权和配准到拼接图像(x,y)坐标点的像素值 F(x,y)，在拼接图像的该像素位置赋予该新的加权像素值。Usually, the fusion weighting method of fading in and fading out is used to weight each pixel value in the overlapping area of the two images according to the distance from the edge of the overlapping area. The closer to the edge, the smaller the weighting coefficient. The weighted sum of the pixel value of the stitched image is registered to the pixel value F(x,y) of the coordinate point of the stitched image (x, y), and the new weighted pixel value is assigned to the pixel position of the stitched image.

F(x,y)基本公式为：The basic formula of F(x,y) is:

其中，F₁(X₁,Y₁)和F₂(X₂,Y₂)是原两幅图像的配准对应点(X₁,Y₁)和(X₂,Y₂) 的像素值，

为加权系数，

Among them, F₁ (X₁ , Y₁ ) and F₂ (X₂ , Y₂ ) are the pixel values of the registration corresponding points (X₁ , Y₁ ) and (X₂ , Y₂ ) of the original two images,

is the weighting coefficient,

上述融合算法提高了融合区域图像的信噪比，但降低了图像的对比度，当目标检测与跟踪时，目标处于融合区域时，会对检测与跟踪的精度造成影响。The above fusion algorithm improves the signal-to-noise ratio of the image in the fusion area, but reduces the contrast of the image. When the target is detected and tracked, when the target is in the fusion area, it will affect the accuracy of detection and tracking.

根据上述技术理解以及对现有视频监控系统以及图像拼接融合技术的需求分析，本申请实施例的视频图像融合方法基于三维全景视频监控系统模型提出的。本申请实施例的视频图像融合方法主要使用基于特定点检测的图像拼接算法Surf(Speed UpFeatures)算法进行了改进，将相应监控区域视频影像融合成一个全景视频影像。According to the above-mentioned technical understanding and the demand analysis for the existing video surveillance system and the image stitching fusion technology, the video image fusion method in the embodiment of the present application is proposed based on the three-dimensional panoramic video surveillance system model. The video image fusion method of the embodiment of the present application is mainly improved by using the image stitching algorithm Surf (Speed Up Features) algorithm based on specific point detection, and fuses the video images of the corresponding monitoring area into a panoramic video image.

图2中示出了现有技术中基于Surf算法的视频图像融合方法的流程图。FIG. 2 shows a flowchart of a video image fusion method based on the Surf algorithm in the prior art.

如图2所示，Surf算法是一种特征点检测算法，Surf算法的主要流程包括：As shown in Figure 2, the Surf algorithm is a feature point detection algorithm. The main process of the Surf algorithm includes:

尺度空间极值点的检测：Scale space extreme point detection:

Surf算法基于Hessian矩阵找到极值点作为特征点，同时在找极值点的过程中采用了加速运算量的方法，因此Surf算法在一定程度上提高了运算速度。对图像中特定点的Hessian矩阵定义如下：The Surf algorithm finds the extreme point as the feature point based on the Hessian matrix, and adopts the method of accelerating the calculation amount in the process of finding the extreme point, so the Surf algorithm improves the calculation speed to a certain extent. The Hessian matrix for a specific point in the image is defined as follows:

其中,L_xx(x,y,σ)是高斯二阶偏导在该像素点处于图像的卷积。Among them, L_xx (x, y, σ) is the convolution of the Gaussian second-order partial derivative in the image at this pixel.

Surf算法将此高斯二阶梯度模板用盒模型近似表示，即图像进行离散化且进一步剪裁为9*9的方格，Hessian矩阵在检测图像的特征点时是通过计算图像中所有像素的Hessian矩阵的行列式得到的。所以Hessian矩阵所得到的极值点就是特征点。The Surf algorithm approximates this Gaussian second-order gradient template with a box model, that is, the image is discretized and further trimmed into a 9*9 square. When detecting the feature points of the image, the Hessian matrix is calculated by calculating the Hessian matrix of all pixels in the image. obtained by the determinant of . Therefore, the extreme points obtained by the Hessian matrix are the characteristic points.

特征点的精确定位：Precise positioning of feature points:

在每个候选特征点上使用泰勒级数插值拟合方法用于确定特性点的位置和尺度，而特征点的最后选取依赖其稳定程度。On each candidate feature point, the Taylor series interpolation fitting method is used to determine the position and scale of the feature point, and the final selection of the feature point depends on its stability.

选取特征点的主方向：Pick the main direction of the feature point:

遍历所有特征点，Surf算法通过统计特征点区域内的哈尔小波特征选取计算特征点的主方向。After traversing all the feature points, the Surf algorithm selects and calculates the main direction of the feature points by counting the Haar wavelet features in the feature point area.

遍历所有特征点，获取特征点最大尺度。Traverse all feature points to obtain the maximum scale of feature points.

计算得到特征点的描述符：Calculate the descriptor of the feature point:

首先以特征点为中心，建立以20S为边长的正方形领域，但建立的这个正方形领域的方向要求与该特征点的主方向相同；然后将这个领域划分为4*4的子区域，并在每个子区域内等间隔采样出25个采样像素。First, take the feature point as the center, establish a square field with 20S as the side length, but the direction of the established square field is required to be the same as the main direction of the feature point; then divide this field into 4*4 sub-regions, and in the 25 sampling pixels are sampled at equal intervals in each sub-region.

然后对每个子区域内的每个像素点的水平和垂直与主方向的哈尔小波特征值进行求解，每个区域内的哈尔小波特征为水平方向之和、水平方向模值之和、垂直方向之和以及垂直方向模值之和；同时对每个像素点的哈尔响应值进行加权处理，取方差为3.3S。最后每个子区域可用一个四维矢量表示，每个特征点可以用一个16*4＝64维向量表示，此64维特征向量即为该特征点的描述符。Then, the Haar wavelet eigenvalues in the horizontal and vertical and main directions of each pixel in each sub-region are solved. The Haar wavelet features in each region are the sum of the horizontal direction, the sum of the The sum of the directions and the sum of the modulus values in the vertical direction; at the same time, the Haar response value of each pixel is weighted, and the variance is 3.3S. Finally, each sub-region can be represented by a four-dimensional vector, and each feature point can be represented by a 16*4=64-dimensional vector, and the 64-dimensional feature vector is the descriptor of the feature point.

本申请实施例的视频图像融合方法在Surf算法基础上，基于Brief算法对 Surf算法进行改进。Brief算法是将特征点用二值化的位字符串形式描述，可以更快地测量两个描述符的相似程度，通过按位进行异或位操作，结果中“1” 的数量越多，两个描述符的相似性越差。The video image fusion method of the embodiment of the present application improves the Surf algorithm based on the Brief algorithm on the basis of the Surf algorithm. The Brief algorithm describes the feature points in the form of binarized bit strings, which can measure the similarity of two descriptors more quickly. The similarity of the descriptors is worse.

具体对本申请实施例的视频图像融合方法进行描述如下。The video image fusion method according to the embodiment of the present application is specifically described as follows.

图3中示出了根据本申请实施例的一种视频图像融合方法的步骤流程图。FIG. 3 shows a flowchart of steps of a video image fusion method according to an embodiment of the present application.

如图3所示，本实施例的视频图像融合方法，具体包括以下步骤：As shown in Figure 3, the video image fusion method of this embodiment specifically includes the following steps:

S101：检测尺度空间极值点，并以所述极值点作为特征点。S101: Detect extreme points in the scale space, and use the extreme points as feature points.

具体的，所述检测尺度空间极值点，具体包括：Specifically, the detection of extreme points in the scale space specifically includes:

对图像中的特定点通过Hessian矩阵定义得到特征点，所述Hessian矩阵具体为：For a specific point in the image, the feature point is obtained by defining the Hessian matrix, and the Hessian matrix is specifically:

其中，L_xx(x,y,σ)是高斯二阶偏导在该像素点处于图像的卷积。Among them, L_xx (x, y, σ) is the convolution of the Gaussian second-order partial derivative in the image at this pixel point.

S102：对所述特征点进行精准定位，获得各所述特征点的位置和尺度；S102: Accurately locate the feature points to obtain the position and scale of each of the feature points;

S103：确定以各特征点为中心特定大小的区域内、固定位置的像素点对。S103: Determine a pixel point pair at a fixed position within an area of a specific size centered on each feature point.

优选地，以各特征点为中心特定大小的区域采用以特征点为中心特定半径的圆形区域内。Preferably, the area of a specific size centered on each feature point is a circular area with a specific radius centered on the feature point.

优选地，以各特征点为中心特定大小的区域采用以特征点为中心特定长度和宽度的矩形区域内。Preferably, the area of a specific size centered on each feature point is a rectangular area with a specific length and width centered on the feature point.

S104：逐一比较所述像素点对的灰度值，并根据所述灰度值的二值化比较结果生成描述符。S104: Compare the grayscale values of the pixel pairs one by one, and generate a descriptor according to the binarization comparison result of the grayscale values.

S105：将所述描述符输出到特征匹配函数得到融合图像。S105: Output the descriptor to a feature matching function to obtain a fusion image.

具体的，步骤S103及S104为：Specifically, steps S103 and S104 are:

在获取到特征点序列后，以每个特征点为中心定义一个大小S*S的补丁区域。After the feature point sequence is obtained, a patch area of size S*S is defined with each feature point as the center.

在所述区域内固定的位置选择nd个像素点对。Select nd pixel pairs at fixed positions within the region.

根据二值化公式对所述像素点对进行比较得到二值化比较结果，将所述二值化比较结果串成二值位字符串形成所有特征点的描述符。The pixel point pair is compared according to the binarization formula to obtain a binarization comparison result, and the binarization comparison result is stringed into a binary bit string to form a descriptor of all feature points.

所述二值化公式为：The binarization formula is:

其中，I(P_i)和I(Q_i)分别表示第i个像素点对的两个像素 P_i和Q_i的灰度值。Among them, I(P_i ) and I(Q_i ) represent the grayscale values of the two pixels P_i and Q_i of the i-th pixel point pair, respectively.

根据公式取到若干字节数的二值化比较值后，把补丁区域内所有点对的比较结果串成一个二值位字符串的形式，即形成该特征点的描述符BAfter the binary comparison value of several bytes is obtained according to the formula, the comparison results of all point pairs in the patch area are strung into a binary bit string form, that is, the descriptor B of the feature point is formed.

所述描述符B公式具体为：The descriptor B formula is specifically:

B＝P₀P₁..P_nd。B=P₀ P₁ ..P_nd .

依序取得所有特征点的描述符，排列成描述符序列，用于后续的图像匹配。The descriptors of all feature points are obtained in sequence and arranged into a descriptor sequence for subsequent image matching.

图4中示出了另一种视频图像融合方法的流程图。FIG. 4 shows a flow chart of another video image fusion method.

如图4所示，优选地，在S102步骤中所述获得各特征点的位置和尺度后还包括：As shown in Figure 4, preferably, after obtaining the position and scale of each feature point described in step S102, it also includes:

遍历所有特征点，对各个特征点周边区域进行均值滤波计算获取特征点最大尺度。Traverse all feature points, and perform mean filtering calculation on the surrounding area of each feature point to obtain the maximum scale of feature points.

优选地，在S102步骤中在所述获得各特征点的位置和尺度后还包括：Preferably, in step S102, after obtaining the position and scale of each feature point, it further includes:

遍历所有特征点，计算各特征点的方向。Traverse all feature points and calculate the direction of each feature point.

优选地，本申请实施例的视频图像融合方法，还包括以下步骤：Preferably, the video image fusion method in the embodiment of the present application further includes the following steps:

S106：对所述融合图像进行运动检测生成多个目标的运动特征分析；S106: Perform motion detection on the fused image to generate motion feature analysis of multiple targets;

S107：根据运动特征分析提取特定目标；S107: Extract a specific target according to the motion feature analysis;

S108：对所述特定目标进行追踪调整视频采集数据。S108: Track and adjust the video capture data for the specific target.

实施例2Example 2

图5中示出了根据本申请实施例的一种视频图像融合装置的结构示意图。FIG. 5 shows a schematic structural diagram of a video image fusion apparatus according to an embodiment of the present application.

如图5所示，视频图像融合装置具体包括：As shown in Figure 5, the video image fusion device specifically includes:

检测模块10，用于检测尺度空间极值点，并以所述极值点作为特征点；Adetection module 10, configured to detect extreme points in the scale space, and use the extreme points as feature points;

定位模块20，用于对所述特征点进行精准定位，获得各所述特征点的位置和尺度；Thepositioning module 20 is used to accurately position the feature points to obtain the position and scale of each of the feature points;

确定模块30，用于确定以各特征点为中心特定大小的区域内、固定位置的像素点对；Determiningmodule 30, for determining the pixel point pair at a fixed position in a region of a specific size centered on each feature point;

比较模块40，用于逐一比较所述像素点对的灰度值，并根据所述灰度值的二值化比较结果生成描述符；Acomparison module 40, configured to compare the grayscale values of the pixel pairs one by one, and generate a descriptor according to the binarization comparison result of the grayscale values;

输出模块50，用于将所述描述符输出到特征匹配函数得到融合图像。Theoutput module 50 is configured to output the descriptor to a feature matching function to obtain a fusion image.

优选地，所述确定模块包括：Preferably, the determining module includes:

定义单元，用于以每个特征点为中心定义一个大小S*S的补丁区域；A definition unit is used to define a patch area of size S*S with each feature point as the center;

选择单元，用于在所述区域内固定的位置选择nd个像素点对。The selection unit is used for selecting nd pixel point pairs at fixed positions in the area.

优选地，所述比较模块包括：Preferably, the comparison module includes:

比较单元，用于根据二值化公式对所述像素点对进行比较得到二值化比较结果：The comparison unit is used to compare the pair of pixels according to the binarization formula to obtain the binarization comparison result:

排列单元，用于将所述二值化比较结果串成二值位字符串形成所有特征点的描述符。The arrangement unit is used for stringing the binarized comparison result into a binary bit string to form a descriptor of all feature points.

优选地，本申请实施例的视频图像融合装置还包括：Preferably, the video image fusion apparatus in the embodiment of the present application further includes:

运动检测模块，用于对所述融合图像进行运动检测生成多个目标的运动特征分析；A motion detection module, for performing motion detection on the fused image to generate a motion feature analysis of multiple targets;

目标提取模块，用于根据运动特征分析提取特定目标；The target extraction module is used to extract specific targets according to the motion feature analysis;

追踪模块，用于对所述特定目标进行追踪调整视频采集数据。The tracking module is used for tracking and adjusting the video acquisition data for the specific target.

本申请实施例中，在前端视频采集设备使用具有不同IP地址的摄像探头，通过网络连接至视频监控融合服务器，此模块将融合所生成的图像传输到控制服务中心；控制服务中心进行运动检测、目标提取、目标追踪等图像处理，控制服务中心还包括视频浏览软件，云台控制软件等模块。In the embodiment of the present application, the front-end video acquisition device uses camera probes with different IP addresses, and is connected to the video surveillance fusion server through the network. This module transmits the image generated by fusion to the control service center; the control service center performs motion detection, Target extraction, target tracking and other image processing, the control service center also includes video browsing software, PTZ control software and other modules.

图6中示出了根据本申请实施例的监控区域摄像头采集示意图。FIG. 6 shows a schematic diagram of camera acquisition in a monitoring area according to an embodiment of the present application.

如图6所示，采用六个摄像头作为摄像机组，摄像头1、摄像头2、摄像头3、摄像头4、摄像头5及摄像头6的监控区域中，相邻的摄像头的两侧监测辐射区域进行重叠产生交集。本申请相邻摄像机有重叠视觉区域，安装中，尽量提高安装经度并使其位于同一水平面可提高匹配精度。As shown in Figure 6, six cameras are used as the camera group. In the monitoring areas ofcamera 1,camera 2, camera 3, camera 4, camera 5 and camera 6, the monitoring radiation areas on both sides of adjacent cameras overlap to generate intersections . Adjacent cameras in the present application have overlapping visual areas. During installation, the installation longitude should be maximized and placed on the same horizontal plane to improve the matching accuracy.

实施例3Example 3

图7中示出了根据本申请实施例的一种全景监控系统的结构示意图。FIG. 7 shows a schematic structural diagram of a panoramic monitoring system according to an embodiment of the present application.

如图5所示，本实施例提供的一种全景监控系统400，具体包括：As shown in FIG. 5 , apanoramic monitoring system 400 provided in this embodiment specifically includes:

前端视频采集单元401，用于采集监控区域内视频信息；The front-endvideo collection unit 401 is used to collect video information in the monitoring area;

视频融合装置402，根据如上所提供的视频融合装置402。Thevideo fusion device 402 is based on thevideo fusion device 402 provided above.

其中，采用N个摄像机监测采集视频信息，然后分别对每一路视频信息进行视频源编码；然后将N个视频源编码数据汇总后进行视频图像融合；最后对拼接融合后的图像进行运动检测，对每次检测生成的多个目标的运动特征分析，在特定场景区域中的特定行为特征的目标作为跟踪对象；根据目标在拼接图像中的方位、俯仰位置，控制中心发送云台控制命令，调整云台旋转的方位、俯仰角度，使目标位于云台摄像机的视场中心，并调整镜头的焦距、光圈等参数，使目标成像清晰，便于系统扩展后的目标识别。Among them, N cameras are used to monitor and collect video information, and then video source coding is performed on each channel of video information respectively; then the coded data of N video sources are aggregated to perform video image fusion; finally, motion detection is performed on the spliced and fused images, and The motion feature analysis of multiple targets generated by each detection, the target with specific behavioral characteristics in a specific scene area is used as the tracking object; according to the azimuth and pitch position of the target in the spliced image, the control center sends the PTZ control command to adjust the cloud The azimuth and pitch angles of the rotation of the stage are adjusted so that the target is located in the center of the field of view of the PTZ camera, and the focal length, aperture and other parameters of the lens are adjusted to make the image of the target clear and facilitate the target recognition after the system is expanded.

本实施例还提供一种计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行以实现如上任一内容所提供的视频图像融合方法。This embodiment also provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the video image fusion method provided by any of the above contents.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、 CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和 /或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/ 或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flows of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

尽管已描述了本申请的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。While the preferred embodiments of the present application have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of this application.

显然，本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样，倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内，则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the spirit and scope of the present application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and the technical equivalents thereof, the present application is also intended to include these modifications and variations.

Claims

1. A video image fusion method is characterized by comprising the following steps:

detecting an extreme point of the scale space, and taking the extreme point as a feature point;

accurately positioning the characteristic points to obtain the positions and the scales of the characteristic points;

determining pixel point pairs with fixed positions in an area with a specific size and taking each characteristic point as a center;

comparing the gray values of the pixel point pairs one by one, and generating descriptors according to the binarization comparison result of the gray values;

and outputting the descriptor to a feature matching function to obtain a fused image.

2. The video image fusion method according to claim 1, further comprising, after said obtaining the position and the scale of each feature point:

and traversing all the feature points, and carrying out mean filtering calculation on the peripheral area of each feature point to obtain the maximum dimension of the feature point.

3. The video image fusion method according to claim 1, further comprising, after said obtaining the position and the scale of each feature point:

and traversing all the feature points, and calculating the main direction of each feature point.

4. The video image fusion method according to claim 1, wherein the detecting the scale space extreme point specifically includes:

defining a Hessian matrix for specific points in the image to obtain characteristic points, wherein the Hessian matrix specifically comprises the following components:

wherein L is_xx(x, y, σ) is the convolution of the gaussian second order partial derivative at the pixel point in the image.

5. The video image fusion method according to claim 1, wherein the determining of the pairs of pixel points at fixed positions in the area of a specific size centered on each feature point specifically comprises:

defining a patch area with the size S by taking each feature point as a center;

nd pairs of pixel points are selected at fixed locations within the region.

6. The video image fusion method according to claim 1, wherein the comparing the gray values of the pixel point pairs one by one and generating the descriptor according to the binarization comparison result of the gray values specifically comprises:

comparing the pixel point pairs according to a binarization formula to obtain a binarization comparison result:

and forming the binary comparison result into a binary bit string to form descriptors of all the feature points.

7. The video image fusion method according to claim 6, wherein the binarization formula is:

wherein, I (P)_i) And I (Q)_i) Two pixels P respectively representing the ith pixel point pair_iAnd Q_iThe gray value of (a).

8. The video image fusion method according to claim 1, wherein the descriptor formula is specifically:

B＝P₁..P_nd。

9. the video image fusion method according to claim 1, further comprising:

performing motion detection on the fused image to generate motion characteristic analysis of a plurality of targets;

analyzing and extracting a specific target according to the motion characteristics;

and tracking and adjusting the video acquisition data of the specific target.

10. A video image fusion device is characterized by specifically comprising:

the detection module is used for detecting an extreme point of the scale space and taking the extreme point as a characteristic point;

the positioning module is used for accurately positioning the characteristic points to obtain the positions and the dimensions of the characteristic points;

the determining module is used for determining pixel point pairs with fixed positions in an area with a specific size by taking each characteristic point as a center;

the comparison module is used for comparing the gray values of the pixel point pairs one by one and generating descriptors according to the binarization comparison result of the gray values;

and the output module is used for outputting the descriptor to a feature matching function to obtain a fused image.

11. The video image fusion apparatus according to claim 10, wherein the determining module comprises:

the defining unit is used for defining a patch area with the size S & ltS & gt by taking each feature point as the center;

a selecting unit for selecting nd pairs of pixel points at fixed positions within the area.

12. The video image fusion apparatus of claim 10, wherein the comparing module comprises:

the comparison unit is used for comparing the pixel point pairs according to a binarization formula to obtain a binarization comparison result:

and the arrangement unit is used for forming the binary comparison result into a binary bit string to form descriptors of all the feature points.

13. The video image fusion apparatus according to claim 10, further comprising:

the motion detection module is used for carrying out motion detection on the fused image to generate motion characteristic analysis of a plurality of targets;

the target extraction module is used for extracting a specific target according to the motion characteristic analysis;

and the tracking module is used for tracking and adjusting the video acquisition data of the specific target.

14. A panoramic monitoring system is characterized by specifically comprising:

the front-end video acquisition unit is used for acquiring video information in the monitoring area;

the video fusion device of any one of claims 10-13.

15. A computer-readable storage medium, having stored thereon a computer program; the computer program is executed by a processor to implement the video image fusion method according to any one of claims 1 to 9.