CN118037549A

Movatterモバイル変換

Info

Publication number: CN118037549A
Application number: CN202410430364.7A
Authority: CN
Inventors: 郭锴凌; 刘旭彬; 黄寅; 徐向民
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2024-04-11
Filing date: 2024-04-11
Publication date: 2024-05-14
Anticipated expiration: 2044-04-11
Also published as: CN118037549B

Abstract

The invention discloses a video enhancement method and a video enhancement system based on video content understanding, relates to a video processing technology, and provides a scheme for solving the problem of unreasonable distribution of super processing resources in the prior art. Identifying key areas and non-key areas in the low-definition video through key object texts input by a user, and carrying out corresponding super-resolution reconstruction on different areas to obtain the high-definition video; and carrying out interpolation reconstruction on the non-key region, and carrying out deep learning reconstruction on the key region. The method has the advantages that the key areas and the non-key areas in the video are identified and divided according to the preference text input by the user by combining a simple and efficient traditional interpolation algorithm and a deep learning super-division technology with excellent effect. The problems of excessive processing and resource waste caused by deep learning oversubstance of the complete area and poor effect caused by completely using an interpolation algorithm are avoided. On the premise of realizing the super-resolution visual effect, the resource consumption and the processing time are saved.

Description

Translated fromChinese

一种基于视频内容理解的视频增强方法及系统A video enhancement method and system based on video content understanding

技术领域Technical Field

本发明涉及视频处理技术，尤其涉及一种基于视频内容理解的视频增强方法及系统。The present invention relates to video processing technology, and in particular to a video enhancement method and system based on video content understanding.

背景技术Background technique

随着通信、电子技术的发展，网络的传输速度飞速增长，越来越多的显示设备支持显示2K、4K等分辨率的视频，用户对于高清晰度的视频需求日益增加。然而，受限于视频的制作时期及拍摄设备，现有的大量视频仍然是较低分辨率的视频，因此，实现视频超分辨率重建，提升视频的质量，是十分必要的。With the development of communication and electronic technology, the transmission speed of the network has increased rapidly, and more and more display devices support the display of 2K, 4K and other resolution videos. Users' demand for high-definition videos is increasing. However, due to the limitations of video production period and shooting equipment, a large number of existing videos are still low-resolution videos. Therefore, it is very necessary to achieve video super-resolution reconstruction and improve video quality.

视频超分辨率重建是指将分辨率较低的视频通过上采样方法转化为具有高分辨率的高清视频。传统视频超分辨率方法如插值等是利用目标像素的周围像素进行插值计算得到新的像素值，这类方法简单、高效，能快速实现视频上采样，并在一定程度上改善视频的视觉质量，然而由于缺乏对图像整体以及视频不同帧之间信息的考虑，处理后的视频往往在细节和清晰度方面不尽人意。Video super-resolution reconstruction refers to converting a low-resolution video into a high-definition video with high resolution through upsampling methods. Traditional video super-resolution methods such as interpolation use the surrounding pixels of the target pixel to interpolate and calculate the new pixel value. This type of method is simple and efficient, can quickly achieve video upsampling, and improve the visual quality of the video to a certain extent. However, due to the lack of consideration of the overall image and the information between different frames of the video, the processed video is often unsatisfactory in terms of details and clarity.

此外，随着深度学习技术的发展，基于深度学习的视频超分方法已成为目前视频超分辨率重建的主流方法。基于深度学习的超分辨率重建方法旨在通过训练神经网络模型来学习低分辨率视频及与其对应的高分辨率视频之间的映射关系。当训练完成后，该模型可以将低分辨率视频作为输入，从而生成对应的高分辨率视频。基于深度学习的视频超分辨率重建方法能够实现高精度的图像恢复，提供更加清晰和逼真的图像效果，然而，由于神经网络对于整幅图像的推理过程通常需要大量的计算资源，故对硬件设备的性能有较高要求，处理耗时也很长。In addition, with the development of deep learning technology, the video super-resolution method based on deep learning has become the mainstream method of video super-resolution reconstruction. The super-resolution reconstruction method based on deep learning aims to learn the mapping relationship between low-resolution videos and their corresponding high-resolution videos by training a neural network model. After the training is completed, the model can use low-resolution videos as input to generate corresponding high-resolution videos. The video super-resolution reconstruction method based on deep learning can achieve high-precision image restoration and provide clearer and more realistic image effects. However, since the reasoning process of the neural network for the entire image usually requires a lot of computing resources, it has high requirements on the performance of the hardware equipment and the processing time is also long.

由于人的注意力机制，人们只对外界信息进行选择性关注和处理，所以在实际中，人们往往只对视频中的主要部分（例如人物、物体）更加关注，而对背景部分的要求则相对较低。传统插值算法在处理速度上具有优势，但无法充分满足对于视频重点区域的质量要求，而深度学习方法虽然能够实现优异的超分效果，但其对不重要的背景区域进行了过度处理，资源消耗较大，造成了计算资源的浪费，难以在实际中进行广泛应用。Due to the human attention mechanism, people only pay selective attention to and process external information. Therefore, in practice, people tend to pay more attention to the main parts of the video (such as people and objects), while the requirements for the background are relatively low. Traditional interpolation algorithms have advantages in processing speed, but cannot fully meet the quality requirements for key areas of the video. Although deep learning methods can achieve excellent super-resolution effects, they over-process unimportant background areas, consume a lot of resources, and waste computing resources, making it difficult to be widely used in practice.

发明内容Summary of the invention

本发明目的在于提供一种基于视频内容理解的视频增强方法及系统，以解决上述现有技术存在的问题。The present invention aims to provide a video enhancement method and system based on video content understanding to solve the problems existing in the above-mentioned prior art.

本发明中所述一种基于视频内容理解的视频增强方法，通过用户输入的关键物体文本识别低清视频中的关键区域和非关键区域，并对不同区域进行对应的超分辨率重建以得到高清视频；其中，对非关键区域进行插值重建，对关键区域进行深度学习重建。The video enhancement method based on video content understanding described in the present invention identifies key areas and non-key areas in a low-definition video through key object text input by a user, and performs corresponding super-resolution reconstruction on different areas to obtain a high-definition video; wherein, interpolation reconstruction is performed on non-key areas, and deep learning reconstruction is performed on key areas.

具体包括以下步骤：The specific steps include:

S100：将待增强的低清视频进行逐帧裁切处理，得到低清图像序列帧；S100: performing frame-by-frame cropping on the low-definition video to be enhanced to obtain a low-definition image sequence frame;

S200：获取用户输入的关键物体文本；S200: Obtain key object text input by the user;

S300：根据关键物体文本识别出低清图像序列帧的关键区域和非关键区域；S300: identifying key areas and non-key areas of low-definition image sequence frames according to key object texts;

S400：将关键区域输入视频超分网络得到关键区域超分结果；S400: inputting the key region into the video super-resolution network to obtain a key region super-resolution result;

S500：对非关键区域利用插值算法得到非关键区域超分结果；S500: using an interpolation algorithm to obtain a super-resolution result of the non-critical area;

S600：逐帧对关键区域超分结果和非关键区域超分结果进行融合；S600: fusing the super-resolution results of the key area and the super-resolution results of the non-key area frame by frame;

S700：将融合后的超分图像序列帧转化为视频，得到所述高清视频。S700: Convert the fused super-resolution image sequence frames into a video to obtain the high-definition video.

在所述步骤S300中，将低清图像序列帧以及关键物体文本输入至基于语言引导的开集目标检测器网络中，输出得到与低清视频逐帧对应的包含关键物体的矩形边界框；矩形边界框内的区域为关键区域，矩形边界框以外的区域为非关键区域。In step S300, the low-definition image sequence frames and the key object text are input into a language-guided open-set target detector network, and a rectangular bounding box containing the key object corresponding to each frame of the low-definition video is output; the area within the rectangular bounding box is the key area, and the area outside the rectangular bounding box is the non-key area.

在所述步骤S400中，将低清图像序列帧中的每N帧为一组进行分组；在每组低清图像序列帧中遍历N帧图像内的关键区域矩形边界框，找到完全包含此N个矩形边界框的最小矩形边界框；利用所述最小矩形边界框对当前组内的N帧图像序列进行裁剪；将剪后的每组低清图像序列帧输入至视频超分网络中得到所述关键区域超分结果。In the step S400, every N frames in the low-definition image sequence frames are grouped as a group; in each group of low-definition image sequence frames, the key area rectangular bounding boxes in the N frame images are traversed to find the minimum rectangular bounding box that completely contains the N rectangular bounding boxes; the N frame image sequence in the current group is cropped using the minimum rectangular bounding box; each group of low-definition image sequence frames after cropping is input into the video super-resolution network to obtain the key area super-resolution result.

在所述步骤S500中，所述插值算法为双三次插值算法。In the step S500, the interpolation algorithm is a bicubic interpolation algorithm.

在所述步骤S600中，根据上采样倍数得到插值上采样图像中的关键区域位置；然后，逐帧将超分辨率重建后的关键区域图像加权融合至所述插值上采样图像的关键区域位置，得到融合后的超分图像序列帧。In step S600, the key area position in the interpolated upsampled image is obtained according to the upsampling multiple; then, the key area image after super-resolution reconstruction is weightedly fused to the key area position of the interpolated upsampled image frame by frame to obtain a fused super-resolution image sequence frame.

加权融合采用渐变平滑加权融合算法。Weighted fusion uses a gradual smoothing weighted fusion algorithm.

本发明中所述一种基于视频内容理解的视频增强系统，采用如上所述一种基于视频内容理解的视频增强方法进行视频分辨率增强处理。The video enhancement system based on video content understanding described in the present invention adopts the video enhancement method based on video content understanding described above to perform video resolution enhancement processing.

具体包括：Specifically include:

视频关键区域识别模块：用于接收用户输入的关键物体文本，并根据用户输入的偏好文本识别出视频中包含关键物体的关键区域；Video key area recognition module: used to receive key object text input by the user, and recognize the key area containing the key object in the video according to the preference text input by the user;

深度学习视频超分模块：利用基于深度学习的视频超分网络对低清视频中包含关键物体的关键区域进行超分辨率重建得到关键区域超分结果；Deep learning video super-resolution module: Use the deep learning-based video super-resolution network to perform super-resolution reconstruction on the key areas containing key objects in low-definition videos to obtain super-resolution results of key areas;

插值模块：利用插值算法对低清视频进行上采样得到非关键区域超分结果；Interpolation module: Use interpolation algorithm to upsample low-definition video to obtain super-resolution results of non-critical areas;

融合模块：对关键区域超分结果和非关键区域超分结果进行融合得到所述高清视频。Fusion module: fuses the key area super-resolution results and the non-key area super-resolution results to obtain the high-definition video.

本发明中所述一种基于视频内容理解的视频增强方法及系统，其优点在于，结合简单高效的传统插值算法和效果优异的深度学习超分技术，根据用户输入的偏好文本识别并划分出视频中的关键区域和非关键区域。对非关键区域采用传统插值算法，而对用户偏好的关键区域采用效果优异的深度学习超分辨率重建算法，避免了对于完整区域进行基于深度学习超分而造成的过度处理和资源浪费、以及完全使用插值算法而效果不佳的问题。在实现超分视觉效果的前提下，节省了资源消耗以及处理时间。The video enhancement method and system based on video content understanding described in the present invention has the advantage of combining a simple and efficient traditional interpolation algorithm with an excellent deep learning super-resolution technology to identify and divide the key areas and non-key areas in the video according to the preference text input by the user. The traditional interpolation algorithm is used for the non-key areas, while the excellent deep learning super-resolution reconstruction algorithm is used for the key areas preferred by the user, thereby avoiding the problems of excessive processing and resource waste caused by deep learning super-resolution for the entire area, and poor results caused by the complete use of the interpolation algorithm. On the premise of achieving super-resolution visual effects, resource consumption and processing time are saved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明中所述视频增强方法的整体流程示意图。FIG. 1 is a schematic diagram of the overall flow of the video enhancement method described in the present invention.

图2是本发明中所述视频增强方法的关键区域超分结果产生流程示意图。FIG. 2 is a schematic diagram of a flow chart of a key area super-resolution result generation process of the video enhancement method described in the present invention.

图3是本发明中所述视频增强方法的融合流程示意图。FIG3 is a schematic diagram of the fusion process of the video enhancement method described in the present invention.

图4是本发明中所述视频增强方法的融合权重取值示意图。FIG. 4 is a schematic diagram of fusion weight values of the video enhancement method described in the present invention.

图5是本发明中所述视频增强系统的结构示意图。FIG5 is a schematic diagram of the structure of the video enhancement system of the present invention.

具体实施方式Detailed ways

在本实施例中，设定图像序列共有帧，第/>帧图像为/>，其中/>为通道数，/>为图像高度，视频上采样倍数为/>。In this embodiment, it is assumed that the image sequence has Frame, No./> The frame image is/> , where/> is the number of channels, /> is the image height, the video upsampling factor is/> .

如图1至图4所示，本发明中所述一种基于视频内容理解的视频增强方法包括以下步骤：As shown in FIG. 1 to FIG. 4 , a video enhancement method based on video content understanding described in the present invention comprises the following steps:

S100：将待增强的低清视频进行逐帧裁切处理，得到低清图像序列帧。S100: The low-definition video to be enhanced is cropped frame by frame to obtain a low-definition image sequence frame.

S200：设定用户感兴趣的关键物体文本。S200: Setting key object text that the user is interested in.

S300：根据关键物体文本识别出低清图像序列帧的关键区域和非关键区域。S300: identifying key areas and non-key areas of low-definition image sequence frames according to key object text.

将低清图像序列帧以及关键物体文本输入至预先构建的基于语言引导的开集目标检测器网络中，输出得到与低清视频逐帧对应的包含关键物体的矩形边界框。其中，矩形边界框内的区域为关键区域，矩形边界框以外的区域为非关键区域。The low-definition image sequence frames and key object texts are input into the pre-built language-guided open-set object detector network, and the output is a rectangular bounding box containing the key object corresponding to each frame of the low-definition video. The area inside the rectangular bounding box is the key area, and the area outside the rectangular bounding box is the non-key area.

对关键区域的识别基于公开的预训练的开集目标检测器网络GroundingDINO。矩形边界框以左上角、右下角顶点坐标的形式表示。低清图像序列帧经过开集目标检测器后，输出的第帧图像的包含关键物体的矩形边界框可表示为/>,其中分别代表矩形边界框的左上角顶点和右下角顶点坐标。The recognition of key areas is based on the public pre-trained open set object detector network GroundingDINO. The rectangular bounding box is represented in the form of the coordinates of the upper left corner and the lower right corner. After the low-definition image sequence frame passes through the open set object detector, the output The rectangular bounding box of the frame image containing the key object can be expressed as/> ,in Respectively represent the coordinates of the upper left corner and lower right corner of the rectangular bounding box.

S400：将关键区域输入预先构建的视频超分网络得到关键区域超分结果，具体包括以下子步骤：S400: inputting the key region into a pre-built video super-resolution network to obtain a key region super-resolution result, specifically including the following sub-steps:

S401：将图像序列每N帧为一组进行分组。S401: Grouping the image sequence into groups of N frames.

S402：对于每组图像序列帧，遍历N帧图像内的关键区域矩形边界框，找到完全包含此N个矩形边界框的最小矩形边界框。S402: For each group of image sequence frames, traverse the key area rectangular bounding boxes in the N frames of images to find the smallest rectangular bounding box that completely contains the N rectangular bounding boxes.

S403：利用此最小矩形边界框对此组内的N帧图像序列进行裁剪，得到裁剪后的组内尺寸相同的图像序列帧。S403: Using the minimum rectangular boundary box to crop the N image sequences in the group, to obtain cropped image sequence frames of the same size in the group.

S404：每N帧为一组，将每组裁剪后的组内尺寸相同的图像序列输入至预先构建的视频超分网络中，输出得到上采样的关键区域图像序列帧。S404: Every N frames form a group, and input each group of cropped image sequences with the same size into a pre-built video super-resolution network, and output an up-sampled key area image sequence frame.

在本实施例中，对关键区域的超分辨率重建基于公开的预训练的BasicVSR模型实现。第组图像帧/>为/>，对应的矩形边界框为/>，最小矩形边界框的左上角、右下角顶点坐标/>通过如下方式计算：In this embodiment, super-resolution reconstruction of key areas is implemented based on a public pre-trained BasicVSR model. Group image frames/> For/> , the corresponding rectangular bounding box is/> , the coordinates of the upper left corner and lower right corner of the minimum rectangular bounding box/> It is calculated as follows:

。 .

第组裁剪后的相同尺寸的图像序列帧为：No. The image sequence frames of the same size after group cropping are:

。 .

经过视频超分网络输出的第组视频帧为：/>。After the video super-resolution network output The video frame is:/> .

S500：对非关键区域利用插值算法得到非关键区域超分结果。在本实施例中，对非关键区域插值可利用双线性插值或双三次插值算法，来根据已有像素快速推算出上采样后未知的像素值。S500: Using an interpolation algorithm to obtain a super-resolution result of the non-key region. In this embodiment, the interpolation of the non-key region can use a bilinear interpolation algorithm or a bicubic interpolation algorithm to quickly calculate the unknown pixel values after upsampling based on the existing pixels.

S600：逐帧对关键区域超分结果和非关键区域超分结果融合，具体包括以下子步骤：S600: fusing the super-resolution results of the key area and the super-resolution results of the non-key area frame by frame, specifically including the following sub-steps:

S601：根据上采样倍数得到插值上采样图像中关键区域的位置。S601: Obtaining the position of a key area in the interpolated up-sampled image according to the up-sampling multiple.

S602：逐帧将超分辨率重建后的关键区域图像加权融合至插值上采样图像的关键区域处，得到融合后的图像序列帧。S602: weightedly fuse the key region images after super-resolution reconstruction to the key regions of the interpolated up-sampled images frame by frame to obtain fused image sequence frames.

为了减少融合后关键区域边缘处的边界感，选用渐变平滑加权融合算法。In order to reduce the boundary feeling at the edge of the key area after fusion, the gradient smoothing weighted fusion algorithm is selected.

如图4所示，渐变平滑加权融合是指在对插值上采样图像中关键区域位置图像和关键区域超分后图像进行融合时，设定过渡距离为N个像素。从关键区域边界往中心方向至过渡边界处逐渐增加关键区域超分后图像的融合权重，越靠近关键区域边界，其权重值越小、越接近0，越靠近过渡边界处，其权重值越大、越接近1，而过渡边界处至关键区域中心处的融合权重全部为1。As shown in Figure 4, the gradual smooth weighted fusion means that when the key area position image in the interpolated upsampled image and the key area super-resolution image are fused, the transition distance is set to N pixels. The fusion weight of the key area super-resolution image is gradually increased from the key area boundary to the center to the transition boundary. The closer to the key area boundary, the smaller the weight value and the closer to 0. The closer to the transition boundary, the larger the weight value and the closer to 1. The fusion weights from the transition boundary to the center of the key area are all 1.

设定第帧插值上采样图像为/>,经视频超分网络输出的第/>帧关键区域图像为/>，第/>帧低清图像的关键区域表示为/>，则在上采样后根据上采样倍数/>，第/>帧插值上采样图像中关键区域位置图像可表示为。设定关键区域超分后图像的融合权重所组成的权重矩阵为/>，则第/>帧的输出融合图像/>通过以下方式得到：Set The frame interpolation upsampled image is/> , output by the video super-resolution network/> The key area image of the frame is/> , No./> The key area of the low-definition frame image is represented as/> , then after upsampling according to the upsampling multiple/> , No./> The key area position image in the frame interpolation upsampled image can be expressed as . Set the weight matrix composed of the fusion weights of the key area super-resolution image as/> , then the first/> Output fused image of the frame/> Obtained by:

； ;

其中表示逐像素点运算。in Represents pixel-by-pixel operation.

S700：最后，将融合后的超分图像序列帧转化为视频，得到高清视频。S700: Finally, the fused super-resolution image sequence frames are converted into a video to obtain a high-definition video.

如图5所示，本发明还公开了一种基于视频内容理解的视频增强系统，具体包括：As shown in FIG5 , the present invention further discloses a video enhancement system based on video content understanding, which specifically includes:

视频关键区域识别模块：用于接收用户输入的关键物体文本，并根据用户输入的偏好文本识别出视频中包含关键物体的关键区域。Video key area recognition module: used to receive key object text input by the user, and recognize the key area containing the key object in the video according to the preference text input by the user.

深度学习视频超分模块：利用基于深度学习的视频超分网络对视频中包含关键物体的关键区域进行超分辨率重建。Deep learning video super-resolution module: Use the deep learning-based video super-resolution network to perform super-resolution reconstruction on key areas of the video containing key objects.

插值模块：利用传统插值算法对视频进行上采样得到插值上采样视频。Interpolation module: Use the traditional interpolation algorithm to upsample the video to obtain the interpolated upsampled video.

融合模块：将关键区域的超分辨率重建结果与通过传统插值算法进行上采样的结果进行融合得到所述高清视频。Fusion module: The super-resolution reconstruction result of the key area is fused with the result of upsampling by the traditional interpolation algorithm to obtain the high-definition video.

其中，各模块可以是软件模块或者是硬件模块。当基于视频内容理解的视频增强系统运行时，可以执行本实施例中的视频增强方法。具体地，视频关键区域识别模块执行步骤S300，深度学习超分模块执行步骤S400，插值模块执行步骤S500，融合模块执行步骤S600，从而实现基于视频内容理解的视频增强处理。Among them, each module can be a software module or a hardware module. When the video enhancement system based on video content understanding is running, the video enhancement method in this embodiment can be executed. Specifically, the video key area recognition module executes step S300, the deep learning super-resolution module executes step S400, the interpolation module executes step S500, and the fusion module executes step S600, thereby realizing video enhancement processing based on video content understanding.

对于本领域的技术人员来说，可根据以上描述的技术方案以及构思，做出其它各种相应的改变以及形变，而所有的这些改变以及形变都应该属于本发明权利要求的保护范围之内。For those skilled in the art, various other corresponding changes and deformations can be made according to the technical solutions and concepts described above, and all of these changes and deformations should fall within the protection scope of the claims of the present invention.

Claims

Translated fromChinese

1.一种基于视频内容理解的视频增强方法，其特征在于，通过用户输入的关键物体文本识别低清视频中的关键区域和非关键区域，并对不同区域进行对应的超分辨率重建以得到高清视频；1. A video enhancement method based on video content understanding, characterized in that the key area and non-key area in the low-definition video are identified by the key object text input by the user, and the corresponding super-resolution reconstruction is performed on different areas to obtain a high-definition video;

其中，对非关键区域进行插值重建，对关键区域进行深度学习重建；Among them, interpolation reconstruction is performed on non-critical areas, and deep learning reconstruction is performed on critical areas;

具体包括以下步骤：The specific steps include:

S700：将融合后的超分图像序列帧转化为视频，得到所述高清视频；S700: Converting the fused super-resolution image sequence frames into a video to obtain the high-definition video;

在所述步骤S300中，将低清图像序列帧以及关键物体文本输入至基于语言引导的开集目标检测器网络中，输出得到与低清视频逐帧对应的包含关键物体的矩形边界框；矩形边界框内的区域为关键区域，矩形边界框以外的区域为非关键区域；In the step S300, the low-definition image sequence frames and the key object text are input into the language-guided open set object detector network, and a rectangular bounding box containing the key object corresponding to each frame of the low-definition video is output; the area within the rectangular bounding box is the key area, and the area outside the rectangular bounding box is the non-key area;

在所述步骤S400中，将低清图像序列帧中的每N帧为一组进行分组；在每组低清图像序列帧中遍历N帧图像内的关键区域矩形边界框，找到完全包含此N个矩形边界框的最小矩形边界框；利用所述最小矩形边界框对当前组内的N帧图像序列进行裁剪；将剪后的每组低清图像序列帧输入至视频超分网络中得到所述关键区域超分结果；In the step S400, each N frames in the low-definition image sequence frames are grouped as a group; in each group of low-definition image sequence frames, the key area rectangular bounding boxes in the N frames are traversed to find the minimum rectangular bounding box that completely contains the N rectangular bounding boxes; the N frame image sequence in the current group is cropped using the minimum rectangular bounding box; each group of low-definition image sequence frames after cropping is input into the video super-resolution network to obtain the key area super-resolution result;

2.根据权利要求1所述一种基于视频内容理解的视频增强方法，其特征在于，在所述步骤S500中，所述插值算法为双三次插值算法。2. According to the video enhancement method based on video content understanding in claim 1, it is characterized in that in the step S500, the interpolation algorithm is a bicubic interpolation algorithm.

3.根据权利要求1所述一种基于视频内容理解的视频增强方法，其特征在于，加权融合采用渐变平滑加权融合算法。3. According to the video enhancement method based on video content understanding described in claim 1, it is characterized in that weighted fusion adopts a gradual smoothing weighted fusion algorithm.

4.一种基于视频内容理解的视频增强系统，其特征在于，采用如权利要求1-7任一项所述一种基于视频内容理解的视频增强方法进行视频分辨率增强处理。4. A video enhancement system based on video content understanding, characterized in that video resolution enhancement processing is performed using a video enhancement method based on video content understanding as described in any one of claims 1 to 7.

5.根据权利要求4所述一种基于视频内容理解的视频增强系统，其特征在于，包括：5. The video enhancement system based on video content understanding according to claim 4, characterized in that it comprises: