CN115018706A

Movatterモバイル変換

Info

Publication number: CN115018706A
Application number: CN202210612330.0A
Authority: CN
Inventors: 王灿进
Original assignee: Xinhua Zhiyun Technology Co ltd
Current assignee: Xinhua Fusion Media Technology Development Beijing Co ltd; Xinhua Zhiyun Technology Co ltd
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2022-09-06
Anticipated expiration: 2042-05-31
Also published as: CN115018706B

Abstract

Translated fromChinese

本申请涉及一种横屏视频素材转化为竖屏视频素材的方法，通过识别横屏视频素材中的至少一个转场时间节点，可以实现自动定位横屏视频素材中的各个转场位置，通过基于运动目标显著性的主体识别算法识别短横屏视频素材中的画面主体，可以实现结合运动信息自动筛选出显著性的画面主体，通过对画面主体逐帧进行长时跟踪，可以保证画面主体在运动出画面后，且在二次进入时，仍然能再次识别并跟踪上该画面主体，最终依据跟踪结果生成短竖屏视频素材，将所有短竖屏视频素材拼接为一个完整的竖屏视频素材，使得最终完整的竖屏视频素材可以保留横屏视频素材中的全部主体信息，不会丢失任何主体信息。

The present application relates to a method for converting a horizontal screen video material into a vertical screen video material. By identifying at least one transition time node in the horizontal screen video material, it is possible to automatically locate each transition position in the horizontal screen video material. The salient subject recognition algorithm for moving objects can identify the main body of the short horizontal screen video material, and can automatically filter out the prominent main body in combination with the motion information. After exiting the screen and entering it for the second time, the main body of the screen can still be identified and tracked again, and finally a short vertical screen video material is generated according to the tracking result, and all the short vertical screen video materials are spliced into a complete vertical screen video material. So that the final complete vertical screen video material can retain all the main body information in the horizontal screen video material without losing any main body information.

Description

Translated fromChinese

横屏视频素材转化为竖屏视频素材的方法The method of converting horizontal screen video material to vertical screen video material

技术领域technical field

本申请涉及视频处理技术领域，特别是涉及一种横屏视频素材转化为竖屏视频素材的方法。The present application relates to the technical field of video processing, and in particular, to a method for converting a horizontal screen video material into a vertical screen video material.

背景技术Background technique

随着手机、平板电脑等移动终端的普及，竖屏视频素材的需求量越来越大。但摄像机、单反相机拍摄的视频素材都是16:9宽高比的横屏视频素材，如何高效率地将横屏视频素材转化为竖屏视频素材，提升用户在移动终端的观看体验，成为视频处理领域和视频剪辑领域的迫切需求。With the popularization of mobile terminals such as mobile phones and tablet computers, the demand for vertical screen video materials is increasing. However, the video materials shot by cameras and SLR cameras are all horizontal screen video materials with a 16:9 aspect ratio. How to efficiently convert horizontal screen video materials into vertical screen video materials to improve users' viewing experience on mobile terminals and become video materials The urgent needs of the processing field and the field of video editing.

传统方案一般是通过视频编辑器来实现横屏视频素材转化为竖屏视频素材。The traditional solution generally uses a video editor to convert horizontal screen video materials into vertical screen video materials.

现有视频编辑器中横转竖功能基本是采用手工编辑，直接对视频素材的画面进行旋转、或者在画面中央进行裁剪。这会导致视频素材的画面主体内容缺失，最终得到的竖屏视频素材无法保留视频全部主体信息。The horizontal-to-vertical function in the existing video editor basically adopts manual editing, which directly rotates the picture of the video material or cuts it in the center of the picture. This will cause the main content of the video material to be missing, and the final vertical screen video material cannot retain all the main body information of the video.

发明内容SUMMARY OF THE INVENTION

基于此，有必要针对传统横屏视频素材转化为竖屏视频素材的方法会导致视频素材的画面主体内容缺失，最终得到的竖屏视频素材无法保留视频全部主体信息的问题，提供一种横屏视频素材转化为竖屏视频素材的方法。Based on this, it is necessary to solve the problem that the traditional method of converting horizontal screen video material into vertical screen video material will cause the main content of the video material to be missing, and the final vertical screen video material cannot retain all the main body information of the video, so as to provide a horizontal screen video material. A method for converting video material to vertical screen video material.

本申请提供一种横屏视频素材转化为竖屏视频素材的方法，所述方法包括：The present application provides a method for converting a horizontal screen video material into a vertical screen video material, the method comprising:

获取横屏视频素材，识别横屏视频素材中的至少一个转场时间节点；Obtain the horizontal screen video material, and identify at least one transition time node in the horizontal screen video material;

依据至少一个转场时间节点将横屏视频素材分割为多个短横屏视频素材；Divide the horizontal screen video material into multiple short horizontal screen video materials according to at least one transition time node;

在每一个短横屏视频素材中，基于运动目标显著性的主体识别算法识别短横屏视频素材中的画面主体；In each short horizontal screen video material, the subject recognition algorithm based on the saliency of the moving target identifies the screen subject in the short horizontal screen video material;

对每一个短横屏视频素材中的画面主体进行逐帧跟踪，Frame-by-frame tracking of the main body of each short horizontal screen video material,

依据每一个短横屏视频素材中的画面主体的逐帧跟踪结果生成短竖屏视频素材，得到多个短竖屏视频素材；generating a short vertical screen video material according to the frame-by-frame tracking result of the picture main body in each short horizontal screen video material, and obtaining a plurality of short vertical screen video materials;

将所有短竖屏视频素材拼接为一个完整的竖屏视频素材。Splicing all short vertical screen video materials into a complete vertical screen video material.

本申请涉及一种横屏视频素材转化为竖屏视频素材的方法，通过识别横屏视频素材中的至少一个转场时间节点，可以实现自动定位横屏视频素材中的各个转场位置，通过基于运动目标显著性的主体识别算法识别短横屏视频素材中的画面主体，可以实现结合运动信息自动筛选出显著性的画面主体，通过对画面主体逐帧进行长时跟踪，可以保证画面主体在运动出画面后，且在二次进入时，仍然能再次识别并跟踪上该画面主体，最终依据跟踪结果生成短竖屏视频素材，将所有短竖屏视频素材拼接为一个完整的竖屏视频素材，使得最终完整的竖屏视频素材可以保留横屏视频素材中的全部主体信息，不会丢失任何主体信息。The present application relates to a method for converting a horizontal screen video material into a vertical screen video material. By identifying at least one transition time node in the horizontal screen video material, it is possible to automatically locate each transition position in the horizontal screen video material. The salient subject recognition algorithm for moving objects can identify the main body of the short horizontal screen video material, and can automatically filter out the prominent main body in combination with the motion information. After exiting the screen and entering it for the second time, the main body of the screen can still be identified and tracked again, and finally a short vertical screen video material is generated according to the tracking result, and all the short vertical screen video materials are spliced into a complete vertical screen video material. So that the final complete vertical screen video material can retain all the main body information in the horizontal screen video material without losing any main body information.

附图说明Description of drawings

图1为本申请一实施例提供的横屏视频素材转化为竖屏视频素材的方法的流程示意图。FIG. 1 is a schematic flowchart of a method for converting a horizontal screen video material into a vertical screen video material according to an embodiment of the present application.

图2为本申请一实施例提供的横屏视频素材中画面主体，预设搜索半径，和局部搜索区域之间的位置关系示意图。FIG. 2 is a schematic diagram of a positional relationship among a screen main body, a preset search radius, and a local search area in a landscape video material provided by an embodiment of the present application.

图3为本申请一实施例提供的横屏视频素材中画面主体，不同的局部子搜索区域的选取示意图。FIG. 3 is a schematic diagram of selection of a screen main body and different local sub-search areas in a landscape video material provided by an embodiment of the present application.

具体实施方式Detailed ways

为了使本申请的目的.技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

本申请提供一种横屏视频素材转化为竖屏视频素材的方法。需要说明的是，本申请提供的横屏视频素材转化为竖屏视频素材的方法应用于任何拍摄设备拍摄的横屏视频素材。The present application provides a method for converting a horizontal screen video material into a vertical screen video material. It should be noted that the method for converting a horizontal screen video material into a vertical screen video material provided by this application is applicable to the horizontal screen video material shot by any shooting device.

此外，本申请提供的横屏视频素材转化为竖屏视频素材的方法不限制其执行主体。可选地，本申请提供的横屏视频素材转化为竖屏视频素材的方法的执行主体可以为一种横转竖处理终端。In addition, the method for converting a horizontal screen video material into a vertical screen video material provided by the present application does not limit its executive body. Optionally, the execution body of the method for converting a horizontal screen video material into a vertical screen video material provided by the present application may be a horizontal-to-vertical processing terminal.

如图1所示，在本申请的一实施例中，所述方法包括如下S100至S600：As shown in FIG. 1, in an embodiment of the present application, the method includes the following S100 to S600:

S100，获取横屏视频素材，识别横屏视频素材中的至少一个转场时间节点。S100: Acquire a horizontal screen video material, and identify at least one transition time node in the horizontal screen video material.

具体地，每个待处理的横屏视频素材可能由不同镜头拍摄的视频画面短横屏视频素材拼接而成，短横屏视频素材之间的画面内容是不连续的，画面主体也可能会随之变化。例如，拍摄一个视频，可能拍摄者使用了5个不同机位架设的5个不同的摄像机，那么最终成片的横屏视频素材就会由这5个不同的摄像机拍摄的视频画面穿插拼接而成。或者，使用同一个摄像机拍摄，但是先后使用了5个不同的镜头。Specifically, each horizontal screen video material to be processed may be spliced together with short horizontal screen video materials of video images captured by different lenses. The content of the short horizontal screen video materials is discontinuous, and the main body of the screen may also change with change. For example, when shooting a video, the photographer may use 5 different cameras set up in 5 different positions, then the final horizontal screen video material will be interspersed and spliced with the video images shot by these 5 different cameras. . Or, shoot with the same camera, but with 5 different lenses one after the other.

因此本实施例首先需要对横屏视频素材中转场的位置进行识别，即转场时间节点的识别，这样才能把横屏视频素材分割为多个短横屏视频素材。Therefore, in this embodiment, it is first necessary to identify the position of the transition in the horizontal screen video material, that is, the identification of the transition time node, so that the horizontal screen video material can be divided into multiple short horizontal screen video materials.

S200，依据至少一个转场时间节点将横屏视频素材分割为多个短横屏视频素材。S200: Divide the horizontal screen video material into multiple short horizontal screen video materials according to at least one transition time node.

具体地，将横屏视频素材分割为多个短横屏视频素材后，可以保证每个短横屏视频素材内的画面是连续的。Specifically, after the horizontal screen video material is divided into a plurality of short horizontal screen video materials, the pictures in each short horizontal screen video material can be guaranteed to be continuous.

S300，在每一个短横屏视频素材中，基于运动目标显著性的主体识别算法识别短横屏视频素材中的画面主体。S300 , in each short horizontal screen video material, a subject recognition algorithm based on the saliency of the moving target identifies a picture subject in the short horizontal screen video material.

具体地，画面主体是展现画面最重要的元素，画面主体不限于人物，动物，植物，也可以是其他任何带来视觉显著观感的目标。可选地，在本申请中，画面主体的形式可以为一个矩形框以及矩形框内展示的视频画面。Specifically, the subject of the picture is the most important element for displaying the picture, and the subject of the picture is not limited to characters, animals, and plants, but can also be any other target that brings a visually significant look and feel. Optionally, in this application, the main body of the picture may be in the form of a rectangular frame and a video image displayed in the rectangular frame.

S400，对每一个短横屏视频素材中的画面主体进行逐帧跟踪。S400, frame-by-frame tracking is performed on the main body of the picture in each short horizontal screen video material.

具体地，本步骤的逐帧跟踪可以保证画面主体即便运动出画面，然后再二次进入时，本方法也能重新跟踪上画面主体。Specifically, the frame-by-frame tracking in this step can ensure that even if the main body of the picture moves out of the picture, and then enters again, the method can re-track the main body of the picture.

S500，依据每一个短横屏视频素材中的画面主体的逐帧跟踪结果生成短竖屏视频素材，得到多个短竖屏视频素材。S500 , generating a short vertical screen video material according to the frame-by-frame tracking result of the picture main body in each short horizontal screen video material, and obtaining a plurality of short vertical screen video materials.

具体地，本步骤相当于提取每一个短横屏视频素材中的画面主体，然后将短横屏视频素材中的画面主体转化为短竖屏视频素材。Specifically, this step is equivalent to extracting the picture main body in each short horizontal screen video material, and then converting the picture main body in the short horizontal screen video material into a short vertical screen video material.

S600，将所有短竖屏视频素材拼接为一个完整的竖屏视频素材。S600, splices all short vertical screen video materials into a complete vertical screen video material.

具体地，拼接短竖屏视频素材时，也要依照时间先后顺序进行拼接。Specifically, when splicing short vertical screen video materials, the splicing should also be performed in chronological order.

本实施例中，通过识别横屏视频素材中的至少一个转场时间节点，可以实现自动定位横屏视频素材中的各个转场位置，通过基于运动目标显著性的主体识别算法识别短横屏视频素材中的画面主体，可以实现结合运动信息自动筛选出显著性的画面主体，通过对画面主体逐帧进行长时跟踪，可以保证画面主体在运动出画面后，且在二次进入时，仍然能再次识别并跟踪上该画面主体，最终依据跟踪结果生成短竖屏视频素材，将所有短竖屏视频素材拼接为一个完整的竖屏视频素材，使得最终完整的竖屏视频素材可以保留横屏视频素材中的全部主体信息，不会丢失任何主体信息。In this embodiment, by identifying at least one transition time node in the horizontal screen video material, each transition position in the horizontal screen video material can be automatically located, and the short horizontal screen video can be identified by a subject recognition algorithm based on the saliency of moving objects. The main body of the picture in the material can be automatically screened out by combining the motion information. By tracking the main body of the picture frame by frame for a long time, it can be ensured that the main body of the picture will still be able to move out of the picture after moving out of the picture, and when entering it for the second time. Identify and track the main body of the screen again, and finally generate a short vertical screen video material according to the tracking result, splicing all the short vertical screen video materials into a complete vertical screen video material, so that the final complete vertical screen video material can retain the horizontal screen video. All the subject information in the material, no subject information will be lost.

在本申请的一实施例中，所述S100包括如下S110至S145：In an embodiment of the present application, the S100 includes the following S110 to S145:

S110，解析横屏视频素材，得到多个视频帧。S110, parse the horizontal screen video material to obtain multiple video frames.

具体地，一段横屏视频素材是由多个视频帧构成的。Specifically, a piece of landscape video material is composed of multiple video frames.

S121，对每两个连续的视频帧进行差分计算，得到每两个连续的视频帧的差分图像。S121: Perform differential calculation on every two consecutive video frames to obtain a differential image of each two consecutive video frames.

具体地，差分计算，就是两个视频帧中的每一个处于同一位置的像素的像素值两两相减，得到像素差值，所有像素点的像素差值构成了一个新的视频图像帧，即差分图像。差分图像可以显示两个视频帧的相似部分，并突出显示两个视频帧的变化部分。差分图像能够检测出一个运动目标的轮廓。Specifically, the difference calculation is to subtract the pixel values of each pixel in the same position in the two video frames to obtain the pixel difference value. The pixel difference value of all pixel points constitutes a new video image frame, that is, difference image. A differential image can show similar parts of two video frames and highlight the changing parts of the two video frames. The differential image can detect the outline of a moving object.

例如，视频帧A和视频帧B是两个连续的视频帧，都各有10个像素点(为了方便说明，减少了像素点数量，实际视频帧的像素点数量远远大于10个)，将视频帧A中的像素点1和视频帧B中的像素点1进行像素值的相减，将视频帧A中的像素点2和视频帧B中的像素点2进行像素值的相减......直到将视频帧A中的像素点10和视频帧B中的像素点10进行像素值的相减，最终得到10个像素差值，这10个像素差值构成了一幅图像，这幅图像就是差分图像。For example, video frame A and video frame B are two consecutive video frames, each with 10 pixels (for the convenience of description, the number of pixels is reduced, the actual number of pixels in the video frame is much larger than 10), the Pixel 1 in video frame A and pixel 1 in video frame B subtract pixel values, and pixel 2 in video frame A and pixel 2 in video frame B subtract pixel values.. ....until the pixel value of the pixel point 10 in the video frame A and the pixel point 10 in the video frame B are subtracted, and finally 10 pixel difference values are obtained, and these 10 pixel difference values constitute an image, This image is the difference image.

S122，计算每一个差分图像中所有像素点的像素值之和，得到每一个差分图像的像素和。S122: Calculate the sum of the pixel values of all the pixel points in each differential image to obtain the pixel sum of each differential image.

具体地，对每一个差分图像而言，将每一个像素点的像素值相加，得到差分图像的像素和。Specifically, for each difference image, the pixel value of each pixel is added to obtain the pixel sum of the difference image.

S123，对每两个相邻的差分图像的像素和进行差分计算，将差分计算得到的差分值的绝对值定义为二次差分和。S123: Perform difference calculation on the pixel sum of each two adjacent difference images, and define the absolute value of the difference value obtained by the difference calculation as the second difference sum.

具体地，例如，横屏视频素材有16个视频帧，将视频帧1和视频帧2做差分计算，得到差分图像A，将视频帧2和视频帧3做差分计算，得到差分图像B，那么差分图像A和差分图像B就是两个相邻的差分图像。Specifically, for example, the horizontal screen video material has 16 video frames, the difference calculation between video frame 1 and video frame 2 is performed to obtain differential image A, and the difference calculation between video frame 2 and video frame 3 is performed to obtain differential image B, then Difference image A and difference image B are two adjacent difference images.

S131，选取一个二次差分和，判断该二次差分和是否大于预设二次差分和阈值。S131 , select a second difference sum, and determine whether the second difference sum is greater than a preset second difference sum threshold.

具体地，若该二次差分和小于或等于预设二次差分和阈值，则将该二次差分和舍弃。Specifically, if the second difference sum is less than or equal to the preset second difference sum threshold, the second difference sum is discarded.

S132，若该二次差分和大于预设二次差分和阈值，则将该二次差分和与所述二次差分和对应的时间节点作为候选转场时间节点。S132 , if the second difference sum is greater than a preset second difference sum threshold, a time node corresponding to the second difference sum and the second difference sum is used as a candidate transition time node.

具体地，例如，差分图像A和差分图像B的二次差分和大于预设二次差分和阈值，则将与差分图像A和差分图像B的二次差分和对应的时间节点作为候选转场时间节点。Specifically, for example, if the second difference sum of the difference image A and the difference image B is greater than the preset second difference sum threshold, the time node corresponding to the second difference sum of the difference image A and the difference image B is used as the candidate transition time node.

与差分图像A和差分图像B的二次差分和对应的时间节点定义的方式是：The time nodes corresponding to the quadratic difference sum of difference image A and difference image B are defined in the following way:

先比较差分图像A的像素和，与差分图像B的像素和的大小，如果差分图像A的像素和较大，将视频帧1和视频帧2中的视频帧2作为与差分图像A和差分图像B的二次差分和对应的时间节点，即视频帧2的时间节点为候选转场时间节点。First compare the pixel sum of the difference image A and the size of the pixel sum of the difference image B. If the pixel sum of the difference image A is larger, take the video frame 2 in the video frame 1 and the video frame 2 as the difference image A and the difference image. The second difference of B and the corresponding time node, that is, the time node of video frame 2, are candidate transition time nodes.

如果差分图像B的像素和较大，将视频帧2和视频帧3中的视频3作为与差分图像A和差分图像B的二次差分和对应的时间节点，即视频帧3的时间节点为候选转场时间节点。If the pixel sum of the difference image B is larger, take the video frame 2 and the video 3 in the video frame 3 as the time node corresponding to the second difference sum of the difference image A and the difference image B, that is, the time node of the video frame 3 is the candidate Transition time node.

换言之，一方面是从差分图像的像素和较大的两个视频帧中选，另一方面是从两个视频帧中时间更靠后的视频帧中选。In other words, on the one hand, the pixels of the difference image and the larger two video frames are selected, and on the other hand, the video frame is selected from the later in time of the two video frames.

S133，返回所述S131，直直至所有二次差分和均进行过选取，得到多个候选转场时间节点。S133 , returning to the S131 until all the second difference sums have been selected, and a plurality of candidate transition time nodes are obtained.

具体地，得到的转场时间节点不会是第一个视频帧的时间节点，但是可能是最后一个视频帧的时间节点。Specifically, the obtained transition time node will not be the time node of the first video frame, but may be the time node of the last video frame.

S141，依照时间先后顺序，将多个候选转场时间节点排序。S141 , sort the multiple candidate transition time nodes according to the time sequence.

S142，以第一个候选转场时间节点作为时间锚点。S142, using the first candidate transition time node as a time anchor.

S143，在时间锚点后选取预设时间段。S143, select a preset time period after the time anchor point.

S144，在时间锚点后的预设时间段内搜寻所有候选转场时间节点，选取二次差分和最大的候选转场时间节点作为转场时间节点。S144: Search all candidate transition time nodes within a preset time period after the time anchor point, and select the second difference and the largest candidate transition time node as the transition time node.

S145，将该转场时间节点作为新的时间锚点，返回S143，直至得到所有转场时间节点。S145, use the transition time node as a new time anchor point, and return to S143 until all transition time nodes are obtained.

下面解释下S141至S144的工作原理，例如，我们得到5个候选转场时间节点，按时间先后顺序排布后，将候选转场时间节点1作为时间锚点，预设时间段选为2秒，那么搜寻转场时间节点1后2秒内的所有候选转场时间节点，找到有候选转场时间节点2和候选转场时间节点3，候选转场时间节点3的二次差分和大于候选转场时间节点2的二次差分和，因此将后续转场时间节点3作为第一个转场时间节点。The working principles of S141 to S144 are explained below. For example, we obtain 5 candidate transition time nodes. After arranging them in chronological order, the candidate transition time node 1 is used as the time anchor, and the preset time period is selected as 2 seconds. , then search for all candidate transition time nodes within 2 seconds after transition time node 1, find candidate transition time node 2 and candidate transition time node 3, and the second difference sum of candidate transition time node 3 is greater than the candidate transition time node 3. The second difference sum of the transition time node 2, so the subsequent transition time node 3 is taken as the first transition time node.

这是因为二次差分和越大，代表候选转场时间节点产生的画面内容变化程度越大。This is because the larger the sum of the quadratic differences, the greater the degree of change in the picture content generated by the candidate transition time node.

后续将后续转场时间节点3作为新的时间锚点，继续搜寻搜寻转场时间节点3后2秒内的所有候选转场时间节点，重复上述步骤，直到得出所有转场时间节点。Subsequently, the subsequent transition time node 3 is used as a new time anchor, and the search continues to search for all candidate transition time nodes within 2 seconds after the transition time node 3, and the above steps are repeated until all transition time nodes are obtained.

综上所述，本实施例介绍了一种识别转场时间节点的实施方式，是采用二次差分来识别转场时间节点的方式。第一次差分是S121，第二次差分是S123。To sum up, this embodiment introduces an implementation manner of identifying the transition time node, which is a method of identifying the transition time node by using the quadratic difference. The first difference is S121, and the second difference is S123.

采用二次差分方法能够有效过滤镜头的快速运动。因为拍摄的时候如果相机有快速运动，会导致画面模糊拖尾等现象，采用二次差分方法能够过滤掉这种模糊拖尾画面。The quadratic difference method can effectively filter the fast motion of the lens. Because if the camera moves quickly during shooting, it will cause the phenomenon of blurred and trailing pictures. The second difference method can filter out such blurred trailing pictures.

在时间锚点后选取预设时间段时，预设时间段可以设置为1秒，也可以设置为1.5秒，也可以设置为2秒。When selecting a preset time period after the time anchor point, the preset time period can be set to 1 second, 1.5 seconds, or 2 seconds.

在本申请的一实施例中，在所述S131之前，所述S110还包括：In an embodiment of the present application, before the S131, the S110 further includes:

S124，计算所有二次差分和的平均值，得到二次差分和平均值。S124, calculate the average value of all the secondary difference sums to obtain the secondary difference sum average value.

具体地，本实施例介绍了二次差分和阈值的一种设置的实施方式，首先计算所有二次差分和的平均值。Specifically, this embodiment introduces an implementation manner of a setting of the quadratic difference sum threshold, and first calculates the average value of all the quadratic difference sums.

S125，获取预设阈值倍率，将预设阈值倍率和二次差分和平均值的积作为预设二次差分和阈值。S125: Acquire a preset threshold magnification, and use the product of the preset threshold magnification and the second difference and the average value as the preset second difference sum threshold.

具体地，阈值倍率可以设置为1.2。Specifically, the threshold magnification may be set to 1.2.

在本申请的一实施例中，所述S100包括如下S150至S180：In an embodiment of the present application, the S100 includes the following S150 to S180:

S150，设定预设时间长度。S150, set a preset time length.

S160，采用预设时间长度的滑动窗遍历横屏时间素材，将横屏时间素材分解为多个视频片段。S160, using a sliding window with a preset time length to traverse the horizontal screen time material, and decompose the horizontal screen time material into multiple video segments.

S170，将每一个视频片段输入视频分类模型中，运行视频分类模型并判断每一个视频片段是否为存在转场的视频片段，以得到多个存在转场的视频片段。S170: Input each video clip into the video classification model, run the video classification model and determine whether each video clip is a video clip with transitions, so as to obtain a plurality of video clips with transitions.

S180，提取每一个存在转场的视频片段中的转场时间节点。S180, extracting transition time nodes in each video clip in which transitions exist.

具体地，本实施例介绍的是另一种识别转场时间节点的方式，采用视频分类模型来实现的方式。视频分类模型是一种深度学习模型，需要提前训练。Specifically, this embodiment introduces another way of identifying transition time nodes, which is implemented by using a video classification model. The video classification model is a deep learning model that needs to be trained in advance.

训练的具体方法是，通过抓取行业内熟悉的转场特效片段，将其标注为1，抓取相同数量级的非转场片段，将其标注为0，然后将标注后的转场特效片段和非转场特效片段共同输入至视频分类模型，并以此作为训练数据训练视频分类模型。训练后的视频分类模型获得了可以识别转场片段的能力。The specific method of training is to capture the familiar transition effects clips in the industry and mark them as 1, grab non-transition clips of the same order of magnitude, mark them as 0, and then mark the marked transition effects clips and The non-transition special effects clips are jointly input to the video classification model, and used as training data to train the video classification model. The trained video classification model gained the ability to recognize transition clips.

可选地，视频分类模型采用TSM、Slowfast、X3D中的一种或多种作为视频分类网络框架，视频分类模型采用交叉熵作为网络损失函数。当网络损失函数下降到预设网络损失值以下，训练结束。Optionally, the video classification model uses one or more of TSM, Slowfast, and X3D as the video classification network framework, and the video classification model uses cross-entropy as the network loss function. The training ends when the network loss function falls below the preset network loss value.

在S150中，首先设定滑动窗的预设时间长度。所述预设时间长度可以为16帧。In S150, a preset time length of the sliding window is firstly set. The preset time length may be 16 frames.

在本申请一实施例中，S160包括：In an embodiment of the present application, S160 includes:

S161，采用第一预设时间长度为K的滑动窗截取横屏时间素材中第1帧至第K帧，将第1帧至第K帧作为第一视频片段。S161 , adopting a sliding window with a first preset time length of K to capture the first frame to the Kth frame in the horizontal screen time material, and use the first frame to the Kth frame as the first video segment.

S162，将滑动窗后移第二预设时间长度，所述第二预设长度为L，将第W-L帧至第W+L帧作为第二视频片段；W为上一个视频片段最后一帧的序号；L小于K。S162, move the sliding window backward by a second preset time length, where the second preset length is L, and use the W-Lth frame to the W+Lth frame as the second video segment; W is the last frame of the previous video segment. Serial number; L is less than K.

S163，反复执行S162，直至遍历横屏时间素材，得到多个视频片段。In S163, S162 is repeatedly performed until the horizontal screen time material is traversed, and multiple video clips are obtained.

可以理解，本实施例中，每次滑动窗后移L的预设时间长度，使得截取的相邻两个视频片段虽然存在重叠片段，但是这样可以防止丢失关键片段。It can be understood that, in this embodiment, the sliding window is moved backward by a preset time length of L each time, so that although there are overlapping segments in two adjacent video clips, this can prevent key segments from being lost.

例如，K取16，L取8，那么第一视频片段就是第1帧至第16帧，第二视频片段就是第8帧至第24帧。相比于滑动窗直接后移第一预设时间长度这种方式，滑动窗直接后移更小的第二预设时间长度，显然可以最大程度的防止丢失关键片段。For example, if K is set to 16 and L is set to 8, then the first video clip is from the 1st to the 16th frame, and the second video clip is from the 8th to the 24th frame. Compared with the method in which the sliding window is directly moved back by the first preset time length, the sliding window is directly moved back by a smaller second predetermined time length, which obviously can prevent the loss of key segments to the greatest extent.

在本申请的一实施例中，所述S180包括如下S181至S186：In an embodiment of the present application, the S180 includes the following S181 to S186:

S181，选取一个存在转场的视频片段。S181, select a video clip with a transition.

S182，提取存在转场的视频片段中的每一个视频帧。S182, extracting each video frame in the video clip in which the transition exists.

S183，计算存在转场的视频片段中的每一个视频帧的颜色直方图。S183: Calculate the color histogram of each video frame in the video clip with the transition.

S184，计算每相邻两个视频帧的颜色直方图之间的帧间距离。S184: Calculate the inter-frame distance between the color histograms of every two adjacent video frames.

S185，选取与帧间距离最大的视频帧对应的时间节点作为该存在转场的视频片段中的转场时间节点。S185: Select a time node corresponding to the video frame with the largest inter-frame distance as the transition time node in the video clip with the transition.

S186，返回S181，直至得到每一个存在转场的视频片段中的转场时间节点。S186, return to S181, until the transition time node in each video clip in which transition exists is obtained.

具体地，上一实施例只是识别存在转场的视频片段，但是尚未找到转场片段内的转场时间节点。本实施例介绍的则是根据颜色直方图的帧间距离来找到转场片段内的转场时间节点的方法。Specifically, the previous embodiment only identifies the video clips with transitions, but has not found the transition time nodes in the transition clips. This embodiment introduces a method for finding the transition time node in the transition segment according to the inter-frame distance of the color histogram.

颜色直方图反映了图像中颜色的组成分布，即出现了哪些颜色以及各种颜色出现的概率，因此通过计算相邻两个视频帧的颜色直方图之间的帧间距离，可以得知相邻两个视频帧的色彩变化程度。帧间距离越大，相邻两个视频帧的色彩变化程度越大。帧间距离最大的视频帧对应的时间节点，可以定义为转场时间节点。The color histogram reflects the composition distribution of colors in the image, that is, which colors appear and the probability of each color appearing. Therefore, by calculating the inter-frame distance between the color histograms of two adjacent video frames, we can know the adjacent The degree of color change between two video frames. The greater the distance between frames, the greater the degree of color change between two adjacent video frames. The time node corresponding to the video frame with the largest inter-frame distance can be defined as the transition time node.

在本申请的一实施例中，所述S200包括：In an embodiment of the present application, the S200 includes:

S210，将所有转场时间节点按时间先后顺序排序。S210: Sort all transition time nodes in chronological order.

S220，选取第一个转场时间节点，截取横屏视频素材的起始时间节点和第一个转场时间节点之间的所有视频帧，作为一个短横屏视频素材。S220: Select the first transition time node, and capture all video frames between the start time node of the horizontal screen video material and the first transition time node, as a short horizontal screen video material.

S230，选取最后一个转场时间节点，截取最后一个转场时间节点和横屏视频素材的末尾时间节点之间的所有视频帧，作为一个短横屏视频素材。S230, select the last transition time node, and intercept all video frames between the last transition time node and the end time node of the horizontal screen video material, as a short horizontal screen video material.

S240，截取每两个相邻转场时间节点之间的所有视频帧，作为一个短横屏视频素材。S240, intercept all video frames between every two adjacent transition time nodes as a short horizontal screen video material.

S250，输出所有的短横屏视频素材。S250, output all short horizontal screen video materials.

具体地，当转场时间节点只有一个时，那么它既是第一个转场时间节点，又是最后一个转场时间节点，那么最终执行完步骤后，会得到两个短横屏视频素材。Specifically, when there is only one transition time node, then it is both the first transition time node and the last transition time node. After the final steps are performed, two short horizontal screen video materials will be obtained.

在本申请的一实施例中，所述S300包括如下S310至S390：In an embodiment of the present application, the S300 includes the following S310 to S390:

S310，选取一个短横屏视频素材。S310, select a short horizontal screen video material.

S321，选取短横屏视频素材中的第一帧图像。S321: Select the first frame image in the short horizontal screen video material.

S322，对第一帧图像和第二帧图像基于flownet进行光流预测，得到第一帧图像和第二帧图像的光流图像。S322 , performing optical flow prediction on the first frame image and the second frame image based on flownet to obtain optical flow images of the first frame image and the second frame image.

具体地，光流图像需要前后相邻的两帧图像来得到，本申请只获取第一帧图像和第二帧图像的光流图像，依据第一帧图像和第二帧图像的光流图像来找到画面主体，然后通过后续S400对画面主体进行跟踪即可，并不需要对所有帧图像均进行光流预测，因此方法比较简单，不需要复杂的计算，工作效率高。Specifically, the optical flow image needs to be obtained by two adjacent frames of images. In this application, only the optical flow images of the first frame image and the second frame image are obtained, and the optical flow images of the first frame image and the second frame image are obtained. Find the main body of the picture, and then track the main body of the picture through the subsequent S400, and it is not necessary to perform optical flow prediction on all frame images, so the method is relatively simple, does not require complex calculations, and has high work efficiency.

光流图像中的每个像素点都有一个x方向和y方向的位移。Each pixel in the optical flow image has a displacement in the x and y directions.

S323，对第一帧图像和第二帧图像的光流图像中的每一个像素点计算x方向绝对值和y方向绝对值之和，得到光流绝对值和。S323: Calculate the sum of the absolute value of the x-direction and the absolute value of the y-direction for each pixel in the optical flow images of the first frame image and the second frame image, to obtain the sum of the absolute values of the optical flow.

具体地，每一个像素点具有2个信息，一个是x方向绝对值(即x方向位移绝对值)，其表达了像素点在x方向上的运动速度，也可以理解为像素点在x方向上的运动趋势。另一个是y方向绝对值(即y方向位移绝对值)，其表达了像素点在y方向上的运动速度，也可以理解为像素点在y方向上的运动趋势。Specifically, each pixel has 2 pieces of information, one is the absolute value of the x-direction (that is, the absolute value of the displacement in the x-direction), which expresses the movement speed of the pixel in the x-direction, which can also be understood as the pixel in the x-direction movement trend. The other is the absolute value of the y-direction (ie the absolute value of the displacement in the y-direction), which expresses the movement speed of the pixel in the y-direction, and can also be understood as the movement trend of the pixel in the y-direction.

S324，对光流绝对值和进行二值化，并依据光流绝对值和的二值化结果在第一帧图像和第二帧图像的光流图像中划分出多个光流连通域，得到光流连通域的集合{F1，F2，…，Fi}，其中，i为光流连通域的序号。S324, binarize the sum of the absolute values of the optical flow, and divide a plurality of optical flow connected domains in the optical flow images of the first frame image and the second frame image according to the binarization result of the absolute value sum of the optical flow, to obtain A set of optical flow connected domains {F1, F2, ..., Fi}, where i is the sequence number of optical flow connected domains.

具体地，光流绝对值和的二值化的一种实施方式是，将每一个像素点的光流绝对值和与预设光流绝对值和阈值比较。Specifically, an implementation of the binarization of the sum of the absolute values of optical flow is to compare the absolute sum of the optical flow of each pixel with a preset absolute value of the optical flow and a threshold.

若一个像素点的光流绝对值和大于或等于预设光流绝对值和阈值，则记作1。若一个像素点的光流绝对值小于预设光流绝对值和阈值，则记作0。这样光流图像中的每一个像素点可以通过0和1这样的二值化数据来表达。If the sum of the absolute value of the optical flow of a pixel is greater than or equal to the preset absolute value of the optical flow and the threshold, it is recorded as 1. If the absolute value of the optical flow of a pixel is less than the preset absolute value of the optical flow and the threshold, it is recorded as 0. In this way, each pixel in the optical flow image can be represented by binary data such as 0 and 1.

那么后续可以通过二值化数据更简便的在光流图像中划分出多个光流连通域。Then, it is easier to divide multiple optical flow connected domains in the optical flow image by binarizing the data.

S331，对第一帧图像进行静态显著性分割。S331, perform static saliency segmentation on the first frame of image.

具体地，光流连通域的划分是一种动态划分，而静态显著性分割是一种静态划分。动态划分和静态划分都有，能够更全面地检测出短横屏视频素材中的显著性目标。Specifically, the division of optical flow connected domains is a dynamic division, while the static saliency segmentation is a static division. There are both dynamic and static divisions, which can more comprehensively detect salient objects in short horizontal screen video materials.

S332，依据第一帧图像的静态显著性分割结果在第一帧图像中划分出多个静态显著性连通域，得到静态显著性连通域的集合{S1，S2，…，Sm}，其中，m为静态显著性连通域的序号。S332: Divide a plurality of static saliency connected domains in the first frame of image according to the static saliency segmentation result of the first frame of image, and obtain a set of static saliency connected domains {S1, S2, ..., Sm}, where m is the ordinal number of the statically significant connected domain.

具体地，m同时也是静态显著性连通域的总数。Specifically, m is also the total number of static saliency connected domains.

S340，计算光流连通域的集合{F1，F2，…，Fi}中的每一个光流连通域和静态显著性连通域的集合{S1，S2，…，Sm}中的每一个静态显著性连通域之间的交并比，得到多个交并比。S340: Calculate each optical flow connected domain in the set {F1, F2, ..., Fi} of optical flow connected domains and each static saliency in the set {S1, S2, ..., Sm} of static saliency connected domains The intersection ratio between connected domains, to obtain multiple intersection ratios.

具体地，交并比即IOU，是光流连通域和静态显著性连通域的交集和并集的比值。Specifically, the intersection and union ratio, or IOU, is the ratio of the intersection and union of the optical flow connected domain and the static saliency connected domain.

本步骤在计算交并比时，需要对光流连通域的集合{F1，F2，…，Fi}中的每一个光流连通域和静态显著性连通域的集合{S1，S2，…，Sm}中的每一个静态显著性连通域都进行一次计算，即F1要和S1，S2，…，Sm每一个静态显著性连通域均计算一个交并比，F2要和S1，S2，…，Sm每一个静态显著性连通域均计算一个交并比...以此类推。When calculating the intersection ratio in this step, each optical flow connected domain in the set of optical flow connected domains {F1, F2, ..., Fi} and the set of static saliency connected domains {S1, S2, ..., Sm Each static saliency connected domain in } is calculated once, that is, F1 needs to calculate an intersection ratio with S1, S2, ..., Sm for each static saliency connected domain, and F2 needs to compare with S1, S2, ..., Sm Each static saliency connected domain computes an intersection ratio...and so on.

S350，选取一个交并比，判断该交并比是否大于预设交并比阈值。S350, select a cross-union ratio, and determine whether the cross-union ratio is greater than a preset cross-union ratio threshold.

具体地，如果该交并比小于或等于预设交并比阈值，那么该交并比会被舍弃，该组光流连通域和静态显著性连通域也会被舍弃。预设交并比阈值可以设置为0.5。Specifically, if the intersection ratio is less than or equal to the preset intersection ratio threshold, the intersection ratio will be discarded, and the set of optical flow connected domains and static saliency connected domains will also be discarded. The preset intersection ratio threshold can be set to 0.5.

S360，若该交并比大于预设交并比阈值，则获取与该交并比对应的光流连通域和静态显著性连通域的并集所展现的图形的外接矩形框，将该外接矩形框作为候选画面主体。S360, if the intersection ratio is greater than a preset intersection ratio threshold, obtain the circumscribed rectangle of the graph displayed by the union of the optical flow connected domain and the static saliency connected domain corresponding to the intersection ratio, and the circumscribed rectangle frame as a candidate screen subject.

具体地，首先，本步骤取得是与该交并比对应的光流连通域和静态显著性连通域的并集所展现的图形。但是这个并集所展现的图形是不规则的，为了便于后续跟踪计算，第二步需要取并集所展现的图形的外接矩形框作为候选画面主体。Specifically, first, this step obtains a graph that is represented by the union of the optical flow connected domain and the static saliency connected domain corresponding to the intersection ratio. However, the graphics displayed by this union are irregular. In order to facilitate subsequent tracking calculations, the second step needs to take the enclosing rectangle of the graphics displayed by the union as the main candidate screen.

S370，返回所述S350，直至每一个交并比均被选取一次，得到至少一个候选画面主体。S370, returning to the S350 until each intersection ratio is selected once, and at least one candidate picture subject is obtained.

S380，计算所有候选画面主体的光流绝对值和，将所有候选画面主体依照光流绝对值和从大到小进行排序，将光流绝对值和最大的候选画面主体作为短横屏视频素材中的画面主体。S380: Calculate the sum of the absolute values of the optical flow of all candidate picture subjects, sort all the candidate image subjects according to the absolute value of the optical flow and from largest to smallest, and use the absolute value of the optical flow and the largest candidate image subject as the short horizontal screen video material the subject of the screen.

具体地，光流绝对值和越大，运动趋势越大，反之，光流绝对值和越小，运动趋势越小。Specifically, the larger the sum of the absolute values of the optical flow, the larger the motion trend; on the contrary, the smaller the sum of the absolute values of the optical flow, the smaller the motion trend.

S390，返回所述S310，直至得到所有短横屏视频素材中的画面主体为止。S390, returning to the S310 until the main body of the screen in all the short horizontal screen video materials is obtained.

具体地，本实施例采用的是一种基于光流连通域和静态显著性分割的长时跟踪的策略，可以保证画面主体运动出画面二次进入时，仍然能再次识别并跟踪上画面主体。Specifically, this embodiment adopts a long-term tracking strategy based on the optical flow connected domain and static saliency segmentation, which can ensure that when the screen subject moves out of the screen and enters the screen for the second time, the upper screen subject can still be identified and tracked again.

在本申请的一实施例中，所述S400包括：In an embodiment of the present application, the S400 includes:

S321，获取短横屏视频素材中的第n帧，并获取第n帧中画面主体的位置。n初始设置为1。S321: Acquire the nth frame in the short horizontal screen video material, and acquire the position of the screen main body in the nth frame. n is initially set to 1.

S322，获取短横屏视频素材中的第n+1帧。S322: Obtain the n+1th frame in the short horizontal screen video material.

S323，以第n帧中画面主体的中心在第n+1帧中所在位置作为圆心，以预设搜索半径为半径在第n+1帧中绘制一个圆形，将该圆形的外接矩形所覆盖的区域作为局部搜索区域。预设搜索半径初始定义为局部搜索半径。S323, taking the position of the center of the screen main body in the nth frame in the n+1th frame as the center of the circle, and using the preset search radius as the radius to draw a circle in the n+1th frame, and the circumscribed rectangle of the circle is The covered area serves as the local search area. The preset search radius is initially defined as the local search radius.

具体地，第n帧中画面主体的中心就是第n帧中画面主体的物理中心，即两个对角线的交点。因为画面主体是一个矩形框，画面主体的中心就是矩形框的两个对角线的交点。本申请中所有的“中心”都是指矩形框的两个对角线的交点。Specifically, the center of the picture main body in the nth frame is the physical center of the picture main body in the nth frame, that is, the intersection of two diagonal lines. Because the main body of the picture is a rectangular frame, the center of the main body of the picture is the intersection of the two diagonal lines of the rectangular frame. All "center" in this application refers to the intersection of two diagonal lines of a rectangular box.

预设搜索半径初始定义为局部搜索半径是为了初始步骤执行顺畅。The preset search radius is initially defined as the local search radius for smooth execution of the initial steps.

图2为本申请一实施例提供的横屏视频素材中画面主体，预设搜索半径，和局部搜索区域之间的位置关系示意图，本步骤可以结合图2进行理解。FIG. 2 is a schematic diagram of the positional relationship between a screen body, a preset search radius, and a local search area in a landscape video material provided by an embodiment of the present application. This step can be understood in conjunction with FIG. 2 .

S324，在局部搜索区域中截取多个与第n帧中画面主体形状相同且覆盖面积相同的局部子搜索区域。S324: Cut out a plurality of local sub-search areas in the local search area that have the same shape and the same coverage area as the main body of the picture in the nth frame.

具体地，局部搜索区域覆盖面积设置为远大于第n帧中画面主体覆盖面积，这样才能做到可选择性比较广。局部搜索区域覆盖面积设置为远大于第n帧中画面主体覆盖面积的方式，通过调控预设搜索半径的大小来实现。可选地，局部搜索半径可以设置为大于第1帧中画面主体的矩形框的最长对角线。局部搜索半径可以设置为第1帧中画面主体的矩形框的最长对角线的长度的2倍。Specifically, the coverage area of the local search area is set to be much larger than the coverage area of the main image in the nth frame, so that the selectivity can be relatively wide. The coverage area of the local search area is set to be much larger than the coverage area of the main image in the nth frame, which is achieved by adjusting the size of the preset search radius. Optionally, the local search radius may be set to be larger than the longest diagonal line of the rectangular frame of the picture main body in the first frame. The local search radius can be set to 2 times the length of the longest diagonal of the rectangular frame of the main frame of the picture in the first frame.

如图3所示，在图3中，在局部搜索区域截取了4个位置不同的局部子搜索区域，但是他们的形状相同且覆盖面积相同。As shown in Figure 3, in Figure 3, four local sub-search regions with different positions are intercepted in the local search region, but they have the same shape and cover the same area.

S325，搜寻第n帧中画面主体在第n+1帧中的所在位置。S325 , searching for the position of the image main body in the n+1th frame in the nth frame.

S326，在第n+1帧中，计算第n帧中画面主体和每一个局部子搜索区域的相似度，得到多个相似度。S326 , in the n+1th frame, calculate the similarity between the main body of the picture and each local sub-search area in the nth frame, to obtain a plurality of similarities.

具体地，计算两个矩形轮廓的相似度可以采用多种图像比对方法，例如差分方法。这里采用任意图像比对的方法均可以实现第n帧中画面主体与局部子搜索区域的的相似度比较。Specifically, various image comparison methods, such as a difference method, can be used to calculate the similarity of two rectangular contours. Any method of image comparison can be used here to achieve the similarity comparison between the main body of the picture and the local sub-search area in the nth frame.

S327，获取多个相似度中的最大相似度，判断最大相似度是否大于相似度阈值。S327: Obtain the maximum similarity among the multiple similarities, and determine whether the maximum similarity is greater than a similarity threshold.

具体地，相似度的指标可以用百分比表示。相似度阈值可以设置为92％。Specifically, the index of similarity can be expressed as a percentage. The similarity threshold can be set to 92%.

S331，若最大相似度大于相似度阈值，则将与最大相似度对应的局部子搜索区域的位置作为第n+1帧中画面主体的位置，完成第n+1帧中画面主体的定位。S331 , if the maximum similarity is greater than the similarity threshold, the position of the local sub-search area corresponding to the maximum similarity is taken as the position of the picture main body in the n+1 th frame, and the positioning of the picture main body in the n+1 th frame is completed.

具体地，本步骤相当于“更新”了画面主体的位置。Specifically, this step is equivalent to "updating" the position of the screen main body.

S332，将n加1，将预设搜索半径定义为局部搜索半径，返回所述S322，直至完成短横屏视频素材中所有视频帧的画面主体的定位。S332 , adding 1 to n, defining the preset search radius as a local search radius, and returning to the S322 until the positioning of the picture subjects of all video frames in the short horizontal screen video material is completed.

S333，返回所述S310，直至所有短横屏视频素材均完成所有视频帧的画面主体的位置的定位。S333, returning to the S310, until all the short horizontal screen video materials have completed the positioning of the positions of the picture main bodies of all the video frames.

具体地，当最大相似度大于相似度阈值时，本实施例采用局部搜索，此时代表画面主体尚未运动出画面。Specifically, when the maximum similarity is greater than the similarity threshold, a local search is used in this embodiment, which means that the main body of the picture has not moved out of the picture.

在本申请的一实施例中，在所述S327之后，所述S400还包括：In an embodiment of the present application, after the S327, the S400 further includes:

S341，若最大相似度小于或等于相似度阈值，则将第n帧中画面主体的位置作为第n+1帧中画面主体的位置，完成第n+1帧中画面主体的的定位。S341 , if the maximum similarity is less than or equal to the similarity threshold, the position of the picture main body in the nth frame is taken as the position of the picture main body in the n+1th frame, and the positioning of the picture main body in the n+1th frame is completed.

S342，将n加1，将预设搜索半径定义为全局搜索半径，返回所述S322。所述全局搜索半径大于所述局部搜索半径。S342, add 1 to n, define the preset search radius as a global search radius, and return to the S322. The global search radius is greater than the local search radius.

具体地，当最大相似度小于或等于相似度阈值时，此时代表画面主体运动出画面，下一帧必须转入全局搜索。当最大相似度重新大于相似度阈值时，代表画面主体回到画面中，下一帧重新转入局部搜索。全局搜索半径可以设置为局部搜索半径的2倍。Specifically, when the maximum similarity is less than or equal to the similarity threshold, it means that the main body of the picture moves out of the picture, and the next frame must be turned into a global search. When the maximum similarity is greater than the similarity threshold again, it means that the main body of the picture returns to the picture, and the next frame re-enters the local search. The global search radius can be set to 2 times the local search radius.

本实施例和前一实施例原理相同，不再赘述，区别在于预设搜索半径定义为全局搜索半径，而不是局部搜索半径，也可以参考图2理解，不再另外绘制图像。The principle of this embodiment is the same as that of the previous embodiment, and will not be described again. The difference is that the preset search radius is defined as a global search radius, not a local search radius. It can also be understood with reference to FIG.

在本申请的一实施例中，所述S500包括：In an embodiment of the present application, the S500 includes:

S510，选取一个短横屏视频素材。S510, select a short horizontal screen video material.

S520，获取逐帧跟踪后的短横屏视频素材中每一帧的画面主体的中心位置。S520: Acquire the center position of the picture main body of each frame in the short horizontal screen video material tracked frame by frame.

S530，以每一帧的画面主体的中心位置为中心，在每一帧中裁剪出宽为w，高为h的矩形视频画面。S530 , taking the center position of the picture main body of each frame as the center, crop out a rectangular video picture with a width w and a height h in each frame.

S540，以每一个矩形视频画面作为一个竖屏视频帧，将得到的所有矩形视频画面按照时间先后顺序拼接，得到一个短竖屏视频素材。S540, taking each rectangular video picture as a vertical screen video frame, splicing all the obtained rectangular video pictures in chronological order to obtain a short vertical screen video material.

S550，返回所述S510，直至将所有短横屏视频素材均转化为短竖屏视频素材。S550, returning to the S510 until all the short horizontal screen video materials are converted into short vertical screen video materials.

可选地，在S530之前，可以还对每一帧的画面主体的中心位置进行滤波，得到每一帧的滤波后的画面主体的中心位置，以该位置为中心裁剪出宽为w，高为h的矩形视频画面。Optionally, before S530, the central position of the picture main body of each frame can also be filtered to obtain the central position of the filtered picture main body of each frame, and the width is w and the height is cut out with the position as the center. Rectangular video frame of h.

在S530之中，不直接以每一帧的画面主体的中心位置为中心，而是以每一帧的滤波后的画面主体的中心位置为中心裁剪出宽为w，高为h的矩形视频画面。In S530, instead of directly taking the center position of the main body of the picture of each frame as the center, a rectangular video picture with a width w and a height h is cropped out with the center position of the filtered picture main body of each frame as the center .

滤波的作用是平滑跟踪点，从而消除宽为w，高为h这个裁剪框在时序上的抖动，让最终裁剪出来的视频画面更加平顺。可选地，滤波的方式是对每连续的三帧画面主体的中心位置计算加权平均值。例如，滤波后的第2帧的画面主体的中心位置就是第1帧，第2帧和第3帧画面主体的中心位置的加权平均值。滤波后的第3帧的画面主体的中心位置就是第2帧，第3帧和第4帧画面主体的中心位置的加权平均值。The function of filtering is to smooth the tracking points, so as to eliminate the jitter in the timing of the cropping frame whose width is w and height is h, so that the final cropped video picture is smoother. Optionally, the filtering method is to calculate a weighted average value for the center position of the main body of the picture in every three consecutive frames. For example, the center position of the screen main body of the second frame after filtering is the weighted average of the center positions of the screen main body of the first frame, the second frame and the third frame. The center position of the main body of the image in the third frame after filtering is the weighted average of the center positions of the main image in the second frame, the third frame, and the fourth frame.

以上所述实施例的各技术特征可以进行任意的组合，各方法步骤也并不做执行顺序的限制，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above-described embodiments can be combined arbitrarily, and the execution order of each method step is not limited. For the sake of brevity, all possible combinations of the technical features in the above-described embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of the description in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as a limitation on the scope of the patent of the present application. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the present application should be determined by the appended claims.