CN114025232A

Movatterモバイル変換

Info

Publication number: CN114025232A
Application number: CN202111232378.0A
Authority: CN
Inventors: 王传鹏; 张昕玥; 张婷; 孙尔威; 李腾飞; 周惠存; 陈春梅
Original assignee: Shanghai Hard Link Network Technology Co ltd
Current assignee: Shanghai Hard Link Network Technology Co ltd
Priority date: 2021-10-22
Filing date: 2021-10-22
Publication date: 2022-02-08
Anticipated expiration: 2041-10-22
Also published as: CN114025232B

Abstract

The invention discloses a video material cutting method, a video material cutting device, terminal equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a plurality of video segments to be clipped; identifying whether a target video element exists in a video segment to be edited; determining clipping points of video segments to be clipped according to the target video elements to obtain a plurality of first target video segments containing the target video elements and a plurality of second target video segments not containing the target video elements; matching the first target video clip with the second target video clip, and taking the first target video clip and the second target video clip which meet the preset matching relationship as candidate clips to be spliced; and splicing the candidate segments to be spliced to obtain a third target video segment. The invention can find the video segments (such as the video segments of characters, identical plot segments and playing methods) meeting the preset splicing requirement at the fastest speed from the plurality of video segments and cut the video segments, and the spliced video segments are more consistent.

Description

Video material cutting method and device, terminal equipment and readable storage medium

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a method and an apparatus for cutting a video material, a terminal device, and a readable storage medium.

Background

Video editing is to select, accept or reject, decompose and splice a large number of materials shot in film production, and finally complete a coherent and smooth work with clear meaning, vivid theme and artistic appeal. At present, the cropping before video splicing is performed manually, a user needs to identify similar elements (such as characters, scenario segments with the same scene and video segments with a playing method) in a video material and then perform corresponding cropping, and the video material is increased by multiples along with the time length of video publishing, so that the material required by manually cropping video splicing needs a very large workload, consumes time and is labor-consuming.

Disclosure of Invention

Embodiments of the present invention provide a method and an apparatus for cutting a video material, a terminal device, and a readable storage medium, which can automatically find and cut a video clip meeting a preset splicing requirement at the fastest speed among a plurality of video clips, and enable splicing of the cut video clips to be more consistent, and the cutting and splicing efficiency of the video material to be higher.

An embodiment of the present invention provides a method for cutting a video material, including:

acquiring a plurality of video segments to be clipped;

identifying whether a target video element exists in a video segment to be edited;

determining clipping points of video segments to be clipped according to the target video elements to obtain a plurality of first target video segments containing the target video elements and a plurality of second target video segments not containing the target video elements;

matching the first target video clip with the second target video clip, and taking the first target video clip and the second target video clip which meet the preset matching relationship as candidate clips to be spliced;

and splicing the candidate segments to be spliced to obtain a third target video segment.

As an improvement of the above solution, the determining, according to the target video element, a clipping point of the video segment to be clipped, to obtain a plurality of first target video segments containing the target video element and a plurality of second target video segments not containing the target video element:

and if the target video elements exist, forming a plurality of first target video segments by using the continuous preset number of frames of the video elements, and forming a plurality of second target video segments by using the rest continuous video frames.

As an improvement of the above scheme, the matching the first target video segment and the second target video segment, and taking the first target video segment and the second target video segment that satisfy the preset matching relationship as candidate segments to be spliced, includes:

identifying characters and scenes of the first target video clips and the second target video clips to obtain character tags and scene tags corresponding to the first target video clips and the second target video clips;

and taking the first target video segment and the second target video segment which have the same character label and scene label as candidate segments to be spliced.

As an improvement of the above scheme, the splicing the candidate segments to be spliced to obtain a third target video segment includes:

judging whether the number of the candidate segments to be spliced is greater than a preset number threshold value or not;

if so, selecting a first target video clip and a second target video clip with the highest matching degree from the candidate clips to be spliced for splicing to obtain a third target video clip;

and if not, splicing the candidate segments to be spliced to obtain a third target video segment.

As an improvement of the above solution, the acquiring a plurality of video segments to be clipped includes:

determining scene change points corresponding to different scenes of a video material to be cut according to the color characteristics and the structural characteristics of each video frame in the video material to be cut; wherein, the corresponding lens visual angles of different scenes are different;

and segmenting the video material to be cut according to the scene switching point to obtain a plurality of video segments to be cut.

As an improvement of the above scheme, the determining a scene change point corresponding to different scenes of the video material to be cut according to the color feature and the structural feature of each video frame in the video material to be cut includes:

respectively extracting color features and structural features of each video frame in a video material to be cut;

calculating the feature similarity of any two adjacent video frames in the video material to be cut according to the color feature and the structural feature of each video frame;

and determining a frame node as a scene transition point when the feature similarity of two adjacent video frames in the video material to be cut meets a preset condition.

carrying out integral optimization on the color indexes of the candidate segments to be spliced;

and splicing the candidate segments to be spliced after the color indexes are optimized to obtain a third target video segment.

Another embodiment of the present invention correspondingly provides a video material cropping device, including:

the video clip acquisition module is used for acquiring a plurality of video clips to be clipped;

the video element identification module is used for identifying whether a target video element exists in a video segment to be clipped;

the video clip cutting module is used for determining the clipping points of the video clips to be clipped according to the target video elements to obtain a plurality of first target video clips containing the target video elements and a plurality of second target video clips not containing the target video elements;

the video clip matching module is used for matching the first target video clip with the second target video clip and taking the first target video clip and the second target video clip which meet the preset matching relationship as candidate clips to be spliced;

and the video segment splicing module is used for splicing the candidate segments to be spliced to obtain a third target video segment.

Another embodiment of the present invention provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the processor implements the video material cutting method according to the above embodiment of the present invention.

Another embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to execute the video material cutting method according to the above embodiment of the present invention.

Compared with the prior art, the video material cutting method, the video material cutting device, the terminal equipment and the computer readable storage medium disclosed by the embodiment of the invention have the following advantages:

target video element identification is carried out on a plurality of video segments to be clipped, and clipping points of the video segments to be clipped are determined according to the target video elements, so that a plurality of first target video segments containing the target video elements and a plurality of second target video segments not containing the target video elements are obtained; then, matching the first target video clip with the second target video clip, and taking the first target video clip and the second target video clip which meet the preset matching relationship as candidate clips to be spliced; and finally, splicing the candidate segments to be spliced to obtain a third target video segment. Based on the method, the video clips (such as the video clips of characters, identical plot clips and identical playing methods) meeting the preset splicing requirements can be found out from the video clips in a large number of video clips at the fastest speed and are cut out, splicing of the cut video clips is more consistent, and the efficiency of cutting and splicing the video materials is higher.

Drawings

Fig. 1 is a schematic flow chart of a video material cropping method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a process of cutting 25 frames of video material to be cut according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a video material cropping device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a schematic flowchart of a video material cropping method according to an embodiment of the present invention, including the following steps:

and S1, acquiring a plurality of video segments to be clipped.

In this embodiment, a plurality of video segments to be clipped are pre-clipped from the same video material to be clipped (of course, they may also be pre-clipped from a plurality of video materials to be clipped), and the video material to be clipped may be a section of video that needs to be clipped into a work, or a plurality of sections of video that need to be clipped and then spliced into a section of a complete work. If the video material to be cropped includes only one scene, no preliminary scene segmentation is necessary. If a plurality of scenes of the video material to be cut exist, the video material needs to be subjected to primary scene segmentation, and the expected position of the segmentation is a frame node with shot direction conversion and great color characteristic or structural characteristic conversion. It will be appreciated that a color feature is a global feature describing the surface properties of a scene to which an image or image region corresponds. The color features may be represented by a color histogram, a color matrix, or the like. The structural feature is a structural relationship of positions of each part of content of the representation image, and can be represented by a pixel matrix feature. It should be noted that, the plurality of representations is two, three or more, and the present invention is applicable to video materials with many scenes.

And S2, identifying whether the video segment to be clipped has the target video element.

In this embodiment, a part of the video segments to be clipped includes the target video element (e.g. the target game character or the target game prop), and another part of the video segments to be clipped does not include the target video element. As an example, the target video element is a pawn of a game for sanxiao. The video frame part of the video segment to be clipped, which contains the chess pieces, is a chess piece playing segment, and the video frame part of the video segment to be clipped, which does not contain the chess pieces, is other types of segments, such as a game scenario segment.

Specifically, the content of the video frame element in the video segment to be clipped is identified through a pre-trained network model, and whether the identified content of the video frame element is a predetermined target video element (for example, the target video element is a target game character or a target game item) is determined.

S3, determining clipping points of the video clips to be clipped according to the target video elements, and obtaining a plurality of first target video clips containing the target video elements and a plurality of second target video clips not containing the target video elements.

Specifically, the step S3 includes:

and S30, if the target video elements exist, forming a plurality of first target video segments by using the continuous preset number of frames of the video elements, and forming a plurality of second target video segments by using the rest continuous video frames.

Specifically, for video frames containing the same target video element, the feature similarity between the video frames is high, so that the content of each video frame in each video segment to be clipped can be identified through an artificial intelligence algorithm (for example, training an image classification model), the target video element in each video segment to be clipped is judged according to the identification result, and then the video frame containing the same target video element is identified, so as to determine the corresponding clipping point: determining a first frame and a last frame in a continuous preset number of frames containing the target video element as clipping points of the video segment to be clipped so as to clip the video segment to be clipped into a first target video segment; the remaining consecutive video frames of the video segment to be edited constitute a second target video segment.

For ease of understanding, the following examples are presented herein: there are many games combining the sanxiao game and the RPG game, and for these games and users, they usually pay more attention to the advertisement works formed by editing the video segments of the sanxiao game and the scenario video segments. Then, as shown in fig. 2, the video material of 10-frame video image is taken as an example, wherein the target video element is a chess piece representing the game of san xiao. Firstly, whether the content of each video frame of a video segment to be clipped contains the image content of 'three vanishing chessmen' or not is identified, it is determined that the video frames 3-5 and 8-10 in the video segment to be clipped contain the target video element of 'three vanishing chessmen', and the video frames 1-2 and 6-7 in the video segment to be clipped contain no target video element of 'three vanishing chessmen'. The clipping points corresponding to the 'three-elimination chessmen' in the video segments to be clipped are determined to be 3, 5, 8 and 10, wherein the video segment formed by the video frames 1-2 is the second target video segment, the video segment formed by the video frames 3-5 is the first target video segment, the video segment formed by the video frames 6-7 is the second target video segment, and the video segment formed by the video frames 8-10 is the first target video segment.

S4, matching the first target video clip with the second target video clip, and taking the first target video clip and the second target video clip which meet the preset matching relationship as candidate clips to be spliced.

As an example, if the preset matching relationship is that the labels of the video segments are the same (the matching degree of the first target video segment and the second target video segment can be understood as 100%, that is, the matching degree is the highest if the labels are the same in a label matching manner), then the step S4 includes the following sub-steps:

s40, identifying the characters and scenes of the first target video clips and the second target video clips to obtain the character labels and scene labels corresponding to the first target video clips and the second target video clips;

and S41, taking the first target video clip and the second target video clip with the same character tag and scene tag as candidate clips to be spliced.

Specifically, the characters and scenes of the plurality of first target video segments and the plurality of second target video segments are identified through a pre-trained network model, and character tags and scene tags (for example, target game scene tags or target game character tags) corresponding to each of the first target video segments and the second target video segments are obtained.

As another example, the preset matching relationship is that the matching degree of the first target video segment and the second target video segment is greater than a preset matching threshold. The matching degree of the two can be calculated through an algorithm such as SIFT feature matching.

And S5, splicing the candidate segments to be spliced to obtain a third target video segment.

To sum up, in the video material cutting method provided by the embodiment of the present invention, first, target video element identification is performed on a plurality of video segments to be cut, and a cutting point of the video segment to be cut is determined according to the target video element, so as to obtain a plurality of first target video segments including the target video element and a plurality of second target video segments not including the target video element; then, matching the first target video clip with the second target video clip, and taking the first target video clip and the second target video clip which meet the preset matching relationship as candidate clips to be spliced; and finally, splicing the candidate segments to be spliced to obtain a third target video segment. Based on the method, the video clips (such as the video clips of characters, identical plot clips and identical playing methods) meeting the preset splicing requirements can be found out from the video clips in a large number of video clips at the fastest speed and are cut out, splicing of the cut video clips is more consistent, and the efficiency of cutting and splicing the video materials is higher.

In the above-described embodiment, as an example, the step S5 includes:

s50, judging whether the number of the candidate segments to be spliced is larger than a preset number threshold value;

s51, if yes, selecting a first target video clip and a second target video clip with the highest matching degree from the candidate clips to be spliced for splicing to obtain a third target video clip;

and S52, if not, splicing the candidate segments to be spliced to obtain a third target video segment.

In this embodiment, when the number of candidate segments to be spliced of the same type is too large, the first target video segment and the second target video segment with the highest matching degree may be selected from the candidate segments to be spliced for splicing, so that it may be ensured that the spliced content of the video material is not too redundant.

In a specific embodiment, the step S1 includes the following sub-steps:

s11, determining scene change points corresponding to different scenes of the video material to be cut according to the color characteristics and the structural characteristics of each video frame in the video material to be cut; wherein, the corresponding lens visual angles of different scenes are different;

s12, segmenting the video material to be cut according to the scene change point to obtain a plurality of video segments to be cut.

The method comprises the following steps of extracting the color characteristic and the structural characteristic of each video frame by detecting the video material to be cut frame by frame so as to judge whether a scene changes: if the color characteristics and the structural characteristics between two adjacent frames of videos are greatly changed, the visual angle of a video shot is judged to be changed, namely the scene of the video is changed, and at the moment, the frame node between the two adjacent frames of videos is used as a scene change point; if the color characteristics and the structural characteristics between two adjacent frames of videos do not change greatly, the video shot visual angle is not changed, namely the scene of the video is not changed, and the frame node between the two adjacent frames of videos does not need to be used as a scene change point. Therefore, after scene transition points corresponding to different scenes of the video material to be cut are determined, the video material to be cut can be subjected to primary scene segmentation according to the scene transition points, and therefore a plurality of video clips to be cut of a single scene are obtained.

Illustratively, as shown in fig. 2(a), the video material to be cut is a video segment completely containing 25 frames of video images. Firstly, the video materials of the 25 frames of video images are detected frame by frame, and the color characteristic and the structural characteristic of each video frame are determined. If the number of frame nodes with color characteristics and structural characteristics of two adjacent video frames greatly changed is 4, and the frame nodes of the 2 nd and 3 rd video frames, the frame nodes of the 8 th and 9 th video frames, the frame nodes of the 15 th and 16 th video frames and the frame nodes of the 23 th and 24 th video frames are sequentially determined, the number of corresponding scene conversion points is 4, and the scene conversion points are sequentially a scene conversion point (I), a scene conversion point (II), a scene conversion point (III) and a scene conversion point (IV). And automatically cutting the video material of the 25 frames of video images into 5 video segments to be cut of a single scene according to the 4 scene switching points, wherein the video segments to be cut are avideo segment 1 to be cut, avideo segment 2 to be cut, avideo segment 3 to be cut, avideo segment 4 to be cut and avideo segment 5 to be cut in sequence.

Specifically, the step S11 includes the following sub-steps:

s110, respectively extracting color features and structural features of each video frame in the video material to be cut;

s111, calculating the feature similarity of any two adjacent video frames in the video material to be cut according to the color feature and the structural feature of each video frame;

and S112, determining a frame node as a scene transition point when the feature similarity of two adjacent video frames in the video material to be cut meets a preset condition.

In this embodiment, the content of the entire video material is detected frame by frame, the color feature and the structural feature of each frame are extracted, the feature similarity between two adjacent video frames is calculated, and if the feature similarity is judged to be smaller than a preset threshold value, it is judged that the scenes of the two adjacent video frames are switched; and if the feature similarity is judged to be larger than or equal to the preset threshold value, judging that the scenes of the two adjacent video frames are not switched.

In a more specific embodiment, the positions where the local peak occurs in the difference between two adjacent video frames and the rule requirement is satisfied are judged through the feature similarity of the two adjacent video frames, and scene segmentation processing is performed, that is, when the difference curve of the two adjacent video frames reaches the local maximum, the frame node is determined as a scene transition point.

Specifically, the person recognition of the plurality of first target video segments and the plurality of second target video segments in the step S40 may be performed by:

it can be understood that the number of people in the target video segment may be 0, 1, 2, or more than 2, for example, if all video frames in a certain target video segment do not contain people, the number of people in the target video segment is 0; for another example, if some video frames in a certain target video segment do not include people, and other video frames include 1, 2, or more than 2 people, the people in the target video segment correspond to 1, 2, or more than 2 people.

In this embodiment, the number of people included in each target video segment and the role category corresponding to each person can be identified through the preset target identification model. And then determining the target person of each target video clip according to a first preset rule. It is understood that the target person may be a single person or a group of persons, such as two persons or three persons.

In one example, the first preset rule includes: (1) determining the person appearing in the corresponding target video segment with the highest frequency as the target person of the target video segment, such as a hero; (2) one or more specific characters selected by default or user definition of the system are determined as the target characters of the target video segment.

Correspondingly, the target person in each target video segment is determined according to a different first preset rule.

If the first preset rule (1) is adopted, the occurrence frequency of each person in the corresponding target video clip needs to be counted so as to determine the role type of the hero and the hero.

Specifically, in the recognition stage, a preset target recognition model performs character sub-image extraction on the video frames of each target video clip, performs specified category classification according to the characteristics of the character sub-images, and performs overall category frequency statistics and rules to realize final role category judgment.

More specifically, video frames near the clipping point may be selected, sub-images are cut out after people are detected in the whole image of each video frame, and then features are calculated and category classification calculation is performed. For example, in a target video segment, the number of times a certain person appears, the number of frames appeared, the percentage of the number of frames appeared to the total number of frames of the segment, etc. are summarized, and the hero and role categories of the target video segment, such as the uniform color categories of "boy dresses", "ada", "zombies", etc., are determined by combining the target rule of "principal finding".

If the first preset rule (2) is adopted, specific persons in each target video clip need to be directly identified and determined as target persons.

After the target person in each target video segment is determined, the mode of optimizing the consistency of the persons of the target video segments before and after the target person is determined is as follows: judging whether the target characters in each target video segment are the same or not, if not, determining the final target characters of all the target video segments according to a second preset rule, screening out the video frames where the final target characters in all the target video segments are located, obtaining uniform final target characters, and ensuring that the characters are consistent with each target video segment; if yes, the video frames of the final target characters of all the target video clips do not need to be screened out.

In one example, the second preset rule includes: (1) determining the target person with the highest frequency in all the target video clips as the final target person of all the target video clips; (2) and determining the target person selected by default of the system or user definition as the final target person of all the target video clips.

In a specific embodiment, the target recognition model needs to perform extraction and class training of a character subgraph on an original video in advance to obtain a preset target recognition model, and the corresponding training process is as follows:

in the training stage, firstly, extracting the position of a character subgraph in a designated game according to a target recognition model (such as a yolo model), further selecting and classifying features after intercepting the character subgraph, and then defining the role category aiming at the feature points expected to be matched, wherein for example, a male role comprises a police image, an equipment image and the like which appear once; in order to improve the total amount of the data set and balance the data amount in different categories (the appearance ratio of each role in the material is greatly different), the data set is subjected to data enhancement, and the quality of the data set is improved by methods of stretching, overturning, adding noise points and the like; an image classification model (e.g., resnet50 model) is then trained and tuned based on the created data set.

As a supplementary description of a case of the embodiment of the matching degree calculation, in a case where most of the image foreground contents of the image frames of both the first target video segment and the second target video segment are substantially the same but the image scenes (backgrounds) are not all the same, when matching of both the first target video segment and the second target video segment is performed not by the above-described manner but by the calculation manner of the matching degree of the relevant image foreground features, for example, when matching of both the first target video segment and the second target video segment is performed by successfully performing feature matching on the video foreground contents (for example, people) of both the first target video segment and the second target video segment and obtaining candidate segments to be stitched, it is also possible to determine whether a scene between two adjacent candidate segments to be stitched changes.

Specifically, the lengths of the candidate segments to be spliced are judged and inferred according to the transition duration and the like, each candidate segment to be spliced is in a plurality of transition fields near a splicing point, for example, 5 seconds before and after the splicing point can cover two adjacent candidate segments to be spliced, scene changes of 5 seconds before and after the splicing point of each candidate segment to be spliced are compared, if the scene changes are judged to be large, it is indicated that the two adjacent candidate segments to be spliced are transitioned, if the scenes are judged to be similar, it is indicated that the two adjacent candidate segments to be spliced are not transitioned, and scene changes before and after the shearing point of the next target video segment are continuously detected until the scene changes before and after all the splicing points are judged.

Adding some preset video connection segments between two adjacent candidate segments to be connected with the changed scene to enable the scenes of the candidate segments to be connected with two adjacent final target characters to be connected continuously, so as to obtain the candidate segments to be connected (namely all the target video segments to be connected) with the continuous final target characters and continuous scenes. The video joining segments are various preset transition videos and play a role in joining candidate segments to be spliced of two different scenes. It should be noted that, the identification of the scene of the candidate segment may be performed by a related image background feature identification algorithm (such as a related OpenCV background detection algorithm, etc.), so as to obtain a scene tag of the candidate segment.

In a specific embodiment, the step S5 includes the following steps:

s50, carrying out overall optimization on the color indexes of the candidate segments to be spliced;

and S51, splicing the candidate segments to be spliced after the color indexes are optimized to obtain a third target video segment.

In this embodiment, the color index of each candidate segment is optimized as a whole, so that the display look and feel of each candidate segment are more consistent. The color index of the candidate segment includes, but is not limited to, the brightness, color temperature, contrast, etc. of the image.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a video material cropping device according to another embodiment of the present invention, including:

the videoclip acquisition module 1 is used for acquiring a plurality of video clips to be clipped;

the videoelement identification module 2 is used for identifying whether a target video element exists in a video segment to be clipped;

the videoclip cutting module 3 is configured to determine a cutting point of a video clip to be cut according to a target video element, and obtain a plurality of first target video clips including the target video element and a plurality of second target video clips not including the target video element;

the videosegment matching module 4 is used for matching the first target video segment with the second target video segment and taking the first target video segment and the second target video segment which meet the preset matching relationship as candidate segments to be spliced;

and the videosegment splicing module 5 is used for splicing the candidate segments to be spliced to obtain a third target video segment.

In the embodiment, a plurality of first target video segments containing target video elements and a plurality of second target video segments not containing target video elements are obtained by performing target video element identification on a plurality of video segments to be clipped and determining clipping points of the video segments to be clipped according to the target video elements; then, matching the first target video clip with the second target video clip, and taking the first target video clip and the second target video clip which meet the preset matching relationship as candidate clips to be spliced; and finally, splicing the candidate segments to be spliced to obtain a third target video segment. Based on the method, the video clips (such as the video clips of characters, identical plot clips and identical playing methods) meeting the preset splicing requirements can be found out from the video clips in a large number of video clips at the fastest speed and are cut out, splicing of the cut video clips is more consistent, and the efficiency of cutting and splicing the video materials is higher.

Another embodiment of the present invention provides a terminal device, where the terminal device of the embodiment includes: a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the various video material cutting method embodiments described above when executing the computer program, such as steps S1-S3 shown in fig. 1. Or, the processor implements the functions of the modules in the above-mentioned video material cutting device when executing the computer program.

Another embodiment of the present invention provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, an apparatus on which the computer-readable storage medium is located is controlled to execute the steps in the above-described embodiments of the video material cutting method, such as steps S1-S3 shown in fig. 1.

Illustratively, the computer program may be partitioned into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device. For example, the computer program may be divided into a videosegment acquisition module 1, a videoelement identification module 2, a videosegment cutting module 3, a videosegment matching module 4, and a videosegment splicing module 5, and the specific functions of each module are as follows:

the videoclip acquisition module 1 is used for acquiring a plurality of video clips to be clipped; the videoelement identification module 2 is used for identifying whether a target video element exists in a video segment to be clipped; the videoclip cutting module 3 is configured to determine a cutting point of a video clip to be cut according to a target video element, and obtain a plurality of first target video clips including the target video element and a plurality of second target video clips not including the target video element; the videosegment matching module 4 is used for matching the first target video segment with the second target video segment and taking the first target video segment and the second target video segment which meet the preset matching relationship as candidate segments to be spliced; and the videosegment splicing module 5 is used for splicing the candidate segments to be spliced to obtain a third target video segment.

The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a terminal device and does not constitute a limitation of a terminal device, and may include more or less components than those shown, or combine certain components, or different components, for example, the terminal device may also include input output devices, network access devices, buses, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal device and connects the various parts of the whole terminal device using various interfaces and lines.

The memory may be used for storing the computer programs and/or modules, and the processor may implement various functions of the terminal device by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the module integrated with the terminal device can be stored in a computer readable storage medium if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A method for cropping video material, comprising:

acquiring a plurality of video segments to be clipped;

2. The method for cropping video material according to claim 1, wherein said determining the cropping point of the video segment to be cropped based on the target video element yields a plurality of first target video segments containing the target video element and a plurality of second target video segments not containing the target video element:

3. The method for cropping video material according to claim 1, wherein the matching the first target video clip and the second target video clip, and using the first target video clip and the second target video clip satisfying a preset matching relationship as candidate clips to be stitched, comprises:

4. The method for cropping video material according to claim 1, wherein said stitching the candidate segments to be stitched to obtain a third target video segment comprises:

5. The method for cropping video material according to claim 1, wherein said obtaining a plurality of video segments to be cropped comprises:

6. The method for cropping video material according to claim 1, wherein said determining scene transition points corresponding to different scenes of the video material to be cropped according to the color features and the structure features of each video frame in the video material to be cropped comprises:

7. The method for cropping video material according to any one of claims 1 to 6, wherein said cropping the candidate segments to be cropped to obtain a third target video segment comprises:

8. A video material cropping apparatus, comprising:

9. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the video material cutting method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus on which the computer-readable storage medium is located to perform the video material cutting method according to any one of claims 1 to 7.