CN113596574A

Movatterモバイル変換

Info

Publication number: CN113596574A
Application number: CN202110878236.5A
Authority: CN
Inventors: 徐桃
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-11-02

Abstract

Translated fromChinese

本申请公开了一种视频处理方法、视频处理装置、电子设备和可读存储介质，属于视频处理技术领域。该视频处理方法包括：获取至少两个视频文件；确定所述至少两个视频文件中包含目标视频片段的主视频文件和辅视频文件，其中，所述目标视频片段为同一场景下拍摄的视频片段；获取所述辅视频文件的目标视频片段中的目标对象；将所述目标对象添加至所述主视频文件的目标视频片段中，生成目标视频文件。

The present application discloses a video processing method, a video processing apparatus, an electronic device and a readable storage medium, which belong to the technical field of video processing. The video processing method includes: acquiring at least two video files; determining a main video file and an auxiliary video file containing a target video clip in the at least two video files, wherein the target video clip is a video clip shot in the same scene obtaining the target object in the target video segment of the auxiliary video file; adding the target object to the target video segment of the main video file to generate a target video file.

Description

Video processing method, video processing apparatus, electronic device, and readable storage medium

Technical Field

The present application belongs to the field of video processing technologies, and in particular, to a video processing method, a video processing apparatus, an electronic device, and a readable storage medium.

Background

With the explosion of short videos, taking videos has become a current trend. Meanwhile, users have strong requirements for editing videos. It should be noted that the video editing technology is a technology for processing a video file, and editing operations such as cutting, splicing, adding characters, adding pictures, adding sound effects, and the like can be performed on the video file generally.

At present, a common video editing technology is video splicing, and a plurality of sections of different video files are spliced front and back, so that a plurality of sections of videos are edited into one video.

However, as new user needs arise, current video editing has been unable to meet the new needs.

Disclosure of Invention

An object of the embodiments of the present application is to provide a video processing method, a video processing apparatus, an electronic device, and a readable storage medium, which can solve the problem that the video processing method in the prior art cannot meet new user requirements.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a video processing method, where the video processing method includes:

acquiring at least two video files;

determining a main video file and an auxiliary video file containing target video clips in the at least two video files, wherein the target video clips are video clips shot in the same scene;

acquiring a target object in a target video clip of the auxiliary video file;

and adding the target object into a target video clip of the main video file to generate a target video file.

In a second aspect, an embodiment of the present application provides a video processing apparatus, including:

the first acquisition module is used for acquiring at least two video files;

the determining module is used for determining a main video file and an auxiliary video file which comprise target video clips in the at least two video files, wherein the target video clips are video clips shot in the same scene;

the second acquisition module is used for acquiring a target object in a target video clip of the auxiliary video file;

and the processing module is used for adding the target object to a target video clip of the main video file to generate a target video file.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the video processing method according to the first aspect.

In a fourth aspect, the present application provides a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the video processing method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the video processing method according to the first aspect.

In the embodiment of the application, a main video file and an auxiliary video file including a target video clip can be determined from at least two video files, wherein the target video clip is a video clip shot in the same scene. And by automatically identifying scene characteristics, the main video file and the auxiliary video file comprising the same target scene are used as the objects of video processing. Here, the target scene is the same scene in the target video clips of the main video file and the auxiliary video file. And then acquiring a target object in a target video clip of the auxiliary video file, adding the target object into the target video clip of the main video file, and generating the target video file, so that the picture content in the target scene in the target video file not only comprises the picture content in the target scene in the main video file, but also comprises part of the picture content in the target scene in the auxiliary video file, and the requirement of fusing different picture contents shot in the same scene by a user is met.

Drawings

Fig. 1 is a flowchart illustrating steps of a video processing method according to an embodiment of the present disclosure;

fig. 2 is a schematic processing diagram of a video processing method according to an embodiment of the present application;

fig. 3 is a second schematic processing procedure of the video processing method according to the embodiment of the present application;

fig. 4 is a block diagram of a video processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present disclosure;

fig. 6 is a second schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The video processing method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

As shown in fig. 1, a video processing method provided in an embodiment of the present application includes:

step 101: at least two video files are obtained.

In this step, at least two video files are selected by the user according to the requirement of the user. That is, when a user needs to process a plurality of video files at the same time, the user can select the plurality of video files to be processed. Specifically, video files locally available to the electronic device may be displayed for selection by a user, and then at least two video files that need to be processed are determined based on the selection by the user. For example, when the electronic device is a mobile phone, the local video files of the mobile phone are displayed through the album, and after the user selects at least two video files, the subsequent steps can be triggered through the target control displayed on the current page.

Step 102: and determining a main video file and an auxiliary video file containing the target video clip in the at least two video files.

In this step, the target video clips are video clips shot in the same scene. That is, the main video file and the sub video file include video clips shot in the same scene. The scene can be understood as the background of the video clip shooting, and when two pieces of video are shot with the same building as the background, the scenes in the two pieces of video can be considered to be the same. It is understood that a person, an animal, or the like appearing in a captured picture during capturing is generally regarded as a captured object rather than a background. Therefore, when two videos are shot with the same building as the background, even if the persons appearing in the shot pictures are different, the scenes in the two videos can be considered to be the same.

The target video clip in the main video file may be a part or all of the main video file, and similarly, the target video clip in the auxiliary video file may also be a part or all of the auxiliary video file. The primary video file typically comprises one video file and the secondary video file may be one or at least two video files. Preferably, in the case where the main video file and the auxiliary video file do not exist in the at least two video files, a prompt message may be displayed for prompting the user to reselect the video file.

Step 103: and acquiring a target object in a target video clip of the auxiliary video file.

In this step, the target object may be any object appearing in the video frame in the target video clip, and the number of the target objects may be one or at least two. For example, if two person objects exist in the video frame of the target video clip, the target object may include one of the persons or may include both of the persons. And under the condition that the number of the auxiliary video files is at least two, respectively acquiring the target object in each auxiliary video file. Wherein, the target objects in different auxiliary video files can be the same or different. It can be understood that, when acquiring the target object in the video segment, the image content of the target object in each video frame can be scratched for each video frame in the video segment, so as to obtain a large number of target objects.

Step 104: and adding the target object into the target video clip of the main video file to generate a target video file.

In this step, a target object is added for each video frame in a target video clip of the main video file, where the target objects added for different video frames are continuous images of the same object in a series of behaviors. So that the target object is also performing a series of actions in the target video segment of the main video file. For example, the target video segment of the secondary video file includes a series of images of person a when operated, and the target video segment of the primary video file includes a series of images of person B when operated. The person a is taken as a target object, and after the target object is added to the target video clip of the main video file, the user sees a picture that the person a and the person B operate together from a playing picture of the target video file.

In the embodiment of the application, a main video file and an auxiliary video file including a target video clip can be determined from at least two video files, wherein the target video clip is a video clip shot in the same scene. Thereby, the main video file and the auxiliary video file including the same target scene are taken as the objects of video processing. Here, the target scene is the same scene in the target video clips of the main video file and the auxiliary video file. And then acquiring a target object in a target video clip of the auxiliary video file, adding the target object into the target video clip of the main video file, and generating the target video file, so that the picture content in the target scene in the target video file not only comprises the picture content in the target scene in the main video file, but also comprises part of the picture content in the target scene in the auxiliary video file, and the requirement of fusing different picture contents shot in the same scene by a user is met.

Optionally, the step 102: determining a primary video file and a secondary video file containing a target video clip in at least two video files may include:

the environmental factors in the background images in the at least two video files are acquired.

In this step, a scene is determined based on the background image, that is, the background images in the two video files are the same, and then the scenes in the two video files are the same, and the two video files are video files shot in the same scene. The background images are identical, which means that the background images are completely identical or the background images are very similar. The environmental factors in the background image may be understood as buildings, streets, sky, geographical coordinates, etc. in the scene, which may be one or at least two of them. Whether the two scenes are the same or not can be determined by buildings, streets, sky, geographical coordinates, etc. in the scenes, i.e., whether the two background images are the same or not can be determined by environmental factors in the background images. Preferably, the environmental factors of the background image in the video file can be identified through an AI (Artificial Intelligence) scene identification technology, and the identified environmental factors are labeled and stored.

And matching the environmental factors of at least two video files based on the environmental factors to determine a target video segment, wherein the matching coefficient between the target video segments of different video files is higher than a target threshold value, and the matching coefficient is the ratio of the same environmental factor to all the environmental factors in the background image.

In this step, each of the at least two video files can be respectively matched with the remaining video files for matching the environmental factors, so that any two of the at least two video files can be matched with the environmental factors, and omission does not occur. The matching of the environmental factors is the comparison of the environmental factors, and whether the environmental factors are the same or not is compared. Here, in judging whether or not the two environmental factors are the same, different judgment criteria are provided for the environmental factors that are not different. For example, when the environmental factor is a building, when two buildings have the same "building label" and the similarity of the images of the two buildings is high, the environmental factor is considered to be the same. When the environmental factor is a geographic location, the environmental factor is considered to be the same when the two geographic locations have the same "location tag" and the distance between the two geographic locations is smaller than a preset threshold, wherein the preset threshold is usually smaller, so that the two geographic locations can be regarded as the same geographic location.

When a video clip exists in each of the two video files and the matching coefficient between the two video clips is higher than the target threshold, the two video clips are both the target video clip. For example, if the background image of the first video segment of video 1 includes 8 different environmental factors, the background image of the second video segment ofvideo 2 includes 5 different environmental factors, and 4 of the 5 different environmental factors ofvideo 2 are the same as the environmental factors in video 1, the matching coefficient of the first video segment and the second video segment is 4/5. Assuming that the target threshold is 1/2, the first video segment and the second video segment may be considered both target video segments. It is understood that the target threshold may be set on its own as desired.

One of the video files containing the target video clip is used as a main video file, and part or all of the rest of the video files are used as auxiliary video files.

In this step, in the case that the number of video files containing the target video clip is at least two, which is the main video file and which is the auxiliary video file or files can be determined according to the selection of the user. For example, a selection control may be displayed for video files containing a target video clip, and a user may select one of the video files as a primary video file first and then select again with the selected one or more video files as secondary video files through the selection control.

Fig. 2 is a schematic processing diagram of a video processing method according to an embodiment of the present disclosure;

the shooting scene of the video 1 is a tourist attraction A, and the shooting picture comprises a character A. The shooting scene of thevideo 2 is a tourist attraction a', and the shooting picture includes a character B. The environmental factors of the tourist attraction A in video 1 are matched with the environmental factors of the tourist attraction A' invideo 2. And if the matching coefficient is higher than the target threshold value, the tourist spot A is considered as the tourist spot A'. The video 1 can be used as a main video file, thevideo 2 can be used as an auxiliary video file, and the person B is added into the video 1 to generate the video 3, so that the video 3 not only includes the picture content of the person a in the tourist attraction a, but also includes the picture content of the person B, and a user can see the picture content of the person a and the person B in the tourist attraction a simultaneously by watching the video 3.

In the embodiment of the application, the background image in the video file is regarded as the scene when the video file is shot, and whether the video file contains the video clip shot in the same scene is judged by utilizing the environmental factors in the background image, so that the main video file and the auxiliary video file are determined.

Optionally, the step 103: acquiring a target object in a target video segment of a secondary video file may include:

a first target interface is displayed that includes M object controls.

In this step, each object control indicates an object in a target video clip of the secondary video file, and M is a positive integer. Here, the object in the target video segment is an object in a captured picture of the target video segment. For example, human, animal, environmental factors, etc. Specifically, the shot picture can be identified by using an AI scene identification technology, and the identified shot picture is an object in the shot picture. The specific number of M is related to the number of objects in the target video segment of the secondary video file, with M being less than or equal to the number of objects.

Receiving first input of a user to N object controls in the M object controls, wherein N is a positive integer.

In this step, N is less than or equal to M, and the first input may include a click, a slide, a long press, or the like.

And responding to the first input, acquiring N objects in the target video clip of the secondary video file indicated by the N object controls, and taking the N objects as target objects.

In this step, the N objects indicated by the N object controls are all target objects, and when a target object is added to a target video clip of the main video file, the N objects are added to the target video clip of the main video file, respectively. As shown in fig. 3, which is a second schematic processing procedure diagram of the video processing method provided in the embodiment of the present application, a video 1 is an auxiliary video file, which is a video file shot at a tourist attraction a, and the video content includes not only an image of the tourist attraction a, but also an image of a person D and an image of an environmental factor 1 (e.g., a rainbow). The main video file is also a video file shot at the tourist attraction a, and the video content includes not only the image of the tourist attraction a but also the image of the person B. The user feels that the rainbow is beautiful, and the person D is a favorite star, and wants to add these two elements to the main video file, the user can select to input the first object control and the second object control in the first target interface, where the first object control indicates the person D in the video 1, and the second object control indicates the environmental factor 1 (e.g., rainbow) in the video 1, and then the generated target video file (video 2) will include the tourist spot a, the person D, the person B, and the environmental factor 1 (e.g., rainbow).

In the embodiment of the application, the user can freely select the object in the target video clip of the auxiliary video file based on the self requirement, so that more requirements of the user can be met.

Optionally, in the step 104: after the target object is added to the target video segment of the main video file and the target video file is generated, the method may further include:

displaying a second target interface comprising K editing controls, wherein K is a positive integer, and each editing control corresponds to a section of video clip in the target video file except the target video clip;

in this step, there is no overlap between the video segments corresponding to different editing controls. Preferably, the video segment corresponding to each editing control is a video segment in the same scene, that is, each editing control corresponds to a video segment in a different scene. It can be understood that the target video file generally includes video clips in multiple scenes, the target video clip is only a video clip in one scene, and for the video clips in the remaining scenes, a corresponding editing control is generated for the user to operate.

Receiving second input of a user to L editing controls in the K editing controls, wherein L is a positive integer;

in this step, L is less than or equal to K, and the second input may include a click, a slide, a long press, and the like.

And responding to the second input, intercepting the video segments corresponding to the L editing controls and the target video segment, and outputting the intercepted video segments.

In this step, the target video clip can be understood as the most basic requirement of the user, and the video clips corresponding to the L editing controls can be understood as the personalized requirement of the user. So that the output video clip includes not only the target video clip but also the video clip selected by the user himself. Here, if the video segments corresponding to the L editing controls and the target video segment are discontinuous video segments, the video segments may be spliced according to positions of the video segments in the target video file to form a continuous video segment.

In the embodiment of the application, a user can freely select any video clip except the target video clip in the target video file to be output together with the target video clip according to the personalized requirements of the user.

displaying a third target interface including an output control.

A third input to the output control by the user is received.

In this step, the third input may include operations such as clicking, sliding, long pressing, and the like.

In response to a third input, a target video segment in the target video file is intercepted, and the intercepted target video segment is output.

In this step, the intercepted target video clip is a target video clip added with a target object, which may be a part or all of the target video file. Here, the complete target video file may also be output based on different operations of the user.

In the embodiment of the application, the user can only select the output target video clip based on own requirements.

Optionally, in the step 104: after the target object is added to the target video segment of the main video file and the target video file is generated, the method further comprises the following steps:

and performing picture compensation on the target video frame according to the target object so that the target video frame contains the target object.

In this step, the target video frame includes a video frame of the target object not included in the target video clip of the target video file. The main video file and the sub video file may be video files having the same FPS (Frames Per Second transmission), or may be video files having different FPSs. It will be appreciated that two pieces of video having the same FPS include the same number of frames in the same duration. Under the condition that the FPS of the main video file and the FPS of the auxiliary video file are the same, if each video frame in the target video clip of the auxiliary video file contains a target object and the target video clip of the main video file and the video clip of the auxiliary video file have the same duration, each video frame of the target video clip of the target video file also contains the target object after the target object is added to the target video clip of the main video file. When the FPS of the auxiliary video file is smaller than that of the main video file, target video frames not containing the target object exist in video frames contained in each second in a target video clip of the target video file, and picture compensation is performed on the target video frames, so that the target video frames contain the target object. For example, the FPS of the main video file is 60, that is, each second includes 60 frames of images, the FPS of the auxiliary video file is 40, that is, each second includes 40 frames of images, the time lengths of the target video clips of the main video file and the auxiliary video file are both 60 seconds, and the target object appears in the 1 st second to 30 th seconds of the target video clip of the auxiliary video file, the target object of the 1 st second to 30 th seconds of the target video clip of the auxiliary video file is acquired and added to the 1 st second to 30 th seconds of the target video clip of the main video file to obtain the target video file, specifically, the target object acquired from the N th second of the target video clip of the auxiliary video file is added to the N th second, N e [1, 30], of the target video clip of the main video file. Since the FPS of the main video file and the sub video file are different, only 40 target objects can be acquired for each second, and 40 target objects need to be added to the 60-frame image. At this time, 40 target objects may be sequentially added to the first 40 frames of images in the 60 frames of images, and the remaining 20 frames of images are subjected to picture compensation, that is, any one target object of the 40 target objects may be added to the remaining 20 frames of images, and preferably, the last target object of the 40 target objects is added to each of the remaining 20 frames of images, so that the 60 frames of images all include the target object. Of course, it is also possible to perform picture compensation for 40 target objects, compensate 40 target objects to 60 target objects with consecutive actions according to the consistency between actions of adjacent target objects, and further add 60 target objects to 60 frames of images in sequence.

When the FPS of the auxiliary video file is greater than the FPS of the main video file, for example, the FPS of the main video file is 40, the FPS of the auxiliary video file is 60, the time lengths of the target video clips of the main video file and the auxiliary video file are both 60 seconds, and the target object appears in the 1 st second to 30 th seconds of the target video clip of the auxiliary video file, the target objects of the 1 st second to 30 th seconds of the target video clip of the auxiliary video file are acquired and added to the 1 st second to 30 th seconds of the target video clip of the main video file, so as to obtain the target video file, since the FPS of the main video file and the FPS of the auxiliary video file are different, 60 target objects can be acquired for each second, and 40 objects can be selected from the 60 target objects and added to the 40-frame image. When selecting the target object, the selection may be performed by any selection method, for example, the selection may be performed randomly, sequentially from front to back, or the like. Of course, the FPS of the target video segment of the main video file can be adjusted to 60 in a frame interpolation manner, so that 60 target objects can be added to the 60-frame image.

In the embodiment of the application, through carrying out picture compensation on the target video frame, the pause feeling of the card caused by the target object in the playing process of the target video file can be relieved, so that the action of the target object is smoother and more coherent.

and performing blurring processing on the scene conversion part in the target video file or adding a preset video frame.

In this step, a plurality of scenes may exist in the video content of the target video file, the target video clip is only one of the scenes, and when the target video file is converted from another scene to a scene corresponding to the target video clip, or the scene corresponding to the target video clip is converted from another scene to another scene, the scene conversion may not be smooth and natural due to the addition of the target object in the target video clip. The scene change can be performed at the time of scene change by blurring the scene change portion or adding a preset video frame. The adding of the preset video frame can be adding transition animation, and the specific content of the transition animation can be determined according to the requirements of users.

In the embodiment of the application, the target video file is more smooth and natural in the scene conversion process by blurring the scene conversion part in the target video file or adding the preset video frame.

and acquiring a first shooting parameter value of the main video file and a second shooting parameter value of the auxiliary video file.

In this step, the first shooting parameter value and the second shooting parameter value correspond to the same shooting parameter. Here, the photographing parameters include, but are not limited to, a light parameter, a focal length parameter, a lens parameter, and the like.

And carrying out optimization fitting based on the first shooting parameter value and the second shooting parameter value to obtain a target shooting parameter value.

In this step, through the optimized fitting mode, the shooting parameter value when the playing effect of the target video file is good, that is, the target shooting parameter value, can be found. Preferably, the target photographing parameter value is located between the first photographing parameter value and the second photographing parameter value, but is not limited thereto.

And adjusting the target video file based on the target shooting parameter value.

In the step, through the target video file after the target shooting parameter value is adjusted, all picture contents in the target video clip look more harmonious and consistent.

In the embodiment of the application, the shooting parameters of the target video file are adjusted, so that the nationwide picture content of the target video file looks very harmonious and consistent, and the influence caused by inconsistent shooting parameters of the main video file and the auxiliary video file is reduced.

Optionally, the at least two video files include a video file being recorded locally by the electronic device and a video file obtained from a target server over a network. The server stores a video file uploaded by a user in advance, for example, a user A finishes shooting a video 1 in a certain scene, the system prompts whether the user A is willing to share the video 1 or not, the user A selects the video to be shared, and selects a specific marker (such as a landmark building) to label the video in the scene. Here, the content of the video 1 may be analyzed through an AI algorithm, buildings, streets, geographical position tags, lights, background objects, people, and the like in the video 1 may be identified, and the identified objects may be labeled and stored to form a material library. After the video annotation is completed, the video and the related annotation content can be sent to a server for cloud storage. When other users use the electronic device to shoot videos, if it is determined that the same marker in the video 1 exists in the current video file, the current user is reminded, the current user can browse the video 1 stored in the cloud and other video files stored in the cloud, after the user selects one or more of the video files, the video file selected by the user is downloaded from the cloud to the local electronic device, and thesteps 102 to 104 are executed, so that the video file is edited.

In the embodiment of the application, shooting elements of other people in the same scene can be cited when a user shoots a video, so that the video shooting is more vivid and interesting, the step of later-stage editing can be omitted, and the efficiency is improved.

It should be noted that, in the video processing method provided in the embodiment of the present application, the execution subject may be a video processing apparatus, or a control module in the video processing apparatus for executing the video processing method. In the embodiment of the present application, a video processing apparatus executing a video processing method is taken as an example, and the video processing apparatus provided in the embodiment of the present application is described.

As shown in fig. 4, an embodiment of the present application further provides a video processing apparatus, including:

a first obtainingmodule 41, configured to obtain at least two video files;

a determiningmodule 42, configured to determine a main video file and an auxiliary video file that include a target video clip in at least two video files, where the target video clip is a video clip shot in the same scene;

a second obtainingmodule 43, configured to obtain a target object in a target video clip of the secondary video file;

and theprocessing module 44 is configured to add the target object to the target video segment of the main video file, and generate a target video file.

Optionally, the determiningmodule 42 includes:

the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring environmental factors in background images in at least two video files;

the matching unit is used for matching the environmental factors of at least two video files based on the environmental factors to determine a target video clip, wherein the matching coefficient between the target video clips of different video files is higher than a target threshold value, and the matching coefficient is the ratio of the same environmental factor in the background image to all the environmental factors;

and the determining unit is used for taking one of the video files containing the target video clip as a main video file and taking part or all of the rest video files as auxiliary video files.

Optionally, the second obtainingmodule 43 includes:

the display unit is used for displaying a first target interface comprising M object controls; each object control indicates one object in a target video clip of the auxiliary video file, and M is a positive integer;

the input unit is used for receiving first input of a user to N object controls in the M object controls, wherein N is a positive integer;

and the response unit is used for responding to the first input, acquiring N objects in the target video clip of the secondary video file indicated by the N object controls, and taking the N objects as target objects.

Optionally, the apparatus further comprises:

the first display module is used for displaying a second target interface comprising K editing controls, wherein K is a positive integer, and each editing control corresponds to a section of video clip in the target video file except the target video clip;

the first input module is used for receiving second input of a user to L editing controls in the K editing controls, and L is a positive integer;

and the first response module is used for responding to the second input, intercepting the video segments corresponding to the L editing controls and the target video segment, and outputting the intercepted video segments.

Optionally, the apparatus further comprises:

the second display module is used for displaying a third target interface comprising an output control;

the second input module is used for receiving a third input of the user to the output control;

and the second response module is used for responding to the third input, intercepting the target video clip in the target video file and outputting the intercepted target video clip.

Optionally, the apparatus further comprises:

and the picture compensation module is used for carrying out picture compensation on the target video frame according to the target object so as to enable the target video frame to contain the target object, wherein the target video frame comprises a video frame which does not contain the target object in a target video segment of the target video file.

Optionally, the apparatus further comprises:

and the frame inserting module is used for performing blurring processing on the scene conversion part in the target video file or adding a preset video frame.

Optionally, the apparatus further comprises:

the first shooting parameter module is used for acquiring a first shooting parameter value of the main video file and a second shooting parameter value of the auxiliary video file;

the second shooting parameter module is used for carrying out optimization fitting on the basis of the first shooting parameter value and the second shooting parameter value to obtain a target shooting parameter value;

and the third shooting parameter module is used for adjusting the target video file based on the target shooting parameter value.

In the embodiment of the application, a main video file and an auxiliary video file including a target video clip can be determined from at least two video files, wherein the target video clip is a video clip shot in the same scene. By automatically identifying scene characteristics, a main video file and an auxiliary video file both including the same target scene are used as objects of video processing. Here, the target scene is the same scene in the target video clips of the main video file and the auxiliary video file. And then acquiring a target object in a target video clip of the auxiliary video file, adding the target object into the target video clip of the main video file, and generating the target video file, so that the picture content in the target scene in the target video file not only comprises the picture content in the target scene in the main video file, but also comprises part of the picture content in the target scene in the auxiliary video file, and the requirement of fusing different picture contents shot in the same scene by a user is met.

The video processing apparatus in the embodiment of the present application may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The video processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system (Android), an iOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.

The video processing apparatus provided in the embodiment of the present application can implement each process implemented by the method embodiments of fig. 1 to fig. 3, and is not described herein again to avoid repetition.

Optionally, as shown in fig. 5, anelectronic device 500 is further provided in this embodiment of the present application, and includes aprocessor 501, amemory 502, and a program or an instruction stored in thememory 502 and executable on theprocessor 501, where the program or the instruction is executed by theprocessor 501 to implement each process of the above-mentioned video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 6 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

Theelectronic device 600 includes, but is not limited to: a radio frequency unit 601, anetwork module 602, anaudio output unit 603, aninput unit 604, asensor 605, adisplay unit 606, auser input unit 607, aninterface unit 608, amemory 609, aprocessor 610, and the like.

Those skilled in the art will appreciate that theelectronic device 600 may further comprise a power source (e.g., a battery) for supplying power to the various components, and the power source may be logically connected to theprocessor 610 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 6 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

Amemory 609 for retrieving at least two video files;

theprocessor 610 is configured to determine a main video file and an auxiliary video file including a target video clip in at least two video files, where the target video clip is a video clip shot in the same scene;

theprocessor 610 is further configured to obtain a target object in a target video clip of the secondary video file;

theprocessor 610 is further configured to add the target object to the target video segment of the main video file, and generate a target video file.

It is to be understood that, in the embodiment of the present application, theinput Unit 604 may include a Graphics Processing Unit (GPU) 6041 and amicrophone 6042, and theGraphics Processing Unit 6041 processes image data of a still picture or a video obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. Thedisplay unit 606 may include adisplay panel 6061, and thedisplay panel 6061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. Theuser input unit 607 includes atouch panel 6071 andother input devices 6072. Atouch panel 6071, also referred to as a touch screen. Thetouch panel 6071 may include two parts of a touch detection device and a touch controller.Other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. Thememory 609 may be used to store software programs as well as various data including, but not limited to, application programs and an operating system. Theprocessor 610 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into theprocessor 610.

The embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the above video processing method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.