Disclosure of Invention
The embodiment of the application aims to provide a video processing method, a video processing device, electronic equipment and a storage medium, which are used for reasonably editing video during video processing.
In one aspect, an embodiment of the present application provides a method for video processing, including:
receiving a video sent by a video acquisition device;
identifying and cutting the video to obtain a plurality of video clips;
Screening out video clips meeting set theme conditions from the video clips;
And synthesizing the video clips meeting the set theme conditions into a target video.
In one embodiment, identifying and cropping a video to obtain a plurality of video clips includes:
If the number of the video acquisition devices is multiple, each video is respectively identified and cut, and multiple video clips are obtained.
In one embodiment, identifying and cropping a video to obtain a plurality of video clips includes:
Performing face recognition and clipping on the video to respectively obtain video clips corresponding to each appointed person;
or carrying out event identification and cutting on the video to respectively obtain video clips corresponding to each appointed event;
Or carrying out animal identification and cutting on the video to respectively obtain video fragments corresponding to each appointed animal.
In one embodiment, synthesizing a video clip meeting a set theme condition into a target video includes:
if the set theme conditions are multiple, synthesizing corresponding target videos based on the video clips corresponding to each set theme condition.
In one embodiment, the method further comprises:
if the target videos are multiple, playing the target video selected by the user according to the video review instruction of the user.
In one aspect, an embodiment of the present application provides a method for video processing, including:
Responding to a page view instruction of a user for videos, displaying a video display interface, wherein at least one video category and target videos corresponding to the video category are displayed in the video display interface;
and responding to the playing operation of the user for any target video, and playing the target video selected by the user.
In one embodiment, the video category is determined based on at least one of a person, an animal, and an event.
In one aspect, an embodiment of the present application provides a video processing apparatus, including:
the receiving unit is used for receiving the video sent by the video acquisition device;
the obtaining unit is used for identifying and cutting the video to obtain a plurality of video clips;
The screening unit is used for screening out video clips meeting the set theme conditions from the video clips;
and the synthesizing unit is used for synthesizing the video clips meeting the set theme conditions into a target video.
In one embodiment, the obtaining unit is configured to:
If the number of the video acquisition devices is multiple, each video is respectively identified and cut, and multiple video clips are obtained.
In one embodiment, the obtaining unit is configured to:
Performing face recognition and clipping on the video to respectively obtain video clips corresponding to each appointed person;
or carrying out event identification and cutting on the video to respectively obtain video clips corresponding to each appointed event;
Or carrying out animal identification and cutting on the video to respectively obtain video fragments corresponding to each appointed animal.
In one embodiment, the synthesis unit is configured to:
if the set theme conditions are multiple, synthesizing corresponding target videos based on the video clips corresponding to each set theme condition.
In one embodiment, the synthesis unit is further configured to:
if the target videos are multiple, playing the target video selected by the user according to the video review instruction of the user.
In one aspect, an embodiment of the present application provides a video processing apparatus, including:
The system comprises a display unit, a video display interface, a target video display unit and a display unit, wherein the display unit is used for responding to a page view instruction of a user for videos, and displaying at least one video category and a target video corresponding to the video category in the video display interface;
and the playing unit is used for responding to the playing operation of the user for any target video and playing the target video selected by the user.
In one embodiment, the video category is determined based on at least one of a person, an animal, and an event.
In one aspect, an embodiment of the present application provides an electronic device, including:
Processor, and
A memory storing computer instructions for causing a processor to perform the steps of the method provided in various alternative implementations of video processing as described above.
In one aspect, embodiments of the present application provide a storage medium storing computer instructions for causing a computer to perform the steps of a method as provided in various alternative implementations of any of the video processing described above.
The video processing method comprises the steps of receiving videos sent by a video acquisition device, identifying and cutting the videos to obtain a plurality of video clips, screening out video clips meeting set theme conditions from the video clips, and synthesizing the video clips meeting the set theme conditions into a target video. Therefore, a plurality of video clips of the same theme are automatically integrated, reasonable editing of the video is achieved, and complex operation of video review of a subsequent user is simplified.
Detailed Description
The following description of the embodiments of the present application will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the application are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to fall within the scope of the present application. In addition, the technical features of the different embodiments of the present application described below may be combined with each other as long as they do not collide with each other.
Some of the terms involved in the embodiments of the present application will be described first to facilitate understanding by those skilled in the art.
The terminal device may be a mobile terminal, a fixed terminal or a portable terminal, such as a mobile handset, a site, a unit, a device, a multimedia computer, a multimedia tablet, an internet node, a communicator, a desktop computer, a laptop computer, a notebook computer, a netbook computer, a tablet computer, a personal communication system device, a personal navigation device, a personal digital assistants, an audio/video player, a digital camera/camcorder, a positioning device, a television receiver, a radio broadcast receiver, an electronic book device, a game device, or any combination thereof, including the accessories and peripherals of these devices or any combination thereof. It is also contemplated that the terminal device can support any type of interface (e.g., wearable device) for the user, etc.
The server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, basic cloud computing services such as big data and artificial intelligent platforms and the like.
The technical idea of the present application will be described below.
In a video review scene, a server generally acquires a photographed video through a video acquisition device, and intercepts video clips of a plurality of key events from the video, so that a user can quickly search and review the video clips of the key events.
However, if there are multiple video clips meeting the requirement of the same user, the user usually needs to click on each video clip separately, which causes the user to have complicated video processing operations.
For example, the server may collect a video in the user's home through the user's home camera, and perform person recognition and cropping on the video to obtain a plurality of video clips corresponding to the elderly and a plurality of video clips corresponding to the children. If the user needs to review all the video clips of the child, the user needs to click on each video clip corresponding to the child respectively, the operation steps are complicated, and the user experience is poor.
Further, if there are multiple video capturing devices, a corresponding video list is generally established for each video clip of each video capturing device, so that a user needs to switch the video list to view the corresponding video clip, the operation steps are more complicated, and the user experience is worse.
Therefore, a scheme capable of automatically and reasonably editing video is needed to simplify the complicated operation of the video review of the subsequent user and improve the user experience.
Based on the defects of the related art, the embodiment of the application provides a video processing method, a device, electronic equipment and a storage medium, which aim to reasonably clip videos during video processing.
The embodiment of the application provides a video processing method, which can be applied to electronic equipment, the type of the electronic equipment is not limited, and the method can be any equipment type suitable for implementation, such as terminal equipment, a server and the like, and the application is not repeated.
Referring to fig. 1, an exemplary architecture diagram of a video processing system is shown, where the system includes a server, a camera device, and a terminal device. Optionally, the server may be a cloud server or a local server, the camera device may be at least one, and in fig. 1, a plurality of camera devices are illustrated as examples, and the terminal device may be a user mobile phone.
The camera device is used for collecting video, converting the video into video data of digital signal type and transmitting the video data to the server through a wired network or a wireless network. Alternatively, network transmission technologies may include Wi-Fi, bluetooth, 4G/5G, and so on.
The server is used for processing and analyzing the video data transmitted by the camera device by using a computer vision technology, cutting and integrating the video according to the analysis result, for example, face recognition, behavior recognition, anomaly detection and the like can be carried out on the video.
The terminal equipment is provided with a video client which is used for displaying a video list corresponding to the acquired video of the camera device, namely a list of target videos generated after editing, to a user through an Internet (web) interface or a mobile application program interface. The user can perform video inquiry and video review through the video client, and can manage and control the video acquisition angle, video acquisition time, video switch and the like through the video client.
Referring to fig. 2, a flowchart of a video processing method according to an embodiment of the present application is applied to the server in fig. 1, and the method is described below with reference to fig. 1, and the specific implementation flow of the method is as follows:
step 200, receiving the video sent by the video acquisition device.
Optionally, the number of the video capturing devices may be one or more, and if the number of the video capturing devices is more than one, a plurality of videos may be obtained. In one embodiment, videos respectively transmitted by at least one video acquisition device corresponding to a target user are received.
Further, a correspondence between the user and the video capturing device may be pre-established, so that each video capturing device corresponding to the target user may be determined according to the correspondence. In this way, each video of the target user can be obtained.
Step 201, identifying and cropping the video to obtain a plurality of video clips.
In one embodiment, if there are multiple video capturing devices, each video is identified and cut out separately to obtain multiple video clips.
Wherein, can adopt at least one of the following ways, discern and tailor for each video separately:
The first mode is to conduct face recognition and cutting on the video to obtain video clips corresponding to each appointed person.
Since a plurality of persons may be included in the video, the designated person may be all persons, and may be one or more persons designated by the user.
If the number of the designated persons is plural, video cropping can be performed for each designated person, so as to obtain at least one video clip corresponding to each designated person.
As one example, face recognition is performed on a video, it is determined that the video contains a child and a stranger, and at least one video clip of the child is cut from the video if the person is designated as the child.
And secondly, carrying out event identification and cutting on the video to respectively obtain video clips corresponding to each appointed event.
Alternatively, the specified event may be all events, may be a specific event, or may be all events of a certain type.
For example, the specified event may be a user coming home, a user going out, a person pressing a doorbell, etc. All events of a certain type may be events such as actions of a set type, abnormal events, and screen changes for a specified person. The abnormal event can be the event that the old or the child falls down, a stranger rings down to the doorbell, explodes and the like, can also be the user distress event determined through voice text recognition of the video, and can also be the user emotion abnormal event determined through voice and action analysis of the video.
In practical application, the specified event and the setting type can be set according to the practical application scene, and the event is not limited herein.
And thirdly, carrying out animal identification and cutting on the video to respectively obtain video fragments corresponding to each appointed animal.
The specified animals can be all animals in the video, and can also be one or more specified animals.
If the number of the specified animals is multiple, each specified animal can be identified, and at least one video segment corresponding to each specified animal can be cut out.
In practical application, the video clipping mode can be set according to the practical application scene, for example, clipping can be performed according to the duration threshold of the video clip set by the user, for example, the clipping rule of the video clip can be updated according to the video review content and the review frequency of the user, and the clipping rule is not limited herein.
Step 202, screening out video clips meeting the set theme conditions from the video clips.
In one embodiment, the labels of the video clips can be set according to the identification result of the video clips, and the video clips meeting the set theme conditions can be screened according to the labels of the video clips.
Further, if the set theme conditions are multiple, each video clip meeting each set theme condition can be screened out.
Optionally, the theme setting condition may be selecting a video clip of a specified person, selecting a video clip of a specified pet, or selecting a video clip of a specified event.
The set theme condition may also be set according to the tag and time.
It should be noted that the video clips corresponding to different set theme conditions may be the same or different. In practical application, the set theme conditions can be set according to the practical application scene, for example, the set theme bars can be set according to the labels and the events, for example, the set theme conditions meeting the requirements of the user can be set in a personalized manner according to the mode of setting by the user, and the method is not limited.
Thus, each video clip can be filtered and divided.
And 203, synthesizing the video clips meeting the set theme conditions into a target video.
In one embodiment, if the set theme conditions are multiple, the corresponding target video may be synthesized based on the video segments corresponding to each set theme condition.
Thus, the video clips of different time periods can be combined to obtain one or more longer target videos.
Further, if the target videos are multiple, the target video selected by the user can be played according to the video review instruction of the user.
Furthermore, the target videos can be classified, and the video categories of the target videos can be determined.
Optionally, the video category is determined based on at least one of a person, an animal, and an event. In practical application, the video category may be set according to a practical application scenario, which is not described herein in detail.
Further, the target video can be displayed and played through a video client in the user equipment.
In one embodiment, when step 203 is performed, the following steps may be employed:
and S2031, the video client side responds to a page view instruction of a user for the video to display a video display interface.
The video display interface displays at least one video category and target videos corresponding to the video categories respectively, the target videos are displayed in groups according to the video categories, and the target videos are generated based on video clips meeting set theme conditions.
In one embodiment, a video client responds to a page view instruction of a user for videos, sends a video page acquisition request to a server, receives at least one target video and a video category corresponding to the target video returned by the video page request, divides each target video according to the video category of each target video, displays each video category in a video display interface, and displays corresponding target videos under each video category respectively.
As one example, the target video for each video category may be presented in the form of a video list.
And S2032, the video client plays the target video selected by the user in response to the play operation of the user on any target video.
In one embodiment, the video client sends a video data acquisition request to the server in response to a play operation of a user on any target video, receives video data of the target video selected by the user and returned by the server, and plays the target video selected by the user based on the video data.
The above embodiment will be described with reference to fig. 3, and an exemplary view of a video clip is shown in fig. 3. In fig. 3, a plurality of home cameras collect videos and send the collected videos to a cloud server. The cloud server stores the received videos of all the household cameras, respectively identifies and cuts each video to obtain a plurality of video clips, and screens and synthesizes the video clips to obtain a plurality of target videos with different theme conditions.
Further, the server can also divide the target videos into categories, determine the video category of each target video, and display each video category and the corresponding target video thereof through the video client so as to facilitate the user to view and play the video.
Further, the video client may also filter the target videos according to the filtering conditions (such as time, video category, video name, etc.) input by the user, and display each filtered target video.
Referring to fig. 4, an exemplary view of a video review time interface is shown. Referring to fig. 5, an exemplary view of a video review list interface (i.e., video presentation interface) is shown. A user may select a video review time period through the video review time interface shown in fig. 4, for example, the video review time period is 07.01-07.08, and then, based on the video review time period (filtering condition) selected by the user, each video category and its corresponding target video in the video review time period may be displayed in the video review list interface shown in fig. 5. For example, video categories include events and people/animals. For example, the target videos under the event (video category) are all review videos, character movement videos, and screen change videos. Target videos under people/animals (Video category) are pet micro-recordings (Vlog), son Vlog, and so on.
In the embodiment of the application, after the video is cut, each video segment meeting the same set theme condition is synthesized into a longer target video, so that the automatic cutting of the video is realized, manual screening and one-to-one clicking and viewing of a user are not needed, and the complicated operation of video review of the user is simplified. Further, if a plurality of video acquisition devices exist, videos of all the camera devices are automatically clipped into the same target video, so that the problem that a user needs to switch video lists of all the camera devices is solved, tedious operation of video review of the user is further simplified, and user experience is improved.
Based on the same inventive concept, the embodiment of the present application further provides a video processing apparatus, and because the principle of solving the problem by using the apparatus and the device is similar to that of a video processing method, the implementation of the apparatus may refer to the implementation of the method, and the repetition is omitted. The device can be applied to electronic equipment, the type of the electronic equipment is not limited by the application, and the device can be any equipment type suitable for implementation, such as terminal equipment, a server and the like, and the application is not repeated.
Referring to fig. 6, a block diagram of an apparatus for video processing according to an embodiment of the present application is shown. In some embodiments, an apparatus for video processing according to an example of the present application includes:
A receiving unit 601, configured to receive a video sent by a video acquisition device;
an obtaining unit 602, configured to identify and clip a video to obtain a plurality of video clips;
a screening unit 603, configured to screen out video clips that meet the set theme condition from the video clips;
and a synthesizing unit 604, configured to synthesize the video clips meeting the set theme condition into a target video.
In one embodiment, the obtaining unit 602 is configured to:
If the number of the video acquisition devices is multiple, each video is respectively identified and cut, and multiple video clips are obtained.
In one embodiment, the obtaining unit 602 is configured to:
Performing face recognition and clipping on the video to respectively obtain video clips corresponding to each appointed person;
or carrying out event identification and cutting on the video to respectively obtain video clips corresponding to each appointed event;
Or carrying out animal identification and cutting on the video to respectively obtain video fragments corresponding to each appointed animal.
In one embodiment, the synthesizing unit 604 is configured to:
if the set theme conditions are multiple, synthesizing corresponding target videos based on the video clips corresponding to each set theme condition.
In one embodiment, the synthesizing unit 604 is further configured to:
if the target videos are multiple, playing the target video selected by the user according to the video review instruction of the user.
Referring to fig. 7, a block diagram of another video processing apparatus according to an embodiment of the present application is shown. In some embodiments, an apparatus for video processing according to an example of the present application includes:
in one aspect, an embodiment of the present application provides a video processing apparatus, including:
The display unit 701 is used for responding to a page view instruction of a user for videos, displaying a video display interface, wherein at least one video category and target videos corresponding to the video category are displayed in the video display interface, and the target videos are displayed in groups according to the video category;
And a playing unit 702, configured to play the target video selected by the user in response to a playing operation of the user on any target video.
In one embodiment, the video category is determined based on at least one of a person, an animal, and an event.
The video processing method comprises the steps of receiving videos sent by a video acquisition device, identifying and cutting the videos to obtain a plurality of video clips, screening out video clips meeting set theme conditions from the video clips, and synthesizing the video clips meeting the set theme conditions into a target video. Therefore, a plurality of video clips of the same theme are automatically integrated, reasonable editing of the video is achieved, and complex operation of video review of a subsequent user is simplified.
In an embodiment of the present application, there is provided an electronic device including:
Processor, and
And a memory storing computer instructions for causing the processor to perform the method of any of the embodiments described above.
In an embodiment of the present application, a storage medium is provided, where computer instructions are stored, where the computer instructions are configured to cause a computer to perform a method according to any of the foregoing embodiments. Fig. 7 shows a schematic structural diagram of an electronic device 7000. Referring to fig. 7, the electronic device 7000 includes a processor 7010 and a memory 7020, and optionally a power supply 7030, a display unit 7040, and an input unit 7050.
The processor 7010 is a control center of the electronic device 7000, connects the respective components using various interfaces and lines, and performs various functions of the electronic device 7000 by running or executing software programs and/or data stored in the memory 7020, thereby performing overall monitoring of the electronic device 7000.
In an embodiment of the present application, the processor 7010 executes the steps of the above embodiment when it invokes a computer program stored in the memory 7020.
The processor 7010 may optionally include one or more processing units, and preferably the processor 7010 may integrate an application processor and a modem processor, wherein the application processor primarily handles operating systems, user interfaces, applications, etc., and the modem processor primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 7010. In some embodiments, the processor, memory, may be implemented on a single chip, and in some embodiments, they may be implemented separately on separate chips.
The memory 7020 may mainly include a storage program area that may store an operating system, various applications, and the like, and a storage data area that may store data created according to the use of the electronic device 7000, and the like. In addition, the memory 7020 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device, and the like.
The electronic device 7000 also includes a power supply 7030 (e.g., a battery) for powering the various components, which can be logically connected to the processor 7010 via a power management system to perform functions such as managing charge, discharge, and power consumption via the power management system.
The display unit 7040 may be used to display information input by a user or information provided to the user, various menus of the electronic device 7000, and the like, and in the embodiment of the present application, is mainly used to display a display interface of each application in the electronic device 7000, and objects such as text and pictures displayed in the display interface. The display unit 7040 may include a display panel 7041. The display panel 7041 may be configured in the form of a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), an Organic Light-Emitting Diode (OLED), or the like.
The input unit 7050 may be used to receive information such as numbers or characters input by a user. The input unit 7050 may include a touch panel 7051 and other input devices 7052. Among other things, the touch panel 7051, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 7051 or thereabout using any suitable object or accessory such as a finger, stylus, etc.).
Specifically, the touch panel 7051 may detect a touch operation by a user, detect a signal resulting from the touch operation, convert the signal into a touch point coordinate, transmit the touch point coordinate to the processor 7010, and receive and execute a command transmitted from the processor 7010. In addition, the touch panel 7051 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. Other input devices 7052 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, on-off keys, etc.), a trackball, mouse, joystick, etc.
Of course, the touch panel 7051 may overlay the display panel 7041, and upon detection of a touch operation thereon or thereabout by the touch panel 7051, the touch panel is transferred to the processor 7010 to determine the type of touch event, and the processor 7010 then provides a corresponding visual output on the display panel 7041 in accordance with the type of touch event. Although in fig. 7, the touch panel 7051 and the display panel 7041 are two separate components to implement the input and output functions of the electronic device 7000, in some embodiments, the touch panel 7051 may be integrated with the display panel 7041 to implement the input and output functions of the electronic device 7000.
The electronic device 7000 may also include one or more sensors, such as a pressure sensor, a gravitational acceleration sensor, a proximity light sensor, and the like. Of course, the electronic device 7000 may also include other components such as a camera, as needed in a specific application, and are not shown in fig. 7 and will not be described in detail since these components are not the components that are important in the embodiments of the present application.
It will be appreciated by those skilled in the art that fig. 7 is merely an example of an electronic device and is not meant to be limiting and that more or fewer components than shown may be included or certain components may be combined or different components.
For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.
It should be apparent that the above embodiments are merely examples for clarity of illustration and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the application.