Movatterモバイル変換


[0]ホーム

URL:


CN112165635A - Video conversion method, device, system and storage medium - Google Patents

Video conversion method, device, system and storage medium
Download PDF

Info

Publication number
CN112165635A
CN112165635ACN202011086867.5ACN202011086867ACN112165635ACN 112165635 ACN112165635 ACN 112165635ACN 202011086867 ACN202011086867 ACN 202011086867ACN 112165635 ACN112165635 ACN 112165635A
Authority
CN
China
Prior art keywords
video
information
cropping
frame
user interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011086867.5A
Other languages
Chinese (zh)
Inventor
宋玉岩
徐宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co LtdfiledCriticalBeijing Dajia Internet Information Technology Co Ltd
Priority to CN202011086867.5ApriorityCriticalpatent/CN112165635A/en
Publication of CN112165635ApublicationCriticalpatent/CN112165635A/en
Priority to PCT/CN2021/106338prioritypatent/WO2022077977A1/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

The present disclosure provides a video conversion method, apparatus, system and storage medium. The video conversion method may include the steps of: acquiring a first video with a first orientation and cutting information used for converting the first video into a second video with a second orientation; generating and displaying a user interface for adjusting the cutting information based on the cutting information; receiving user input for adjusting the cropping information via a user interface; and generating a second video according to the adjusted cropping information.

Description

Video conversion method, device, system and storage medium
Technical Field
The present disclosure relates to the field of video processing technologies, and in particular, to a method, an apparatus, a system, and a storage medium for video conversion.
Background
Currently, most video and film works employ wide aspect ratios (i.e., landscape) such as 4:3, 16:9 during the capture process. Video or similar media recorded with a wide aspect ratio may be designed to be viewed on a table top or landscape orientation. Therefore, when a user watches a landscape video using a mobile terminal, the terminal screen is generally switched to the landscape position to play the video in order to obtain a good visual experience.
However, more and more users, particularly mobile phone users, are more accustomed to viewing high aspect ratio (i.e., portrait) video. Vertically oriented media has become a popular format for viewing and displaying media in many applications. The common solution is to narrow down the horizontal screen video to the vertical screen for watching according to the horizontal screen aspect ratio, so that large unused screen areas exist up and down the video, and the picture size becomes smaller, which causes poor visual experience of users.
Disclosure of Invention
The present disclosure provides a video conversion method, apparatus, system and storage medium to at least solve a problem that a user cannot further adjust a cut region after video cutting conversion.
According to a first aspect of embodiments of the present disclosure, there is provided a video conversion method, which may include: acquiring a first video with a first orientation and cutting information used for converting the first video into a second video with a second orientation; generating and displaying a user interface for adjusting the cutting information based on the cutting information; receiving user input for adjusting the cropping information via a user interface; and generating a second video according to the adjusted cropping information.
Optionally, the cropping information may include a cropping window for cropping the first video into the second video.
Alternatively, the step of generating and displaying a user interface for adjusting the cutting information based on the cutting information may include: for a frame of a first video, a cropping window for cropping the frame into a corresponding frame of a second video is displayed on the frame.
Optionally, the video conversion method may include: determining at least one key frame of the first video, wherein generating and displaying a user interface for adjusting cropping information may comprise: generating and displaying a user interface for adjusting cropping information of each of the at least one key frame.
Optionally, the step of generating the second video according to the adjusted cropping information may include: the method comprises the steps of adaptively adjusting a cutting window of a first video according to adjusted cutting information; and cropping the first video by using the adaptively adjusted cropping window to obtain a second video.
Optionally, the step of obtaining cropping information of the first video converted into the second video in the second orientation may include: analyzing each frame of the first video to determine at least one type of information for each frame; generating and displaying another user interface for adjusting the weight of the at least one information at the time of video orientation conversion for each frame based on the analysis result; receiving, through the other user interface, a user input for adjusting a weight of the at least one information; generating clipping information based on the at least one information whose weight is adjusted.
Optionally, the step of generating the clipping information based on the at least one information whose weight is adjusted may include: generating an annotation graph of the corresponding frame based on the at least one information of which the weight is adjusted; obtaining the focus of the corresponding frame by calculating the moment of the label graph; a cropping window is generated from the focus and a specified aspect ratio. The labeled graph is an information distribution graph.
According to a second aspect of the embodiments of the present disclosure, there is provided a video conversion apparatus, which may include: an interface module configured to receive a first video in a first orientation; an analysis module configured to obtain cropping information for converting the first video to a second video in a second orientation, and to generate and display a user interface for adjusting the cropping information based on the cropping information; a display module configured to display the user interface, wherein a user input to adjust the cropping information is received via the user interface; and an editing module configured to generate a second video according to the adjusted cropping information.
Optionally, the cropping information may include a cropping window for cropping the first video into the second video.
Optionally, for a frame of the first video, the analysis module causes a cropping window to be displayed over the frame for cropping the frame into a corresponding frame of the second video.
Optionally, the analysis module may be configured to determine at least one key frame of the first video, and generate and display a user interface for adjusting cropping information of each of the at least one key frame.
Optionally, the editing module may be configured to adaptively adjust a cropping window of the first video according to the adjusted cropping information, and crop the first video with the adaptively adjusted cropping window to obtain the second video.
Optionally, the analysis module may be configured to analyze each frame of the first video to determine at least one information of each frame, generate and display another user interface for each frame for adjusting a weight of the at least one information at the time of the video orientation transition based on a result of the analysis, receive a user input for adjusting the weight of the at least one information through the another user interface, and generate the cropping information based on the at least one information whose weight is adjusted.
Optionally, the analysis module may be configured to generate an annotation graph for the respective frame based on the at least one information whose weight is adjusted, obtain a focus of the respective frame by calculating a moment of the annotation graph, and generate the cropping window according to the focus and the specified aspect ratio.
According to a third aspect of embodiments of the present disclosure, there is provided a video conversion apparatus, which may include: a display; a transceiver for receiving a first video in a first orientation; and a processor for: the method includes acquiring cropping information for converting a first video to a second video in a second orientation, generating and displaying a user interface for adjusting the cropping information based on the cropping information, controlling a display to display the user interface, controlling a transceiver to receive user input for adjusting the cropping information via the user interface, and generating the second video according to the adjusted cropping information.
Optionally, the cropping information may include a cropping window for cropping the first video into the second video.
Alternatively, for a frame of the first video, the processor may cause a cropping window to be displayed over the frame for cropping the frame into a corresponding frame of the second video.
Optionally, the processor may determine at least one key frame of the first video and generate and display a user interface for adjusting cropping information of each of the at least one key frame.
Optionally, the processor may adaptively adjust a cropping window of the first video according to the adjusted cropping information, and crop the first video with the adaptively adjusted cropping window to obtain the second video.
Alternatively, the processor may analyze each frame of the first video to determine at least one information of each frame, generate and display another user interface for each frame to adjust a weight of the at least one information at the time of video orientation conversion based on a result of the analysis, receive a user input for adjusting the weight of the at least one information through the another user interface, and generate the cropping information based on the at least one information whose weight is adjusted.
Alternatively, the processor may generate an annotation map for the respective frame based on the at least one information whose weight is adjusted, obtain a focus of the respective frame by calculating a moment of the annotation map, and generate the cropping window according to the focus and the specified aspect ratio.
According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic apparatus, which may include: at least one processor; at least one memory storing computer executable instructions, wherein the computer executable instructions, when executed by the at least one processor, cause the at least one processor to perform the video conversion method as described above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a video conversion method as described above.
According to a sixth aspect of embodiments of the present disclosure, there is provided a computer program product, instructions of which are executed by at least one processor in an electronic device to perform the video conversion method as described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
after the video is cut, the cutting information is presented to the user through setting the user interface, so that the user can further cut and adjust the final cutting result through the user interface, and a more satisfactory cutting effect of the user is achieved.
In addition, the proportion of each information flow in the converted video result can be adjusted according to the requirement of a user by setting a user interface, so that important information defined by the user is kept in the cutting processing, and the cutting effect expected by the user is achieved.
In addition, the distribution situation of each frame of key information is more highlighted by calculating the focus of each frame of image, better shearing information can be provided by fitting the track of each frame of focus, the conformity between frames is increased, and the user experience is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a diagram of an application environment for transitioning a video from one orientation to another provided in accordance with an embodiment of the present disclosure;
FIG. 2 is a flow chart of a video conversion method according to an embodiment of the present disclosure;
FIG. 3 is a diagram of a user interface for adjusting a cropping window, according to an embodiment of the present disclosure;
FIG. 4 is a schematic flow chart illustrating the process of obtaining cropping window information for a single frame according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a labeling area according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a user interface for adjusting information weights according to an embodiment of the present disclosure;
fig. 7 is a block diagram of a video conversion device according to an embodiment of the present disclosure;
fig. 8 is a flow chart of a video conversion method according to another embodiment of the present disclosure;
fig. 9 is a block diagram of a video conversion device according to an embodiment of the present disclosure;
fig. 10 is a block diagram of an electronic device according to an embodiment of the disclosure.
Throughout the drawings, it should be noted that the same reference numerals are used to designate the same or similar elements, features and structures.
Detailed Description
The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of the embodiments of the disclosure as defined by the claims and their equivalents. Various specific details are included to aid understanding, but these are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.
The video cutting in the related art is fully automatically realized, the automatically cut video may not achieve the expected cutting effect of the user, but the user cannot make further cutting adjustment on the final cutting result. In addition, in video auto-cropping, the user cannot adjust the importance of each stream in the video scene. This results in that the cut-out video scene may also not meet the user's expectations.
The method and the device can provide the functions of parameter adjustment before video cutting processing and adjustment of the cut area after processing for users, so that the users can obtain the satisfied video cutting results.
Hereinafter, according to various embodiments of the present disclosure, a method, an apparatus, and a system of the present disclosure will be described in detail with reference to the accompanying drawings.
FIG. 1 is a diagram of an application environment for transitioning a video from one orientation to another provided in accordance with an embodiment of the present disclosure. In the present disclosure, the orientation is transverse or vertical with respect to the device/apparatus.
Referring to fig. 1, theapplication environment 100 includes a terminal 110 and amedia server system 120.
The terminal 110 is a terminal where a user is located, and the terminal 110 may be at least one of a smart phone, a tablet computer, a portable computer, a desktop computer, and the like. Although the present embodiment shows only oneterminal 110 for illustration, those skilled in the art will appreciate that the number of the terminals may be two or more. The number of terminals and the type of the device are not limited in any way in the embodiments of the present disclosure.
The terminal 110 may be installed with a target application for providing the video to be cut and converted to themedia server system 120, and the target application may be a multimedia-type application, a social-type application, an information-type application, or the like. For example, the terminal 110 may be a terminal used by a user, and an account of the user is registered in an application running in theterminal 110.
The terminal 110 may be connected to themedia server system 120 through a wireless network or a wired network, so that data interaction between the terminal 110 and themedia server system 120 is possible. For example, the network may comprise a Local Area Network (LAN), a Wide Area Network (WAN), a telephone network, a wireless link, an intranet, the Internet, a combination thereof, or the like.
Themedia server system 120 may be a server system for clip converting video. For example, themedia server system 120 may include one or more processors and memory. The memory may include one or more programs for performing the above video conversion method. Themedia server system 120 may also include a power component configured to perform power management of themedia server system 120, a wired or wireless network interface configured to connect themedia server system 120 to a network, and an input output (I/O) interface. Themedia server system 120 may operate based on an operating system stored in memory, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc. However, the devices included in the above-describedmedia server system 120 are merely exemplary, and the present disclosure is not limited thereto.
Themedia server system 120 may cut and convert the input video, and then send the converted video to the terminal 110 or distribute the video to the media platform via a wireless network or a wired network.
Further, themedia server system 120 may obtain cropping information for converting the first video to a second video in a second orientation, generate and display a user interface for adjusting the cropping information based on the cropping information, receive user input for adjusting the cropping information via the user interface, and then adjust the previously cropped video again according to the adjusted cropping information.
Optionally, the terminal 110 may be installed with an application program implementing the video conversion method of the present disclosure, and the terminal 110 may implement cut conversion on the video. For example, the memory of the terminal 110 may store one or more programs for executing the above video conversion method. The processor of the terminal 110 may implement the cut-to-convert of the video by running the relevant programs/algorithms. The terminal 110 may then upload the cut converted video to themedia server system 120 via a wireless network or a wired network, or may store the converted video in a memory of the terminal 110.
As an example, the terminal 110 may transmit a horizontal video acquired locally or externally to themedia server system 120 via a wireless or wired network, and themedia server system 120 may cut the horizontal video into a vertical video according to the video conversion method of the present disclosure, and then send the converted vertical video to the terminal 110 via the wireless or wired network.
As another example, the terminal 110 may convert a landscape video acquired locally or externally into a portrait screen according to the video conversion method of the present disclosure, and then upload the portrait video to themedia server system 120 via a wireless or wired network. Themedia server system 120 may distribute the vertical video to other electronic devices.
Although the embodiment illustrates the conversion of a landscape video to a portrait video, the disclosed method may be similarly employed to convert a landscape video to a landscape video by cropping the portrait video.
Fig. 2 is a flow chart of a video conversion method according to an embodiment of the present disclosure. The video conversion method of the embodiment of the present disclosure may be performed by themedia server system 120 or an electronic device having a video clip conversion function.
In step S201, a first video in a first orientation and cropping information for converting the first video into a second video in a second orientation are acquired. The cropping information may include a cropping window for cropping the first video into the second video. Here, the first video of the first orientation may refer to a landscape video.
A video intelligent cropping tool, such as Google Autoflip, may be used to directly obtain cropping information for converting a video in one orientation to a video in another orientation. That is, cropping information for cropping the first video may be obtained from an associated video smart cropping tool.
According to an embodiment of the present disclosure, the cutting information may be acquired by: analyzing each frame of the first video to determine at least one type of information of each frame, generating an annotation graph of the corresponding frame based on the at least one type of information, obtaining a focus of the corresponding frame by calculating a moment of the annotation graph, using the focus as a center of a cropping window for cropping the frame, and generating the cropping window according to the focus and the specified aspect ratio.
According to another embodiment of the present disclosure, the cutting information may be acquired by: analyzing each frame of the first video to determine at least one kind of information of each frame, generating and displaying a user interface for adjusting the weight of the at least one kind of information at the time of video orientation conversion for each frame based on the analysis result, receiving a user input for adjusting the weight of the at least one kind of information through the user interface, generating a label graph of the corresponding frame based on the at least one kind of information with the adjusted weight, obtaining a focus of the corresponding frame by calculating a moment of the label graph, regarding the focus as the center of a cutting window for cutting the frame, and generating the cutting window according to the focus and the designated aspect ratio.
According to the embodiment of the disclosure, before the cropping information is acquired, a user interface can be set, so that the proportion of each information stream in the converted video result can be adjusted according to the requirement of the user, and important information defined by the user is kept in the cropping processing.
In addition, the distribution situation of each frame of key information is more highlighted by calculating the focus of each frame of image, better shearing information can be provided by fitting the track of each frame of focus, the conformity between frames is increased, and the user experience is improved.
However, the above-described acquisition of the cut information is merely exemplary, and the present disclosure is not limited thereto.
In the present disclosure, the obtained cropping information may be result information after cropping the video. Alternatively, the cropping information may be cropping window information calculated at the time of information analysis for the video frame. That is, the acquired cropping information may be information after the video cropping processing or may be pre-analysis information before the video cropping processing.
In step S202, a user interface for adjusting the cutting information is generated and displayed based on the cutting information. In the user interface, for a frame of the first video, a cropping window for cropping the frame into a corresponding frame of the second video may be displayed on the frame. For example, refer to fig. 3.
In step S203, a user input for adjusting the cropping information is received via the user interface. Here, the user input may be one of a touch input, a key input, a hover input, and the like. Different types of user input may be implemented depending on the capabilities of the display device.
In step S204, a second video is generated from the adjusted cropping information. The cropping window of the first video can be adaptively adjusted according to the adjusted cropping information, and then the first video is cropped by using the adaptively adjusted cropping window to obtain the second video. Through adaptive adjustment, better cropping information can be provided, and the conformity between frames is increased.
In one possible implementation, at least one key frame of the first video may be determined, and then a user interface for adjusting cropping information of each key frame of the at least one key frame is generated and displayed. After adjusting the cropping window of the key frame of the first video, the cropping window of the relevant frame of the first video can be automatically adjusted in a self-adaptive manner, and after the whole video is adjusted, the user can derive the cropped video.
According to the embodiment of the disclosure, the user is allowed to have more comprehensive understanding on the whole cutting processing flow before and after the video cutting processing, and finally obtains the cutting result satisfied by the user.
In addition, the video conversion method can better process video scene switching, user-specified area change or lost scenes.
FIG. 3 is a diagram of a user interface to adjust a cropping window, according to an embodiment of the present disclosure. The user interface of fig. 3 may be displayed on a partial area of a display such as a terminal or a server, or displayed on the display in a full screen.
According to the embodiment of the present disclosure, after the automatic cropping process, the cropping information of each frame can be provided to the user, and the cropping information is reflected in the user interface for adjusting the cropping window at a later stage.
Referring to fig. 3, a user may make an adjustment of a cropping window for a frame through theuser interface 301. The user can move the cutting window up, down, left and right to adjust to the area concerned by the user. When theuser interface 301 is displayed on the touch screen, the user may touch the cutout window to move accordingly. Or the cropping window may be dragged to the area of interest by a mouse, keyboard, or the like. However, the above examples are merely exemplary, and the present disclosure is not limited thereto.
The user may selectively adjust the cropping window for some frames. For example, the user may select a frame of interest to the user by dragging a slider of the video in theuser interface 301 and then adjust the cropping window of the frame. Alternatively, a "next frame" button (not shown) may be provided on theuser interface 301, and the adjustment interface for the next frame may be switched by clicking the "next frame" button after the user has adjusted the cropping window of the current frame.
In addition, after the entire video adjustment is completed, the user can export the cut video by clicking an export button (not shown) on the user interface. The above button examples are only exemplary, and buttons with different functions can be arranged on the user interface according to actual requirements.
Alternatively, a cropping window for each key frame may be displayed on theuser interface 301, so that the user adjusts the cropping window for the key frames of the video, and after adjusting the cropping window for the key frames, the user may export the adjusted video by clicking an "export" button on the user interface.
The user interface according to the embodiment of the disclosure is simple, the user operation is easy, and the efficiency of adjusting the cutting information by the user is improved.
Fig. 4 is a schematic flowchart of acquiring cropping window information of a single frame according to an embodiment of the present disclosure.
Referring to FIG. 4, after image 401 is acquired, image 401 is analyzed to determine M types of information for image 401, M being a positive integer. For each information analysis, a corresponding analysis method may be used, that is, the image 401 may be analyzed using M analysis methods to determine M information. For example, the face information of the image 401 may be analyzed using a face analysis method.
By analyzing the M types of information, M corresponding labeled regions can be generated, that is, each analyzed type of information will generate an information distribution map corresponding to the image 401. For example, when analyzing the face information, a pixel-based labeled region of the face information of the image 401 is generated, and then the pixel-based labeled region is converted into a labeled region with information distribution.
The user can respectively give weights to the M marking areas according to the requirement of the user so as to highlight the concerned parts of the user. For example, if the emphasis is to protect the face part from being cut off, the weighting ratio of the labeled region of the face information can be increased and the weighting ratio of the labeled region of other information can be decreased.
The entire annotation region of the image 401 is calculated from the weighted M annotation regions. For example, the entire annotation region of the image 401 can be obtained by summing the weighted M annotation regions.
After obtaining the overall annotation region, an annotation graph for the image 401 can be generated based on the overall annotation region. Since weighting processing is performed on each label area in the past, the label graph can show the importance of each label area.
The focus of the image 401 is obtained by computing the moments of the annotation map. For example, the focus of the image 401 can be obtained by calculating the geometric center point of the annotation map. A cropping window is generated using the position of the focal point and the specified aspect ratio.
However, the above example is merely exemplary, and cropping window information for converting a video in one orientation to a video in another orientation may also be obtained from a video intelligent cropping tool, such as Google Autoflip. After obtaining cropping information from other cropping tools or software, cropping information such as the center position, size, aspect ratio, and the like of the cropping window of each frame can be obtained in a similar manner as described above.
FIG. 5 is a schematic diagram of a labeling area according to an embodiment of the present disclosure.
Referring to fig. 5, (a) of fig. 5 is a frame of the first video, and (b) of fig. 5 shows a labeled area of important information (e.g., motion information) in the frame, and a white area in (b) is the labeled area. However, the above examples are merely exemplary, and the present disclosure is not limited thereto.
FIG. 6 is a schematic diagram of a user interface for adjusting information weights according to an embodiment of the present disclosure. After analyzing the various information of a frame, a user interface associated with the various information may be displayed accordingly.
Referring to fig. 6, in theuser interface 601, one slider bar may be configured for each type of information (such as first information, second information, etc.), and the slider bar may be used to adjust the weight of the corresponding information. For example, the range of the slider bar may be set to [0, 1 ]. After setting the respective weights for each type of information, the "ok" button is clicked to complete the setting of the weights for the respective information streams in one frame. For example, upon clicking the "OK" button, the weight information entered by the user may be transmitted to a processor of the electronic device for subsequent cut transitions. Or the corresponding cropping window may be presented on the corresponding frame after clicking the "ok" button to show the user the cropping position of the cropping window on the frame.
However, the user interface of FIG. 6 is merely exemplary, and elements in the user interface may be presented in other forms.
Alternatively, one text input box may be configured for each kind of information, and the user may give a weight to the corresponding information through the text input box. However, the above examples are merely exemplary, and the present disclosure is not limited thereto.
The user interface may be displayed on a display portion area of an electronic device (such as the terminal 110 or the media server system 120), or may be displayed on a display in a full screen, and those skilled in the art may perform display setting according to actual needs.
According to embodiments of the present disclosure, a user is allowed to adjust the information stream weight of each frame before a video cropping process, so that user-defined important information is preserved in the cropping process.
Fig. 7 is a block diagram of a video conversion device according to an embodiment of the present disclosure. Thevideo conversion device 700 may be implemented as a terminal 110 or as amedia server system 120, or any other device.
Referring to fig. 7, thevideo conversion apparatus 700 may include a transceiver 701, a display 702, and a processor 703.
The transceiver 701 may receive a first video in a first orientation.
The processor 703 may use a video intelligent cropping tool, such as a Google Autoflip, to obtain cropping window information for converting a video of one orientation to a video of another orientation. Alternatively, the processor may use an algorithm (e.g., the method shown in fig. 4) for obtaining cropping information of an embodiment of the present disclosure to obtain cropping information for converting a first video to a second video in a second orientation.
The processor 703 may generate and display a user interface for adjusting the cutting information based on the cutting information, and control the display 702 to display the user interface. For example, the user interface shown in FIG. 3 may be displayed.
The user interface may include graphics, text, icons, videos, and any combination thereof associated with the analysis information. When the display 702 is a touch screen display, the display 702 also has the ability to capture touch signals on or over the surface of the display 702. The touch signal may be input to the processor 703 as a control signal for processing. At this point, the display 702 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 702 may be one, disposed on the front panel of thevideo conversion device 700; in other embodiments, the displays 702 may be at least two, each disposed on a different surface of thevideo conversion device 700 or in a folded design; in still other embodiments, the display 702 may be a flexible display screen disposed on a curved surface or on a folded surface of thevideo conversion device 700. The Display 702 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like. However, the above examples are merely exemplary, and the present disclosure is not limited thereto.
The processor 703 may control the transceiver 701 to receive a user input for adjusting the cropping information via the user interface, and after the cropping window is adjusted, the processor 703 may automatically make the cropping window adjustment for the relevant frame to ensure the degree of engagement between the frames.
As an example, the processor 703 may adaptively adjust a cropping window of the first video according to the adjusted cropping information, and then crop the first video with the adaptively adjusted cropping window to obtain the second video. After the final second video is obtained, the second video may be output to other devices via the transceiver 701.
By setting the adjustment options of the cut window of the cut video frame, the user can further adjust the final cut result.
According to the embodiment of the disclosure, the function of adjusting the cutting area after the video cutting processing can be provided for the user, and the parameter adjustment before the video cutting processing can be provided for the user, so that the user can obtain the satisfactory cutting result.
The processor 703 may analyze each frame of the first video to determine at least one information of each frame, and generate a user interface for each frame for adjusting a weight of the at least one information at the time of the video orientation conversion based on a result of the analysis. For example, the user interface shown in FIG. 6 may be displayed.
The processor 703 may control the transceiver 701 to receive, through the user interface of fig. 6, a user input for adjusting a weight of at least one information of each frame, and generate the cropping window information for each frame based on the at least one information of which the weight is adjusted. After generating the cropping window information for each frame, the processor 703 may generate a user interface from the cropping window information to visually display to the user how each frame was cropped.
In one possible implementation, the processor 703 may generate, based on the analysis of the at least one information, respective labeled regions of the respective frames corresponding to the at least one information, the labeled regions being regions representing distribution of the information, wherein the respective labeled regions of the respective frames are given a weight input by the user.
In one possible implementation, for each frame of the first video, the processor 703 may calculate an overall labeled region of the corresponding frame according to the respective labeled region whose weight is adjusted, calculate a focus of the corresponding frame based on the overall labeled region, and generate a cropping window of the corresponding frame based on the focus and a specified aspect ratio. Further, the size of the cropping window may be set in advance, or may be adaptively adjusted.
In one possible implementation, the processor 703 may obtain a fitted focus for each frame by fitting the focus of the respective frame, and then generate a cropping window for the respective frame based on the fitted focus and the specified aspect ratio.
In one possible implementation, the processor may generate an annotation map for the respective frame based on the entire annotation region and obtain the focus of the respective frame by computing a moment of the annotation map.
In some embodiments, thevideo conversion device 700 may include a memory that may store the raw input video and the converted video. Further, the memory may include one or more computer-readable storage media, which may be non-transitory. The memory may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory is used to store at least one instruction for execution by the processor 703.
In some embodiments, thevideo conversion apparatus 700 may further include: a peripheral interface and at least one peripheral. The processor 703 and the peripheral interface may be connected by a bus or signal line. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. In particular, the peripheral device may include at least one of a radio frequency circuit, a touch display screen, a camera, an audio circuit, a positioning component, a power supply, and the like.
In some embodiments, thevideo conversion device 700 may also include one or more sensors. The one or more sensors include, but are not limited to, acceleration sensors, gyroscope sensors, pressure sensors, fingerprint sensors, optical sensors, and proximity sensors. For example, the processor 703 may receive an indication of a change in orientation from one or more sensors, recommending a video of the respective orientation to the user.
Fig. 8 is a flowchart of a video conversion method according to another embodiment of the present disclosure.
Referring to fig. 8, in step S801, a first video of a first orientation is acquired. For example, the first video of the first orientation may be a landscape video.
At step S802, at least one type of information of each frame of the first video is analyzed.
Here, the at least one type of information of each frame may include key region information, and for example, may include at least one of face information, body information, main object information, moving scene information, video boundary information, and the like. The face information may include face recognition information, face tracking information, and the like, and the main object information may include object recognition information, object tracking information, and the like. However, the above examples are merely exemplary, and the present disclosure may analyze any number and kind of information in one frame.
Analysis of the information contained within the frames may be accomplished by pre-storing analysis algorithms for the primary information, key information, or information of interest to the user. For example, face recognition algorithms may be used to analyze face information in a frame, and optical flow algorithms may be used to analyze motion scene information in a frame. However, the above examples are merely exemplary, and the present disclosure is not limited thereto.
In step S803, each labeled region corresponding to at least one type of information of the corresponding frame is generated based on the analysis of the at least one type of information. Here, the label area may refer to an area indicating information distribution. For a frame, the frame may include various information, and each analysis of one information in the frame may generate an information distribution map corresponding to the frame, and accordingly, if the various information in the frame is analyzed, a plurality of labeled regions may be generated.
As an example, when analyzing the face information in a frame, a pixel-based labeled region (mask) corresponding to the face information of the frame may be generated, and then the pixel-based labeled region may be converted into a labeled region with information distribution.
In step S804, a user interface for adjusting the weight occupied by each labeled area in video cropping is generated based on the analysis result and displayed. The user interface may include a slider bar or text entry box for adjusting the weight for each of the at least one information.
After analyzing the information contained within a frame, a user interface may be generated for the frame, which may include a user interface for adjusting the weight of the information contained in the frame. For example, the user interface may include a slider bar or a text entry box for adjusting each type of information. However, the above examples are merely exemplary, and the present disclosure is not limited thereto.
In step S805, a user input for adjusting the weight of each annotation region is received through the user interface. Each labeled region of the corresponding frame may be given a weight input by the user. The user can set the weight of the information to be reserved through the user interface according to the requirement of the user. For example, if the user wants to focus on protecting the face part from being cut off, the user may increase the weighting ratio of the labeled region of the face information and decrease the weighting ratio of the labeled region of other information. The user can interactively adjust the weighting parameters. By weighting the various labeled regions, information/regions that are of greater interest to the user can be highlighted.
Here, each kind of information corresponds to one kind of information labeling area, and weighting each kind of information may be interpreted as weighting the information labeling area.
By setting the user interface for each frame, the user can realize the weight of each information flow in one frame in the subsequent cutting conversion operation.
In step S806, for each frame of the first video, the entire annotation region of the corresponding frame is calculated according to the respective annotation region whose weight is adjusted. For example, the weighted regions can be summed to obtain the overall labeled region for a frame.
In step S807, an annotation map for the corresponding frame is generated based on the entire annotation region. Here, the annotation graph may be an information distribution image for each annotation region.
In step S808, the focal point of the corresponding frame is obtained by calculating the moment of the annotation graph. Here, the focus may reflect the distribution of important information in one frame. For example, the geometric center point of the annotation graph can be computed as the focus of a frame.
In step S809, a cropping window for the respective frame is generated based on the focus and the specified aspect ratio. For example, after the focus of one frame is obtained, the focus is taken as the center of the cropping window, and the layout and size of the cropping window are set in accordance with a specified aspect ratio. Here, the aspect ratio of the second video may be taken as the specified aspect ratio, although the present disclosure is not limited thereto.
In one possible implementation, the fitted focus of each frame may be obtained by fitting the focus of the respective frame, and the cropping window of the respective frame is generated based on the fitted focus and the specified aspect ratio. The method achieves a smoother cutting effect between frames by fitting the cutting area of the current scene according to the focuses of a series of frames of the current scene.
In step S810, cropping information for converting the first video into a second video of a second orientation is acquired. For example, after the cropping window information of each frame is obtained according to steps S802 to S809, the cropping window information of all frames is obtained for further adjustment of the cropping window in the following.
In step S811, a user interface for adjusting the cut information is generated and displayed based on the cut information. In the user interface, for a frame of the first video, a cropping window for cropping the frame into a corresponding frame of the second video may be displayed on the frame. For example, refer to fig. 3.
In step S812, a user input for adjusting the cropping information is received via the user interface.
In step S813, the cropping window of the first video may be adaptively adjusted according to the adjusted cropping information. For example, after the user further adjusts the video frame, the further adjusted clipping window may be fitted to make the finally presented video more fluent.
In step S814, the first video is cropped using the adaptively adjusted cropping window to obtain a further adjusted second video.
According to the embodiment of the disclosure, the functions of parameter adjustment before video cutting processing and cutting area adjustment after processing can be provided for a user, so that the user can more comprehensively grasp the whole cutting processing flow before and after the video cutting processing, and finally obtain a satisfactory cutting result.
Fig. 9 is a block diagram of a video conversion device according to an embodiment of the present disclosure.
Referring to fig. 9, thevideo conversion apparatus 900 may include an interface module 901, an analysis module 902, a display module 903, and an editing module 904. Each module in thevideo conversion apparatus 900 may be implemented by one or more modules, and names of the corresponding modules may vary according to types of the modules. In various embodiments, some modules in thevideo conversion apparatus 900 may be omitted, or additional modules may be further included. Furthermore, modules/elements according to various embodiments of the present disclosure may be combined to form a single entity, and thus the functions of the respective modules/elements may be equivalently performed prior to the combination.
The interface module 901 may be configured to receive a first video in a first orientation and a user input.
The analysis module 902 may be configured to analyze each frame of the first video to determine at least one information for each frame, and generate a user interface for each frame to adjust a weight of the at least one information at the time of the video orientation transition based on a result of the analysis.
In one possible implementation, the at least one information may include key zone information.
In one possible implementation, the key area information may include at least one of face information, human body information, important object information, motion scene information, and video boundary information.
The display module 903 may be configured to display a user interface for adjusting the weight of at least one type of information.
In one possible implementation, the user interface may include a user interface for adjusting the weights for each of the at least one information.
The editing module 904 may be configured to generate cropping window information to crop the first video based on the at least one information with the adjusted weight, and generate a second video in a second orientation based on the cropped first video.
In one possible implementation, the analysis module 902 may generate, based on the analysis of the at least one information, respective labeled regions of the respective frames corresponding to the at least one information, the labeled regions being regions representing a distribution of the information, wherein the respective labeled regions of the respective frames are given a weight input by the user.
In one possible implementation, for each frame of the first video, the editing module 904 may calculate an overall annotation region of the corresponding frame according to the respective annotation region whose weight is adjusted; a focus of the respective frame is calculated based on the global labeling area, and a cropping window of the respective frame is generated based on the focus and the specified aspect ratio.
In one possible implementation, the editing module 904 may obtain a fitted focus of each frame by fitting the focus of the respective frame, and generate a cropping window of the respective frame based on the fitted focus and the specified aspect ratio.
In one possible implementation, the editing module 904 can generate a label graph for the corresponding frame based on the entire label region and obtain a focus of the corresponding frame by calculating a moment of the label graph.
In addition, thevideo conversion apparatus 900 may provide the user with a function of adjusting the cropping zone after the video cropping process, so that the user can obtain a cropping result that they have satisfied.
The analysis module 902 may obtain cropping information for converting the first video to a second video in a second orientation and generate and display a user interface for adjusting the cropping information based on the cropping information. User input for adjusting the cropping information may be received via the user interface.
In one possible implementation, the cropping information may include a cropping window for cropping the first video into the second video.
In one possible implementation, for a frame of the first video, analysis module 902 may cause a cropping window to be displayed over the frame for cropping the frame into a corresponding frame of the second video.
As an example, after the weight of each piece of information of each frame is adjusted, the video may be cut according to the adjusted information, and then, the cut information after the previous cutting process may be presented to the user again, so that the user may perform the adjustment of the cutting window on the cut video again. Alternatively, after the weight of each piece of information of each frame is adjusted, the video is not cut, but the cut information generated according to the adjusted information is presented to the user through the user interface, and the user can adjust the cut window from the whole and then perform the cutting process by using the finally adjusted cut window.
In one possible implementation, the analysis module 902 may determine at least one key frame of the first video and generate and display a user interface for adjusting cropping information of each of the at least one key frame.
In one possible implementation, the editing module 904 may adaptively adjust a cropping window of the first video according to the adjusted cropping information, and crop the first video with the adaptively adjusted cropping window to obtain the second video.
The video conversion apparatus of this embodiment, the implementation principle and technical effect of implementing video conversion by using the above modules are the same as those of the related method embodiments, and details of the related method embodiments may be referred to, and are not repeated herein.
According to an embodiment of the present disclosure, an electronic device may be provided. Fig. 10 is a block diagram of an electronic device according to an embodiment of the disclosure, theelectronic device 1000 including at least one memory 1002 and at least one processor 1001, the at least one memory 1002 having stored therein a set of computer-executable instructions that, when executed by the at least one processor 1001, perform a video conversion method according to an embodiment of the disclosure.
By way of example, theelectronic device 1000 may be a PC computer, tablet device, personal digital assistant, smartphone, or other device capable of executing the set of instructions described above. Theelectronic device 1000 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. Theelectronic device 1000 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In theelectronic device 1000, the processor 1001 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processor 1001 may also include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, or the like.
The processor 1001 may execute instructions or code stored in a memory, where the memory may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.
The memory 1002 may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the memory may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device that may be used by a database system. The memory and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the memory.
In addition, theelectronic device 1000 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of theelectronic device 1000 may be connected to each other via a bus and/or a network.
According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a video conversion method according to the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disk memory, Hard Disk Drive (HDD), solid-state disk (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or an extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a magnetic tape, a magneto-optical, Hard disk, solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and so on, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
According to an embodiment of the present disclosure, there may also be provided a computer program product, in which instructions are executable by a processor of a computer device to perform the above-mentioned video conversion method.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

CN202011086867.5A2020-10-122020-10-12Video conversion method, device, system and storage mediumPendingCN112165635A (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
CN202011086867.5ACN112165635A (en)2020-10-122020-10-12Video conversion method, device, system and storage medium
PCT/CN2021/106338WO2022077977A1 (en)2020-10-122021-07-14Video conversion method and video conversion apparatus

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011086867.5ACN112165635A (en)2020-10-122020-10-12Video conversion method, device, system and storage medium

Publications (1)

Publication NumberPublication Date
CN112165635Atrue CN112165635A (en)2021-01-01

Family

ID=73868175

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011086867.5APendingCN112165635A (en)2020-10-122020-10-12Video conversion method, device, system and storage medium

Country Status (2)

CountryLink
CN (1)CN112165635A (en)
WO (1)WO2022077977A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2022077995A1 (en)*2020-10-122022-04-21北京达佳互联信息技术有限公司Video conversion method and video conversion device
WO2022077977A1 (en)*2020-10-122022-04-21北京达佳互联信息技术有限公司Video conversion method and video conversion apparatus
CN115291779A (en)*2021-04-192022-11-04华为技术有限公司 A window control method and device thereof

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104020928A (en)*2014-06-092014-09-03联想(北京)有限公司Image display method and device
CN105898566A (en)*2016-04-292016-08-24乐视控股(北京)有限公司Video content presenting switching method and device, and mobile play terminal
CN106454407A (en)*2016-10-252017-02-22广州华多网络科技有限公司Video live broadcast method and device
WO2017096220A1 (en)*2015-12-042017-06-08Livestream LLCVideo stream encoding system with live crop editing and recording
CN107197372A (en)*2017-06-302017-09-22北京金山安全软件有限公司Method and device for shearing batch vertical screen videos and electronic equipment
CN109089157A (en)*2018-06-152018-12-25广州华多网络科技有限公司Method of cutting out, display equipment and the device of video pictures
CN110298380A (en)*2019-05-222019-10-01北京达佳互联信息技术有限公司Image processing method, device and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8869217B2 (en)*2011-05-122014-10-21Cable Television Laboratories, Inc.Media files delivery system and method
CN112165635A (en)*2020-10-122021-01-01北京达佳互联信息技术有限公司Video conversion method, device, system and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104020928A (en)*2014-06-092014-09-03联想(北京)有限公司Image display method and device
WO2017096220A1 (en)*2015-12-042017-06-08Livestream LLCVideo stream encoding system with live crop editing and recording
CN105898566A (en)*2016-04-292016-08-24乐视控股(北京)有限公司Video content presenting switching method and device, and mobile play terminal
CN106454407A (en)*2016-10-252017-02-22广州华多网络科技有限公司Video live broadcast method and device
CN107197372A (en)*2017-06-302017-09-22北京金山安全软件有限公司Method and device for shearing batch vertical screen videos and electronic equipment
CN109089157A (en)*2018-06-152018-12-25广州华多网络科技有限公司Method of cutting out, display equipment and the device of video pictures
CN110298380A (en)*2019-05-222019-10-01北京达佳互联信息技术有限公司Image processing method, device and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2022077995A1 (en)*2020-10-122022-04-21北京达佳互联信息技术有限公司Video conversion method and video conversion device
WO2022077977A1 (en)*2020-10-122022-04-21北京达佳互联信息技术有限公司Video conversion method and video conversion apparatus
CN115291779A (en)*2021-04-192022-11-04华为技术有限公司 A window control method and device thereof

Also Published As

Publication numberPublication date
WO2022077977A1 (en)2022-04-21

Similar Documents

PublicationPublication DateTitle
US11481978B2 (en)Redundant tracking system
EP3844598B1 (en)Video clip object tracking
US11501499B2 (en)Virtual surface modification
US11586345B2 (en)Method and apparatus for interaction control of display page
US20220156956A1 (en)Active image depth prediction
US11606532B2 (en)Video reformatting system
US20200134792A1 (en)Real time tone mapping of high dynamic range image data at time of playback on a lower dynamic range display
CN109947972A (en)Reduced graph generating method and device, electronic equipment, storage medium
CN112165635A (en)Video conversion method, device, system and storage medium
EP2843625A1 (en)Method for synthesizing images and electronic device thereof
KR102797027B1 (en) Video generating method and device, electronic device, and computer readable medium
EP3201878A1 (en)Optimizing a visual display of media
US20200007948A1 (en)Video subtitle display method and apparatus
CN115017340A (en)Multimedia resource generation method and device, electronic equipment and storage medium
CN112218160A (en)Video conversion method and device, video conversion equipment and storage medium
CN113852756B (en)Image acquisition method, device, equipment and storage medium
US20140016914A1 (en)Editing apparatus, editing method, program and storage medium
US20250111695A1 (en)Template-Based Behaviors in Machine Learning
US11665312B1 (en)Video reformatting recommendation
US20180174281A1 (en)Visual enhancement and cognitive assistance system
US11356731B1 (en)Method and electronic device for displaying video
HK40013870A (en)Interactive processing method, device, equipment and client in vehicle shooting
CN113570693A (en) Method, device, device and storage medium for perspective transformation of three-dimensional model

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20210101


[8]ページ先頭

©2009-2025 Movatter.jp