Disclosure of Invention
An exemplary embodiment of the present disclosure is directed to providing an image processing method, apparatus, electronic device, and storage medium, which address at least one of the problems in the related art described above. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided an image processing method, including: acquiring head pose data of a current video frame, wherein the head pose data of the video frame is used for describing the head pose of a target object in the video frame; acquiring body posture data matched with the head posture data of the video frame; and driving the part corresponding to the virtual image according to the head posture data and the body posture data so as to generate an animation frame of the virtual image corresponding to the video frame.
Optionally, the image processing method further includes: a continuous animation frame sequence is formed based on the animation frames corresponding to the video frames and the time sequences of the video frames.
Optionally, the step of acquiring body pose data matched with the head pose data of the video frame comprises: and determining body posture data matched with the head posture data of the video frame according to the mapping relation between the predetermined head posture data and the body posture data.
Optionally, in a case that the head pose data includes an euler angle corresponding to a head pose, the determining, according to a predetermined mapping relationship between the head pose data and the body pose data, the body pose data matched with the head pose data of the video frame includes: taking the product of the head pose data of the video frame and a predetermined mapping coefficient vector as body pose data matched with the head pose data of the video frame; wherein the mapping coefficient vector is used for representing the mapping relation between the head posture data and the body posture data.
Optionally, the mapping coefficient vector is predetermined by: acquiring a time sequence of head posture data of the animation sequence sample and a time sequence of skeleton posture data of each skeleton; and performing linear fitting on the time sequence of the acquired head posture data and the time sequence of the skeleton posture data of each skeleton of the body to obtain the mapping coefficient vector.
Optionally, the step of driving the portion corresponding to the avatar according to the head pose data and the body pose data includes: fusing the head attitude data and the head attitude data of a preset animation frame sequence to obtain fused head attitude data; fusing the body posture data and the body posture data of the preset animation frame sequence to obtain fused body posture data; and driving the part corresponding to the virtual image according to the fused head posture data and body posture data.
Optionally, the step of performing fusion processing on the head pose data and the head pose data of the preset animation frame sequence to obtain fused head pose data includes: superposing the head attitude data and the head attitude data of the corresponding animation frame in the preset animation frame sequence to obtain fused head attitude data; and the corresponding animation frame is an animation frame of which the time sequence in the preset animation frame sequence corresponds to the time sequence of the video frame.
Optionally, the step of performing fusion processing on the body posture data and the body posture data of the preset animation frame sequence to obtain fused body posture data includes: superposing the body posture data and the body posture data of the corresponding animation frame in the preset animation frame sequence to obtain fused body posture data; and the corresponding animation frame is an animation frame of which the time sequence in the preset animation frame sequence corresponds to the time sequence of the video frame.
Optionally, the image processing method further includes: constructing a corresponding virtual image according to an object image input or selected by a user; wherein the step of driving the portion corresponding to the avatar according to the head pose data and the body pose data comprises: and driving the parts corresponding to the constructed virtual image according to the head posture data and the body posture data.
Optionally, the body pose data comprises: bone pose data for each of a plurality of body bones.
According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including: a head pose data acquisition unit configured to acquire head pose data of a current video frame, wherein the head pose data of the video frame is used for describing a head pose of a target object in the video frame; a body posture data acquisition unit configured to acquire body posture data matching the head posture data of the video frame; a driving unit configured to drive a portion corresponding to an avatar according to the head pose data and the body pose data to generate an animation frame of the avatar corresponding to the video frame.
Optionally, the drive unit is configured to: a continuous animation frame sequence is formed based on the animation frames corresponding to the video frames and the time sequences of the video frames.
Optionally, the body pose data acquisition unit is configured to: and determining body posture data matched with the head posture data of the video frame according to the mapping relation between the predetermined head posture data and the body posture data.
Optionally, in a case that the head pose data includes euler angles corresponding to head poses, the body pose data acquisition unit is configured to: taking the product of the head pose data of the video frame and a predetermined mapping coefficient vector as body pose data matched with the head pose data of the video frame; wherein the mapping coefficient vector is used for representing the mapping relation between the head posture data and the body posture data.
Optionally, the mapping coefficient vector is predetermined by: acquiring a time sequence of head posture data of the animation sequence sample and a time sequence of skeleton posture data of each skeleton; and performing linear fitting on the time sequence of the acquired head posture data and the time sequence of the skeleton posture data of each skeleton of the body to obtain the mapping coefficient vector.
Optionally, the drive unit is configured to: fusing the head attitude data and the head attitude data of a preset animation frame sequence to obtain fused head attitude data; fusing the body posture data and the body posture data of the preset animation frame sequence to obtain fused body posture data; and driving the part corresponding to the virtual image according to the fused head posture data and body posture data.
Optionally, the drive unit is configured to: superposing the head attitude data and the head attitude data of the corresponding animation frame in the preset animation frame sequence to obtain fused head attitude data; and the corresponding animation frame is an animation frame of which the time sequence in the preset animation frame sequence corresponds to the time sequence of the video frame.
Optionally, the drive unit is configured to: superposing the body posture data and the body posture data of the corresponding animation frame in the preset animation frame sequence to obtain fused body posture data; and the corresponding animation frame is an animation frame of which the time sequence in the preset animation frame sequence corresponds to the time sequence of the video frame.
Optionally, the image processing apparatus further includes: an avatar construction unit configured to construct a corresponding avatar according to an object image input or selected by a user; wherein the drive unit is configured to: and driving the parts corresponding to the constructed virtual image according to the head posture data and the body posture data.
Optionally, the body pose data comprises: bone pose data for each of a plurality of body bones.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the image processing method as described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by at least one processor, cause the at least one processor to perform the image processing method as described above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by at least one processor, implement the image processing method as described above.
According to the image processing method, apparatus, electronic device and storage medium of the exemplary embodiments of the present disclosure, it is possible to use the head pose to reasonably drive the body of the avatar in real time. The method for driving the virtual image has low calculation amount, does not need manual operation of a user, improves the real-time interactivity and improves the user experience.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party.
Fig. 1 illustrates a flowchart of an image processing method according to an exemplary embodiment of the present disclosure.
Referring to fig. 1, in step S101, head pose data of a current video frame is acquired.
Head pose data for a video frame is used to describe a head pose of a target object in the video frame.
As an example, the current video frame may be a human face video frame of the user, and the head pose of the target object in the current video frame may be a human face head pose in the current video frame.
As an example, the current video frame may be a video frame of the user captured in real time, or a video frame in a video sequence that the user has entered.
As an example, the head pose data may include: euler angles corresponding to the head pose. For example, the euler angles may include at least one of: pitch angle alpha, yaw angle beta, roll angle gamma. It should be understood that the head pose may be described in other suitable ways, and the present disclosure is not limited to the specific description of the head pose, i.e., the specific form of the head pose data.
As an example, referring to fig. 2, an euler angle corresponding to the head pose of the target object in the current video frame may be calculated, then the euler angle obtained through calculation is subjected to filtering processing, and the euler angle after smoothing filtering is used as the head pose data of the current video frame. For example, the filtering process may be an exponential moving average process.
In step S102, body pose data matching the head pose data of the video frame is acquired.
In other words, from the head pose data of the current video frame, data of a body pose adapted to the head pose of the target object in the current video frame is determined.
As an example, the body pose data may include: bone pose data for each of a plurality of body bones. For example, the plurality of body bones may be pre-designated.
Body pose data that matches the head pose data of the current video frame may be obtained using any suitable means. As an example, the body pose data that matches the head pose data of the current video frame may be determined according to a predetermined mapping relationship of the head pose data to the body pose data. Body pose data matching the head pose data of the video frame can thus be acquired with low computational effort.
As an example, the product between the head pose data of the current video frame and the predetermined mapping coefficient vector may be taken as the body pose data that matches the head pose data of the current video frame. The mapping coefficient vector is used for representing the mapping relation between the head posture data and the body posture data.
As an example, the mapping coefficient vector may be predetermined by: acquiring a time sequence of head posture data of the animation sequence sample and a time sequence of skeleton posture data of each skeleton; and then, performing linear fitting on the time sequence of the acquired head posture data and the time sequence of the skeleton posture data of each body skeleton to obtain the mapping coefficient vector.
Specifically, referring to fig. 2, the head posture data of each animation frame in the animation sequence sample constitutes a time-series sequence of head posture data at the time series of the animation frames, and the bone posture data of the jth skeleton of each animation frame constitutes a time-series sequence of bone posture data of the jth skeleton at the time series of the animation frames. The head pose data of each animation frame is used to describe the head pose of an avatar template (e.g., a template character) in the animation frame, and the bone pose data of the jth body bone of each animation frame is used to describe the bone pose of the jth body bone of the avatar template in the animation frame.
As an example, a reasonable animation sequence can be made as an animation sequence sample aiming at the virtual image template, and a time sequence H of head posture data of the animation sequence sample is extracted
iAnd a time-series sequence of bone pose data for the jth body bone
And then can be fitted by linear fitting
And obtaining a mapping coefficient vector of the head posture and the body posture, wherein the fitting operation is to learn a reasonable mapping parameter vector from the artificial animation sequence sample to the head posture to the body posture, so that the effect of driving the body posture by the head posture is natural.
In step S103, a portion corresponding to an avatar is driven according to the head pose data and the body pose data to generate an animation frame of the avatar corresponding to the video frame. Specifically, the head of the avatar may be driven according to the head pose data, and the body of the avatar may be driven according to the body pose data. Thereby realizing the synchronous driving of the head and the body of the virtual image.
According to the embodiments of the present disclosure, the head of the avatar may be driven according to the head pose of the target object in the current video frame, and the body of the avatar may be driven in a body pose conforming to the head pose of the target object in the current video frame, i.e., the avatar may be driven to make a body motion conforming to the head pose of the target object in the current video frame, thereby enabling the formation of an animation frame of the avatar.
As an example, the portion corresponding to the avatar may be driven by setting the head pose data of the avatar as the acquired head pose data and setting the body pose data of the avatar as the acquired body pose data.
As an example, the image processing method according to an exemplary embodiment of the present disclosure may further include: a continuous animation frame sequence is formed based on the animation frames corresponding to the video frames and the time sequences of the video frames. Thus, the client can see a live avatar.
As an example, the current video frame may be a video frame captured in real time, or a video frame in a received video. For example, after the driving of the avatar is performed based on the current video frame (e.g., the video frame captured at the current moment) in steps S101-S103, the driving of the avatar may be continued by performing steps S101-S103 with the video frame captured at the next moment as the current video frame, so that the avatar can be continuously driven to form an animation frame sequence of the avatar. Alternatively, steps S101 to S103 may be performed by sequentially using each video frame in the received video as the current video frame, and a continuous animation frame sequence of the avatar may be formed based on the animation frames corresponding to each video frame and the timing of each video frame generated.
As an example, the image processing method according to an exemplary embodiment of the present disclosure may further include: and constructing a corresponding virtual image according to the object image input or selected by the user, namely constructing the virtual image corresponding to the object. For example, the object image may be a character image or a character image, or the like. Accordingly, as an example, a part corresponding to the constructed avatar may be driven according to the head posture data and the body posture data. According to the embodiment, the user can conveniently select the required virtual image, and the user experience is improved.
Fig. 3 illustrates a flowchart of a method of driving a portion corresponding to an avatar according to an exemplary embodiment of the present disclosure.
As shown in fig. 3, in step S201, the head pose data and the head pose data of the preset animation frame sequence are fused to obtain fused head pose data.
As an example, the fusion process may be a superposition process, and it should be understood that other forms of fusion processes are also possible, and the disclosure is not limited thereto.
As an example, the head pose data and the head pose data of the corresponding animation frame in the preset animation frame sequence may be subjected to an overlay process to obtain fused head pose data; and the corresponding animation frame is an animation frame of which the time sequence in the preset animation frame sequence corresponds to the time sequence of the video frame.
In step S202, the body posture data and the body posture data of the preset animation frame sequence are fused to obtain fused body posture data.
As an example, the body posture data and the body posture data of the corresponding animation frame in the preset animation frame sequence may be subjected to superposition processing to obtain fused body posture data; and the corresponding animation frame is an animation frame of which the time sequence in the preset animation frame sequence corresponds to the time sequence of the video frame.
In step S203, the part corresponding to the avatar is driven according to the fused head posture data and body posture data.
As an example, the preset animation frame sequence may be pre-specified or selected by a user.
By way of example, the preset animation frame sequence may be an animation frame sequence related to a respiratory state, and it should be understood that other animation frame sequences may be possible, and the disclosure is not limited thereto. For example, the head pose data for the animated frame sequence of respiratory states may include a time-series sequence of head pose data for head poses of a subject as the subject breathes in the animated frame sequence. For example, the body pose data for the animated frame sequence of respiratory states may include a time-series sequence of body pose data for body poses of the subject as the subject breathes in the animated frame sequence.
For example, a first animation frame in the preset animation frame sequence corresponds to a first video frame in a video where the current video frame is located, a second animation frame in the preset animation frame sequence corresponds to a second video frame in the video where the current video frame is located, and so on.
According to the embodiment of the disclosure, the fusion of the head posture data and the body posture data with the fixed animation frame sequence is supported, so that richer head and body movement effects are obtained, and the user experience is improved.
Fig. 4 illustrates an example of an image processing method according to an exemplary embodiment of the present disclosure.
As shown in FIG. 4, when a user input of a two-dimensional image is received, reconstruction may be performed automatically, for example, an automatic bone-covering and skinning operation may be included to obtain a reconstructed two-dimensional character. With the image processing method according to the foregoing exemplary embodiment, head and body driving can be performed on a two-dimensional character according to a face image input by a user in real time. For example, according to a human face video sequence input by a user, an Euler angle corresponding to a human face posture is calculated in real time, then the real-time Euler angles alpha, beta and gamma are filtered through exponential moving average to obtain a smoother input value, and the head of the virtual image is driven to do corresponding action according to the smoother input value; then, the skeleton posture data of the skeleton of the body is obtained through the mapping coefficient vector and the filtered alpha, beta and gamma, and the body of the virtual image is driven to do corresponding actions according to the skeleton posture data. For the driving, the driving can be performed according to the input human face head pose only.
According to the embodiment of the disclosure, abundant and reasonable head and body movement effects can be obtained by supporting a real-time or fixed animation frame sequence fusion mode according to the head posture; the head and the body of the virtual image can be driven in real time, and the knowledge of drawing and animation is not required for a user, so that the use cost of the user is reduced, and the user experience is improved; because the body posture estimation is carried out only depending on the head posture, only a few parameters and calculation amount are needed, and the method can be effectively operated in real time.
Fig. 5 illustrates a block diagram of a structure of an image processing apparatus according to an exemplary embodiment of the present disclosure.
As shown in fig. 5, theimage processing apparatus 10 according to an exemplary embodiment of the present disclosure includes: a head posturedata acquisition unit 101, a body posturedata acquisition unit 102, and adrive unit 103.
Specifically, the head posedata acquisition unit 101 is configured to acquire head pose data of a current video frame, wherein the head pose data of the video frame is used for describing a head pose of a target object in the video frame.
The body posedata acquisition unit 102 is configured to acquire body pose data that matches the head pose data of the video frame.
The drivingunit 103 is configured to drive a portion corresponding to an avatar according to the head pose data and the body pose data to generate an animation frame of the avatar corresponding to the video frame.
As an example, the drivingunit 103 may be configured to: a continuous animation frame sequence is formed based on the animation frames corresponding to the video frames and the time sequences of the video frames.
As an example, the body posedata acquisition unit 102 may be configured to: and determining body posture data matched with the head posture data of the video frame according to the mapping relation between the predetermined head posture data and the body posture data.
As an example, in case the head pose data may comprise euler angles corresponding to head poses, the body posedata acquisition unit 102 may be configured to: taking the product of the head pose data of the video frame and a predetermined mapping coefficient vector as body pose data matched with the head pose data of the video frame; wherein the mapping coefficient vector is used for representing the mapping relation between the head posture data and the body posture data.
As an example, the mapping coefficient vector may be predetermined by: acquiring a time sequence of head posture data of the animation sequence sample and a time sequence of skeleton posture data of each skeleton; and performing linear fitting on the time sequence of the acquired head posture data and the time sequence of the skeleton posture data of each skeleton of the body to obtain the mapping coefficient vector.
As an example, the drivingunit 103 may be configured to: fusing the head attitude data and the head attitude data of a preset animation frame sequence to obtain fused head attitude data; fusing the body posture data and the body posture data of the preset animation frame sequence to obtain fused body posture data; and driving the part corresponding to the virtual image according to the fused head posture data and body posture data.
As an example, the drivingunit 103 may be configured to: superposing the head attitude data and the head attitude data of the corresponding animation frame in the preset animation frame sequence to obtain fused head attitude data; and the corresponding animation frame is an animation frame of which the time sequence in the preset animation frame sequence corresponds to the time sequence of the video frame.
As an example, the drivingunit 103 may be configured to: superposing the body posture data and the body posture data of the corresponding animation frame in the preset animation frame sequence to obtain fused body posture data; and the corresponding animation frame is an animation frame of which the time sequence in the preset animation frame sequence corresponds to the time sequence of the video frame.
As an example, theimage processing apparatus 10 may further include: an avatar construction unit (not shown) configured to construct a corresponding avatar according to an object image input or selected by a user; wherein thedrive unit 103 may be configured to: and driving the parts corresponding to the constructed virtual image according to the head posture data and the body posture data.
As an example, the body pose data may include: bone pose data for each of a plurality of body bones.
With regard to theimage processing apparatus 10 in the above-described embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
Further, it should be understood that each unit in theimage processing apparatus 10 according to the exemplary embodiment of the present disclosure may be implemented as a hardware component and/or a software component. The individual units may be implemented, for example, using Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs), depending on the processing performed by the individual units as defined by the skilled person.
Fig. 6 illustrates a block diagram of an electronic device according to an exemplary embodiment of the present disclosure. Referring to fig. 6, theelectronic device 20 includes: at least onememory 201 and at least oneprocessor 202, said at least onememory 201 having stored therein a set of computer-executable instructions, which, when executed by the at least oneprocessor 202, performs the image processing method as described in the above exemplary embodiments.
By way of example, theelectronic device 20 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the set of instructions described above. Theelectronic device 20 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. Theelectronic device 20 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In theelectronic device 20, theprocessor 202 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation,processor 202 may also include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, or the like.
Theprocessor 202 may execute instructions or code stored in thememory 201, wherein thememory 201 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.
Memory 201 may be integrated withprocessor 202, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further,memory 201 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 301 and theprocessor 202 may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that theprocessor 202 can read files stored in the memory.
In addition, theelectronic device 20 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of theelectronic device 20 may be connected to each other via a bus and/or a network.
According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the image processing method as described in the above exemplary embodiment. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
According to an exemplary embodiment of the present disclosure, there may also be provided a computer program product in which instructions are executable by at least one processor to perform the image processing method as described in the above exemplary embodiment.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.