and E_priorDescribing three-dimensional joint point prior knowledge, and ensuring that the difference between the solved three-dimensional joint point position and a common action is not too large:

where N is 2, representing two image capturing devices, K is 11, representing 11 two-dimensional key points, K is_nIs a camera reference matrix, P_nThe reference point is the external reference moment of the camera, the array,

the coordinates of the kth three-dimensional joint point of the parameterized three-dimensional model of the human body,

for the detected first two-dimensional joint point coordinates,

the second two-dimensional joint coordinates of the kth three-dimensional joint of the three-dimensional model corresponding to a common (or average) posture (specifically, which posture is selected, which embodiment is not limited). w is a_kRepresenting the Kth joint point, W, in the image W_n,kIt indicates the kth joint point in the image W captured by the nth image capturing apparatus. Solving by the above disclosureThe desired three-dimensional joint coordinates can be obtained by minimizing the parameter theta of the target E. It should be noted that, the parameterized human body three-dimensional model

Is controlled by the parameter theta, and further determines the three-dimensional position coordinates corresponding to the human joints. The embodiment is not limited to the specific form, and a common example is that an SMPL (Skinned Multi-Person Linear) model describes a human three-dimensional mesh M (θ, β, r), where θ is a vector with a length of 72 and represents an axis angle of rotation of 24 joints in the model; beta is a length-10 vector representing the blendshape coefficient; r ═ r _ x, r _ y, r _ z]Representing the three-dimensional coordinates of the model root node in the world coordinate system.

Step 1012 is performed to acquire lower body motion information of the captured object based on the upper body motion information of the captured object and a preset motion detection model.

In the present embodiment, a motion detection model is obtained by training a neural network based on the sample image motion data, and lower body motion information of the object to be captured is obtained from the upper body motion information of the object to be captured and the motion detection model.

Step 1014 generates whole-body motion data of the object to be captured based on the upper-body motion information and the lower-body motion information of the object to be captured.

In the present embodiment, the whole body motion data of the object to be captured is obtained by fusing the upper body motion information corresponding to the captured upper body image of the object to be captured and the lower body motion information corresponding to the captured lower body image of the object to be captured that is not captured.

In the above embodiment, the image data is acquired by the image acquisition device with two viewing angles, so that the capturing precision of the motion image is improved, and the spatial constraint problem of image acquisition in the three-dimensional animation production is broken through by acquiring only the upper body image of the captured object and generating the complete whole body motion data of the captured object, and the method has strong practicability.

It should be understood that although the various steps in the flowcharts of fig. 1-10 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-10 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps or stages.

FIG. 11 is a block diagram illustrating an animation capture device, according to an example embodiment. Referring to fig. 11, the apparatus includes an upper body motioninformation acquisition module 1102, a lower body motioninformation acquisition module 1104, and a whole body motiondata generation module 1106.

An upper body motioninformation acquisition module 1102 configured to execute acquiring upper body motion information of the captured object in the world coordinate system based on an upper body image of the captured object acquired by the image acquisition device;

a lower body motioninformation acquiring module 1104 configured to execute acquiring lower body motion information of the captured object according to the upper body motion information and a preset motion detection model, wherein the lower body motion information of the captured object includes first three-dimensional joint point coordinates of the lower body of the captured object, and the motion detection model is obtained by training a neural network through collected sample image motion data, and the sample image motion data is labeled with upper body motion sample information, root node coordinates and orientation of a sample object and the lower body motion sample information of the sample object;

a whole-body motiondata generation module 1106 configured to execute generating whole-body motion data of the captured object according to the upper-body motion information and the lower-body motion information of the captured object.

In an exemplary embodiment, the upper body image has a plurality of frame images; the lower body motion information acquisition module is configured to perform: inputting the upper body motion information of the captured object in the current frame image and the lower body motion information of the captured object in the previous frame image of the current frame into the motion detection model, and obtaining the lower body motion information of the captured object in the current frame image.

In an exemplary embodiment, the lower body motion information acquiring module further includes: a sample image action data acquiring unit configured to execute acquiring sample image action data, wherein the sample image action data comprises a plurality of continuous frame sample images, each frame sample image is marked with sample object upper body action sample information, root node coordinates, orientation and lower body action sample information, the sample object upper body action sample information comprises a first three-dimensional sample coordinate of the sample object upper body, and the sample object lower body action sample information comprises a second three-dimensional sample coordinate of the sample object lower body; a network prediction unit configured to perform, for an arbitrary frame sample image, inputting, to the neural network, upper body motion sample information, root node coordinates, an orientation, and lower body sample motion information of a sample object labeled in a previous frame sample image adjacent to the frame sample image, of the sample object labeled in the frame sample image, and obtaining lower body motion information of the arbitrary frame sample image output by the neural network; a loss value determination unit configured to determine a loss value according to the lower body motion information of the arbitrary frame sample image and the lower body sample motion information labeled in the arbitrary frame sample image; a network training unit configured to perform training of the neural network according to the loss value, resulting in the motion detection model.

In an exemplary embodiment, the apparatus further comprises a single-view image capturing device or a dual-view image capturing apparatus: the single-view image capture device configured to capture an upper body image of the captured object; the double-view image acquisition device is arranged in the following way and acquires the upper body image of the captured object: the double-view-angle image acquisition device comprises a first image acquisition device and a second image acquisition device, wherein a connecting line of the first image acquisition device and the second image acquisition device in the double-view-angle image acquisition device is a first edge of a placing angle, the first image acquisition device is placed at the first placing angle towards the captured object based on the first edge, the second image acquisition device is placed at a second placing angle towards the captured object based on the first edge, and the distance between the first image acquisition device and the second image acquisition device is set to be a preset distance.

In an exemplary embodiment, the apparatus further includes a dual view image acquisition apparatus calibration module configured to perform: acquiring action images of the calibration object synchronously acquired by the first image acquisition equipment and the second image acquisition equipment; respectively identifying motion images at two visual angles, and acquiring position information of a preset joint point in the motion images at the corresponding visual angles; mapping the position information of a preset joint point in the action image to a coordinate system of the image acquisition equipment at a corresponding visual angle to obtain an initial three-dimensional coordinate of the action image at the corresponding visual angle; and acquiring external parameter initial values of the first image acquisition device and the second image acquisition device by aligning the initial three-dimensional coordinates of the action images under two visual angles so as to finish the calibration of the first image acquisition device and the second image acquisition device.

In an exemplary embodiment, the upper body motion information acquiring module includes: an upper body image acquiring unit configured to perform acquisition of an upper body image of the captured subject synchronously acquired by the first image capturing device and the second image capturing device; the image data identification unit is configured to respectively identify the upper body images at two visual angles, and acquire first two-dimensional joint point coordinates in the upper body images at the corresponding visual angles, wherein the first two-dimensional joint point coordinates are two-dimensional joint point coordinates of the upper body of a captured object in the upper body images; the projection unit is configured to perform perspective projection based on a perspective projection method, project the first two-dimensional joint point coordinate to a three-dimensional space where a corresponding image acquisition equipment coordinate system is located, and obtain a second three-dimensional joint point coordinate under a corresponding view angle, wherein the second three-dimensional joint point coordinate is a three-dimensional coordinate of the first two-dimensional joint point coordinate under the image acquisition equipment coordinate system corresponding to the corresponding view angle; and the upper body motion information acquisition unit is configured to execute joint optimization solving according to a conversion relation between the image acquisition device coordinate system and the world coordinate system at two preset visual angles and the second three-dimensional joint point coordinate to obtain the upper body motion information of the captured object at the world coordinate system.

In an exemplary embodiment, the upper body motion information acquiring unit is configured to perform: acquiring a predicted three-dimensional model of a captured object in a world coordinate system according to a conversion relation between a coordinate system of image acquisition equipment and the world coordinate system under two preset visual angles and the second three-dimensional joint point coordinate; projecting the predicted three-dimensional model to image acquisition equipment coordinate systems under two visual angles respectively based on a preset joint point to obtain second two-dimensional joint point coordinates of the preset joint point in the predicted three-dimensional model under the two visual angles respectively; determining a target three-dimensional model of the captured object according to the distances between the second two-dimensional joint point coordinates under the two visual angles and the first two-dimensional joint point coordinates under the corresponding visual angles respectively; and acquiring target three-dimensional coordinates of a preset joint point in the target three-dimensional model corresponding to the world coordinate system, and determining the target three-dimensional coordinates as the upper body motion information of the captured object.

In an exemplary embodiment, the preset joint points include a vertex node, a shoulder node, an elbow node, a wrist node, a crotch node, and a trunk root node.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 12 is a block diagram illustrating an electronic device Z00 for motion capture in accordance with an exemplary embodiment. For example, electronic device Z00 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and so forth.

Referring to fig. 12, electronic device Z00 may include one or more of the following components: a processing component Z02, a memory Z04, a power component Z06, a multimedia component Z08, an audio component Z10, an interface for input/output (I/O) Z12, a sensor component Z14 and a communication component Z16.

The processing component Z02 generally controls the overall operation of the electronic device Z00, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component Z02 may include one or more processors Z20 to execute instructions to perform all or part of the steps of the method described above. Further, the processing component Z02 may include one or more modules that facilitate interaction between the processing component Z02 and other components. For example, the processing component Z02 may include a multimedia module to facilitate interaction between the multimedia component Z08 and the processing component Z02.

The memory Z04 is configured to store various types of data to support operations at the electronic device Z00. Examples of such data include instructions for any application or method operating on electronic device Z00, contact data, phonebook data, messages, pictures, videos, and the like. The memory Z04 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component Z06 provides power to the various components of the electronic device Z00. The power component Z06 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device Z00.

The multimedia component Z08 comprises a screen providing an output interface between the electronic device Z00 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component Z08 includes a front facing camera and/or a rear facing camera. When the electronic device Z00 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component Z10 is configured to output and/or input an audio signal. For example, the audio component Z10 includes a Microphone (MIC) configured to receive external audio signals when the electronic device Z00 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory Z04 or transmitted via the communication component Z16. In some embodiments, the audio component Z10 further includes a speaker for outputting audio signals.

The I/O interface Z12 provides an interface between the processing component Z02 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly Z14 includes one or more sensors for providing status assessment of various aspects to the electronic device Z00. For example, the sensor assembly Z14 may detect the open/closed state of the electronic device Z00, the relative positioning of the components, such as the display and keypad of the electronic device Z00, the sensor assembly Z14 may also detect a change in the position of one component of the electronic device Z00 or the electronic device Z00, the presence or absence of user contact with the electronic device Z00, the orientation or acceleration/deceleration of the electronic device Z00, and a change in the temperature of the electronic device Z00. The sensor assembly Z14 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly Z14 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly Z14 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component Z16 is configured to facilitate wired or wireless communication between the electronic device Z00 and other devices. The electronic device Z00 may have access to a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component Z16 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component Z16 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device Z00 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, for example a memory Z04 comprising instructions executable by the processor Z20 of the electronic device Z00 to perform the above method. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A motion capture method, comprising:

2. The method according to claim 1, wherein the upper body image has a plurality of frame images; the acquiring the motion information of the lower body of the captured object according to the motion information of the upper body and a preset motion detection model comprises:

inputting the upper body motion information of the captured object in the current frame image and the lower body motion information of the captured object in the previous frame image of the current frame into the motion detection model, and obtaining the lower body motion information of the captured object in the current frame image.

3. The method of claim 2, wherein the motion detection model is obtained by the following training method:

acquiring sample image action data, wherein the sample image action data comprises a plurality of continuous frame sample images, each frame sample image is marked with upper body action sample information, root node coordinates, orientation and lower body action sample information of a sample object, the upper body action sample information comprises first three-dimensional sample coordinates of the upper body of the sample object, and the lower body action sample information comprises second three-dimensional sample coordinates of the lower body of the sample object;

for any frame sample image, inputting the upper body motion sample information, the root node coordinates and the orientation of the sample object marked in the frame sample image and the lower body sample motion information of the sample object marked in the previous frame sample image adjacent to the frame sample image into the neural network to obtain the lower body motion information of the any frame sample image output by the neural network;

determining a loss value according to the lower body motion information of the arbitrary frame sample image and the lower body sample motion information labeled in the arbitrary frame sample image;

and training the neural network according to the loss value to obtain the action detection model.

4. The method of claim 1, wherein the image-based acquisition of the upper body image of the captured subject by the image acquisition device comprises:

acquiring an upper body image of the captured object by adopting a single-view image acquisition device; or,

acquiring an upper body image of the captured object by adopting a double-view-angle image acquisition device which is arranged in the following way:

the double-view-angle image acquisition device comprises a first image acquisition device and a second image acquisition device, wherein a connecting line of the first image acquisition device and the second image acquisition device in the double-view-angle image acquisition device is a first edge of a placing angle, the first image acquisition device is placed at the first placing angle towards the captured object based on the first edge, the second image acquisition device is placed at a second placing angle towards the captured object based on the first edge, and the distance between the first image acquisition device and the second image acquisition device is set to be a preset distance.

5. The method of claim 4, further comprising, prior to acquiring the upper body image of the captured object with a dual-view image acquisition device:

acquiring action images of the calibration object synchronously acquired by the first image acquisition equipment and the second image acquisition equipment;

respectively identifying motion images at two visual angles, and acquiring position information of a preset joint point in the motion images at the corresponding visual angles;

mapping the position information of a preset joint point in the action image to a coordinate system of the image acquisition equipment at a corresponding visual angle to obtain an initial three-dimensional coordinate of the action image at the corresponding visual angle;

and acquiring external parameter initial values of the first image acquisition device and the second image acquisition device by aligning the initial three-dimensional coordinates of the action images under two visual angles so as to finish the calibration of the first image acquisition device and the second image acquisition device.

6. The method of claim 4, wherein the obtaining upper body motion information of the captured object in the world coordinate system comprises:

acquiring an upper body image of the captured object synchronously acquired by the first image acquisition device and the second image acquisition device;

respectively identifying upper body images at two visual angles, and acquiring first two-dimensional joint point coordinates in the upper body images at the corresponding visual angles, wherein the first two-dimensional joint point coordinates are two-dimensional joint point coordinates of the upper body of a captured object in the upper body images;

based on a perspective projection method, projecting the first two-dimensional joint point coordinate to a three-dimensional space where a corresponding image acquisition equipment coordinate system is located to obtain a second three-dimensional joint point coordinate under a corresponding visual angle, wherein the second three-dimensional joint point coordinate is a three-dimensional coordinate of the first two-dimensional joint point coordinate under the image acquisition equipment coordinate system corresponding to the corresponding visual angle;

and performing joint optimization solution according to a conversion relation between the coordinate system of the image acquisition equipment and the world coordinate system under two preset visual angles and the second three-dimensional joint point coordinate to obtain the upper body motion information of the captured object under the world coordinate system.

7. A motion capture device, comprising:

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the motion capture method of any of claims 1 to 6.

9. A computer-readable storage medium whose instructions, when executed by a processor of an electronic device, enable the electronic device to perform the motion capture method of any of claims 1-6.

10. A computer program product comprising a computer program stored in a readable storage medium, from which at least one processor of a device reads and executes the computer program, causing the device to perform the motion capture method of any of claims 1 to 6.