Movatterモバイル変換


[0]ホーム

URL:


CN112614214A - Motion capture method, motion capture device, electronic device and storage medium - Google Patents

Motion capture method, motion capture device, electronic device and storage medium
Download PDF

Info

Publication number
CN112614214A
CN112614214ACN202011507523.7ACN202011507523ACN112614214ACN 112614214 ACN112614214 ACN 112614214ACN 202011507523 ACN202011507523 ACN 202011507523ACN 112614214 ACN112614214 ACN 112614214A
Authority
CN
China
Prior art keywords
image
sample
upper body
captured object
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011507523.7A
Other languages
Chinese (zh)
Other versions
CN112614214B (en
Inventor
徐屹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co LtdfiledCriticalBeijing Dajia Internet Information Technology Co Ltd
Priority to CN202011507523.7ApriorityCriticalpatent/CN112614214B/en
Publication of CN112614214ApublicationCriticalpatent/CN112614214A/en
Application grantedgrantedCritical
Publication of CN112614214BpublicationCriticalpatent/CN112614214B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The present disclosure relates to a motion capture method, apparatus, electronic device, and storage medium, the method comprising: the method comprises the steps of acquiring upper body motion information of a captured object in a world coordinate system through an acquired upper body image of the captured object, acquiring lower body motion information of the captured object according to the upper body motion information of the captured object and a preset motion detection model, and generating whole body motion data of the captured object according to the upper body motion information and the lower body motion information of the captured object. By adopting the scheme disclosed by the invention, the whole body motion data of the captured object is generated only by collecting the upper body image of the captured object, and compared with the situation that the whole body image needs to be collected when motion capture is carried out in the traditional technology, the space required by image collection is greatly saved, so that the space constraint of image collection during three-dimensional animation production is broken through.

Description

Motion capture method, motion capture device, electronic device and storage medium
Technical Field
The present disclosure relates to machine vision recognition technologies, and in particular, to a motion capture method and apparatus, an electronic device, and a storage medium.
Background
With the development of deep learning, the human body key point detection technology makes great progress, and based on the progress, a motion capture display method of pure vision (based on a common USB camera, a user does not need to wear any professional equipment) is also increasingly applied to a low-cost 3D animation production process.
However, the field angle of a common camera is limited by the field angle of the camera, and generally does not exceed 90 °, if a whole body motion is to be captured, a sufficiently large motion space is required between the camera and a user, and the distance between the camera and the user is usually 2 to 3 meters. For an ordinary user, the depth of 2 to 3 meters in front of the computer desk of the user is generally difficult to reach, so that the current pure visual motion capture display method is difficult to implement due to large space constraint. Therefore, how to break through space constraints so that a general user can display a whole body 3D motion by using one terminal device is a problem that needs to be solved urgently at present.
Disclosure of Invention
The present disclosure provides a motion capture method, apparatus, electronic device and storage medium, to at least solve the problem in the related art that pure visual motion capture is difficult to implement due to large space constraints. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a motion capture method, comprising:
acquiring upper body motion information of a captured object in a world coordinate system based on an upper body image of the captured object acquired by an image acquisition device;
acquiring lower body motion information of the captured object according to the upper body motion information and a preset motion detection model, wherein the lower body motion information of the captured object comprises first three-dimensional joint point coordinates of the lower body of the captured object, the motion detection model is obtained by training a neural network through collected sample image motion data, and the sample image motion data is marked with upper body motion sample information, root node coordinates and orientation of a sample object and the lower body motion sample information of the sample object;
generating whole body motion data of the captured object based on the upper body motion information and the lower body motion information of the captured object.
In one embodiment, the upper body image has a plurality of frame images; the acquiring the motion information of the lower body of the captured object according to the motion information of the upper body and a preset motion detection model comprises: inputting the upper body motion information of the captured object in the current frame image and the lower body motion information of the captured object in the previous frame image of the current frame into the motion detection model, and obtaining the lower body motion information of the captured object in the current frame image.
In one embodiment, the motion detection model is obtained by the following training method: acquiring sample image action data, wherein the sample image action data comprises a plurality of continuous frame sample images, each frame sample image is marked with upper body action sample information, root node coordinates, orientation and lower body action sample information of a sample object, the upper body action sample information comprises first three-dimensional sample coordinates of the upper body of the sample object, and the lower body action sample information comprises second three-dimensional sample coordinates of the lower body of the sample object; for any frame sample image, inputting the upper body motion sample information, the root node coordinates and the orientation of the sample object marked in the frame sample image and the lower body sample motion information of the sample object marked in the previous frame sample image adjacent to the frame sample image into the neural network to obtain the lower body motion information of the any frame sample image output by the neural network; determining a loss value according to the lower body motion information of the arbitrary frame sample image and the lower body sample motion information labeled in the arbitrary frame sample image; and training the neural network according to the loss value to obtain the action detection model.
In one embodiment, the image-based capture of the upper body image of the captured subject by the image capture device comprises: acquiring an upper body image of the captured object by adopting a single-view image acquisition device; or, acquiring the upper body image of the captured object by adopting a dual-view image acquisition device which is arranged in the following way: the double-view-angle image acquisition device comprises a first image acquisition device and a second image acquisition device, wherein a connecting line of the first image acquisition device and the second image acquisition device in the double-view-angle image acquisition device is a first edge of a placing angle, the first image acquisition device is placed at the first placing angle towards the captured object based on the first edge, the second image acquisition device is placed at a second placing angle towards the captured object based on the first edge, and the distance between the first image acquisition device and the second image acquisition device is set to be a preset distance.
In one embodiment, before acquiring the upper body image of the captured object by using the dual-view image acquisition device, the method further comprises: acquiring action images of the calibration object synchronously acquired by the first image acquisition equipment and the second image acquisition equipment; respectively identifying motion images at two visual angles, and acquiring position information of a preset joint point in the motion images at the corresponding visual angles; mapping the position information of a preset joint point in the action image to a coordinate system of the image acquisition equipment at a corresponding visual angle to obtain an initial three-dimensional coordinate of the action image at the corresponding visual angle; and acquiring external parameter initial values of the first image acquisition device and the second image acquisition device by aligning the initial three-dimensional coordinates of the action images under two visual angles so as to finish the calibration of the first image acquisition device and the second image acquisition device.
In one embodiment, the obtaining upper body motion information of the captured object in the world coordinate system includes: acquiring an upper body image of the captured object synchronously acquired by the first image acquisition device and the second image acquisition device; respectively identifying upper body images at two visual angles, and acquiring first two-dimensional joint point coordinates in the upper body images at the corresponding visual angles, wherein the first two-dimensional joint point coordinates are two-dimensional joint point coordinates of the upper body of a captured object in the upper body images; based on a perspective projection method, projecting the first two-dimensional joint point coordinate to a three-dimensional space where a corresponding image acquisition equipment coordinate system is located to obtain a second three-dimensional joint point coordinate under a corresponding visual angle, wherein the second three-dimensional joint point coordinate is a three-dimensional coordinate of the first two-dimensional joint point coordinate under the image acquisition equipment coordinate system corresponding to the corresponding visual angle; and performing joint optimization solution according to a conversion relation between the coordinate system of the image acquisition equipment and the world coordinate system under two preset visual angles and the second three-dimensional joint point coordinate to obtain the upper body motion information of the captured object under the world coordinate system.
In one embodiment, the performing joint optimization solution according to a transformation relationship between a coordinate system of an image capturing device and a world coordinate system at two preset viewing angles and the second three-dimensional joint coordinates to obtain upper body motion information of a captured object in the world coordinate system includes: acquiring a predicted three-dimensional model of a captured object in a world coordinate system according to a conversion relation between a coordinate system of image acquisition equipment and the world coordinate system under two preset visual angles and the second three-dimensional joint point coordinate; projecting the predicted three-dimensional model to image acquisition equipment coordinate systems under two visual angles respectively based on a preset joint point to obtain second two-dimensional joint point coordinates of the preset joint point in the predicted three-dimensional model under the two visual angles respectively; determining a target three-dimensional model of the captured object according to the distances between the second two-dimensional joint point coordinates under the two visual angles and the first two-dimensional joint point coordinates under the corresponding visual angles respectively; and acquiring target three-dimensional coordinates of a preset joint point in the target three-dimensional model corresponding to the world coordinate system, and determining the target three-dimensional coordinates as the upper body motion information of the captured object.
In one embodiment, the preset joint points include a vertex joint, a neck joint, a shoulder joint, an elbow joint, a wrist joint, a crotch joint, and a trunk root joint.
According to a second aspect of embodiments of the present disclosure, there is provided a motion capture apparatus comprising:
an upper body motion information acquisition module configured to execute acquiring upper body motion information of the captured object in a world coordinate system based on an upper body image of the captured object acquired by the image acquisition device;
a lower body motion information acquiring module configured to execute acquiring lower body motion information of the captured object according to the upper body motion information and a preset motion detection model, wherein the lower body motion information of the captured object comprises first three-dimensional joint point coordinates of the lower body of the captured object, and the motion detection model is obtained by training a neural network through collected sample image motion data, and the sample image motion data is marked with upper body motion sample information, root node coordinates and orientation of a sample object and the lower body motion sample information of the sample object;
a whole body motion data generation module configured to execute generating whole body motion data of the captured object according to the upper body motion information and the lower body motion information of the captured object.
In one embodiment, the upper body image has a plurality of frame images; the lower body motion information acquisition module is configured to perform: inputting the upper body motion information of the captured object in the current frame image and the lower body motion information of the captured object in the previous frame image of the current frame into the motion detection model, and obtaining the lower body motion information of the captured object in the current frame image.
In one embodiment, the lower body motion information acquiring module further includes: a sample image action data acquiring unit configured to execute acquiring sample image action data, wherein the sample image action data comprises a plurality of continuous frame sample images, each frame sample image is marked with sample object upper body action sample information, root node coordinates, orientation and lower body action sample information, the sample object upper body action sample information comprises a first three-dimensional sample coordinate of the sample object upper body, and the sample object lower body action sample information comprises a second three-dimensional sample coordinate of the sample object lower body; a network prediction unit configured to perform, for an arbitrary frame sample image, inputting, to the neural network, upper body motion sample information, root node coordinates, an orientation, and lower body sample motion information of a sample object labeled in a previous frame sample image adjacent to the frame sample image, of the sample object labeled in the frame sample image, and obtaining lower body motion information of the arbitrary frame sample image output by the neural network; a loss value determination unit configured to determine a loss value according to the lower body motion information of the arbitrary frame sample image and the lower body sample motion information labeled in the arbitrary frame sample image; a network training unit configured to perform training of the neural network according to the loss value, resulting in the motion detection model.
In one embodiment, the apparatus further comprises a single-view image acquisition device or a dual-view image acquisition apparatus: the single-view image capture device configured to capture an upper body image of the captured object; the double-view image acquisition device is arranged in the following way and acquires the upper body image of the captured object: the double-view-angle image acquisition device comprises a first image acquisition device and a second image acquisition device, wherein a connecting line of the first image acquisition device and the second image acquisition device in the double-view-angle image acquisition device is a first edge of a placing angle, the first image acquisition device is placed at the first placing angle towards the captured object based on the first edge, the second image acquisition device is placed at a second placing angle towards the captured object based on the first edge, and the distance between the first image acquisition device and the second image acquisition device is set to be a preset distance.
In one embodiment, the apparatus further includes a dual view image acquisition apparatus calibration module configured to perform: acquiring action images of the calibration object synchronously acquired by the first image acquisition equipment and the second image acquisition equipment; respectively identifying motion images at two visual angles, and acquiring position information of a preset joint point in the motion images at the corresponding visual angles; mapping the position information of a preset joint point in the action image to a coordinate system of the image acquisition equipment at a corresponding visual angle to obtain an initial three-dimensional coordinate of the action image at the corresponding visual angle; and acquiring external parameter initial values of the first image acquisition device and the second image acquisition device by aligning the initial three-dimensional coordinates of the action images under two visual angles so as to finish the calibration of the first image acquisition device and the second image acquisition device.
In one embodiment, the upper body motion information acquiring module includes: an upper body image acquiring unit configured to perform acquisition of an upper body image of the captured subject synchronously acquired by the first image capturing device and the second image capturing device; the image data identification unit is configured to respectively identify the upper body images at two visual angles, and acquire first two-dimensional joint point coordinates in the upper body images at the corresponding visual angles, wherein the first two-dimensional joint point coordinates are two-dimensional joint point coordinates of the upper body of a captured object in the upper body images; the projection unit is configured to perform perspective projection based on a perspective projection method, project the first two-dimensional joint point coordinate to a three-dimensional space where a corresponding image acquisition equipment coordinate system is located, and obtain a second three-dimensional joint point coordinate under a corresponding view angle, wherein the second three-dimensional joint point coordinate is a three-dimensional coordinate of the first two-dimensional joint point coordinate under the image acquisition equipment coordinate system corresponding to the corresponding view angle; and the upper body motion information acquisition unit is configured to execute joint optimization solving according to a conversion relation between the image acquisition device coordinate system and the world coordinate system at two preset visual angles and the second three-dimensional joint point coordinate to obtain the upper body motion information of the captured object at the world coordinate system.
In one embodiment, the upper body motion information acquiring unit is configured to perform: acquiring a predicted three-dimensional model of a captured object in a world coordinate system according to a conversion relation between a coordinate system of image acquisition equipment and the world coordinate system under two preset visual angles and the second three-dimensional joint point coordinate; projecting the predicted three-dimensional model to image acquisition equipment coordinate systems under two visual angles respectively based on a preset joint point to obtain second two-dimensional joint point coordinates of the preset joint point in the predicted three-dimensional model under the two visual angles respectively; determining a target three-dimensional model of the captured object according to the distances between the second two-dimensional joint point coordinates under the two visual angles and the first two-dimensional joint point coordinates under the corresponding visual angles respectively; and acquiring target three-dimensional coordinates of a preset joint point in the target three-dimensional model corresponding to the world coordinate system, and determining the target three-dimensional coordinates as the upper body motion information of the captured object.
In one embodiment, the preset joint points include a vertex node, a shoulder node, an elbow node, a wrist node, a crotch node, and a trunk root node.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to cause the electronic device to perform the motion capture method as described in any embodiment of the first aspect.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, in which instructions, which, when executed by a processor of an electronic device, enable the electronic device to perform the motion capture method described in any one of the embodiments of the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program stored in a readable storage medium, from which at least one processor of a device reads and executes the computer program, causing the device to perform the motion capture method described in any one of the embodiments of the first aspect.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: the method comprises the steps of acquiring upper body motion information of a captured object in a world coordinate system through an acquired upper body image of the captured object, acquiring lower body motion information of the captured object according to the upper body motion information of the captured object and a preset motion detection model, and generating whole body motion data of the captured object according to the upper body motion information and the lower body motion information of the captured object. By adopting the scheme disclosed by the invention, the whole body motion data of the captured object is generated only by collecting the upper body image of the captured object, and compared with the situation that the whole body image needs to be collected when motion capture is carried out in the traditional technology, the space required by image collection is greatly saved, so that the space constraint of image collection during three-dimensional animation production is broken through.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a flow diagram illustrating a method of motion capture in accordance with an exemplary embodiment.
FIG. 2 is a flowchart illustrating the training steps of a motion detection model according to an exemplary embodiment.
FIG. 3 is a schematic diagram illustrating a structure of a neural network, according to an example embodiment.
Fig. 4 is a schematic diagram of an application scenario of a single-view image capture device according to an exemplary embodiment.
Fig. 5 is a schematic view of an application scenario of a dual-view image capturing apparatus according to an exemplary embodiment.
FIG. 6 is a flowchart illustrating a calibration procedure for an image capture device according to an exemplary embodiment.
FIG. 7 is a diagram illustrating preset human key points according to an exemplary embodiment.
Fig. 8 is a flowchart illustrating steps of upper body motion information according to an example embodiment.
FIG. 9 is a flowchart illustrating steps for jointly optimizing solution for upper body motion information in accordance with an exemplary embodiment.
FIG. 10 is a flow diagram illustrating a method of motion capture in accordance with an exemplary embodiment.
FIG. 11 is a block diagram illustrating a motion capture device, according to an example embodiment.
Fig. 12 is an internal block diagram of an electronic device shown in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Because the maximum transverse field angle of a common camera is generally 90 degrees, and the longitudinal field angle is about 50 degrees to 60 degrees, if the user wants to shoot the whole body movement of the user, the user needs to be 2 meters to 3 meters away from the camera, and in a common family, the depth of 2 meters to 3 meters in front of the computer desk of the user is generally difficult to reach. However, if only the upper body of the user is photographed, the user only needs to be 1 to 1.5 meters away from the camera, and the distance can be satisfied by a common user. It should be noted that it is not feasible to enlarge the longitudinal field of view of the camera by adjusting the placement angle of the camera, for example, the camera is placed by turning 90 degrees, so that the longitudinal field of view of the camera can reach 90 °, however, the lateral field of view of the camera is too small, and once the user lifts his/her arms flat, the hand is easy to be out of the range of the screen.
Based on this, the present disclosure provides a motion capture method, which is applied to a terminal for illustration, and it is understood that the method may also be applied to a server, and may also be applied to a system including a terminal and a server, and is implemented through interaction between the terminal and the server. In this embodiment, as shown in fig. 1, the method includes the following steps:
in step S110, upper body motion information of the captured object in the world coordinate system is acquired based on the upper body image of the captured object captured by the image capturing apparatus.
The captured object is an object corresponding to motion capture. The upper body motion information of the captured object is that the upper body image of the captured object corresponds to a three-dimensional coordinate vector in the world coordinate system. Since motion capture is a three-dimensional motion picture based on estimation of the position of a two-dimensional image of a captured object in a three-dimensional space, in the present embodiment, when motion capture is to be performed, two-dimensional image data of the captured object captured by an image capturing apparatus can be acquired. However, since the acquisition of the entire image of the object is subject to a large spatial constraint, the present embodiment only needs to acquire the upper body image of the object, acquire the three-dimensional coordinate vector corresponding to the world coordinate system based on the upper body image, and acquire the lower body motion information of the object in the subsequent steps. Therefore, compared with the traditional technology of live-broadcasting and acquiring the whole-body image of the captured object, the space occupied by the method is smaller.
In step S120, lower body motion information of the captured object is acquired based on the upper body motion information and a preset motion detection model.
The lower body motion information of the captured object includes first three-dimensional joint point coordinates of the lower body of the captured object, and the first three-dimensional joint point coordinates are three-dimensional coordinate vectors in the world coordinate system corresponding to the lower body image of the captured object predicted based on the upper body motion information of the captured object. The motion detection model is obtained by training a neural network based on the collected sample image motion data, and the predicted lower body motion information of the captured object can be obtained from the motion detection model and the upper body motion information of the captured object. The sample image motion data is labeled with sample information of the upper body motion, the root node coordinates and orientation of the sample object, and sample information of the lower body motion of the sample object. Specifically, the sample information of the upper body motion of the sample object is the three-dimensional coordinate vector of the upper body image of the sample object in the world coordinate system, the sample information of the lower body motion of the sample object is the three-dimensional coordinate vector of the lower body image of the sample object in the world coordinate system, and the root node coordinate and the orientation refer to the root node coordinate of the sample object in the sample image motion data and the orientation of the sample object.
In the present embodiment, a motion detection model is obtained by training a neural network based on the sample image motion data, and lower body motion information of the object to be captured is obtained from the upper body motion information of the object to be captured and the motion detection model. In order to overcome the space constraint of image capturing in three-dimensional animation production, the present disclosure captures only a partial image of a captured object (i.e., an upper body image) when capturing image data, and predicts lower body motion information corresponding to a partial image of the captured object (i.e., a lower body image) that is not captured, based on upper body motion information corresponding to the captured object partial image that is captured and a motion detection model trained in advance.
In step S130, whole body motion data of the object to be captured is generated based on the upper body motion information and the lower body motion information of the object to be captured.
The whole-body motion data of the captured object means that a whole-body image of the captured object corresponds to a three-dimensional coordinate vector in the world coordinate system, and the whole-body image of the captured object includes an acquired upper-body image of the captured object and an acquired lower-body image of the captured object that is not acquired. In the present embodiment, the whole body motion data of the object to be captured, that is, the whole animation data of the object to be captured is obtained by fusing the upper body motion information corresponding to the acquired upper body image of the object to be captured and the lower body motion information corresponding to the lower body image of the object to be captured that is not acquired.
The motion capture method acquires the upper body motion information of the captured object in the world coordinate system through the acquired upper body image of the captured object, acquires the lower body motion information of the captured object according to the upper body motion information of the captured object and a preset motion detection model, and further generates the whole body motion data of the captured object according to the upper body motion information and the lower body motion information of the captured object. The method and the device have the advantages that the whole body motion data of the captured object is generated only by collecting the upper body image of the captured object under the condition of occupying less space, compared with the method and the device for capturing the whole body image in the prior art, the method and the device greatly save the space required by image collection, and therefore space constraint of image collection during three-dimensional animation production is broken through.
In an exemplary embodiment, as shown in fig. 2, the motion detection model is obtained by the following training method:
in step S210, sample image motion data is acquired.
The sample image motion data comprises a plurality of continuous frame sample images, and the upper body motion sample information, the root node coordinates, the orientation and the lower body motion sample information of the sample object are marked in each frame sample image. Specifically, the upper body motion sample information includes a first three-dimensional sample coordinate of the upper body of the sample object, and the lower body motion sample information includes a second three-dimensional sample coordinate of the lower body of the sample object. In this embodiment, taking the captured object as a human being as an example, in order to train the neural network, a large amount of sample image motion data including three-dimensional motion of a human body may be collected, specifically, a video frame including motion of an actor, and a frame including three-dimensional coordinates of each joint point of the human body collected by an inertial capture device or an optical capture device, such as a frame including three-dimensional coordinates corresponding to motion types of most animation scenes, in which an upper half body and a lower half body of the sample object are closely combined. The amount of sample image motion data to be acquired is preferably 10 ten thousand frames or more (about 1 hour of continuous video).
In step S220, for an arbitrary frame sample image, the upper body motion sample information, the root node coordinates, and the direction of the sample object marked in the frame sample image, and the lower body motion information of the sample object marked in the previous frame sample image adjacent to the frame sample image are input to the neural network, and the lower body motion information of the arbitrary frame sample image output by the neural network is obtained.
The neural network may adopt a network structure including an input layer, an output layer, and a hidden layer. In practical application, the number and the dimension of the hidden layers can be increased as required to improve the performance, but the larger calculation amount is brought at the same time. In this embodiment, as shown in fig. 3, the neural network may adopt a network structure including three hidden layers. When a neural network is trained on any frame of sample images, a 45-dimensional vector including upper body motion sample information, root node coordinates and orientation marked in the frame of sample images and lower body motion sample information marked in a previous frame of sample images adjacent to the frame of sample images is input into the neural network, and a 12-dimensional vector, namely the lower body motion sample information of the frame of sample images is output from the neural network. Specifically, the input upper body motion sample information of the frame sample image may be a three-dimensional coordinate vector of 9 upper body joints (including head, shoulder, elbow, wrist, and double-span) of 27 dimensions, the root node coordinates may be a three-dimensional coordinate vector of a 6-dimensional human body root node representing the entire body, the orientation may be a three-dimensional spatial orientation representing the entire body, the input lower body motion sample information of the upper body sample image may be a three-dimensional coordinate vector of 4 lower body joints (including knee and ankle) of 12 dimensions, and the output lower body motion sample information of the frame sample image may also be a 12-dimensional vector, that is, predicted lower body motion sample information of the frame sample image, that is, a predicted three-dimensional coordinate vector of 4 lower body joints (including knee and ankle) of the frame sample image.
It is understood that, when the motion detection model obtained by the neural network training is actually used, since the image data includes a plurality of images, when it is necessary to predict the motion information of the lower body of the object to be captured in the t-th image, the motion information of the upper body of the object to be captured in the t-th image (a 27-dimensional vector) and the coordinates and orientation of the root node (a 6-dimensional vector) which are obtained by solving the acquired image data, and the motion information of the lower body of the object to be captured (a 12-dimensional vector) output from the model in the t-1 image may be spliced into one 45-dimensional vector (in the 1-th image, the lower body may be directly assumed to be in an upright state), and the motion information of the lower body of the object to be captured in the t-th image (a 12-dimensional vector) may be input to the model and output.
In step S230, a loss value is determined based on the lower body motion information of the arbitrary frame sample image and the lower body sample motion information labeled in the arbitrary frame sample image.
Specifically, the neural network is trained based on the acquired sample image motion data, that is, the neural network infers each set of input data, that is, compares the obtained result (i.e., the lower body motion information of each frame of the sample image) with the true value (i.e., the correspondingly labeled lower body motion information), and calculates the loss value.
In step S240, the neural network is trained according to the loss value, and a motion detection model is obtained.
In this embodiment, the network parameters are iterated through a back propagation algorithm based on the loss values calculated in the above steps, so that a trained motion detection model is obtained during convergence.
In the above-described embodiment, the motion detection model is obtained by training the neural network, and the upper body motion information corresponding to the current frame image data and the lower body motion information corresponding to the previous frame image data are input to the motion detection model, so that the lower body motion information corresponding to the current frame image data is obtained, and the three-dimensional coordinate vector corresponding to the partial image (i.e., the lower body image) of the captured object that is not captured is predicted by capturing the partial image (i.e., the upper body image) of the captured object, thereby breaking through the spatial constraint of image capturing at the time of three-dimensional animation production and making the image capturing space required at the time of motion capturing smaller.
In an exemplary embodiment, when the above-mentioned upper body image of the captured object is acquired, the upper body image of the captured object may be acquired by using a single-view image acquisition device, or the upper body image of the captured object may be acquired by using a dual-view image acquisition device. Fig. 4 is a schematic diagram of image acquisition corresponding to a single-view image acquisition device, that is, a single image acquisition device is used for image acquisition, and since only the upper body image of the captured object is acquired in the present disclosure, complete whole body motion data of the captured object can be generated according to the acquired upper body image based on the above method, so that compared with the conventional technology in which the whole body image of the user is directly acquired, the method requires less space. For example, when the field angle of the image capturing device is fixed, if the user needs to capture the whole body image of the user by 2 meters to 3 meters from the image capturing device, and only the upper body image of the user needs to be captured, the user only needs to be 1 meter to 1.5 meters from the image capturing device. Therefore, the method and the device can break through space constraint to acquire the image, and have a wide application prospect for common users.
Fig. 5 is a schematic diagram of image acquisition corresponding to a dual-view image acquisition apparatus, that is, two image acquisition devices are used for image acquisition. When animation is performed, the position of a human body joint point in a three-dimensional space needs to be estimated, namely motion capture, but under the scheme of single-view motion capture, the position estimation accuracy of the human body along the depth direction is generally low. For example, when the human body extends in the depth direction, there is a great ambiguity in the view angle of the image capturing device, such as when the palm is placed in front of the chest, it is difficult to see the distance between the palm and the chest at the view angle of the image capturing device, and the error can reach more than ten centimeters. Based on this, the present disclosure further improves the accuracy of motion capture while saving image capture space by employing a dual-view image capture device for image capture. The motion capture method of the present disclosure is further explained below by using a dual-view image capture scene as shown in fig. 5, wherein the two image capture devices may be arranged in a manner as shown in fig. 5, that is, the two image capture devices are respectively arranged at two sides of the terminal at a certain angle. Specifically, the terminal may be, but is not limited to, various devices for animation, such as a personal computer, a notebook computer, a smart phone, and a tablet computer. Specifically, the dual-view image capturing apparatus includes a first image capturing device (image capturing device 1 in the figure) and a second image capturing device (image capturing device 2 in the figure), wherein, with a connecting line of the first image capturing device and the second image capturing device as a first side of the placing angle, the first image capturing device is placed at a first placing angle toward the captured object based on the first side, the second image capturing device is placed at a second placing angle toward the captured object based on the first side, and a distance between the first image capturing device and the second image capturing device is set to be a preset distance.
In an exemplary embodiment, as shown in fig. 6, before the capturing of the upper body image of the captured object by the dual-view image capturing device, the method further comprises the following steps:
in step S610, a motion image of the calibration object synchronously acquired by the first image acquisition device and the second image acquisition device is acquired.
In step S620, the motion images at two viewing angles are respectively identified, and position information of a preset joint point in the motion image at the corresponding viewing angle is obtained.
In step S630, the position information of the preset joint point in the motion image is mapped to the coordinate system where the image capturing device at the corresponding view angle is located, so as to obtain the initial three-dimensional coordinates of the motion image at the corresponding view angle.
In step S640, external parameter initial values of the first image capturing device and the second image capturing device are obtained by aligning the initial three-dimensional coordinates of the motion images at the two viewing angles, so as to complete calibration of the first image capturing device and the second image capturing device.
The above process is a calibration process for the first image acquisition device and the second image acquisition device, the calibration process is also a process for calculating the position of the image acquisition device in a world coordinate system, and after calibration is usually completed, the image acquisition device is generally not moved in the whole using process, and if the image acquisition device is moved, the image acquisition device needs to be calibrated again. The calibration process can be realized by solving an external reference matrix based on the internal reference matrices of the two image acquisition devices, wherein the internal reference matrix of the image acquisition devices is usually a known quantity, and the external reference matrix can be obtained by using checkerboard calibration or other methods based on the known internal reference matrix of the image acquisition devices.
Specifically, in this embodiment, a human body may also be directly used as a calibration object, that is, for two image capturing devices placed as shown in fig. 5, during calibration, the calibration object is allowed to stand at a "user standing position" in fig. 5 and perform some actions randomly, and position information of a preset joint in an image can be detected by using each frame of image of a synchronous video acquired from each image capturing device. And roughly estimating a human body three-dimensional model under a coordinate system of the image acquisition equipment, and then, calculating an initial estimation value of external parameters of the image acquisition equipment by aligning the human body three-dimensional models under the visual angles of the two image acquisition equipment. Thereby completing the position calibration of the image acquisition equipment. In order to obtain more accurate estimation, the projections under the visual angles of the two image acquisition devices can be jointly optimized by utilizing the three-dimensional human body models of multiple frames, so that an accurate estimation value of the external parameters of the camera can be obtained.
In this embodiment, the preset body joint point positions may include 11 joint points, namely, a vertex joint point, a neck joint point, a shoulder joint point, an elbow joint point, a wrist joint point, a crotch joint point, and a trunk root joint point, as shown in fig. 7, and these joint points are also referred to as two-dimensional joint points on the image. The method for detecting the preset human body joint points in the image can be realized by building a neural network model, namely training a basic network through a large amount of image data marked with the preset joint point positions, and detecting the joint point positions on the given image containing the human body by the trained model.
In an exemplary embodiment, as shown in fig. 8, acquiring upper body motion information of a captured object in a world coordinate system specifically includes:
in step S810, an upper body image of the captured object synchronously captured by the first image capturing apparatus and the second image capturing apparatus is acquired.
The captured object is an object corresponding to motion capture. Since motion capture is a three-dimensional motion picture obtained based on position estimation of a two-dimensional image of a captured object in a three-dimensional space, in the present embodiment, in order to improve the accuracy of motion capture, when motion capture is to be performed, an upper body image of the captured object synchronously captured by a first image capturing device and a second image capturing device may be employed, and motion capture may be further implemented based on the synchronously captured upper body image of the captured object.
In step S820, the upper body images at two viewing angles are respectively identified, and the first two-dimensional joint coordinates in the upper body image at the corresponding viewing angle are obtained.
For convenience of description, the first two-dimensional joint point coordinates are two-dimensional joint point coordinates of the upper body of the captured object in the image data, that is, two-dimensional coordinates corresponding to joint points of the upper body of the captured object in the image data. Specifically, when the captured object is a human body, the upper body joints of the human body include a vertex joint, a shoulder joint, an elbow joint, a wrist joint, a crotch joint, and a trunk root joint. In this embodiment, the first two-dimensional joint point coordinates of the upper body of the captured object in the image data and the first two-dimensional joint point coordinates of the upper body of the captured object in the coordinate system of the image capturing device at the corresponding view angle are obtained by performing image recognition on the upper body images of the captured object acquired in the above steps.
In step S830, based on the perspective projection method, the first two-dimensional joint coordinate is projected into the three-dimensional space where the coordinate system of the corresponding image capturing device is located, so as to obtain a second three-dimensional joint coordinate at the corresponding view angle.
And the second three-dimensional joint point coordinate is a three-dimensional coordinate of the first two-dimensional joint point coordinate corresponding to the coordinate system of the image acquisition equipment under the corresponding visual angle. In general, the relationship between two-dimensional coordinates and three-dimensional coordinates in the same coordinate system can be obtained by projection through a perspective projection method. Specifically, based on a perspective projection method, the first two-dimensional joint point coordinates are projected into a three-dimensional space where the corresponding image acquisition device coordinate system is located, so that the corresponding second three-dimensional joint point coordinates are obtained.
In step S840, a joint optimization solution is performed according to a transformation relationship between the coordinate systems of the image capturing device and the world coordinate system at two preset viewing angles and the coordinates of the second three-dimensional joint point, so as to obtain the upper body motion information of the captured object in the world coordinate system.
Wherein, the upper body motion information of the captured object is that the upper body image of the captured object in the image data corresponds to a three-dimensional coordinate vector under a world coordinate system. In general, assuming that a position of a certain image capturing device in the world coordinate system is known, a conversion relationship between the image capturing device coordinate system and the world coordinate system can be obtained based on the position relationship. Therefore, based on the conversion relationship between the coordinate system of the image acquisition device and the world coordinate system, the second three-dimensional joint point coordinate of the human body joint point under each image acquisition device coordinate system can be converted into the common world coordinate system, and then the joint optimization solution is carried out, namely, a solution (the human body joint point corresponds to the three-dimensional coordinate vector under the world coordinate system) of the human body joint point in the world coordinate system is found, so that the second three-dimensional joint point coordinate is closest to the human body joint points under the two image acquisition device coordinate systems, and the upper body motion information of the captured object under the world coordinate system is obtained.
In the above embodiment, the upper body image of the captured object is obtained by the image capturing devices with different viewing angles, the upper body image of the captured object is respectively identified, the first two-dimensional joint point coordinates in the upper body image at the corresponding viewing angle are obtained, the first two-dimensional joint point coordinates are projected into the three-dimensional space where the corresponding image capturing device coordinate system is located based on the perspective projection method, the second three-dimensional joint point coordinates at the corresponding viewing angle are obtained, and joint optimization solution is performed according to the preset conversion relationship between each image capturing device coordinate system and the world coordinate system and the second three-dimensional joint point coordinates, so as to obtain the upper body motion information of the captured object at the world coordinate system, that is, the motion capture of the captured object is completed. By adopting the scheme of the disclosure, the upper body image is collected through the image collecting devices with different visual angles, so that the precision of motion capture is greatly improved, and the occupied space is smaller compared with the traditional technology for collecting the whole body image of the captured object.
In an exemplary embodiment, as shown in fig. 9, in step S840, performing joint optimization solution according to a transformation relationship between the coordinate system of the image capturing device and the world coordinate system at two preset viewing angles and the second three-dimensional joint coordinates, to obtain the upper body motion information of the captured object in the world coordinate system, which may be specifically implemented by the following steps:
in step S842, a predicted three-dimensional model of the captured object in the world coordinate system is obtained according to the transformation relationship between the image capturing device coordinate system and the world coordinate system at two preset viewing angles and the second three-dimensional joint coordinates.
Specifically, based on the conversion relationship between the respective image-capturing-device coordinate systems and the world coordinate system, if it is assumed that the position of a certain image-capturing device in the world coordinate system is known, a second three-dimensional joint point coordinate in the world coordinate system may be projected into the image-capturing-device coordinate system for the upper body image of the captured object corresponding to the second three-dimensional joint point coordinate in the image-capturing-device coordinate system. When the human body extends along the depth direction, the human body has great ambiguity, for example, when the palm is placed in front of the chest, the distance between the palm and the chest is difficult to see, therefore, when the second three-dimensional joint point coordinates are projected into the world coordinate system, a plurality of predicted three-dimensional models of the captured object positioned under the world coordinate system after projection can be obtained, and then the subsequent steps are adopted for carrying out joint optimization solution so as to determine the target three-dimensional model of the captured object.
In step S844, the predicted three-dimensional model is projected to the coordinate systems of the image capturing device at two viewing angles based on the preset joint point, so as to obtain second two-dimensional joint point coordinates of the preset joint point in the predicted three-dimensional model corresponding to the two viewing angles.
And the second two-dimensional joint point coordinate is a two-dimensional coordinate of a view angle corresponding to the preset joint point, which is obtained by projecting the predicted three-dimensional model in the world coordinate system to each image acquisition equipment coordinate system based on the preset joint point. In this embodiment, the coordinates of the second two-dimensional joint point in the coordinate system of each image capturing device corresponding to the preset joint point in the predicted three-dimensional model are obtained through projection.
In step S846, a target three-dimensional model of the captured object is determined according to distances between the second two-dimensional joint coordinates at the two viewing angles and the first two-dimensional joint coordinates at the corresponding viewing angle, respectively.
Specifically, the idea of the joint optimization solution is that the joint coordinates (i.e., the first two-dimensional joint coordinates) corresponding to the acquired upper body image of the captured object are closest based on the location of the projected two-dimensional coordinates (i.e., the second two-dimensional joint coordinates) on the image. Therefore, in the present embodiment, the target three-dimensional model of the captured object is determined based on the distance between the second two-dimensional joint coordinates at which the predicted three-dimensional model is projected to the image capture device coordinate system and the first two-dimensional joint coordinates corresponding to the upper body image of the captured object. The plurality of prediction models obtained in the above way are respectively projected to the coordinate system of the image acquisition equipment to obtain corresponding second two-dimensional joint point coordinates, the distance between the second two-dimensional joint point coordinates and the first two-dimensional joint point coordinates of each prediction model is calculated, and the prediction model corresponding to the second two-dimensional joint point coordinates with the nearest distance is determined as the target three-dimensional model of the captured object.
In step S848, target three-dimensional coordinates in the target three-dimensional model, at which the preset joint points correspond to the world coordinate system, are obtained, and the target three-dimensional coordinates are determined as the upper body motion information of the captured object.
Specifically, based on the determined target three-dimensional model, the target three-dimensional coordinates corresponding to the preset joint points in the target three-dimensional model in the world coordinate system are obtained according to the preset joint points, and the target three-dimensional coordinates are determined as upper body motion information corresponding to the upper body image of the captured object in the image data, where the upper body motion information is motion data corresponding to the upper body image of the captured object in the image data in the world coordinate system, and can reflect motions corresponding to the upper body image of the captured object.
In the above embodiment, the predicted three-dimensional model of the captured object in the world coordinate system is obtained according to the conversion relationship between the preset image capturing device coordinate system and the world coordinate system and the second three-dimensional joint point coordinate, the predicted three-dimensional model is projected to the image capturing device coordinate system based on the preset joint point, the second two-dimensional joint point coordinate of the preset joint point in the predicted three-dimensional model is obtained, the target three-dimensional model of the captured object is determined according to the distance between the second two-dimensional joint point coordinate and the first two-dimensional joint point coordinate, the target three-dimensional coordinate of the preset joint point in the target three-dimensional model corresponding to the world coordinate system is obtained, and the target three-dimensional coordinate is determined as the upper body movement information of the captured object. According to the method, the upper body motion information corresponding to the upper body image of the captured object is obtained based on a joint optimization solution method according to the acquired upper body image of the captured object, so that the motion capturing precision is improved.
The following further describes the motion capture method of the present disclosure based on the dual-view image capturing device shown in fig. 5, as shown in fig. 10, the method specifically includes the following steps:
step 1002, calibrating the position of the image acquisition device.
Specifically, the method shown in fig. 6 may be implemented, and this is not described in detail in this embodiment.
And step 1004, acquiring image data acquired by the image acquisition equipment.
The image data includes an upper body image of a captured object, and the captured object is a corresponding object for motion capture.
Step 1006, identify the image data, and obtain the first two-dimensional joint coordinates in the image data.
In the present embodiment, the first two-dimensional joint point coordinates in the image data in which the joint point of the upper body of the captured object is located in the image capturing apparatus coordinate system are obtained based on the image recognition of the upper body image of the captured object acquired in the above-described steps. Specifically, two image acquisition devices synchronously acquire image data to obtain an acquired synchronous video stream, and for each frame of image in the synchronous video stream, two-dimensional joint point detection is firstly carried out on the visual angles of the two image acquisition devices to obtain a first two-dimensional joint point coordinate corresponding to the joint point in each frame of image.
And step 1008, projecting the first two-dimensional joint point coordinate to a three-dimensional space where the image acquisition equipment coordinate system is located based on a perspective projection method to obtain a second three-dimensional joint point coordinate.
Specifically, based on a perspective projection method, the first two-dimensional joint point coordinates are projected into a three-dimensional space where the image acquisition device coordinate system is located, so that corresponding second three-dimensional joint point coordinates are obtained.
Step 1010, performing joint optimization solution according to a preset conversion relation between the coordinate system of the image acquisition device and the world coordinate system and the coordinates of the second three-dimensional joint point to obtain the upper body motion information of the captured object in the world coordinate system.
For example, based on the coordinates of the first two-dimensional joint point corresponding to the joint point in each frame of image obtained above, the three-dimensional model of the human body is roughly estimated. Since the image acquisition equipment is calibrated, each joint point of the human body three-dimensional model to be solved can be projected to the corresponding view angles of the two image acquisition equipment, so that the corresponding second two-dimensional joint point coordinates are obtained, the second two-dimensional joint point coordinates are compared with the first two-dimensional joint point coordinates corresponding to the detected joint point, and the human body three-dimensional model which enables the second two-dimensional joint point coordinates of the projected joint point on the image to be closest to the first two-dimensional joint point coordinates corresponding to the detected joint point is found, namely the solution is the required solution.
Generally, this is a mathematical optimization problem, and there are many implementation forms, which are not limited in this embodiment, and the following examples are only for illustrating the implementation principle of this solution and are not intended to limit the scope of the present disclosure. The optimization objectives adopted by this embodiment are as follows:
E=wdataEdata+wpriorEprior(ii) a Wherein, wdataAnd wpriorFor each weight, which is determined based on empirical values, it can be understood as a constant, EdataDescribing the difference between the second two-dimensional joint point coordinates of the three-dimensional joint point after being projected at each visual angle and the corresponding first two-dimensional joint point coordinates, and expressing as follows:
Figure BDA0002845349410000161
and EpriorDescribing three-dimensional joint point prior knowledge, and ensuring that the difference between the solved three-dimensional joint point position and a common action is not too large:
Figure BDA0002845349410000171
where N is 2, representing two image capturing devices, K is 11, representing 11 two-dimensional key points, K isnIs a camera reference matrix, PnThe reference point is the external reference moment of the camera, the array,
Figure BDA0002845349410000172
the coordinates of the kth three-dimensional joint point of the parameterized three-dimensional model of the human body,
Figure BDA0002845349410000173
for the detected first two-dimensional joint point coordinates,
Figure BDA0002845349410000174
the second two-dimensional joint coordinates of the kth three-dimensional joint of the three-dimensional model corresponding to a common (or average) posture (specifically, which posture is selected, which embodiment is not limited). w is akRepresenting the Kth joint point, W, in the image Wn,kIt indicates the kth joint point in the image W captured by the nth image capturing apparatus. Solving by the above disclosureThe desired three-dimensional joint coordinates can be obtained by minimizing the parameter theta of the target E. It should be noted that, the parameterized human body three-dimensional model
Figure BDA0002845349410000175
Is controlled by the parameter theta, and further determines the three-dimensional position coordinates corresponding to the human joints. The embodiment is not limited to the specific form, and a common example is that an SMPL (Skinned Multi-Person Linear) model describes a human three-dimensional mesh M (θ, β, r), where θ is a vector with a length of 72 and represents an axis angle of rotation of 24 joints in the model; beta is a length-10 vector representing the blendshape coefficient; r ═ r _ x, r _ y, r _ z]Representing the three-dimensional coordinates of the model root node in the world coordinate system.
Step 1012 is performed to acquire lower body motion information of the captured object based on the upper body motion information of the captured object and a preset motion detection model.
In the present embodiment, a motion detection model is obtained by training a neural network based on the sample image motion data, and lower body motion information of the object to be captured is obtained from the upper body motion information of the object to be captured and the motion detection model.
Step 1014 generates whole-body motion data of the object to be captured based on the upper-body motion information and the lower-body motion information of the object to be captured.
In the present embodiment, the whole body motion data of the object to be captured is obtained by fusing the upper body motion information corresponding to the captured upper body image of the object to be captured and the lower body motion information corresponding to the captured lower body image of the object to be captured that is not captured.
In the above embodiment, the image data is acquired by the image acquisition device with two viewing angles, so that the capturing precision of the motion image is improved, and the spatial constraint problem of image acquisition in the three-dimensional animation production is broken through by acquiring only the upper body image of the captured object and generating the complete whole body motion data of the captured object, and the method has strong practicability.
It should be understood that although the various steps in the flowcharts of fig. 1-10 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-10 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps or stages.
FIG. 11 is a block diagram illustrating an animation capture device, according to an example embodiment. Referring to fig. 11, the apparatus includes an upper body motioninformation acquisition module 1102, a lower body motioninformation acquisition module 1104, and a whole body motiondata generation module 1106.
An upper body motioninformation acquisition module 1102 configured to execute acquiring upper body motion information of the captured object in the world coordinate system based on an upper body image of the captured object acquired by the image acquisition device;
a lower body motioninformation acquiring module 1104 configured to execute acquiring lower body motion information of the captured object according to the upper body motion information and a preset motion detection model, wherein the lower body motion information of the captured object includes first three-dimensional joint point coordinates of the lower body of the captured object, and the motion detection model is obtained by training a neural network through collected sample image motion data, and the sample image motion data is labeled with upper body motion sample information, root node coordinates and orientation of a sample object and the lower body motion sample information of the sample object;
a whole-body motiondata generation module 1106 configured to execute generating whole-body motion data of the captured object according to the upper-body motion information and the lower-body motion information of the captured object.
In an exemplary embodiment, the upper body image has a plurality of frame images; the lower body motion information acquisition module is configured to perform: inputting the upper body motion information of the captured object in the current frame image and the lower body motion information of the captured object in the previous frame image of the current frame into the motion detection model, and obtaining the lower body motion information of the captured object in the current frame image.
In an exemplary embodiment, the lower body motion information acquiring module further includes: a sample image action data acquiring unit configured to execute acquiring sample image action data, wherein the sample image action data comprises a plurality of continuous frame sample images, each frame sample image is marked with sample object upper body action sample information, root node coordinates, orientation and lower body action sample information, the sample object upper body action sample information comprises a first three-dimensional sample coordinate of the sample object upper body, and the sample object lower body action sample information comprises a second three-dimensional sample coordinate of the sample object lower body; a network prediction unit configured to perform, for an arbitrary frame sample image, inputting, to the neural network, upper body motion sample information, root node coordinates, an orientation, and lower body sample motion information of a sample object labeled in a previous frame sample image adjacent to the frame sample image, of the sample object labeled in the frame sample image, and obtaining lower body motion information of the arbitrary frame sample image output by the neural network; a loss value determination unit configured to determine a loss value according to the lower body motion information of the arbitrary frame sample image and the lower body sample motion information labeled in the arbitrary frame sample image; a network training unit configured to perform training of the neural network according to the loss value, resulting in the motion detection model.
In an exemplary embodiment, the apparatus further comprises a single-view image capturing device or a dual-view image capturing apparatus: the single-view image capture device configured to capture an upper body image of the captured object; the double-view image acquisition device is arranged in the following way and acquires the upper body image of the captured object: the double-view-angle image acquisition device comprises a first image acquisition device and a second image acquisition device, wherein a connecting line of the first image acquisition device and the second image acquisition device in the double-view-angle image acquisition device is a first edge of a placing angle, the first image acquisition device is placed at the first placing angle towards the captured object based on the first edge, the second image acquisition device is placed at a second placing angle towards the captured object based on the first edge, and the distance between the first image acquisition device and the second image acquisition device is set to be a preset distance.
In an exemplary embodiment, the apparatus further includes a dual view image acquisition apparatus calibration module configured to perform: acquiring action images of the calibration object synchronously acquired by the first image acquisition equipment and the second image acquisition equipment; respectively identifying motion images at two visual angles, and acquiring position information of a preset joint point in the motion images at the corresponding visual angles; mapping the position information of a preset joint point in the action image to a coordinate system of the image acquisition equipment at a corresponding visual angle to obtain an initial three-dimensional coordinate of the action image at the corresponding visual angle; and acquiring external parameter initial values of the first image acquisition device and the second image acquisition device by aligning the initial three-dimensional coordinates of the action images under two visual angles so as to finish the calibration of the first image acquisition device and the second image acquisition device.
In an exemplary embodiment, the upper body motion information acquiring module includes: an upper body image acquiring unit configured to perform acquisition of an upper body image of the captured subject synchronously acquired by the first image capturing device and the second image capturing device; the image data identification unit is configured to respectively identify the upper body images at two visual angles, and acquire first two-dimensional joint point coordinates in the upper body images at the corresponding visual angles, wherein the first two-dimensional joint point coordinates are two-dimensional joint point coordinates of the upper body of a captured object in the upper body images; the projection unit is configured to perform perspective projection based on a perspective projection method, project the first two-dimensional joint point coordinate to a three-dimensional space where a corresponding image acquisition equipment coordinate system is located, and obtain a second three-dimensional joint point coordinate under a corresponding view angle, wherein the second three-dimensional joint point coordinate is a three-dimensional coordinate of the first two-dimensional joint point coordinate under the image acquisition equipment coordinate system corresponding to the corresponding view angle; and the upper body motion information acquisition unit is configured to execute joint optimization solving according to a conversion relation between the image acquisition device coordinate system and the world coordinate system at two preset visual angles and the second three-dimensional joint point coordinate to obtain the upper body motion information of the captured object at the world coordinate system.
In an exemplary embodiment, the upper body motion information acquiring unit is configured to perform: acquiring a predicted three-dimensional model of a captured object in a world coordinate system according to a conversion relation between a coordinate system of image acquisition equipment and the world coordinate system under two preset visual angles and the second three-dimensional joint point coordinate; projecting the predicted three-dimensional model to image acquisition equipment coordinate systems under two visual angles respectively based on a preset joint point to obtain second two-dimensional joint point coordinates of the preset joint point in the predicted three-dimensional model under the two visual angles respectively; determining a target three-dimensional model of the captured object according to the distances between the second two-dimensional joint point coordinates under the two visual angles and the first two-dimensional joint point coordinates under the corresponding visual angles respectively; and acquiring target three-dimensional coordinates of a preset joint point in the target three-dimensional model corresponding to the world coordinate system, and determining the target three-dimensional coordinates as the upper body motion information of the captured object.
In an exemplary embodiment, the preset joint points include a vertex node, a shoulder node, an elbow node, a wrist node, a crotch node, and a trunk root node.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
FIG. 12 is a block diagram illustrating an electronic device Z00 for motion capture in accordance with an exemplary embodiment. For example, electronic device Z00 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and so forth.
Referring to fig. 12, electronic device Z00 may include one or more of the following components: a processing component Z02, a memory Z04, a power component Z06, a multimedia component Z08, an audio component Z10, an interface for input/output (I/O) Z12, a sensor component Z14 and a communication component Z16.
The processing component Z02 generally controls the overall operation of the electronic device Z00, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component Z02 may include one or more processors Z20 to execute instructions to perform all or part of the steps of the method described above. Further, the processing component Z02 may include one or more modules that facilitate interaction between the processing component Z02 and other components. For example, the processing component Z02 may include a multimedia module to facilitate interaction between the multimedia component Z08 and the processing component Z02.
The memory Z04 is configured to store various types of data to support operations at the electronic device Z00. Examples of such data include instructions for any application or method operating on electronic device Z00, contact data, phonebook data, messages, pictures, videos, and the like. The memory Z04 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component Z06 provides power to the various components of the electronic device Z00. The power component Z06 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device Z00.
The multimedia component Z08 comprises a screen providing an output interface between the electronic device Z00 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component Z08 includes a front facing camera and/or a rear facing camera. When the electronic device Z00 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component Z10 is configured to output and/or input an audio signal. For example, the audio component Z10 includes a Microphone (MIC) configured to receive external audio signals when the electronic device Z00 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory Z04 or transmitted via the communication component Z16. In some embodiments, the audio component Z10 further includes a speaker for outputting audio signals.
The I/O interface Z12 provides an interface between the processing component Z02 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly Z14 includes one or more sensors for providing status assessment of various aspects to the electronic device Z00. For example, the sensor assembly Z14 may detect the open/closed state of the electronic device Z00, the relative positioning of the components, such as the display and keypad of the electronic device Z00, the sensor assembly Z14 may also detect a change in the position of one component of the electronic device Z00 or the electronic device Z00, the presence or absence of user contact with the electronic device Z00, the orientation or acceleration/deceleration of the electronic device Z00, and a change in the temperature of the electronic device Z00. The sensor assembly Z14 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly Z14 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly Z14 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component Z16 is configured to facilitate wired or wireless communication between the electronic device Z00 and other devices. The electronic device Z00 may have access to a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component Z16 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component Z16 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device Z00 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, for example a memory Z04 comprising instructions executable by the processor Z20 of the electronic device Z00 to perform the above method. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A motion capture method, comprising:
acquiring upper body motion information of a captured object in a world coordinate system based on an upper body image of the captured object acquired by an image acquisition device;
acquiring lower body motion information of the captured object according to the upper body motion information and a preset motion detection model, wherein the lower body motion information of the captured object comprises first three-dimensional joint point coordinates of the lower body of the captured object, the motion detection model is obtained by training a neural network through collected sample image motion data, and the sample image motion data is marked with upper body motion sample information, root node coordinates and orientation of a sample object and the lower body motion sample information of the sample object;
generating whole body motion data of the captured object based on the upper body motion information and the lower body motion information of the captured object.
2. The method according to claim 1, wherein the upper body image has a plurality of frame images; the acquiring the motion information of the lower body of the captured object according to the motion information of the upper body and a preset motion detection model comprises:
inputting the upper body motion information of the captured object in the current frame image and the lower body motion information of the captured object in the previous frame image of the current frame into the motion detection model, and obtaining the lower body motion information of the captured object in the current frame image.
3. The method of claim 2, wherein the motion detection model is obtained by the following training method:
acquiring sample image action data, wherein the sample image action data comprises a plurality of continuous frame sample images, each frame sample image is marked with upper body action sample information, root node coordinates, orientation and lower body action sample information of a sample object, the upper body action sample information comprises first three-dimensional sample coordinates of the upper body of the sample object, and the lower body action sample information comprises second three-dimensional sample coordinates of the lower body of the sample object;
for any frame sample image, inputting the upper body motion sample information, the root node coordinates and the orientation of the sample object marked in the frame sample image and the lower body sample motion information of the sample object marked in the previous frame sample image adjacent to the frame sample image into the neural network to obtain the lower body motion information of the any frame sample image output by the neural network;
determining a loss value according to the lower body motion information of the arbitrary frame sample image and the lower body sample motion information labeled in the arbitrary frame sample image;
and training the neural network according to the loss value to obtain the action detection model.
4. The method of claim 1, wherein the image-based acquisition of the upper body image of the captured subject by the image acquisition device comprises:
acquiring an upper body image of the captured object by adopting a single-view image acquisition device; or,
acquiring an upper body image of the captured object by adopting a double-view-angle image acquisition device which is arranged in the following way:
the double-view-angle image acquisition device comprises a first image acquisition device and a second image acquisition device, wherein a connecting line of the first image acquisition device and the second image acquisition device in the double-view-angle image acquisition device is a first edge of a placing angle, the first image acquisition device is placed at the first placing angle towards the captured object based on the first edge, the second image acquisition device is placed at a second placing angle towards the captured object based on the first edge, and the distance between the first image acquisition device and the second image acquisition device is set to be a preset distance.
5. The method of claim 4, further comprising, prior to acquiring the upper body image of the captured object with a dual-view image acquisition device:
acquiring action images of the calibration object synchronously acquired by the first image acquisition equipment and the second image acquisition equipment;
respectively identifying motion images at two visual angles, and acquiring position information of a preset joint point in the motion images at the corresponding visual angles;
mapping the position information of a preset joint point in the action image to a coordinate system of the image acquisition equipment at a corresponding visual angle to obtain an initial three-dimensional coordinate of the action image at the corresponding visual angle;
and acquiring external parameter initial values of the first image acquisition device and the second image acquisition device by aligning the initial three-dimensional coordinates of the action images under two visual angles so as to finish the calibration of the first image acquisition device and the second image acquisition device.
6. The method of claim 4, wherein the obtaining upper body motion information of the captured object in the world coordinate system comprises:
acquiring an upper body image of the captured object synchronously acquired by the first image acquisition device and the second image acquisition device;
respectively identifying upper body images at two visual angles, and acquiring first two-dimensional joint point coordinates in the upper body images at the corresponding visual angles, wherein the first two-dimensional joint point coordinates are two-dimensional joint point coordinates of the upper body of a captured object in the upper body images;
based on a perspective projection method, projecting the first two-dimensional joint point coordinate to a three-dimensional space where a corresponding image acquisition equipment coordinate system is located to obtain a second three-dimensional joint point coordinate under a corresponding visual angle, wherein the second three-dimensional joint point coordinate is a three-dimensional coordinate of the first two-dimensional joint point coordinate under the image acquisition equipment coordinate system corresponding to the corresponding visual angle;
and performing joint optimization solution according to a conversion relation between the coordinate system of the image acquisition equipment and the world coordinate system under two preset visual angles and the second three-dimensional joint point coordinate to obtain the upper body motion information of the captured object under the world coordinate system.
7. A motion capture device, comprising:
an upper body motion information acquisition module configured to execute acquiring upper body motion information of the captured object in a world coordinate system based on an upper body image of the captured object acquired by the image acquisition device;
a lower body motion information acquiring module configured to execute acquiring lower body motion information of the captured object according to the upper body motion information and a preset motion detection model, wherein the lower body motion information of the captured object comprises first three-dimensional joint point coordinates of the lower body of the captured object, and the motion detection model is obtained by training a neural network through collected sample image motion data, and the sample image motion data is marked with upper body motion sample information, root node coordinates and orientation of a sample object and the lower body motion sample information of the sample object;
a whole body motion data generation module configured to execute generating whole body motion data of the captured object according to the upper body motion information and the lower body motion information of the captured object.
8. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the motion capture method of any of claims 1 to 6.
9. A computer-readable storage medium whose instructions, when executed by a processor of an electronic device, enable the electronic device to perform the motion capture method of any of claims 1-6.
10. A computer program product comprising a computer program stored in a readable storage medium, from which at least one processor of a device reads and executes the computer program, causing the device to perform the motion capture method of any of claims 1 to 6.
CN202011507523.7A2020-12-182020-12-18Motion capture method, motion capture device, electronic equipment and storage mediumActiveCN112614214B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011507523.7ACN112614214B (en)2020-12-182020-12-18Motion capture method, motion capture device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011507523.7ACN112614214B (en)2020-12-182020-12-18Motion capture method, motion capture device, electronic equipment and storage medium

Publications (2)

Publication NumberPublication Date
CN112614214Atrue CN112614214A (en)2021-04-06
CN112614214B CN112614214B (en)2023-10-27

Family

ID=75240654

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011507523.7AActiveCN112614214B (en)2020-12-182020-12-18Motion capture method, motion capture device, electronic equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN112614214B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113706699A (en)*2021-10-272021-11-26腾讯科技(深圳)有限公司Data processing method and device, electronic equipment and computer readable storage medium
WO2023279705A1 (en)*2021-07-072023-01-12上海商汤智能科技有限公司Live streaming method, apparatus, and system, computer device, storage medium, and program
CN116229583A (en)*2023-05-062023-06-06北京百度网讯科技有限公司 Drive information generation, drive method, device, electronic device, and storage medium
CN119818058A (en)*2024-12-252025-04-15厦门医学院附属第二医院Human motion gesture tracking method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108229251A (en)*2016-12-152018-06-29中国移动通信有限公司研究院A kind of action identification method and device
CN108230429A (en)*2016-12-142018-06-29上海交通大学Real-time whole body posture reconstruction method based on head and two-hand positions and posture
CN110349180A (en)*2019-07-172019-10-18深圳前海达闼云端智能科技有限公司Human body joint point prediction method and device and motion type identification method and device
CN110472462A (en)*2018-05-112019-11-19北京三星通信技术研究有限公司Attitude estimation method, the processing method based on Attitude estimation and electronic equipment
CN111191533A (en)*2019-12-182020-05-22北京迈格威科技有限公司Pedestrian re-identification processing method and device, computer equipment and storage medium
US20200202538A1 (en)*2018-12-242020-06-25Industrial Technology Research InstituteMotion tracking system and method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108230429A (en)*2016-12-142018-06-29上海交通大学Real-time whole body posture reconstruction method based on head and two-hand positions and posture
CN108229251A (en)*2016-12-152018-06-29中国移动通信有限公司研究院A kind of action identification method and device
CN110472462A (en)*2018-05-112019-11-19北京三星通信技术研究有限公司Attitude estimation method, the processing method based on Attitude estimation and electronic equipment
US20200202538A1 (en)*2018-12-242020-06-25Industrial Technology Research InstituteMotion tracking system and method thereof
CN110349180A (en)*2019-07-172019-10-18深圳前海达闼云端智能科技有限公司Human body joint point prediction method and device and motion type identification method and device
CN111191533A (en)*2019-12-182020-05-22北京迈格威科技有限公司Pedestrian re-identification processing method and device, computer equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2023279705A1 (en)*2021-07-072023-01-12上海商汤智能科技有限公司Live streaming method, apparatus, and system, computer device, storage medium, and program
CN113706699A (en)*2021-10-272021-11-26腾讯科技(深圳)有限公司Data processing method and device, electronic equipment and computer readable storage medium
CN113706699B (en)*2021-10-272022-02-08腾讯科技(深圳)有限公司Data processing method and device, electronic equipment and computer readable storage medium
CN116229583A (en)*2023-05-062023-06-06北京百度网讯科技有限公司 Drive information generation, drive method, device, electronic device, and storage medium
CN116229583B (en)*2023-05-062023-08-04北京百度网讯科技有限公司Driving information generation method, driving device, electronic equipment and storage medium
CN119818058A (en)*2024-12-252025-04-15厦门医学院附属第二医院Human motion gesture tracking method and device

Also Published As

Publication numberPublication date
CN112614214B (en)2023-10-27

Similar Documents

PublicationPublication DateTitle
KR102194094B1 (en) Synthesis method, apparatus, program and recording medium of virtual and real objects
JP6587628B2 (en) Instruction generation method and apparatus
CN109410276B (en)Key point position determining method and device and electronic equipment
CN112287852B (en)Face image processing method, face image display method, face image processing device and face image display equipment
WO2020224479A1 (en)Method and apparatus for acquiring positions of target, and computer device and storage medium
CN114170302A (en) Camera external parameter calibration method, device, electronic device and storage medium
CN112614214B (en)Motion capture method, motion capture device, electronic equipment and storage medium
CN112115894B (en)Training method and device of hand key point detection model and electronic equipment
CN110599593B (en)Data synthesis method, device, equipment and storage medium
CN111666917A (en)Attitude detection and video processing method and device, electronic equipment and storage medium
CN105117111B (en)The rendering method and device of virtual reality interactive picture
EP4629179A1 (en)Method and apparatus for generating 3d hand model, and electronic device
JP7138680B2 (en) Synthesis method, device and storage medium for omnidirectional parallax view
CN114898039B (en) Three-dimensional model construction method, device, electronic equipment and storage medium
CN114821799A (en)Motion recognition method, device and equipment based on space-time graph convolutional network
CN114155175B (en)Image generation method, device, electronic equipment and storage medium
CN107239758B (en)Method and device for positioning key points of human face
EP3916683A1 (en)Method and apparatus for displaying an image, electronic device and computer-readable storage medium
CN114078279A (en)Motion capture method, motion capture device, electronic device and storage medium
CN114093020B (en) Motion capture method, device, electronic device and storage medium
CN112767453B (en)Face tracking method and device, electronic equipment and storage medium
CN111862288B (en) A posture rendering method, device and medium
CN111982293B (en)Body temperature measuring method and device, electronic equipment and storage medium
CN114612307A (en)Super-resolution processing method and device for video, electronic equipment and storage medium
CN116805285A (en)Image processing method and device, electronic equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp