Movatterモバイル変換


[0]ホーム

URL:


CN113449570A - Image processing method and device - Google Patents

Image processing method and device
Download PDF

Info

Publication number
CN113449570A
CN113449570ACN202010231605.7ACN202010231605ACN113449570ACN 113449570 ACN113449570 ACN 113449570ACN 202010231605 ACN202010231605 ACN 202010231605ACN 113449570 ACN113449570 ACN 113449570A
Authority
CN
China
Prior art keywords
human body
model
result
value
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010231605.7A
Other languages
Chinese (zh)
Inventor
甄海洋
周维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rainbow Software Co ltd
ArcSoft Corp Ltd
Original Assignee
Rainbow Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rainbow Software Co ltdfiledCriticalRainbow Software Co ltd
Priority to CN202010231605.7ApriorityCriticalpatent/CN113449570A/en
Priority to KR1020227037422Aprioritypatent/KR20220160066A/en
Priority to PCT/CN2021/080280prioritypatent/WO2021190321A1/en
Priority to JP2022558577Aprioritypatent/JP7448679B2/en
Publication of CN113449570ApublicationCriticalpatent/CN113449570A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种图像处理方法和装置。其中,该方法包括:获取原始图像;对原始图像进行人体检测,得到人体图像;利用训练好的第一模型对人体图像进行处理,得到人体图像的处理结果,其中,处理结果包括:二维关节点,三维关节点和蒙皮多人线性SMPL模型;根据人体图像的处理结果生成人体模型。本发明解决了相关技术中针对二维和三维关节点定位以及人体参数重建的识别准确度较低的技术问题。

Figure 202010231605

The invention discloses an image processing method and device. The method includes: acquiring an original image; performing human body detection on the original image to obtain a human body image; using the trained first model to process the human body image to obtain a processing result of the human body image, wherein the processing result includes: two-dimensional joints point, 3D joint points and skinned multi-person linear SMPL model; generate a human body model according to the processing results of human body images. The invention solves the technical problem of low recognition accuracy for two-dimensional and three-dimensional joint point positioning and human body parameter reconstruction in the related art.

Figure 202010231605

Description

Image processing method and device
Technical Field
The invention relates to the technical field of computer vision, in particular to an image processing method and device.
Background
Related human techniques in the industry today include human detection, two-dimensional and three-dimensional joint point localization, segmentation, and the like. Aiming at parts such as two-dimensional and three-dimensional joint point positioning, human parameter coefficient reconstruction and the like, the following scheme can be adopted at present: 1) firstly, the image is detected by a deep learning scheme, a human body area is intercepted after the detection is finished, two-dimensional joint points are estimated by a deep learning network, and then three-dimensional joint points, human body postures and shape parameters are estimated by the two-dimensional joint points. However, when a three-dimensional joint is estimated by using two-dimensional joints, there may be ambiguity of motion, for example, two-dimensional joints in the same state may correspond to three-dimensional joints in different positions, and the recognition accuracy of the three-dimensional joints depends on the recognition accuracy of the two-dimensional joints, resulting in low recognition accuracy of the three-dimensional joints. 2) Firstly, the image is detected by a deep learning scheme, a human body region is intercepted after the detection is finished, the three-dimensional joint points are directly predicted by a deep learning network, the three-dimensional joint points are changed into three-dimensional voxel grids, the possibility of each voxel grid of each joint is deduced, and therefore training and prediction are carried out. However, because the samples of the three-dimensional joint points are difficult to obtain, most training samples are collected in a laboratory environment, the robustness to outdoor scenes is not high, and the voxel grid is adopted for prediction, so that the calculation amount is large, and the real-time performance is low. 3) Firstly, detecting a human body, then carrying out human segmentation or human analysis on the detected picture, and then carrying out human model estimation by an optimization method by using segmentation and analysis results. However, the human body segmentation and analysis requirements are too high, and the result deviation can affect the human body reconstruction effect.
In view of the problems in the above solutions, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides an image processing method and device, which at least solve the technical problem of low identification accuracy for positioning two-dimensional and three-dimensional joint points and reconstructing human body parameters in the related technology.
According to an aspect of an embodiment of the present invention, there is provided an image processing method including: acquiring an original image; carrying out human body detection on the original image to obtain a human body image; processing the human body image by using the trained first model to obtain a processing result of the human body image, wherein the processing result comprises: two-dimensional joint points, three-dimensional joint points and a skin multi-person linear SMPL model; and generating a human body model according to the processing result of the human body image.
Optionally, the method further comprises: obtaining a plurality of groups of training samples, wherein each group of training samples comprises: the method comprises the following steps of (1) obtaining a human body image, first marking information of a two-dimensional joint point, second marking information of a three-dimensional joint point and a parameter value of an SMPL model; training a preset model by utilizing a plurality of groups of training samples, and acquiring a target loss value of the preset model; stopping training the preset model under the condition that the target loss value is smaller than the preset value, and determining the preset model as a first model; and under the condition that the target loss value is larger than the preset value, continuously training the preset model by using the multiple groups of training samples until the target loss value is smaller than the preset value.
Optionally, training the preset model by using multiple sets of training samples, and obtaining the target loss value of the preset model includes: inputting a plurality of groups of training samples into a preset model, and obtaining an output result of the preset model, wherein the output result comprises: a first result of a two-dimensional joint point, a second result of a three-dimensional joint point, and a third result of an SMPL model; obtaining a first loss value of the two-dimensional joint point based on the first mark information and the first result; obtaining a second loss value of the three-dimensional joint point based on the second marking information and the second result; obtaining a third loss value of the SMPL model based on the parameter value and the third result; and obtaining a target loss value based on the first loss value, the second loss value and the third loss value.
Optionally, the parameter value of the SMPL model is real data acquired by the acquisition device, or adjustment data obtained by adjusting the parameter value acquired by the acquisition device.
Optionally, deriving a third loss value for the SMPL model based on the parameter value and the third result comprises: obtaining a third loss value based on the parameter value and the third result under the condition that the parameter value is the real value acquired by the acquisition device; and under the condition that the parameter value is an adjustment value obtained by adjusting the parameter value acquired by the acquisition device, acquiring a three-dimensional joint point based on the parameter value, projecting the three-dimensional joint point onto a two-dimensional plane to obtain a two-dimensional joint point, acquiring a fourth loss value of the two-dimensional joint point based on the projected two-dimensional joint point and the first mark information, and determining the fourth loss value as a third loss value.
Optionally, the method further comprises: processing the parameter value of the third result by using a discriminator to obtain a classification result of the parameter value of the third result, wherein the classification result is used for representing whether the parameter value of the third result is a real value acquired by an acquisition device; and determining whether to stop training the preset model or not based on the classification result and the target loss value.
Optionally, the arbiter is trained with a generative confrontation network.
Optionally, the human body detection is performed on the original image, and obtaining the human body image includes: processing the original image by using the trained second model to obtain the position information of the human body in the original image; and cutting and normalizing the original image based on the position information to obtain a human body image.
Optionally, the first model employs an hourglass-type network structure or a feature map pyramid FPN network structure.
According to another aspect of the embodiments of the present invention, there is also provided an image processing apparatus including: the acquisition module is used for acquiring an original image; the detection module is used for carrying out human body detection on the original image to obtain a human body image; the processing module is used for processing the human body image by utilizing the trained first model to obtain a processing result of the human body image, wherein the processing result comprises: two-dimensional joint points, three-dimensional joint points and a skin multi-person linear SMPL model; and the generating module is used for generating a human body model according to the processing result of the human body image.
According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein when the program is executed, an apparatus in which the storage medium is located is controlled to execute the above-mentioned image processing method.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes the image processing method described above.
In the embodiment of the invention, after the original image is obtained, firstly, the original image is subjected to human body detection to obtain the human body image, and then the trained first model is used for processing the human body image to obtain the processing result of the human body image, so that the purposes of human body detection, two-dimensional and three-dimensional joint point positioning and SMPL model establishment are simultaneously realized, and the human body model can be further generated. It is easy to notice that, because a two-dimensional joint point, a three-dimensional joint point and an SMPL model can be obtained simultaneously by using one model, the three-dimensional joint point is not required to be estimated through the two-dimensional joint point, thereby achieving the technical effect of improving the image identification accuracy, and further solving the technical problem of lower identification accuracy for positioning the two-dimensional and three-dimensional joint points and reconstructing human body parameters in the related technology.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of an image processing method according to an embodiment of the invention;
FIG. 2 is a schematic illustration of an alternative human body image according to an embodiment of the invention;
FIG. 3 is a schematic illustration of an alternative mannequin according to an embodiment of the present invention;
FIG. 4a is a schematic illustration of an alternative average shaped mannequin in accordance with an embodiment of the present invention;
FIG. 4b is a schematic diagram of an alternative human model generated with shape parameters added thereto according to an embodiment of the present invention;
FIG. 4c is a schematic diagram of an alternative human model generated after adding shape parameters and pose parameters according to an embodiment of the invention;
FIG. 4d is a schematic diagram of an alternative human model generated from detected human motion according to an embodiment of the invention;
FIG. 5 is a flow diagram of an alternative image processing method according to an embodiment of the invention;
fig. 6 is a schematic diagram of an alternative GAN network in accordance with an embodiment of the present invention; and
fig. 7 is a schematic diagram of an image processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
According to an embodiment of the present invention, there is provided an image processing method, it should be noted that the steps shown in the flowchart of the drawings may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from that here.
Fig. 1 is a flowchart of an image processing method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:
step S102, obtaining an original image;
specifically, the original image may be an image obtained by capturing input video stream data, or may be an image directly obtained, where the original image includes a human body.
Step S104, carrying out human body detection on the original image to obtain a human body image;
specifically, the human body image may be a minimum image extracted from the original image and including a complete human body region, as shown in fig. 2.
In an alternative embodiment, the human body detection can be performed by using a deep learning model, such as fast Region Convolutional Neural Networks (fast Region Convolutional Neural Networks), yolo (young Only Look one), ssd (single Shot detector), and other detection frames and their variants. As known to those skilled in the art, under different devices and application scenarios, different detection frameworks can be selected to quickly and accurately implement human body detection to obtain a human body image.
Optionally, the human body detection is performed on the original image, and obtaining the human body image includes: processing the original image by using the trained deep learning model to obtain the position information of the human body in the original image; and cutting and normalizing the original image based on the position information to obtain a human body image. The human body position in the human body image can be represented by a minimum bounding rectangle containing a complete human body region in the original image, and is expressed in a form of two-dimensional coordinates (left, top, bottom, right).
Step S106, processing the human body image by using the trained first model to obtain a processing result of the human body image, wherein the processing result comprises: two-dimensional joint points, three-dimensional joint points, and SMPL (Skinned Multi-Person Linear) models.
Alternatively, the first model may adopt an hourglass network structure or an FPN (Feature map Pyramid) network structure. For example, when the input is a w × h image, the output feature map may be a w × h or w/4 × h/4 image.
Specifically, the above-described joint point may be a position coordinate of each joint on the human body, such as a wrist, an elbow, or the like, as shown in fig. 2.
The two-dimensional joint points can be expressed in the form of a thermodynamic diagram (Heat Map) or in the form of a coordinate vector. In the form of a thermal map, each joint point can be represented as a feature map, and assuming that the input human body image is a w × h image, the output feature map is an image with the same size or scaled in equal proportion, the value of the feature map at the position of the joint point is 1, and the values of the feature maps at other positions are 0. In one example, when there are 16 two-dimensional joint points of the human body, 16 maps of w h or w/2 h/2 or less may be used to represent the two-dimensional joint points of the human body.
The three-dimensional joint point can also have two expression modes of thermodynamic diagram and coordinate vector, wherein for the thermodynamic diagram mode, compared with the two-dimensional joint point, the three-dimensional joint point increases the z-axis information on a three-dimensional space and diffuses the thermodynamic diagram into a cuboid.
In an alternative embodiment, the human body image may be processed by using the first model to obtain a parameter value of the SMPL model; and then obtaining two-dimensional joint points or three-dimensional joint points based on the parameter values.
And step S108, generating a human body model according to the processing result of the human body image.
As shown in fig. 3, the SMPL model may include a shape (shape) parameter and a pose (position) parameter, and the human body model generated from the shape parameter and the pose parameter may include a plurality of vertices and three-dimensional joint points, each of which is a three-dimensional vector including (x, y, z) coordinates. Fig. 4a to 4c show a process of generating a human body model based on a shape parameter and an orientation parameter, wherein fig. 4a shows a human body model of an average shape, fig. 4b shows a human body model generated by adding a shape parameter to the average shape, and fig. 4c shows a human body model generated by adding a shape parameter and an orientation parameter to the average shape. Fig. 4d shows a human model generated from the detected motion of the human body, in addition to the human model generated in fig. 4 c. As can be seen by comparing fig. 4b and 4c, the difference between the two is not very large, and therefore, in some applications, human modeling can be achieved by generating a human model from only the shape parameters.
According to the embodiment of the invention, after the original image is obtained, firstly, the original image is subjected to human body detection to obtain the human body image, and then the trained first model is used for processing the human body image to obtain the processing result of the human body image, so that the purposes of human body detection, two-dimensional and three-dimensional joint point positioning and SMPL model establishment are simultaneously realized, and the human body model can be further generated. It is easy to notice that, because a two-dimensional joint point, a three-dimensional joint point and an SMPL model can be obtained simultaneously by using one model, the three-dimensional joint point is not required to be estimated through the two-dimensional joint point, thereby achieving the technical effect of improving the image identification accuracy, and further solving the technical problem of lower identification accuracy for positioning two-dimensional and three-dimensional joint points and reconstructing human parameters in the related technology.
In a first application scenario, a human body motion can be detected in real time to drive a human body animation (AVATAR) model, for example, the human body motion is captured based on two-dimensional joint points and three-dimensional joint points, so that the human body animation model performs corresponding motion along with the human body motion, and interactive interaction is realized.
In the second application scenario, the purpose of editing the human body, such as slimming, can be achieved according to the two-dimensional joint points and the three-dimensional joint points in the processing result, for example, processing image pixels at corresponding positions of an arm, a leg, and a body on the human body image, and achieving image processing effects of slimming an arm, a leg, and a waist.
Optionally, in the above embodiment of the present invention, the image processing method further includes: obtaining a plurality of groups of training samples, wherein each group of training samples comprises: the method comprises the following steps of (1) obtaining a human body image, first marking information of a two-dimensional joint point, second marking information of a three-dimensional joint point and a parameter value of an SMPL model; training a preset model by utilizing a plurality of groups of training samples, and acquiring a target loss value of the preset model; stopping training the preset model under the condition that the target loss value is smaller than the preset value, and determining the preset model as a first model; and under the condition that the target loss value is larger than the preset value, continuously training the preset model by using the multiple groups of training samples until the target loss value is smaller than the preset value. The smaller the target loss value is, the higher the recognition accuracy is, the preset value can be set in advance according to the requirements of image recognition accuracy and efficiency, and whether the model is trained can be determined through the preset value.
Optionally, training the preset model by using multiple sets of training samples, and obtaining the target loss value of the preset model includes: inputting a plurality of groups of training samples into a preset model, and obtaining an output result of the preset model, wherein the output result comprises: a first result of a two-dimensional joint point, a second result of a three-dimensional joint point, and a third result of an SMPL model; obtaining a first loss value of the two-dimensional joint point based on the first mark information and the first result; obtaining a second loss value of the three-dimensional joint point based on the second marking information and the second result; obtaining a third loss value of the SMPL model based on the parameter value and the third result; and obtaining a target loss value based on the first loss value, the second loss value and the third loss value.
Optionally, in the foregoing embodiment of the present invention, the image processing method further includes labeling the training sample with first label information of a two-dimensional joint point and second label information of a three-dimensional joint point.
In an alternative embodiment, for a two-dimensional joint point, the first loss value may be obtained based on a predicted thermodynamic diagram (i.e., the first result) and a thermodynamic diagram of the label (i.e., the first label information), or based on a predicted coordinate vector (i.e., the first result) and a coordinate vector of the label (i.e., the first label information), or based on a combined information of the thermodynamic diagram and the coordinate vector.
For a three-dimensional joint point, likewise, the second loss value may be obtained based on the predicted thermodynamic diagram (i.e., the second result) and the thermodynamic diagram of the label tag (i.e., the second label information), or based on the predicted coordinate vector (i.e., the second result) and the coordinate vector of the label tag (i.e., the second label information), or based on the integrated information of the thermodynamic diagram and the coordinate vector.
Compared with a thermodynamic diagram mode, the coordinate vector mode is more convenient to calculate.
Alternatively, the parameter value of the SMPL model may be a real value acquired by the acquisition device, or an adjusted value obtained by adjusting the parameter value acquired by the acquisition device. In an alternative embodiment, the parameter values of the SMPL model may be predicted by true values that are weighted more heavily and adjusted values that are weighted less heavily.
Specifically, the above-mentioned acquisition device may be a camera or a sensor disposed at a plurality of fixed positions in a laboratory environment or an outdoor environment.
Because only the data collected in the laboratory environment can obtain accurate and real parameter values of the SMPL model, the data collected in the outdoor environment has no way to obtain accurate parameter values of the SMPL model. Therefore, in actual calculations, the third loss value may be calculated in different ways for the SMPL model based on the type of parameter value. Optionally, when the parameter value is the real value acquired by the acquisition device, a direct regression mode may be adopted to calculate a third loss value, that is, the third loss value is obtained based on the parameter value and the third result; when the parameter value is an adjustment value obtained by adjusting the parameter value acquired by the acquisition device, a three-dimensional joint point can be obtained according to the parameter value of the SMPL model, the three-dimensional joint point is projected onto a two-dimensional plane to obtain a two-dimensional joint point, a fourth loss value of the two-dimensional joint point is calculated based on the projected two-dimensional joint point and the first mark information, and the loss value is used as a third loss value and is transmitted back to a parameter space of the SMPL model to update the parameter value of the SMPL model.
In the training process, the target loss value is the synthesis of the first loss value, the second loss value and the third loss value, and can be obtained by calculation in a manner of solving a weighted sum.
In an optional embodiment, in the model training process, parameters of the two-dimensional joint points, the three-dimensional joint points and the SMPL model may be learned at the same time, and the model may be generated by performing regression as a whole, and in addition, as shown in fig. 5, an SMPL model discriminator may be used to discriminate the parameter value of the SMPL model, and determine whether the parameter value is a value randomly generated through a network or an acquired real value, thereby improving the reality of the model effect. Specifically, the SMPL model discriminator processes the parameter value of the third result (i.e., the SMPL model output by the preset model) to obtain a classification result of the parameter value of the third result, where the classification result is used to characterize whether the parameter value of the third result is a real value acquired by the acquisition device; and determining whether to stop training the preset model or not based on the classification result and the target loss value. Among them, a D-discriminator in a countermeasure generation Network (GAN) can be adopted as the SMPL model discriminator.
In an alternative embodiment, since the data collected in the outdoor environment has no way to obtain the precise parameter values of the SMPL model, so that it may generate abnormal parameter values, in the embodiment of the present invention, a GAN network is added to train the SMPL model discriminator (i.e. D discriminator), as shown in fig. 6, the GAN network includes a G generator and a D discriminator, the D discriminator is a binary network, receives the randomly generated value from the G generator and the collected real value, and outputs a label indicating the authenticity of the data, for example, when the real value is received, the output is close to a positive label (usually, the positive label is set to 1), when the randomly generated value from the G generator is received, the output is close to a negative label (usually, the negative label is set to 0), and the difference between the randomly generated value and the real value is described by the D discriminator, and then, updating the weight of the numerical value randomly generated by the G generator according to the difference, so that the numerical value randomly generated by the G generator is closer to the real numerical value, and the capability of distinguishing the randomly generated numerical value from the real numerical value by the D discriminator is improved.
Example 2
According to an embodiment of the present invention, an image processing apparatus is provided, which can execute the image processing method described in embodiment 1, and preferred embodiments and application scenarios in this embodiment are the same as those in embodiment 1, and are not described herein again.
Fig. 7 is a schematic diagram of an image processing apparatus according to an embodiment of the present invention, as shown in fig. 7, the apparatus including:
an obtaining module 72, configured to obtain an original image;
the detection module 74 is used for performing human body detection on the original image to obtain a human body image;
a processing module 76, configured to process the human body image by using the trained first model to obtain a processing result of the human body image, where the processing result includes: parameter values of two-dimensional joint points, three-dimensional joint points and the SMPL model;
and a generating module 78, configured to generate a human body model according to the processing result of the human body image.
Optionally, in the above embodiment of the present invention, the apparatus further includes: the obtaining module is further configured to obtain a plurality of sets of training samples, where each set of training samples includes: the method comprises the following steps of (1) obtaining a human body image, first marking information of a two-dimensional joint point, second marking information of a three-dimensional joint point and a parameter value of an SMPL model; the training module is used for training the preset model by utilizing a plurality of groups of training samples and acquiring a target loss value of the preset model; the training stopping module is used for stopping training the preset model and determining the preset model as a first model under the condition that the target loss value is smaller than the preset value; the training module is further used for continuing to train the preset model by using the multiple groups of training samples under the condition that the target loss value is larger than the preset value until the target loss value is smaller than the preset value.
Optionally, the training module comprises: the obtaining unit is used for inputting a plurality of groups of training samples into a preset model and obtaining an output result of the preset model, wherein the output result comprises: a first result of a two-dimensional joint point, a second result of a three-dimensional joint point, and a third result of an SMPL model; the first processing unit is used for obtaining a first loss value of the two-dimensional joint point based on the first mark information and the first result; the second processing unit is used for obtaining a second loss value of the three-dimensional joint point based on the second marking information and the second result; a third processing unit, configured to obtain a third loss value of the SMPL model based on the parameter value and the third result; and the fourth processing unit is used for obtaining a target loss value based on the first loss value, the second loss value and the third loss value.
Optionally, the third processing unit is further configured to obtain a third loss value based on the parameter value and the third result when the parameter value is acquired by the acquisition device; and under the condition that the parameter value is obtained by adjusting the parameter value acquired by the acquisition device, acquiring a three-dimensional joint point based on the parameter value, projecting the three-dimensional joint point onto a two-dimensional plane to obtain a two-dimensional joint point, acquiring a fourth loss value of the two-dimensional joint point based on the projected two-dimensional joint point and the first mark information, and determining the fourth loss value as a third loss value.
Optionally, the apparatus further comprises: the processing module is further used for processing the parameter value of the third result by using the discriminator to obtain a classification result of the parameter value of the third result, wherein the classification result is used for representing whether the parameter value of the third result is a real value acquired by the acquisition device; the training stopping module is further used for determining whether to stop training the preset model or not based on the classification result and the target loss value.
Optionally, the training module is further configured to train the discriminator with the generation of the countermeasure network.
Optionally, the detection module comprises: the detection unit is used for processing the original image by using the trained second model to obtain the position information of the human body in the original image; and the fifth processing unit is used for cutting and normalizing the original image based on the position information to obtain a human body image.
Example 3
According to an embodiment of the present invention, there is provided a storage medium including a stored program, wherein an apparatus in which the storage medium is located is controlled to execute the image processing method in embodiment 1 described above when the program is executed.
Example 4
According to an embodiment of the present invention, there is provided a processor configured to execute a program, where the program executes the image processing method in embodiment 1.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (12)

Translated fromChinese
1.一种图像处理方法,其特征在于,包括:1. an image processing method, is characterized in that, comprises:获取原始图像;get the original image;对所述原始图像进行人体检测,得到人体图像;performing human body detection on the original image to obtain a human body image;利用训练好的第一模型对所述人体图像进行处理,得到所述人体图像的处理结果,其中,所述处理结果包括:二维关节点,三维关节点和蒙皮多人线性SMPL模型;The human body image is processed by using the trained first model to obtain a processing result of the human body image, wherein the processing result includes: two-dimensional joint points, three-dimensional joint points and a skinned multi-person linear SMPL model;根据所述人体图像的处理结果生成人体模型。A human body model is generated according to the processing result of the human body image.2.根据权利要求1所述的方法,其特征在于,所述方法还包括:2. The method according to claim 1, wherein the method further comprises:获取多组训练样本,其中,每组训练样本中包含:人体图像,二维关节点的第一标记信息,三维关节点的第二标记信息,以及SMPL模型的参数值;Acquiring multiple sets of training samples, wherein each set of training samples includes: a human body image, the first label information of the 2D joint points, the second label information of the 3D joint points, and the parameter values of the SMPL model;利用所述多组训练样本对预设模型进行训练,并获取所述预设模型的目标损失值;Use the multiple sets of training samples to train the preset model, and obtain the target loss value of the preset model;在所述目标损失值小于预设值的情况下,停止对所述预设模型进行训练,并确定所述预设模型为所述第一模型;In the case that the target loss value is less than a preset value, stop training the preset model, and determine that the preset model is the first model;在所述目标损失值大于所述预设值的情况下,继续利用所述多组训练样本对所述预设模型进行训练,直至所述目标损失值小于所述预设值。In the case that the target loss value is greater than the preset value, continue to use the multiple sets of training samples to train the preset model until the target loss value is smaller than the preset value.3.根据权利要求2所述的方法,其特征在于,利用所述多组训练样本对预设模型进行训练,并获取所述预设模型的目标损失值包括:3. The method according to claim 2, wherein, using the multiple groups of training samples to train a preset model, and obtaining the target loss value of the preset model comprises:将所述多组训练样本输入所述预设模型,并获取所述预设模型的输出结果,其中,所述输出结果包括:所述二维关节点的第一结果,所述三维关节点的第二结果和SMPL模型的第三结果;Input the multiple sets of training samples into the preset model, and obtain the output result of the preset model, wherein the output result includes: the first result of the two-dimensional joint point, the first result of the three-dimensional joint point The second result and the third result of the SMPL model;基于所述第一标记信息和所述第一结果,得到所述二维关节点的第一损失值;obtaining a first loss value of the two-dimensional joint point based on the first marker information and the first result;基于所述第二标记信息和所述第二结果,得到所述三维关节点的第二损失值;obtaining a second loss value of the three-dimensional joint point based on the second marker information and the second result;基于所述参数值和所述第三结果,得到所述SMPL模型的第三损失值;obtaining a third loss value of the SMPL model based on the parameter value and the third result;基于所述第一损失值、所述第二损失值和所述第三损失值,得到所述目标损失值。The target loss value is obtained based on the first loss value, the second loss value and the third loss value.4.根据权利要求3所述的方法,其特征在于,所述SMPL模型的参数值是通过采集装置采集到的真实数值,或,通过对所述采集装置采集到的参数值进行调整得到的调整数值。4. The method according to claim 3, wherein the parameter value of the SMPL model is a real value collected by a collection device, or an adjustment obtained by adjusting the parameter value collected by the collection device numerical value.5.根据权利要求4所述的方法,其特征在于,基于所述参数值和所述第三结果,得到所述SMPL模型的第三损失值包括:5. The method according to claim 4, wherein, based on the parameter value and the third result, obtaining the third loss value of the SMPL model comprises:在所述参数值是通过所述采集装置采集到的真实数值的情况下,基于所述参数值和所述第三结果,得到所述第三损失值;In the case that the parameter value is a real value collected by the collecting device, obtain the third loss value based on the parameter value and the third result;在所述参数值是通过对所述采集装置采集到的参数值进行调整得到的调整数值的情况下,基于所述参数值获得三维关节点,将所述三维关节点投射到二维平面上获得二维关节点,基于投射的二维关节点和所述第一标记信息,得到所述二维关节点的第四损失值,并将所述第四损失值确定为所述第三损失值。In the case where the parameter value is an adjustment value obtained by adjusting the parameter value collected by the collection device, a three-dimensional joint point is obtained based on the parameter value, and the three-dimensional joint point is projected onto a two-dimensional plane to obtain For the two-dimensional joint point, based on the projected two-dimensional joint point and the first label information, a fourth loss value of the two-dimensional joint point is obtained, and the fourth loss value is determined as the third loss value.6.根据权利要求3所述的方法,其特征在于,所述方法还包括:6. The method according to claim 3, wherein the method further comprises:利用判别器对所述第三结果的参数值进行处理,得到所述第三结果的参数值的分类结果,其中,所述分类结果用于表征所述第三结果的参数值是否是通过采集装置采集到的真实数值;The parameter value of the third result is processed by a discriminator, and a classification result of the parameter value of the third result is obtained, wherein the classification result is used to indicate whether the parameter value of the third result is collected by the collecting device The real value collected;基于所述分类结果和所述目标损失值,确定是否停止对所述预设模型进行训练。Based on the classification result and the target loss value, it is determined whether to stop training the preset model.7.根据权利要求6所述的方法,其特征在于,利用生成对抗网络对所述判别器进行训练。7. The method of claim 6, wherein the discriminator is trained using a generative adversarial network.8.根据权利要求1所述的方法,其特征在于,对所述原始图像进行人体检测,得到人体图像包括:8. The method according to claim 1, wherein, performing human body detection on the original image to obtain a human body image comprising:利用训练好的第二模型对所述原始图像进行处理,得到人体在所述原始图像中的位置信息;Use the trained second model to process the original image to obtain the position information of the human body in the original image;基于所述位置信息对所述原始图像进行裁剪和归一化处理,得到所述人体图像。The original image is cropped and normalized based on the position information to obtain the human body image.9.根据权利要求1所述的方法,其特征在于,所述第一模型采用沙漏型网络结构或特征图金字塔FPN网络结构。9 . The method according to claim 1 , wherein the first model adopts an hourglass network structure or a feature map pyramid FPN network structure. 10 .10.一种图像处理装置,其特征在于,包括:10. An image processing device, comprising:获取模块,用于获取原始图像;The acquisition module is used to acquire the original image;检测模块,用于对所述原始图像进行人体检测,得到人体图像;a detection module for performing human body detection on the original image to obtain a human body image;处理模块,用于利用训练好的第一模型对所述人体图像进行处理,得到所述人体图像的处理结果,其中,所述处理结果包括:二维关节点,三维关节点和蒙皮多人线性SMPL模型;a processing module, configured to process the human body image by using the trained first model to obtain a processing result of the human body image, wherein the processing result includes: two-dimensional joint points, three-dimensional joint points and skinned multi-person Linear SMPL model;生成模块,用于根据所述人体图像的处理结果生成人体模型。The generating module is used for generating a human body model according to the processing result of the human body image.11.一种存储介质,其特征在于,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行权利要求1至9中任意一项所述的图像处理方法。11. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device where the storage medium is located is controlled to perform the image processing according to any one of claims 1 to 9 method.12.一种处理器,其特征在于,所述处理器用于运行程序,其中,所述程序运行时执行权利要求1至9中任意一项所述的图像处理方法。12 . A processor, wherein the processor is configured to run a program, wherein the image processing method according to any one of claims 1 to 9 is executed when the program is run.
CN202010231605.7A2020-03-272020-03-27Image processing method and devicePendingCN113449570A (en)

Priority Applications (4)

Application NumberPriority DateFiling DateTitle
CN202010231605.7ACN113449570A (en)2020-03-272020-03-27Image processing method and device
KR1020227037422AKR20220160066A (en)2020-03-272021-03-11 Image processing method and apparatus
PCT/CN2021/080280WO2021190321A1 (en)2020-03-272021-03-11Image processing method and device
JP2022558577AJP7448679B2 (en)2020-03-272021-03-11 Image processing method and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010231605.7ACN113449570A (en)2020-03-272020-03-27Image processing method and device

Publications (1)

Publication NumberPublication Date
CN113449570Atrue CN113449570A (en)2021-09-28

Family

ID=77808126

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010231605.7APendingCN113449570A (en)2020-03-272020-03-27Image processing method and device

Country Status (4)

CountryLink
JP (1)JP7448679B2 (en)
KR (1)KR20220160066A (en)
CN (1)CN113449570A (en)
WO (1)WO2021190321A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20230076966A (en)*2021-11-232023-06-01한국공학대학교산학협력단Method and apparatus for generating gesture data of human-friendly robot

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114140515B (en)*2021-11-292024-07-16新拓三维技术(深圳)有限公司Three-dimensional human body dimension measuring method, system and computer readable storage medium
CN114299204B (en)*2021-12-222023-04-18深圳市海清视讯科技有限公司Three-dimensional cartoon character model generation method and device
CN114157526B (en)*2021-12-232022-08-12广州新华学院Digital image recognition-based home security remote monitoring method and device
CN114445653A (en)*2021-12-242022-05-06天翼云科技有限公司Abnormal target prediction method, device, medium and electronic equipment
CN114612612B (en)*2022-03-042025-05-06Oppo广东移动通信有限公司 Human body posture estimation method and device, computer readable medium, and electronic device
CN114863013B (en)*2022-03-282025-09-26网易(杭州)网络有限公司 A method for reconstructing a three-dimensional model of a target object
CN115131817B (en)*2022-04-292025-05-09腾讯科技(深圳)有限公司 A method, device, equipment and storage medium for multi-person posture estimation
CN115482557B (en)*2022-10-092023-11-17中国电信股份有限公司Human body image generation method, system, equipment and storage medium
WO2024124485A1 (en)*2022-12-152024-06-20中国科学院深圳先进技术研究院Three-dimensional human body reconstruction method and apparatus, device, and storage medium
CN115775300B (en)*2022-12-232024-06-11北京百度网讯科技有限公司Human body model reconstruction method, human body model reconstruction training method and device
CN116863078B (en)*2023-07-142025-04-04中国电信股份有限公司技术创新中心 Three-dimensional human body model reconstruction method, device, electronic device and readable medium
CN117351432B (en)*2023-12-042024-02-23环球数科集团有限公司 A training system for multi-object recognition models for tourists in scenic spots
CN117745978B (en)*2024-02-202024-04-30四川大学华西医院 A simulation quality control method, device and medium based on human body three-dimensional reconstruction algorithm

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108053469A (en)*2017-12-262018-05-18清华大学Complicated dynamic scene human body three-dimensional method for reconstructing and device under various visual angles camera
CN109285215A (en)*2018-08-282019-01-29腾讯科技(深圳)有限公司A kind of human 3d model method for reconstructing, device and storage medium
CN109615582A (en)*2018-11-302019-04-12北京工业大学 A face image super-resolution reconstruction method based on attribute description generative adversarial network
CN109859296A (en)*2019-02-012019-06-07腾讯科技(深圳)有限公司Training method, server and the storage medium of SMPL parametric prediction model
CN110020633A (en)*2019-04-122019-07-16腾讯科技(深圳)有限公司Training method, image-recognizing method and the device of gesture recognition model
CN110298916A (en)*2019-06-212019-10-01湖南大学A kind of 3 D human body method for reconstructing based on synthesis depth data

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20140204013A1 (en)*2013-01-182014-07-24Microsoft CorporationPart and state detection for gesture recognition
JP6373026B2 (en)*2014-03-202018-08-15株式会社東芝 Image processing apparatus, image processing system, image processing method, and program
JP6912215B2 (en)*2017-02-092021-08-04国立大学法人東海国立大学機構 Detection method and detection program to detect the posture of an object
CN108345869B (en)*2018-03-092022-04-08南京理工大学 Driver gesture recognition method based on depth image and virtual data
JP2020030613A (en)*2018-08-222020-02-27富士通株式会社Information processing device, data calculating program and data calculating method
CN109702741B (en)*2018-12-262020-12-18中国科学院电子学研究所 Robotic arm visual grasping system and method based on self-supervised learning neural network
CN110188598B (en)*2019-04-132022-07-05大连理工大学 A Real-time Hand Pose Estimation Method Based on MobileNet-v2

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108053469A (en)*2017-12-262018-05-18清华大学Complicated dynamic scene human body three-dimensional method for reconstructing and device under various visual angles camera
CN109285215A (en)*2018-08-282019-01-29腾讯科技(深圳)有限公司A kind of human 3d model method for reconstructing, device and storage medium
CN109615582A (en)*2018-11-302019-04-12北京工业大学 A face image super-resolution reconstruction method based on attribute description generative adversarial network
CN109859296A (en)*2019-02-012019-06-07腾讯科技(深圳)有限公司Training method, server and the storage medium of SMPL parametric prediction model
CN110020633A (en)*2019-04-122019-07-16腾讯科技(深圳)有限公司Training method, image-recognizing method and the device of gesture recognition model
CN110298916A (en)*2019-06-212019-10-01湖南大学A kind of 3 D human body method for reconstructing based on synthesis depth data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A. KANAZAWA ET AL.: "End-to-End Recovery of Human Shape and Pose", 《 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》*
MATTHEW LOPER ET AL.: "SMLP: a skinned multi-person linear model", 《ACM TRANSACTIONS ON GRAPHICS》*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20230076966A (en)*2021-11-232023-06-01한국공학대학교산학협력단Method and apparatus for generating gesture data of human-friendly robot
KR102669074B1 (en)2021-11-232024-05-24한국공학대학교산학협력단Method and apparatus for generating gesture data of human-friendly robot

Also Published As

Publication numberPublication date
WO2021190321A1 (en)2021-09-30
KR20220160066A (en)2022-12-05
JP2023519012A (en)2023-05-09
JP7448679B2 (en)2024-03-12

Similar Documents

PublicationPublication DateTitle
CN113449570A (en)Image processing method and device
Patel et al.AGORA: Avatars in geography optimized for regression analysis
CN113706699B (en)Data processing method and device, electronic equipment and computer readable storage medium
Balan et al.Detailed human shape and pose from images
Ballan et al.Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes
Hu et al.3DBodyNet: Fast reconstruction of 3D animatable human body shape from a single commodity depth camera
JP4951498B2 (en) Face image recognition device, face image recognition method, face image recognition program, and recording medium recording the program
Liu et al.4d human body capture from egocentric video via 3d scene grounding
CN112401369B (en)Body parameter measurement method, system, device, chip and medium based on human body reconstruction
US10776978B2 (en)Method for the automated identification of real world objects
JP2019096113A (en)Processing device, method and program relating to keypoint data
CN110555908A (en)three-dimensional reconstruction method based on indoor moving target background restoration
WO2019225547A1 (en)Object tracking device, object tracking method, and object tracking program
JP2014085933A (en)Three-dimensional posture estimation apparatus, three-dimensional posture estimation method, and program
Shen et al.Exemplar-based human action pose correction
JP4761670B2 (en) Moving stereo model generation apparatus and method
Wang et al.Digital twin: Acquiring high-fidelity 3D avatar from a single image
JP7318814B2 (en) DATA GENERATION METHOD, DATA GENERATION PROGRAM AND INFORMATION PROCESSING DEVICE
Lu et al.Parametric shape estimation of human body under wide clothing
Darujati et al.Facial motion capture with 3D active appearance models
DaiModeling and simulation of athlete’s error motion recognition based on computer vision
Liu et al.Learning 3-D Human Pose Estimation from Catadioptric Videos.
Wang et al.Im2fit: Fast 3d model fitting and anthropometrics using single consumer depth camera and synthetic data
Cordea et al.3-D head pose recovery for interactive virtual reality avatars
Pan et al.LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-timestamp 3D Human Pose Estimation

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp