Movatterモバイル変換


[0]ホーム

URL:


CN113297973B - Key point detection method, device, equipment and computer readable medium - Google Patents

Key point detection method, device, equipment and computer readable medium
Download PDF

Info

Publication number
CN113297973B
CN113297973BCN202110570018.5ACN202110570018ACN113297973BCN 113297973 BCN113297973 BCN 113297973BCN 202110570018 ACN202110570018 ACN 202110570018ACN 113297973 BCN113297973 BCN 113297973B
Authority
CN
China
Prior art keywords
key point
image
heat map
coordinate
regression network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110570018.5A
Other languages
Chinese (zh)
Other versions
CN113297973A (en
Inventor
蔚栋
安山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co LtdfiledCriticalBeijing Jingdong Century Trading Co Ltd
Priority to CN202110570018.5ApriorityCriticalpatent/CN113297973B/en
Publication of CN113297973ApublicationCriticalpatent/CN113297973A/en
Application grantedgrantedCritical
Publication of CN113297973BpublicationCriticalpatent/CN113297973B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本公开的实施例公开了关键点检测方法、装置、电子设备和计算机可读介质。该方法的一具体实施方式包括:对待检测图像进行特征提取,得到图像特征;将图像特征输入预先训练的热力图回归网络,得到关键点热力图;将热力图回归网络的中间层的输出结果输入预先训练的坐标回归网络,得到关键点坐标集合;基于关键点热力图和关键点坐标集合,生成待检测图像对应的关键点位置信息。该实施方式实现了同时满足准确率和关键点之间的关联关系的要求。

The embodiments of the present disclosure disclose a key point detection method, device, electronic device and computer-readable medium. A specific implementation of the method includes: extracting features from the image to be detected to obtain image features; inputting the image features into a pre-trained heat map regression network to obtain a key point heat map; inputting the output result of the intermediate layer of the heat map regression network into a pre-trained coordinate regression network to obtain a key point coordinate set; based on the key point heat map and the key point coordinate set, generating key point position information corresponding to the image to be detected. This implementation achieves the requirements of simultaneously meeting the accuracy rate and the correlation relationship between key points.

Description

Key point detection method, device, equipment and computer readable medium
Technical Field
Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method, an apparatus, a device, and a computer readable medium for detecting a key point.
Background
Keypoint detection is widely used in a variety of computer vision tasks. Related keypoint detection techniques include coordinate regression and thermodynamic diagram regression.
However, when the above manner is adopted for the key point detection, there are often the following technical problems:
When a coordinate regression method is adopted, the obtained coordinates of the key points are not accurate enough. When the thermodynamic diagram regression method is adopted, the association relationship between the key points is weak. That is, the related detection method cannot meet the requirements of the association relationship between the accuracy and the key points at the same time.
Disclosure of Invention
The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Some embodiments of the present disclosure propose a keypoint detection method, apparatus, electronic device, and computer readable medium to solve one or more of the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide a method for detecting a keypoint, where the method includes extracting features of an image to be detected to obtain image features, inputting the image features into a pre-trained thermodynamic diagram regression network to obtain a thermodynamic diagram of the keypoint, inputting an output result of an intermediate layer of the thermodynamic diagram regression network into the pre-trained coordinate regression network to obtain a coordinate set of the keypoint, and generating position information of the keypoint corresponding to the image to be detected based on the thermodynamic diagram of the keypoint and the coordinate set of the keypoint.
In a second aspect, some embodiments of the present disclosure provide a keypoint detection apparatus, where the apparatus includes an extraction unit configured to perform feature extraction on an image to be detected to obtain an image feature, a thermodynamic diagram generating unit configured to input the image feature into a thermodynamic diagram regression network trained in advance to obtain a keypoint thermodynamic diagram, a coordinate generating unit configured to input an output result of an intermediate layer of the thermodynamic diagram regression network into the coordinate regression network trained in advance to obtain a set of keypoint coordinates, and a position information generating unit configured to generate keypoint position information corresponding to the image to be detected based on the thermodynamic diagram of the keypoint and the set of keypoint coordinates.
In a third aspect, some embodiments of the present disclosure provide an electronic device comprising one or more processors, and storage means having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.
In a fourth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.
The above embodiments of the present disclosure have the advantage of combining the advantages of both thermodynamic diagram regression networks and coordinate regression networks by the keypoint detection method of some embodiments of the present disclosure. Specifically, the coordinates of the key points predicted by the thermodynamic diagram regression network are accurate, and the association relationship between the key points predicted by the coordinate regression network is strong, so that the requirements of the accuracy and the association relationship between the key points can be met simultaneously.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
FIG. 1 is a schematic diagram of one application scenario of a keypoint detection method of some embodiments of the present disclosure;
FIG. 2 is a flow chart of some embodiments of a keypoint detection method in accordance with the present disclosure;
FIG. 3 illustrates hand keypoint detection results using thermodynamic regression networks;
FIG. 4 illustrates hand keypoint detection results using a coordinate regression network;
FIG. 5 is a flow chart of further embodiments of a keypoint detection method in accordance with the present disclosure;
FIG. 6 is a schematic structural view of some embodiments of a keypoint detection device according to the present disclosure;
Fig. 7 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 is a schematic diagram of an application scenario of a keypoint detection method of some embodiments of the present disclosure.
The execution subject of the keypoint detection method may be any computing device. The computing device may be hardware or software. When the computing device is hardware, the computing device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices listed above. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.
In the application scenario of fig. 1, the computing device may first input the image 101 to be detected into the feature extraction network 101 for feature extraction. On this basis, the resulting image features may be input into a pre-trained thermodynamic diagram regression network 103, resulting in a keypoint thermodynamic diagram 104. In addition, the computing device may input the output of the middle tier (third tier in fig. 1, for example) of thermodynamic diagram regression network 103 into pre-trained coordinate regression network 105, resulting in set of keypoint coordinates 106. The computing device may then generate keypoint location information 107 corresponding to the image to be detected based on the keypoint thermodynamic diagram 104 and the set of keypoint coordinates 106. For ease of illustration, the keypoint location information 107 may be visually displayed on the image to be detected, as shown at 108.
With continued reference to fig. 2, a flow 200 of some embodiments of a keypoint detection method according to the present disclosure is shown. The key point detection method comprises the following steps:
step 201, extracting features of the image to be detected to obtain image features.
In some embodiments, the execution body of the keypoint detection method may perform feature extraction on the image to be detected by using various feature extraction algorithms, so as to obtain image features. For example, the image to be detected may be input into a convolutional neural network to obtain the image features. For another example, image feature extraction may also be performed by algorithms such as color histograms, color correlograms, and the like. The image to be detected can be any image. For example, in the context of gesture recognition, the image to be detected may be an image currently captured by a camera. Of course, it may be an image obtained by preprocessing a captured image, or the like. As another example, it may be an image in a gallery specified by the user.
In some optional implementation manners of some embodiments, before extracting the features of the image to be detected to obtain the image features, the method may further include performing target portion detection on the original image to be detected to obtain an image area displaying the target portion, and scaling the image area to a target size to obtain the image to be detected. In practice, the content possibly displayed in the original image to be detected is more. For example, in a scene of hand keypoint detection, legs, bodies, heads, etc. may be displayed in addition to hands in the original image to be detected. Other content can interfere with the detection of keypoints at the target site. Therefore, the target portion can be detected first, and the image area displaying the target portion can be obtained. On the basis, in order to facilitate unified processing, the image area can be scaled to the target size, and then the image to be detected is obtained.
In some optional implementations of some embodiments, generating the keypoint location information corresponding to the image to be detected based on the keypoint thermodynamic diagram and the set of keypoint coordinates includes generating the keypoint location information corresponding to the original image to be detected based on the keypoint thermodynamic diagram and the set of keypoint coordinates. In these alternative implementations, since the image to be detected is obtained based on the original image to be detected, the key point position information corresponding to the original image to be detected can be generated as required.
In some alternative implementations of some embodiments, mapping the set of keypoint thermodynamic diagrams and keypoint coordinates to the image to be detected, respectively, resulting in a target image comprising the set of thermodynamic diagrams and the set of coordinate mapping keypoints, includes mapping the set of keypoint thermodynamic diagrams and keypoint coordinates to the original image to be detected, respectively, resulting in a target image comprising the set of thermodynamic diagrams and the set of coordinate mapping keypoints.
Step 202, inputting the image features into a pre-trained thermodynamic diagram regression network to obtain a key point thermodynamic diagram.
In some embodiments, the executing entity may input the image features into a pre-trained thermodynamic diagram regression network to obtain the keypoint thermodynamic diagram. The thermodynamic diagram regression network can be used for predicting a two-dimensional thermodynamic diagram, and the principle is that the coordinate position of the standard (groundTruth) of the key point is utilized to generate the thermodynamic diagram (heatmap) by utilizing a two-dimensional Gaussian function, and finally, the position coordinate with the highest activation value in the diagram is taken as the final key point coordinate. As an example, the thermodynamic diagram regression network may include a plurality (e.g., 3) of deconvolution layers and output layers. In addition, the thermodynamic diagram network may include a network structure such as a residual block, if necessary.
In practice, thermodynamic diagram regression network prediction has higher accuracy, but the association relation between key points cannot be predicted well. Fig. 3 illustrates hand keypoint detection results obtained using thermodynamic regression networks. In practice, in order to facilitate subsequent processing such as gesture detection by using the key point detection result, the key points are generally numbered. In this example, the 3 keypoints on the index finger are numbered 1-3 in turn, and the 3 keypoints on the middle finger are numbered 4-6 in turn, as shown at 301 in FIG. 3. However, as shown in fig. 3 at 302, the positions of the key points are changed in the number 2 and the number 4, and if the key points are sequentially connected in this order, it is possible to intuitively see that the connection line between the key points has changed significantly, although the positions of the key points are relatively accurate as a whole, as a result of the detection of the key points of the hand by using the thermodynamic regression network. And the connection lines between the key points may represent the association relationship between the key points. If gesture detection is performed subsequently, the association relationship between key points needs to be utilized. For example, it is determined whether the user is "bijean" and needs to be determined by the relative positional relationship between the links of the middle finger and index finger keypoints.
And 203, inputting the output result of the middle layer of the thermodynamic diagram regression network into a pre-trained coordinate regression network to obtain a key point coordinate set.
In some embodiments, the executing entity may input the output result of the middle layer of the thermodynamic diagram regression network into a pre-trained coordinate regression network to obtain the set of coordinates of the key points. Wherein, the output result of any middle layer can be selected to be input into the coordinate regression network. As an example, the network structure of the coordinate regression network may include a plurality of convolution layers and a pooling layer, a reorganization (Reshape) layer. In practice, the initial coordinate regression network and thermodynamic diagram regression network may be trained in advance by some machine learning methods using a training sample set. For example, the initial coordinate regression network and thermodynamic diagram regression network may be trained using a back-propagation, random gradient descent method, and the coordinate regression network is obtained if the training stop condition is satisfied. The initial coordinate regression network and the thermodynamic diagram regression network may be trained separately or in combination.
In practice, the accuracy of the coordinate regression network prediction is low, but the association relationship between key points can be well predicted. Fig. 4 illustrates hand keypoint detection results using a coordinate regression network. In this example, like FIG. 3, the 3 keypoints on the index finger are numbered sequentially 1-3 and the 3 keypoints on the middle finger are numbered sequentially 4-6, as shown at 401. As shown in 402, the hand keypoint detection result obtained by using the coordinate regression network does not change significantly the connecting line between the keypoints, but the position coordinates of some keypoints (for example, the keypoints with the number of 4) have larger prediction offset.
And 204, generating the key point position information corresponding to the image to be detected based on the key point thermodynamic diagram and the key point coordinate set.
In some embodiments, the executing body may generate the keypoint location information corresponding to the image to be detected based on the keypoint thermodynamic diagram and the set of keypoint coordinates. As an example, the execution subject performs coordinate regression on the key point thermodynamic diagram. Specifically, the position coordinate with the highest activation value in the thermodynamic diagram of the key point can be obtained first, and the average value of the position coordinate and the position information of the key point can be obtained. On the basis, the obtained average value can be used as the key point position information corresponding to the image to be detected.
Some embodiments of the present disclosure provide methods that combine the advantages of both approaches by combining thermodynamic diagram regression networks and coordinate regression networks. Therefore, the requirements of the association relation between the accuracy and the key points can be met at the same time. In the process, compared with the process of inputting the image features into the two branch networks respectively, the output result of the middle layer of the thermodynamic diagram network is utilized to further conduct coordinate regression, so that the fusion of the capacities of the two networks is facilitated, and the requirements of the accuracy and the association relation between the key points are met.
With further reference to FIG. 5, a flow 500 of further embodiments of a keypoint detection method is illustrated. The process 500 of the keypoint detection method includes the steps of:
step 501, extracting features of an image to be detected to obtain image features.
In some embodiments, the execution body (e.g., the server shown in FIG. 1) on which the keypoint detection method runs.
Step 502, inputting the image features into a pre-trained thermodynamic diagram regression network to obtain a key point thermodynamic diagram.
Step 503, inputting the output result of the middle layer of the thermodynamic diagram regression network into a pre-trained coordinate regression network to obtain a key point coordinate set.
In some embodiments, the specific implementation of steps 501-503 and the technical effects thereof may refer to those embodiments corresponding to fig. 2, and are not described herein.
And step 504, mapping the thermodynamic diagram of the key points and the coordinate set of the key points to the image to be detected respectively to obtain a target image containing the thermodynamic diagram mapping key point set and the coordinate mapping key point set.
In some embodiments, for the thermodynamic diagram of the keypoint, the execution body of the keypoint detection method may take at least one position with the highest activation value in the diagram, and obtain, based on this, the thermodynamic diagram mapping keypoints in the image to be detected through a certain mapping. For each key point coordinate in the key point coordinate set, a coordinate mapping key point can be obtained through certain mapping. Wherein the mapping may comprise a matrix transformation, multiplication with fixed coefficients, etc., according to the actual needs. It will be appreciated that the target image is obtained by mapping a set of keypoint thermodynamic diagrams and keypoint coordinates to the image to be detected.
And 505, selecting a key point from each key point group in the target image as a target key point to obtain a target key point set, wherein each key point group comprises a corresponding thermodynamic diagram mapping key point and a coordinate mapping key point.
In some embodiments, both the thermodynamic map and coordinate map keypoint sets are predictors of keypoints in the image to be detected. Thus, for the same location (e.g., the tip of a thumb), there will be one thermodynamic map key and one coordinate map key, i.e., a set of keys, corresponding thereto. And two key points in the key point group corresponding to the same position are mutually corresponding. For each key point group, a key point can be selected as the key point corresponding to the position, namely the target key point. Since the target image has a plurality of positions, a plurality of key point groups exist. Thus a target set of keypoints is obtained. As an example, one key point may be randomly selected as the target key point. Thus, as a whole, a combination of two networks can be achieved.
In some alternative implementations of some embodiments, one keypoint from the set of keypoints is selected as the target keypoint based on the distance between two keypoints in each set of keypoints. As an example, the distance between two keys may be euclidean distance.
As an example, in response to determining that the distance is less than or equal to a preset threshold, it is illustrated that the predictions for the two networks are relatively similar. That is, regardless of which network's prediction results are selected, the location information of the key points is relatively accurate. At this time, the coordinate mapping key points in the key point group are preferentially determined as target key points, and as the coordinate mapping key points can also meet the requirement of the association relationship, the dual requirements of the accuracy and the association relationship between the key points can be met at the same time. And determining thermodynamic map key points in the key point group as target key points in response to determining that the distance is greater than a preset threshold. In the implementation modes, the selection of the key points is realized by setting the threshold value, and the requirements of the accuracy and the association relation between the key points can be further considered.
And step 506, determining the position information of each target key point in the target key point set as the key point position information corresponding to the image to be detected.
In some alternative implementations of some embodiments, the thermodynamic diagram regression network and the coordinate regression network are trained by training the initial thermodynamic diagram regression network at a first learning rate until a convergence condition is met to obtain an intermediate thermodynamic diagram regression network, and performing joint training on the intermediate thermodynamic diagram regression network and the initial coordinate regression network at a second learning rate until a training end condition is met to obtain the thermodynamic diagram regression network and the coordinate regression network, wherein the second learning rate is less than the first learning rate.
In practice, because thermodynamic diagram regression networks are more sensitive to keypoints, coordinate regression networks are more sensitive to associations between keypoints. Therefore, the convergence direction of the two during training is different. Based on this, if two networks are directly trained, which is equivalent to converging in two directions at the same time, the overall convergence speed and the accuracy of the prediction result of the network are inevitably affected.
Furthermore, the learning rate (LEARNING RATE) acts as an important super-parameter in deep learning, which determines whether and when the objective function can converge to a local minimum. The appropriate learning rate enables the objective function to converge to a local minimum at an appropriate time.
Based on this, in these implementations, the initial thermodynamic diagram regression network may be trained first at a first, larger learning rate, such that the thermodynamic diagram regression network converges quickly, and the keypoint location information is learned first. On the basis, the intermediate thermodynamic diagram regression network and the initial coordinate regression network are jointly trained at a smaller second learning rate, so that the association relationship between key points can be learned while the local optimal solution is found, and the requirements of the accuracy and the association relationship between the key points are met.
In contrast to the description of some embodiments corresponding to fig. 2, the process 500 of the keypoint detection method in some embodiments corresponding to fig. 5 obtains the target keypoints by selecting the keypoints from the set of keypoints. From the whole, there are necessarily some key point thermodynamic diagrams and some key point coordinate sets, so that the advantages of two networks can be integrated, and the requirements of the accuracy and the association relation between the key points are met.
With further reference to fig. 6, as an implementation of the method shown in the foregoing figures, the present disclosure provides some embodiments of a keypoint detection apparatus, which correspond to those method embodiments shown in fig. 2, and which are particularly applicable in a variety of electronic devices.
As shown in fig. 6, the keypoint detection apparatus 600 of some embodiments includes an extraction unit 601, a thermodynamic diagram generation unit 602, a coordinate generation unit 603, and a position information generation unit 604. The extracting unit 601 is configured to perform feature extraction on an image to be detected, so as to obtain image features. Thermodynamic diagram generation unit 602 is configured to input image features into a pre-trained thermodynamic diagram regression network to derive a keypoint thermodynamic diagram. The coordinate generation unit 603 is configured to input the output result of the middle layer of the thermodynamic diagram regression network into the pre-trained coordinate regression network, resulting in a set of key point coordinates. The position information generating unit 604 is configured to generate the keypoint position information corresponding to the image to be detected based on the keypoint thermodynamic diagram and the set of keypoint coordinates.
In an alternative implementation manner of some embodiments, the location information generating unit 604 is further configured to map the thermodynamic diagram of the keypoint and the coordinate set of the keypoint to the image to be detected to obtain a target image including the thermodynamic diagram mapping keypoint set and the coordinate mapping keypoint set, select one keypoint from each set of keypoints in the target image as a target keypoint to obtain a target keypoint set, wherein each set of keypoints includes the corresponding thermodynamic diagram mapping keypoint and the coordinate mapping keypoint, and determine location information of each target keypoint in the target keypoint set as the keypoint location information corresponding to the image to be detected.
In an alternative implementation of some embodiments, the location information generating unit 604 is further configured to select one keypoint from the keypoint groups as the target keypoint based on the distance between the two keypoints in each keypoint group.
In an alternative implementation of some embodiments, the location information generating unit 604 is further configured to determine thermodynamic map keypoints of the set of keypoints as target keypoints in response to determining that the distance is less than or equal to a preset threshold, and to determine coordinate map keypoints of the set of keypoints as target keypoints in response to determining that the distance is greater than the preset threshold.
In an alternative implementation of some embodiments, the thermodynamic diagram regression network comprises a plurality of deconvolution layers, and the coordinate generation unit 603 is configured to input the output result of the last deconvolution layer of the plurality of deconvolution layers to the coordinate regression network to obtain the set of keypoint coordinates.
In an alternative implementation of some embodiments, the thermodynamic diagram regression network and the coordinate regression network are trained by training the initial thermodynamic diagram regression network at a first learning rate until a convergence condition is met to obtain an intermediate thermodynamic diagram regression network, and performing joint training on the intermediate thermodynamic diagram regression network and the initial coordinate regression network at a second learning rate until a training end condition is met to obtain the thermodynamic diagram regression network and the coordinate regression network, wherein the second learning rate is less than the first learning rate.
In an alternative implementation of some embodiments, the apparatus 600 further comprises a detection unit, a scaling unit. The detection unit is configured to detect a target part of the original image to be detected, and an image area displaying the target part is obtained. The scaling unit is configured to scale the image area to a target size, resulting in an image to be detected. The location information generating unit 604 is further configured to generate the location information of the keypoints corresponding to the original image to be detected based on the keypoint thermodynamic diagram and the set of keypoint coordinates.
In an alternative implementation of some embodiments, the location information generating unit 604 is further configured to map the set of keypoint thermodynamic diagrams and the set of keypoint coordinates to the original image to be detected, respectively, resulting in a target image comprising the set of thermodynamic diagrams map the set of keypoints and the set of coordinate map keypoints.
It will be appreciated that the elements described in the apparatus 600 correspond to the various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting benefits described above with respect to the method are equally applicable to the apparatus 600 and the units contained therein, and are not described in detail herein.
Referring now to fig. 7, a schematic diagram of an electronic device 700 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 7 is only one example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 7, the electronic device 700 may include a processing means (e.g., a central processor, a graphics processor, etc.) 701, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage means 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
In general, devices may be connected to I/O interface 705 including input devices 706 such as a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc., output devices 707 including a Liquid Crystal Display (LCD), speaker, vibrator, etc., storage devices 708 including, for example, magnetic tape, hard disk, etc., and communication devices 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 shows an electronic device 700 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 7 may represent one device or a plurality of devices as needed.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications device 709, or from storage 708, or from ROM 702. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing means 701.
It should be noted that, the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be included in the electronic device or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device is caused to perform feature extraction on an image to be detected to obtain image features, input the image features into a pre-trained thermodynamic diagram regression network to obtain a key point thermodynamic diagram, input an output result of an intermediate layer of the thermodynamic diagram regression network into the pre-trained coordinate regression network to obtain a key point coordinate set, and generate key point position information corresponding to the image to be detected based on the key point thermodynamic diagram and the key point coordinate set.
Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, which may be described as, for example, a processor comprising an extraction unit, a thermodynamic diagram generation unit, a coordinate generation unit and a location information generation unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the extraction unit may also be described as "a unit that performs feature extraction on an image to be detected".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic that may be used include Field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-a-chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims (11)

Translated fromChinese
1.一种关键点检测方法,包括:1. A key point detection method, comprising:对待检测图像进行特征提取,得到图像特征;Perform feature extraction on the image to be detected to obtain image features;将所述图像特征输入预先训练的热力图回归网络,得到关键点热力图;Inputting the image features into a pre-trained heat map regression network to obtain a key point heat map;将所述热力图回归网络的中间层的输出结果输入预先训练的坐标回归网络,得到关键点坐标集合;Inputting the output result of the intermediate layer of the heat map regression network into a pre-trained coordinate regression network to obtain a set of key point coordinates;基于所述关键点热力图和所述关键点坐标集合,生成所述待检测图像对应的关键点位置信息,包括:Based on the key point heat map and the key point coordinate set, generating key point position information corresponding to the image to be detected, including:通过映射得到热力图映射关键点集合和坐标映射关键点集合,根据关键点组中两个关键点之间的距离和预设阈值的关系实现关键点的选取,并根据选取的关键点位置信息确定待检测图像对应的关键点位置信息,其中,每个关键点组包括相对应的热力图映射关键点和坐标映射关键点。A heat map mapping key point set and a coordinate mapping key point set are obtained through mapping, and the key point selection is realized according to the relationship between the distance between two key points in the key point group and the preset threshold, and the key point position information corresponding to the image to be detected is determined according to the selected key point position information, wherein each key point group includes corresponding heat map mapping key points and coordinate mapping key points.2.根据权利要求1所述的方法,其中,所述基于所述关键点热力图和所述关键点坐标集合,生成所述待检测图像对应的关键点位置信息,包括:2. The method according to claim 1, wherein generating the key point position information corresponding to the image to be detected based on the key point heat map and the key point coordinate set comprises:分别将所述关键点热力图和所述关键点坐标集合映射至所述待检测图像,得到包含热力图映射关键点集合和坐标映射关键点集合的目标图像;Mapping the key point heat map and the key point coordinate set to the image to be detected respectively, to obtain a target image including the key point set mapped by the heat map and the key point set mapped by the coordinates;从所述目标图像中每个关键点组中选取一个关键点作为目标关键点,得到目标关键点集合,其中,所述每个关键点组包括相对应的热力图映射关键点和坐标映射关键点;Selecting a key point from each key point group in the target image as a target key point to obtain a target key point set, wherein each key point group includes a corresponding heat map mapping key point and a coordinate mapping key point;将所述目标关键点集合中各个目标关键点的位置信息确定为所述待检测图像对应的关键点位置信息。The position information of each target key point in the target key point set is determined as the key point position information corresponding to the image to be detected.3.根据权利要求2所述的方法,其中,所述从所述目标图像中每个关键点组中选取一个关键点作为目标关键点,包括:3. The method according to claim 2, wherein the step of selecting a key point from each key point group in the target image as the target key point comprises:基于每个关键点组中的两个关键点之间的距离,从所述关键点组中选取一个关键点作为目标关键点。Based on the distance between two key points in each key point group, a key point is selected from the key point group as a target key point.4.根据权利要求3所述的方法,其中,所述基于每个关键点组中的两个关键点之间的距离,从所述关键点组中选取一个关键点作为目标关键点,包括:4. The method according to claim 3, wherein the step of selecting a key point from each key point group as a target key point based on the distance between two key points in each key point group comprises:响应于确定所述距离小于或等于预设阈值,将所述关键点组中的热力图映射关键点确定为目标关键点;In response to determining that the distance is less than or equal to a preset threshold, determining the heat map mapping key point in the key point group as a target key point;响应于确定所述距离大于所述预设阈值,将所述关键点组中的坐标映射关键点确定为目标关键点。In response to determining that the distance is greater than the preset threshold, a coordinate mapping key point in the key point group is determined as a target key point.5.根据权利要求1所述的方法,其中,所述热力图回归网络包括多个反卷积层;以及5. The method of claim 1, wherein the heat map regression network comprises a plurality of deconvolution layers; and所述将所述热力图回归网络的中间层的输出结果输入坐标回归网络,得到关键点坐标集合,包括:The output result of the intermediate layer of the heat map regression network is input into the coordinate regression network to obtain a set of key point coordinates, including:将所述多个反卷积层中的最后一个反卷积层的输出结果输入坐标回归网络,得到关键点坐标集合。The output result of the last deconvolution layer among the multiple deconvolution layers is input into the coordinate regression network to obtain a set of key point coordinates.6.根据权利要求1所述的方法,其中,所述热力图回归网络和所述坐标回归网络是通过以下步骤训练得到的:6. The method according to claim 1, wherein the heat map regression network and the coordinate regression network are trained by the following steps:以第一学习率对于初始热力图回归网络进行训练,直至满足收敛条件,得到中间热力图回归网络;The initial heat map regression network is trained with a first learning rate until the convergence condition is met, thereby obtaining an intermediate heat map regression network;以第二学习率对所述中间热力图回归网络和初始坐标回归网络进行联合训练,直至满足训练结束条件,得到所述热力图回归网络和所述坐标回归网络,其中,所述第二学习率小于所述第一学习率。The intermediate heat map regression network and the initial coordinate regression network are jointly trained at a second learning rate until a training end condition is met, thereby obtaining the heat map regression network and the coordinate regression network, wherein the second learning rate is less than the first learning rate.7.根据权利要求2所述的方法,其中,在所述对待检测图像进行特征提取,得到图像特征之前,所述方法还包括:7. The method according to claim 2, wherein, before extracting features from the image to be detected to obtain image features, the method further comprises:对原始待检测图像进行目标部位检测,得到显示目标部位的图像区域;Performing target part detection on the original image to be detected to obtain an image area showing the target part;将所述图像区域缩放至目标尺寸,得到所述待检测图像;以及Scaling the image area to a target size to obtain the image to be detected; and所述基于所述关键点热力图和所述关键点坐标集合,生成所述待检测图像对应的关键点位置信息,包括:The generating the key point position information corresponding to the image to be detected based on the key point heat map and the key point coordinate set includes:基于所述关键点热力图和所述关键点坐标集合,生成所述原始待检测图像对应的关键点位置信息。Based on the key point heat map and the key point coordinate set, the key point position information corresponding to the original image to be detected is generated.8.根据权利要求7所述的方法,其中,所述分别将所述关键点热力图和所述关键点坐标集合映射至所述待检测图像,得到包含热力图映射关键点集合和坐标映射关键点集合的目标图像,包括:8. The method according to claim 7, wherein the step of mapping the key point heat map and the key point coordinate set to the image to be detected to obtain a target image containing a heat map mapping key point set and a coordinate mapping key point set comprises:分别将所述关键点热力图和所述关键点坐标集合映射至所述原始待检测图像,得到包含热力图映射关键点集合和坐标映射关键点集合的目标图像。The key point heat map and the key point coordinate set are respectively mapped to the original image to be detected to obtain a target image containing a heat map mapping key point set and a coordinate mapping key point set.9.一种关键点检测装置,包括:9. A key point detection device, comprising:提取单元,被配置成对待检测图像进行特征提取,得到图像特征;An extraction unit is configured to perform feature extraction on the image to be detected to obtain image features;热力图生成单元,被配置成将所述图像特征输入预先训练的热力图回归网络,得到关键点热力图;A heat map generation unit is configured to input the image features into a pre-trained heat map regression network to obtain a key point heat map;坐标生成单元,被配置成将所述热力图回归网络的中间层的输出结果输入预先训练的坐标回归网络,得到关键点坐标集合;A coordinate generation unit is configured to input the output result of the intermediate layer of the heat map regression network into a pre-trained coordinate regression network to obtain a set of key point coordinates;位置信息生成单元,被配置成基于所述关键点热力图和所述关键点坐标集合,生成所述待检测图像对应的关键点位置信息,包括:A position information generating unit is configured to generate key point position information corresponding to the image to be detected based on the key point heat map and the key point coordinate set, including:通过映射得到热力图映射关键点集合和坐标映射关键点集合,根据关键点组中两个关键点之间的距离和预设阈值的关系实现关键点的选取,并根据选取的关键点位置信息确定待检测图像对应的关键点位置信息,其中,每个关键点组包括相对应的热力图映射关键点和坐标映射关键点。A heat map mapping key point set and a coordinate mapping key point set are obtained through mapping, and the key point selection is realized according to the relationship between the distance between two key points in the key point group and the preset threshold, and the key point position information corresponding to the image to be detected is determined according to the selected key point position information, wherein each key point group includes corresponding heat map mapping key points and coordinate mapping key points.10.一种电子设备,包括:10. An electronic device comprising:一个或多个处理器;one or more processors;存储装置,其上存储有一个或多个程序,a storage device having one or more programs stored thereon,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1 to 8.11.一种计算机可读介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-8中任一所述的方法。11. A computer-readable medium having a computer program stored thereon, wherein when the program is executed by a processor, the method according to any one of claims 1 to 8 is implemented.
CN202110570018.5A2021-05-252021-05-25 Key point detection method, device, equipment and computer readable mediumActiveCN113297973B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110570018.5ACN113297973B (en)2021-05-252021-05-25 Key point detection method, device, equipment and computer readable medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110570018.5ACN113297973B (en)2021-05-252021-05-25 Key point detection method, device, equipment and computer readable medium

Publications (2)

Publication NumberPublication Date
CN113297973A CN113297973A (en)2021-08-24
CN113297973Btrue CN113297973B (en)2025-02-25

Family

ID=77324643

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110570018.5AActiveCN113297973B (en)2021-05-252021-05-25 Key point detection method, device, equipment and computer readable medium

Country Status (1)

CountryLink
CN (1)CN113297973B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114332977A (en)*2021-10-142022-04-12北京百度网讯科技有限公司 Key point detection method, device, electronic device and storage medium
CN114627311A (en)*2022-03-162022-06-14北京金山云网络技术有限公司Object attribute information generation method, object attribute information generation device, electronic device, and medium
CN116935367A (en)*2022-03-292023-10-24北京嘀嘀无限科技发展有限公司 A key point detection method, device, equipment, storage medium and product
CN116777899B (en)*2023-07-282025-03-14超音速人工智能科技股份有限公司 Industrial image key point detection method, system and platform based on regression model
CN120411664A (en)*2024-02-012025-08-01宁德时代新能源科技股份有限公司 Inspection method, device, equipment and computer-readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110084221A (en)*2019-05-082019-08-02南京云智控产业技术研究院有限公司A kind of serializing face critical point detection method of the tape relay supervision based on deep learning
CN110516642A (en)*2019-08-302019-11-29电子科技大学 A lightweight face 3D key point detection method and system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10783393B2 (en)*2017-06-202020-09-22Nvidia CorporationSemi-supervised learning for landmark localization
US10929654B2 (en)*2018-03-122021-02-23Nvidia CorporationThree-dimensional (3D) pose estimation from a monocular camera
CN109508681B (en)*2018-11-202021-11-30北京京东尚科信息技术有限公司Method and device for generating human body key point detection model
CN110532981B (en)*2019-09-032022-03-15北京字节跳动网络技术有限公司Human body key point extraction method and device, readable storage medium and equipment
CN110991319B (en)*2019-11-292021-10-19广州市百果园信息技术有限公司Hand key point detection method, gesture recognition method and related device
CN111914782A (en)*2020-08-102020-11-10河南威虎智能科技有限公司Human face and detection method and device of feature points of human face, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110084221A (en)*2019-05-082019-08-02南京云智控产业技术研究院有限公司A kind of serializing face critical point detection method of the tape relay supervision based on deep learning
CN110516642A (en)*2019-08-302019-11-29电子科技大学 A lightweight face 3D key point detection method and system

Also Published As

Publication numberPublication date
CN113297973A (en)2021-08-24

Similar Documents

PublicationPublication DateTitle
CN113297973B (en) Key point detection method, device, equipment and computer readable medium
CN110991319B (en)Hand key point detection method, gesture recognition method and related device
CN110532981B (en)Human body key point extraction method and device, readable storage medium and equipment
JP2023547917A (en) Image segmentation method, device, equipment and storage medium
CN111402122A (en)Image mapping processing method and device, readable medium and electronic equipment
CN113378773B (en)Gesture recognition method, gesture recognition device, gesture recognition apparatus, gesture recognition storage medium, and gesture recognition program product
WO2022033111A1 (en)Image information extraction method, training method and apparatus, medium, and electronic device
EP3968131A1 (en)Object interaction method, apparatus and system, computer-readable medium, and electronic device
US20210158031A1 (en)Gesture Recognition Method, and Electronic Device and Storage Medium
CN111368668A (en)Three-dimensional hand recognition method and device, electronic equipment and storage medium
CN112966592B (en) Hand key point detection method, device, equipment and medium
CN111601129B (en)Control method, control device, terminal and storage medium
CN112200183B (en) Image processing method, device, apparatus and computer readable medium
CN114495173A (en) A gesture recognition method, apparatus, electronic device and computer readable medium
CN111310595B (en) Method and apparatus for generating information
CN109410121B (en)Human image beard generation method and device
CN115880719A (en) Gesture depth information generation method, device, device and computer readable medium
CN113703704B (en)Interface display method, head-mounted display device, and computer-readable medium
CN111968030B (en)Information generation method, apparatus, electronic device and computer readable medium
WO2023025181A1 (en)Image recognition method and apparatus, and electronic device
CN111292365B (en)Method, apparatus, electronic device and computer readable medium for generating depth map
CN115761412A (en)Detection frame processing method and device, electronic equipment and computer readable medium
CN110263743B (en)Method and device for recognizing images
CN116740775A (en) Key point detection methods and equipment
CN114758368B (en)Face key point information generation method, device, equipment and computer readable medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp