CN112232125A

Movatterモバイル変換

Info

Publication number: CN112232125A
Application number: CN202010958281.7A
Authority: CN
Inventors: 张寿奎; 王志成; 周而进
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2021-01-15

Abstract

Translated fromChinese

本申请提供一种关键点检测方法及关键点检测模型的训练方法，关键点检测方法包括：获取RAW格式的目标原始图像数据；将所述目标原始图像数据作为已训练的关键点检测模型的输入；按照所述关键点检测模型的网络架构和网络参数，对所述目标原始图像数据进行卷积、拼接以及取整操作，获得所述关键点检测模型输出的关键点坐标。本申请上述实施例提供的技术方案，数据处理量大大减少，降低了对硬件的要求，可以适用于不同硬件条件的设备。

The present application provides a key point detection method and a training method for a key point detection model. The key point detection method includes: acquiring target original image data in RAW format; using the target original image data as the input of the trained key point detection model ; According to the network architecture and network parameters of the key point detection model, perform convolution, splicing and rounding operations on the target original image data to obtain the key point coordinates output by the key point detection model. The technical solutions provided by the above embodiments of the present application greatly reduce the amount of data processing, lower the requirements for hardware, and can be applied to devices with different hardware conditions.

Description

Key point detection method and key point detection model training method

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for detecting a key point, a method and an apparatus for training a key point detection model, an electronic device, and a computer-readable storage medium.

Background

The key point detection is a technology that has recently been paid more and more attention, for example, generating face key point information corresponding to a main target person from a single picture is very basic and important for many applications, such as face recognition, face attribute, living body judgment, and the like.

Most of the existing methods perform key point detection on an image of RGB (red, green and blue) data, and train a model by using the RGB data by building a network model including Convolution, Pooling, Concat, Fullconcat, and ReLU (activation function transformation).

The network model needs support of calculation functions such as convergence, position, Concat, fullcocat, ReLU, etc., and has a large data processing amount, which has a high requirement on hardware, and has poor supportability on different hardware.

Disclosure of Invention

The embodiment of the application provides a key point detection method which is used for simplifying a data processing process.

The embodiment of the application provides a key point detection method, which comprises the following steps:

acquiring target original image data in a RAW format;

taking the target original image data as the input of a trained key point detection model;

and carrying out convolution, splicing and rounding operations on the target original image data according to the network architecture and the network parameters of the key point detection model to obtain the key point coordinates output by the key point detection model.

In an embodiment, the keypoint detection model is composed of a plurality of convolution splicing modules and at least one convolution decoding module which are connected in sequence.

In an embodiment, the convolution splicing module includes a first convolution layer, a first splicing layer, and a first convolution fetching block connected in sequence.

In an embodiment, the convolutional decoding module includes a second convolutional layer, a second concatenation layer, a second convolutional fetch block, and a third convolutional fetch block, which are connected in sequence.

In an embodiment, before the inputting the target raw image data as the trained keypoint detection model, the method further comprises:

acquiring a target BGR format image with known key point coordinates;

performing inverse image signal processing on the target BGR format image to obtain target original sample data of an RAW format with known key point coordinates;

inputting the target original sample data of the known key point coordinates into a neural network for training until the neural network converges, and taking the converged neural network as the key point detection model.

In one embodiment, the inverse image signal processing is performed on the target BGR format image, and includes:

carrying out inverse gamma change on the target BGR format image to obtain target BGR format data;

converting the target BGR format data into target RGB format data;

carrying out inverse color correction matrix change on the target RGB format data to obtain a target RGB format image;

adding mosaic to the target RGB format image, and converting the mosaic into a target image in an RGGB format;

and performing black level processing on the target image in the RGGB format to obtain target original sample data in the RAW format.

In an embodiment, the target raw image data is face raw image data.

The embodiment of the application further provides a method for training the key point detection model, which comprises the following steps:

acquiring a target BGR format image with known key point coordinates;

inputting the target original sample data of the known key point coordinates into a neural network for training until the neural network converges, and taking the converged neural network as a key point detection model.

In one embodiment, the neural network is composed of a plurality of convolution splicing modules and at least one convolution decoding module which are connected in sequence.

In an embodiment, the performing inverse image signal processing on the target BGR format image to obtain target original sample data in RAW format with known key point coordinates includes:

converting the target BGR format data into target RGB format data;

The embodiment of the present application further provides a key point detecting device, including:

the data acquisition module is used for acquiring target original image data in an RAW format;

the data input module is used for taking the target original image data as the input of a trained key point detection model;

and the result output module is used for performing convolution, splicing and rounding operations on the target original image data according to the network architecture and the network parameters of the key point detection model to obtain the key point coordinates output by the key point detection model.

The embodiment of the present application further provides a training device for a key point detection model, including:

the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring a target BGR format image with known key point coordinates;

the inverse processing module is used for performing inverse image signal processing on the target BGR format image to obtain target original sample data of an RAW format with known key point coordinates;

and the model training module is used for inputting the target original sample data of the known key point coordinates into a neural network for training until the neural network is converged, and the converged neural network is used as a key point detection model.

An embodiment of the present application further provides an electronic device, where the electronic device includes:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the above-described method of keypoint detection or method of training a keypoint detection model.

An embodiment of the present application further provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is executable by a processor to perform the above-mentioned method for detecting a keypoint or the above-mentioned method for training a keypoint detection model.

According to the technical scheme provided by the embodiment of the application, the target original image data in the RAW format is obtained for key point detection, and the problems of information loss of RGB images under the conditions of dim light, backlight and the like and time consumption in the image signal processing process can be solved. The key point detection model only has convolution and splicing, and uses rounding to replace a ReLU activation function, so that the data processing capacity is greatly reduced, the requirement on hardware is reduced, and the method can be suitable for equipment with different hardware conditions.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic view of an application scenario of a keypoint detection method provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

fig. 3 is a schematic flowchart of a key point detection method provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a network architecture of a key point detection model provided in an embodiment of the present application;

FIG. 5 is a graphical illustration of rounding and ReLU activation functions provided by an embodiment of the present application;

FIG. 6 is a flowchart illustrating a method for training a keypoint detection model according to an embodiment of the present disclosure;

FIG. 7 is a detailed flowchart of step S620 in the corresponding embodiment of FIG. 6;

fig. 8 is a block diagram of a key point detecting apparatus provided in an embodiment of the present application;

fig. 9 is a block diagram of a training apparatus for a keypoint detection model according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Fig. 1 is a schematic view of an application scenario of a keypoint detection method provided in an embodiment of the present application. The application scenario may include aserver 10 and a plurality ofclients 20. Theclient 20 may be a camera, a smartphone, a tablet, a laptop or a desktop computer. Theserver 10 may be a server, a server cluster or a cloud computing center. Theclient 20 and theserver 10 are connected through a wired or wireless network.

Theclient 20 may acquire target original image data in a RAW format, and send the target original image data in the RAW format to the server, and theserver 10 obtains the key point coordinates of the target by using the key point detection method provided in the embodiment of the present application. Wherein the target may be a human face, a vehicle, or other object. For example, the coordinates of the key points may be coordinates of the eyes, nose, mouth, eyebrows, and other elements. The areas of interest are different for different targets, so it is possible to set in advance what the key points of the targets are. For example, key points of a human face may include eyes, nose, mouth, eyebrows. The key point of the vehicle may be a contour.

In an embodiment, theserver 10 may further execute a method for training a keypoint detection model provided in the following embodiments of the present application, and obtain the keypoint detection model through training. And then identifying the coordinates of the key points from the target original image data by using a key point detection model.

According to the requirement, theserver 10 may also send the trained keypoint detection model to theclient 20, and after theclient 20 collects target original image data in the RAW format, the method provided in the embodiment of the present application may also be adopted to obtain the keypoint coordinates of the target.

Fig. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. The electronic device can be used for executing the training method of the key point detection model and the key point detection method provided by the embodiment of the application. As shown in fig. 2, the electronic apparatus includes: one ormore processors 102, and one ormore memories 104 storing processor-executable instructions. Wherein the processor is configured to execute the key point detection method or the training method of the key point detection model provided in the embodiments described below in the present application.

Theprocessor 102 may be a gateway, or may be an intelligent terminal, or may be a device including a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capability and/or instruction execution capability, and may process data of other components in theelectronic device 100, and may control other components in theelectronic device 100 to perform desired functions.

Thememory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed byprocessor 102 to implement a method of training a keypoint detection model or a method of keypoint detection as described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

In one embodiment, theelectronic device 100 shown in FIG. 2 may also include aninput device 106, anoutput device 108, and adata acquisition device 110, which are interconnected via abus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of theelectronic device 100 shown in fig. 2 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

Theinput device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like. Theoutput device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like. Thedata acquisition device 110 may acquire an image of a subject and store the acquired image in thememory 104 for use by other components. Illustratively, thedata acquisition device 110 may be a camera.

In an embodiment, the components in the example electronic device for implementing the method for training the keypoint detection model and the method for detecting the keypoint may be integrally disposed or disposed in a decentralized manner, such as integrally disposing theprocessor 102, thememory 104, theinput device 106 and theoutput device 108, and disposing thedata acquisition device 110 separately.

In an embodiment, the example electronic device for implementing the training method of the keypoint detection model and the keypoint detection method of the embodiment of the present application may be implemented as an intelligent terminal, such as a smart phone, a tablet computer, a smart watch, an in-vehicle device, and the like.

Fig. 3 is a schematic flowchart of a key point detection method according to an embodiment of the present application. As shown in fig. 3, the method may include the following steps S310 to S330.

Step S310: target original image data in a RAW format is acquired.

RAW is in an unprocessed or uncompressed format, and refers to RAW data obtained by converting a captured light source signal into a digital signal by an image sensor. The target raw image data may be face raw image data, i.e., raw data of a face captured by an image sensor. The target raw image data may also be vehicle raw image data, i.e., raw data of the vehicle captured by the image sensor.

At present, key point detection is mostly performed on an RGB (red, green and blue) image, but the information loss of the RGB image is very serious under the conditions of dim light, backlight and the like, so that the key point detection under the dim light is very inaccurate. According to the method and the device, the target original image data in the RAW format is adopted for carrying out key point detection, the target original image data in the RAW format is in an unprocessed or uncompressed format, and complete target information is reserved, so that the problem of information loss under the conditions of dim light, backlight and the like can be solved, and the accuracy of key point detection can be improved. In addition, the target original image data in the RAW format can solve the problem of time consumption for converting the target original image data into the RGB image.

Step S320: and taking the target original image data as the input of the trained key point detection model.

The key point detection model can be obtained by training in advance before detection. The method of training the keypoint detection model can be seen below. In one embodiment, the keypoint detection model may be trained by a neural network. Fig. 4 is a schematic network architecture diagram of a key point detection model according to an embodiment of the present application. As shown in fig. 4, the keypoint detection model is composed of a plurality of convolution splicing modules and at least one convolution decoding module which are connected in sequence. In one embodiment, the keypoint detection model may include four convolutional concatenation modules connected in sequence, and a convolutional decoding module connected last. In one embodiment, a convolution layer may be connected before the first convolution patch module, and the convolution calculation is performed on the target original image data before the result is input into the first convolution patch module. The convolution splicing module can comprise a first convolution layer, a first splicing layer and a first convolution taking whole block which are sequentially connected.

As shown in fig. 4, the convolution kernel of the first convolution layer of the convolution splicing module may be 3 × 3 in size, and the first splicing layer is used to splice the input (i.e., the input of the convolution splicing module) and the result output by the first convolution layer. The first convolution whole block is used for performing convolution calculation on the result output by the first splicing layer (the size of a convolution kernel can be 3 x 3), and outputting the calculation result after taking an integer. As shown in fig. 4, the output of the first convolution splicing module is used as the input of the second convolution splicing module, the output of the second convolution splicing module is used as the input of the third convolution splicing module, the output of the third convolution splicing module is used as the input of the fourth convolution splicing module, the output of the fourth convolution splicing module is used as the input of the convolution decoding module, and the output of the convolution decoding module is the key point coordinate.

As shown in fig. 4, the convolutional decoding module may include a second convolutional layer, a second splice layer, a second convolutional-fetching block, and a third convolutional-fetching block, which are sequentially connected. The first, second and third are used to distinguish different convolution layers and different convolution blocks.

As shown in fig. 4, the convolution kernel size of the second convolution layer of the convolution decoding module may be 1 × 1, and the second convolution layer is used for splicing the input (i.e., the input of the convolution decoding module) with the output of the second convolution layer and then outputting the spliced input. The second convolution taking block is used for performing convolution calculation on the result output by the second convolution layer (the size of the convolution kernel can be 1 × 1), and the calculation result is output after being subjected to integer taking. The third convolution monoblock is used for performing convolution calculation on the result output by the second convolution monoblock (the size of a convolution kernel can be 1 × 1), and outputting the calculation result after the calculation result is subjected to integer selection. And the third convolution takes the result of the whole block output as the coordinate of the key point.

Step S330: and carrying out convolution, splicing and rounding operations on the target original image data according to the network architecture and the network parameters of the key point detection model to obtain the key point coordinates output by the key point detection model.

The network architecture is used for determining a processing procedure of the target raw image data, and for example, the network architecture may be as shown in fig. 4. The network parameters refer to values in the convolution kernel. After the keypoint detection model is trained, the network architecture and network parameters are known. Therefore, according to the network architecture and the network parameters, the convolution, splicing and rounding operations can be performed on the target original image data. And the output of the key point detection model is the key point coordinates of the target original image data.

In an embodiment, according to the network architecture shown in fig. 4, a convolution operation may be performed on target original image data first, a result 1 is output, the result 1 is input to a first convolution and concatenation module, the convolution and concatenation module multiplies the result 1 by a convolution kernel of 3 × 3 to output a result 2, the result 1 and the result 2 are subjected to a concatenation output result 3, the result 3 is multiplied by the convolution kernel of 3 × 3 to output a result 4, and the result 4 is taken as an integer to obtain a result 5. The first convolution patch module outputs a result 5 to the second convolution patch module. Referring to the first convolution splicing module, the second convolution splicing module performs convolution, splicing, convolution and rounding on the result 5 and outputs a result 6, the third convolution splicing module performs convolution, splicing, convolution and rounding on the result 6 and outputs a result 7, and the fourth convolution splicing module performs convolution, splicing, convolution and rounding on the result 6 and outputs a result 8. The convolution decoding module outputs a result 9 by multiplying the result 8 by the 1 × 1 convolution kernel, splices the result 8 and the result 9 to obtain aresult 10, multiplies theresult 10 by the 1 × 1 convolution kernel, then obtains a result 11 by rounding, multiplies the result 11 by the 1 × 1 convolution kernel, and then obtains a result 12 by rounding. The result 12 is the key point coordinates.

It should be noted that the keypoint detection model related to the embodiment of the present application only has convolution and concatenation, and replaces the ReLU activation function with rounding, but does not include other operations such as pooling posing, element product Elemwise, full convolution FullConcat, and activation function, so that the data processing amount is greatly reduced, the requirement on hardware is reduced, and the method and the device can be applied to devices with different hardware conditions.

It should be explained that the curve of the rounding and the ReLU activation functions is shown in FIG. 5, and rounding at the x positive half axis can be considered as the differential of the ReLU; the negative half-axis may be considered to be a reverse continuation of the positive half-axis. The ReLU activation function is essentially to introduce a non-linear change, and the rounding operation adopted in the above embodiments of the present application is also essentially a non-linear change, so that the rounding operation adopted in the embodiments of the present application instead of the ReLU activation function can also achieve the effect of the non-linear change.

Fig. 6 is a schematic flowchart of a method for training a keypoint detection model according to an embodiment of the present disclosure. The training of the keypoint detection model may be performed before step S320, and the trained keypoint detection model may be used in the embodiment corresponding to fig. 3 to perform keypoint detection on the target raw image data. As shown in fig. 6, the method for training the keypoint detection model includes the following steps: s610-step S630.

Step S610: and acquiring a target BGR format image with known key point coordinates.

The target BGR format image is image data of a target in BGR (cyan and red color mode) format with known coordinates of key points. As above, the target may be a human face, a vehicle, or the like. The RAW format data are inconvenient to label, meanwhile, most of historically accumulated Internet public data sets are BGR format images, the existing BGR format image data are fully utilized, and the embodiment of the application can adopt target BGR format images with known key point coordinates to train a key point detection model.

Step S620: and carrying out inverse image signal processing on the target BGR format image to obtain target original sample data of an RAW format with known key point coordinates.

The inverse image signal processing refers to an inverse process of a camera ISP (image signal processing) process. Because the target original image data in the RAW format is acquired in the detection stage, an inverse ISP processing mode is adopted in the embodiment of the application, and the inverse ISP is used for converting the target BGR format image into the RAW format, so that the image data in the RAW format can be used for training the key point detection model.

In order to distinguish from the target original image data to be detected, the target original image data in the RAW format obtained by the inverse ISP processing is called target original sample data. Since the key point coordinates of the target BGR format image are known quantities, the key point coordinates of the target original sample data are known quantities.

Step S630: inputting the target original sample data of the known key point coordinates into a neural network for training until the neural network converges, and taking the converged neural network as the key point detection model.

The network architecture of the neural network may be as shown in fig. 4. The neural network is composed of a plurality of convolution splicing modules and at least one convolution decoding module which are connected in sequence. The volume splicing module comprises a first volume layer, a first splicing layer and a first volume taking whole block which are sequentially connected. The convolution decoding module comprises a second convolution layer, a second splicing layer, a second convolution block and a third convolution block which are sequentially connected. See above specifically, and are not described herein again.

The target original sample data is used as the input of the neural network, and the network parameters can be adjusted until the neural network converges. Namely, the error between the key point coordinates of the target original sample data predicted by the neural network and the known key point coordinates is minimum.

Due to the fact that the number of pictures in the BGR format is large, according to the technical scheme provided by the embodiment of the application, reverse image signal processing is carried out on the target BGR format image with the known key point coordinates to obtain target original sample data in the RAW format, then the target original sample data in the RAW format is trained to obtain a key point detection model, and the utilization rate of existing data can be improved.

Fig. 7 is a detailed flowchart of step S620 provided in the embodiment of the present application. As shown in fig. 7, the above step S620 includes the following steps S621 to S625.

Step S621: and carrying out inverse gamma change on the target BGR format image to obtain target BGR format data.

The inverse gamma transformation is to change the target BGR format image from a gray value interval of 0-255 to an interval of 0-1, and then multiply the result by 2.2 power of 2. For differentiation, the result after the inverse gamma change may be referred to as target BGR format data.

Step S622: and converting the target BGR format data into target RGB format data.

The target BGR format data is that the color data is sorted from high order to low order according to B (blue), G (green) and R (red). The target RGB format data means that color data is sorted from upper to lower in R (red), G (green), and R B (blue).

Step S623: and carrying out inverse color correction matrix change on the target RGB format data to obtain a target RGB format image.

Wherein the inverse color correction matrix change means multiplying the target RGB format data by a preset matrix, for example, [ [0.7666,0.2493, -0.0179], [0.1113,1.0118, -0.1311], [0.0162,0.4098,0.5707] ]; the result of the inverse color correction matrix change is referred to as a target RGB format image.

Step S624: adding mosaic to the target RGB format image, and converting the target RGB format image into an RGGB format target image.

The target image in the RGGB format means that each pixel of the target image has intensity values of four channels of R (red), G (green), and B (blue). In one embodiment, the target RGB format image may be converted to an RGGB format target image using the following formula, while adding a mosaic.

out[0]＝img[0:H:2,0:W:2,0]/rgb_gain[0]

out[1]＝img[0:H:2,1:W:2,1]/rgb_gain[1]

out[2]＝img[1:H:2,0:W:2,1]/rgb_gain[1]

out[3]＝img[1:H:2,1:W:2,2]/rgb_gain[2]

out [0], out [1], out [2], out [3] represent the values of the four channels R, G, G, B, and rgb _ gain [0], rgb _ gain [1], rgb _ gain [2] represent random numbers for adding mosaics. img denotes a target RGB format image, and the size of the target RGB format image may be expressed as H × W × 3.

Step S625: and performing black level processing on the target image in the RGGB format to obtain target original sample data in the RAW format.

The black level processing refers to multiplying the target image in the RGGB format by 255, converting the result into the range of 0-255, and outputting the result, namely the target original sample data in the RAW format. The target RAW sample data in RAW format may be used to train the keypoint detection model in step S630.

The following are embodiments of the apparatus of the present application, which can be used to implement the above-mentioned key point detection method and the embodiments of the method for training the key point detection model of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the above-described embodiments of the method of the present application.

Fig. 8 is a block diagram of a keypoint detection apparatus according to an embodiment of the present application. As shown in fig. 8, the apparatus includes: adata acquisition module 810, adata input module 820, and aresult output module 830.

And adata obtaining module 810, configured to obtain target RAW image data in RAW format.

Adata input module 820, configured to use the target raw image data as an input of the trained keypoint detection model.

And aresult output module 830, configured to perform convolution, stitching, and rounding operations on the target original image data according to the network architecture and the network parameters of the keypoint detection model, so as to obtain the keypoint coordinates output by the keypoint detection model.

In an embodiment, the keypoint detection model includes a plurality of convolution splicing modules and at least one convolution decoding module connected in sequence.

In an embodiment, as shown in fig. 9, the keypoint detection apparatus provided in the above embodiment of the present application may further include: asample acquisition module 910, aninverse processing module 920, and amodel training module 930.

Asample obtaining module 910, configured to obtain a target BGR format image with known key point coordinates;

theinverse processing module 920 is configured to perform inverse image signal processing on the target BGR format image to obtain target original sample data in RAW format with known key point coordinates;

amodel training module 930, configured to input the target original sample data of the known keypoint coordinates into a neural network for training until the neural network converges, where the converged neural network serves as the keypoint detection model.

In an embodiment, theinverse processing module 920 includes:

the inverse gamma change unit is used for carrying out inverse gamma change on the target BGR format image to obtain target BGR format data;

the format conversion unit is used for converting the target BGR format data into target RGB format data;

the color inverse change unit is used for carrying out inverse color correction matrix change on the target RGB format data to obtain a target RGB format image;

the mosaic unit is used for adding mosaics to the target RGB format image and converting the mosaic to a target image in an RGGB format;

and the black level unit is used for performing black level processing on the target image in the RGGB format to obtain target original sample data in the RAW format.

In an embodiment, the target raw image data is face raw image data.

Fig. 9 is a block diagram of a training apparatus for a keypoint detection model according to an embodiment of the present application. As shown in fig. 9, the apparatus includes: asample acquisition module 910, aninverse processing module 920, and amodel training module 930.

amodel training module 930, configured to input the target original sample data with the known key point coordinates into a neural network for training until the neural network converges, where the converged neural network serves as a key point detection model.

In one embodiment, the neural network comprises a plurality of convolution splicing modules and at least one convolution decoding module which are connected in sequence.

In an embodiment, the inverse processing module 920:

The implementation processes of the functions and actions of the modules in the device are specifically detailed in the implementation processes of the corresponding steps in the key point detection method and the training method of the key point detection model, and are not repeated here.

In the embodiments provided in the present application, the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.