Movatterモバイル変換


[0]ホーム

URL:


CN119672766A - Method, storage medium, electronic device and product for detecting hand joints in eyewear equipment - Google Patents

Method, storage medium, electronic device and product for detecting hand joints in eyewear equipment
Download PDF

Info

Publication number
CN119672766A
CN119672766ACN202510186010.7ACN202510186010ACN119672766ACN 119672766 ACN119672766 ACN 119672766ACN 202510186010 ACN202510186010 ACN 202510186010ACN 119672766 ACN119672766 ACN 119672766A
Authority
CN
China
Prior art keywords
hand
hand joint
joint detection
detection model
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510186010.7A
Other languages
Chinese (zh)
Inventor
曹卫
韩瑞峰
史春苓
陈科科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Qiuguo Planning Technology Co ltd
Original Assignee
Hangzhou Qiuguo Planning Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Qiuguo Planning Technology Co ltdfiledCriticalHangzhou Qiuguo Planning Technology Co ltd
Priority to CN202510186010.7ApriorityCriticalpatent/CN119672766A/en
Publication of CN119672766ApublicationCriticalpatent/CN119672766A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

Translated fromChinese

本申请涉及视觉检测技术领域,具体提供了一种眼镜设备中手关节检测的方法、存储介质、电子设备及产品,该方法可以包括:获取当前帧图像对应的手部区域的置信度;在确认所述置信度大于预设阈值的情况下,对所述手部区域进行处理,得到待检测手部区域图像;将所述待检测手部区域图像输入到目标手关节检测模型中,获取手关节检测结果。本申请的一些实施例可以提升手关节检测的精准度。

The present application relates to the field of visual detection technology, and specifically provides a method, storage medium, electronic device and product for detecting hand joints in an eyewear device, the method may include: obtaining the confidence of a hand area corresponding to a current frame image; when confirming that the confidence is greater than a preset threshold, processing the hand area to obtain an image of the hand area to be detected; inputting the image of the hand area to be detected into a target hand joint detection model to obtain a hand joint detection result. Some embodiments of the present application can improve the accuracy of hand joint detection.

Description

Method for detecting hand joint in glasses equipment, storage medium, electronic equipment and product
Technical Field
The application relates to the technical field of visual detection, in particular to a method for detecting a hand joint in glasses equipment, a storage medium, electronic equipment and products.
Background
With the continuous development of computer vision technology, vision detection technology is widely applied to detection scenes. For example, gesture recognition techniques combine computer vision, machine learning, and image processing techniques to enable detection of user gestures. Specifically, the gesture recognition of the user can be realized by detecting the hand joint of the user. In the prior art, when detecting the hand joint, a heat point diagram of the hand joint is roughly determined, and then fitting estimation is carried out to obtain the hand joint position. Because of more hand joints, the accuracy of the hand joint position detected by the fitting estimation mode is lower.
Therefore, how to provide a method for detecting a hand joint in a pair of glasses with high accuracy is a technical problem to be solved.
Disclosure of Invention
It is an object of some embodiments of the application to provide a method of hand joint detection in an eyeglass device, a storage medium, an electronic device and a product, according to the technical scheme provided by the embodiment of the application, the accuracy of hand joint detection can be improved, and the practicability is higher.
In a first aspect, some embodiments of the present application provide a method for detecting a hand joint in an eyeglass device, which includes obtaining a confidence coefficient of a hand region corresponding to a current frame image, processing the hand region to obtain a hand region image to be detected when the confidence coefficient is confirmed to be greater than a preset threshold value, and inputting the hand region image to be detected into a target hand joint detection model to obtain a hand joint detection result.
According to some embodiments of the application, under the condition that the confidence coefficient of the hand region is larger than a preset threshold value, the hand region image to be detected, which is obtained by processing the hand region, is input into a target hand joint detection model, and a hand joint detection result is obtained. According to the embodiment of the application, accurate detection of the hand joint can be realized through the trained target hand joint detection model, and the detection efficiency is higher.
In some embodiments, before the hand region image to be detected is input into the target hand joint detection model, the method further comprises constructing a hand sample data set, wherein the hand sample data set comprises hand position regions and hand joint standard data in each hand sample of a plurality of hand samples, the hand joint standard data comprises three-dimensional coordinates and expected values of each hand joint point, training an initial hand joint detection model by using the hand sample data set, and determining the target hand joint detection model.
According to the method, the initial hand joint detection model is trained through the constructed hand sample data set to obtain the target hand joint detection model, and model support is provided for accurate detection of subsequent hand joints.
In some embodiments, the three-dimensional coordinates of each hand joint point characterize a two-dimensional position coordinate of each hand joint and a distance of each hand joint to the middle finger root region, and the expected value characterizes whether each hand joint point is within a boundary region of each hand sample.
According to the method and the device, the positions of the hand-off nodes are represented through three-dimensional coordinates, and accurate positioning of the hand-off nodes can be achieved.
In some embodiments, the constructing the hand sample data set comprises normalizing acquired hand image data to obtain a processed image, normalizing hand joints in the processed image to obtain the hand joint standard data, and performing transformation operation on hand position areas in the processed image to obtain the plurality of hand samples, wherein the transformation operation comprises offset and/or expansion.
According to the method and the device, standard and rich sample data can be obtained through normalization processing and standardization processing of hand image data and transformation operation of hand position areas, and rich data support is provided for subsequent model training.
In some embodiments, the training of the initial hand joint detection model includes a first training stage and a second training stage, wherein the first training stage is trained using a hand sample data set that has not undergone blurring and noise processing, and the second training stage is trained using a hand sample data set that has undergone blurring and noise processing.
Some embodiments of the application may improve the accuracy and stability of model training by training using different hand sample data sets during different training phases.
In some embodiments, the training the initial hand joint detection model by using the hand sample data set to determine the target hand joint detection model includes, in the first training stage, converting the hand position area of each hand sample into a gray image with a preset size, inputting the gray image into the initial hand joint detection model, outputting hand joint point data, comparing the hand joint point data with the hand joint standard data, determining that the hand joint detection model of the training meets a preset convergence condition, entering the second training stage, and executing the second training stage on the hand joint detection model of the training to obtain the target hand joint detection model.
In some embodiments, the hand region is obtained by using feature point matching to obtain the hand region of the current frame image if it is determined that the hand joint detection result of the previous frame image of the current frame image contains not less than a preset number of hand joint information, and using a hand joint detection model to detect the hand region of the current frame image if it is determined that the hand joint detection result of the previous frame image contains less than the preset number of hand joint information.
According to the method and the device for detecting the hand joint, the hand joint information in the hand joint detection result is analyzed, the detection mode of the hand region for obtaining the current frame image is determined, the computer overhead can be effectively reduced, and the hand joint detection efficiency in continuous frames is improved.
In a second aspect, some embodiments of the present application provide a device for detecting a hand joint in an eyeglass device, which includes an acquisition module configured to acquire a confidence coefficient of a hand region corresponding to a current frame image, a processing module configured to process the hand region to obtain a hand region image to be detected when the confidence coefficient is determined to be greater than a preset threshold, and a detection module configured to input the hand region image to be detected into a target hand joint detection model, and acquire a hand joint detection result.
In a third aspect, some embodiments of the application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs a method according to any of the embodiments of the first aspect.
In a fourth aspect, some embodiments of the application provide an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to implement a method according to any of the embodiments of the first aspect when executing the program.
In a fifth aspect, some embodiments of the application provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor, is adapted to carry out the method according to any of the embodiments of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions of some embodiments of the present application, the drawings that are required to be used in some embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be construed as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 is a system diagram of hand joint detection in an eyeglass apparatus provided by some embodiments of the present application;
FIG. 2 is a flow chart of a method for obtaining a target hand joint detection model according to some embodiments of the present application;
FIG. 3 is one of the flow charts of the method of hand joint detection in an eyeglass apparatus provided in some embodiments of the present application;
FIG. 4 is a second flowchart of a method for detecting a hand joint in an eyeglass apparatus according to some embodiments of the present application;
FIG. 5 is a block diagram of an apparatus for hand joint detection in an eyeglass device in accordance with some embodiments of the present application;
fig. 6 is a schematic diagram of an electronic device according to some embodiments of the present application.
Detailed Description
The technical solutions of some embodiments of the present application will be described below with reference to the drawings in some embodiments of the present application.
It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
In the related art, with the development of augmented reality AR and virtual reality VR technologies, various types of smart glasses devices are continuously emerging. For example, ‌ MR glasses (as a specific example of a glasses device) are smart devices that combine AR and VR technologies that can seamlessly blend the virtual world with the real world, providing an entirely new visual experience ‌. MR glasses capture real world objects and actions through technologies such as built-in ultrasonic sensors, cameras, LED screens and the like, and display virtual images in the glasses, so that users can see virtual objects in the real environment and interact with the virtual objects. Gesture recognition is currently performed mainly by panoramic cameras in eyeglass devices, but such gesture recognition is poor. The applicant has therefore proposed a way by which user gestures can be identified by hand joint detection. But the accuracy of hand joint detection in the prior art is lower.
In view of this, some embodiments of the present application provide a method for detecting a hand joint in an eyeglass device, where first a hand region is determined from a current frame image, and then, if it is confirmed that a confidence level of the hand region is greater than a preset threshold, the hand joint detection is performed using a trained target hand joint detection model, so as to obtain a hand joint detection result. Through the analysis of the confidence coefficient of the hand region, the support can be provided for the accuracy of the subsequent hand joint detection, and the hand joint detection is only carried out on the hand region meeting the confidence coefficient requirement, so that the computer overhead can be reasonably used, and the hardware resource can be reasonably utilized.
The overall construction of a system for hand joint detection in an eyeglass apparatus according to some embodiments of the present application is described below by way of example with reference to fig. 1.
As shown in fig. 1, some embodiments of the present application provide a system for hand joint detection in an eyeglass apparatus deployed in an eyeglass apparatus. The system for hand joint detection in the eyeglass apparatus may include a camera 100 and a detector 200. The camera 100 may collect video frames or image frames containing the hands of the target object. After the detector 200 receives the current frame image transmitted from the camera 100, it can identify the current frame image to obtain the hand region, and then calculate and analyze the confidence of the hand region to determine whether to detect the hand joint. In the case of confirming that the hand joint detection is performed, the target hand joint detection model pre-deployed inside the detector 200 may detect the hand region image to be detected corresponding to the hand region, and obtain a hand joint detection result. And carrying out gesture recognition or gesture analysis according to the detection result of the hand joint.
In some embodiments of the present application, the target hand joint detection model is pre-trained and deployed in a glasses device, which may be AR glasses, VR glasses, MR glasses, or the like, and embodiments of the present application are not limited in detail herein.
It can be appreciated that in order to achieve accurate detection of the hand joint of the current frame image, a target hand joint detection model needs to be acquired first. The following is an exemplary description of the implementation of the acquisition of a target hand joint detection model provided in some embodiments of the present application in conjunction with fig. 2. The following method for obtaining the target hand joint detection model may be performed by the detector 200 or may be performed by a server disposed outside the glasses apparatus, and embodiments of the present application are not limited herein.
Referring to fig. 2, fig. 2 is a flowchart of a method for obtaining a target hand joint detection model according to some embodiments of the present application, where the method for obtaining the target hand joint detection model may include:
S210, constructing a hand sample data set, wherein the hand sample data set comprises hand position areas in each hand sample in a plurality of hand samples and hand joint standard data, and the hand joint standard data comprises three-dimensional coordinates and expected values of each hand joint point. The three-dimensional coordinates of each hand joint point represent the two-dimensional position coordinates of each hand joint and the distance from each hand joint to the middle finger root area, and the expected value represents whether each hand joint point is in the boundary area of each hand sample.
For example, in some embodiments of the present application, rich training data is provided for subsequent model training by constructing a hand sample dataset containing 3D coordinates (x, y, z) (as one specific example of three-dimensional coordinates) and valid values (as one specific example of expected values) for each hand joint point of the hand (referred to as a hand joint for short). Where x, y in (x, y, z) is the position coordinates of the hand joint in the hand sample, and the z value of each hand joint point is mapped to the relative depth (i.e., distance) from the middle finger root region. In addition, the image of each hand sample is set with an image boundary, and the valid value represents whether each hand joint is in the image boundary, if so, the valid value is 1, otherwise, the valid value is 0. It should be noted that, there are 27 hand joints in a normal hand, and typically 21 hand joints are marked in the hand sample. In practical application, the number of hand joints in the hand sample can be set according to the actual requirement, so that the target hand joint detection model with the corresponding number can be conveniently trained, and the embodiment of the application is not particularly limited in the number of hand joints.
In some embodiments of the present application, S210 may include:
S211, carrying out normalization processing on the acquired hand image data to obtain a processed image.
For example, in some embodiments of the application, during the data collection phase, hand image data may be acquired by at least one of the following three means:
First, public image data that is appropriate for the eyeglass device scene is gathered from a public dataset, e.g., INTERHAND, EPIC-kichen, 3DPose, etc. For example, hand image data of the lens angle is collected because the distance of the palm and the lens is relatively similar to the camera 100 in the eyeglass device.
And secondly, generating hand image data, namely obtaining hand 3D models in various forms by using the 3D texture mapping. And driving a hand 3D model by using the joint by using the disclosed hand joint motion sequence data, projecting the image projected to a 2D plane according to the camera internal parameters in the glasses equipment, and fusing the image projected to the 2D plane with a large number of real images which do not contain hands to generate a large number of hand image data with rich background, hands and gestures. The hand image data includes information such as a rectangular position of the hand and a position of a hand joint. Note that the hand joint position definition of the hand 3D model in the above is identical to that mentioned above.
Thirdly, acquiring real hand image data, namely acquiring a large amount of hand image data containing hands and a small amount of pure background samples not containing hands by using target equipment, constructing hand position samples by using an open-source hand detection model (such as yolo and monado, mediapipe) in a multi-model voting mode, discarding samples with inconsistent voting results of the multiple models, detecting images in a rectangular hand position (namely the hand position) by using an existing open-source joint detection model, constructing joint position samples by using a multi-model voting mode, discarding samples with inconsistent voting results of the multiple models, and selecting hand image data with higher quality.
After the hand image data are obtained in the above manner, operations such as random rotation and perspective transformation (as a specific example of normalization processing) are performed on the hand image data, so that richer hand image data are obtained, and the richness of samples is increased. For the hand joint beyond the image boundary after transformation, the 3D coordinates of the hand joint are set to be (0, 0), and the corresponding valid value is set to be 0. The hand joint within the image boundary is then marked valid=1, and the 3D coordinates of the hand joint are determined according to its position in the image.
In addition, random blurring and noise adding processes (such as gaussian blurring, gaussian noise, pretzel noise, etc.) may be performed on portions of the hand image data, and portions of the data may not be subjected to blurring and noise adding processes, so as to facilitate subsequent application of the hand sample data set to subsequent first and second training phases, respectively.
And S212, normalizing the hand joint in the processed image, and obtaining the hand joint standard data.
For example, in some embodiments of the present application, the definition of the hand joints and the definition of the hand joint sequence in the hand image data acquired using the different modes are mapped uniformly into standard definitions, that is, the form of the three-dimensional coordinates mentioned above. For the missing hand joint or the hand joint beyond the image boundary, the position of the hand joint is set to (0, 0), and the valid value is set to 0.
S213, performing transformation operation on the hand position area in the processed image to obtain the plurality of hand samples, wherein the transformation operation comprises offset and/or expansion.
For example, in some embodiments of the present application, in order to cope with the hand position offset, operations such as random offset, random expansion, random shrinkage and the like may be performed on the hand position area in the processed image, so as to improve the robustness and stability of the target hand joint detection model of the subsequent training. For example, the offset direction and the offset distance are set to realize the offset processing of the hand position area to obtain the offset hand sample, or the area expansion value of the hand position area is set to expand the hand position area to the corresponding area. The embodiments of the present application are not particularly limited herein.
Specifically, when performing operations such as random offset and random expansion on the hand position area, the hand position area in the processed image needs to be marked first, so as to obtain a hand position rectangular frame. Left, top, width, height is used to denote the left, upper, wide, long of the rectangular box of hand positions, respectively. Random shifting is performed according to (left, top, width, height), for example, random noise e, f, e, f can be in the range of [ -10,10] is added to left and top, and updated data is obtained, that is, left=left+e, top=top+f. The hand position rectangular frame is randomly enlarged, namely random noise g, h, g and h can be added to the width and height, the range of h can be [ -10,10], and updated data is obtained, namely the width=width+g and the height=height+h. It can be appreciated that the above manner of adding random noise can realize random offset, random expansion or random contraction of the rectangular frame of hand position, and a rich hand sample data set for subsequent training is obtained. The value range of the random noise may be selected according to practical situations, and the embodiment of the present application is not specifically limited herein.
In addition, the hand sample detection method can also be used for keeping the proportion of high-quality hand sample pictures in the training hand sample data set as much as possible for the hand gestures in the case of difficult detection of finger shielding such as fist making and the like, and generating hand samples by using more corresponding articulation sequences. Wherein, the distance and the intersection of finger joint positions can be used for judging whether the shielding exists.
It should be noted that, in order to obtain a rich hand sample data set, various image processing manners may be used for the hand image data, for example, clipping into a uniform size, etc., and the embodiments of the present application are not limited to the above operations.
S220, training an initial hand joint detection model by using the hand sample data set, and determining the target hand joint detection model.
For example, in some embodiments of the present application, after a rich hand sample data set is constructed by the above embodiments, it may be used to train the initial hand joint detection model to obtain the target hand joint detection model.
In some embodiments of the application, the training of the initial hand joint detection model comprises a first training stage and a second training stage, wherein the first training stage is trained by using a hand sample data set which is not subjected to fuzzy processing and noise processing, and the second training stage is trained by using the hand sample data set which is subjected to the fuzzy processing and the noise processing.
For example, in some embodiments of the present application, in order to ensure convergence of the model, in an initial training phase of the model (as a specific example of the first training phase), no random blurring and noise-adding settings are performed, i.e., training is performed using a hand sample data set that is not subjected to blurring and noise-adding processing. And after the model converges and is gradually stabilized, adding the hand sample data set subjected to fuzzy and noise adding treatment for training. It should be noted that the quality of the model training output result after the blurring and noise adding processing is close to the quality of the real eyeglass device output result.
The above-described process is exemplarily set forth below.
In some embodiments of the present application, S220 may include converting the hand position area of each hand sample into a gray image of a preset size in the first training stage, inputting the gray image into an initial hand joint detection model, outputting hand joint point data, comparing the hand joint point data with the hand joint standard data, determining that the hand joint detection model of the present training meets a preset convergence condition, entering the second training stage, and executing the second training stage on the hand joint detection model of the present training to obtain the target hand joint detection model.
For example, in some embodiments of the present application, during model training, an image of a rectangular frame region of hand positions (as one specific example of a hand position region) in a hand sample data set is stretched into a 128×128 (as one specific example of a preset size) gray scale image. The gray scale image is then input into the initial hand joint detection model, outputting hand joint point data, for example, outputting (x, y, z) values of 21 hand joint points, and 21 valid values. And comparing the output hand joint node data with hand joint standard data in the hand sample data set, and confirming whether the hand joint detection model trained at the time meets the preset convergence condition. If the target hand joint detection model meets the requirement, a second training stage is entered, and after the second training stage is completed, the target hand joint detection model can be obtained. The preset size may be flexibly set, and the embodiment of the present application is not limited herein.
When entering the second training stage, the training may be performed based on the hand joint detection model of the present training, or may be performed based on the hand joint detection model of the present training after the model of the hand joint detection model of the present training is adjusted after the loss calculation is performed on the hand joint node data and the hand joint standard data, and the embodiment of the present application is not limited herein specifically.
In addition, whether the hand joint detection model trained at this time meets the preset convergence condition is judged, the accuracy of the result can be obtained by calculating the hand joint node data and the hand joint standard data, if the accuracy of the result is higher than a set value, the preset convergence condition is considered to be met, or an error value can be set, the error rate of the model can be obtained by calculating the hand joint node data and the hand joint standard data, and if the error rate of the model is lower than the set value, the preset convergence condition is considered to be met. In addition, the judgment criterion for completion of the second training stage may be that the number of training times has reached the set number of times, or that the accuracy of the model has been equal to or greater than the maximum value, or that the error rate of the model has been lower than the minimum value, or the like.
It should be understood that the end conditions of the first training phase and the second training phase may be set according to the actual application scenario, and embodiments of the present application are not limited thereto.
The specific process of hand joint detection in an eyeglass apparatus performed by the detector 200 provided by some embodiments of the present application is described below by way of example in connection with fig. 3.
Referring to fig. 3, fig. 3 is a flowchart of a method for detecting a hand joint in an eyeglass device according to some embodiments of the present application, where the method for detecting a hand joint in an eyeglass device may include:
s310, obtaining the confidence coefficient of the hand area corresponding to the current frame image.
For example, in some embodiments of the present application, after receiving a rectangular frame region (as a specific example of a hand region) of the hand position of the current frame image, the corresponding confidence thereof is calculated.
In some embodiments of the present application, S310 may include obtaining a hand region of the current frame image by using feature point matching if it is determined that the hand joint detection result of the previous frame image of the current frame image contains not less than a preset number of hand joint information, and detecting the hand region of the current frame image by using a hand joint detection model if it is determined that the hand joint detection result of the previous frame image contains less than the preset number of hand joint information.
For example, in some embodiments of the present application, the rectangular frame region of the hand position of the current frame image may be detected by the hand detection model, and the acquisition is obtained by feature point matching with the rectangular frame region of the hand position in the previous frame image. Which mode is specifically used is related to the number of hand joints (as a specific example of hand joint information) of the previous frame image output.
If the current frame image has the previous frame image, judging whether the number of hand joints detected by the target hand joint detection model on the previous frame image is more than 10 (as a specific example of the preset number), if so, detecting a rectangular area of the hand position in the current frame image in a characteristic point matching mode, otherwise, detecting by using the hand joint detection model. And if the current frame image does not exist in the previous frame image, detecting by using a hand joint detection model. In addition, the preset number of values can be flexibly set according to actual requirements, for example, 9 values, 11 values, etc., and the embodiment of the application is not limited to this.
It is understood that by detecting the hand position by using the feature point matching method under a certain condition, the calculation amount can be effectively reduced, and the hardware cost can be saved.
And S320, processing the hand region to obtain a hand region image to be detected under the condition that the confidence coefficient is confirmed to be larger than a preset threshold value.
For example, in some embodiments of the present application, if the confidence is greater than 0.95 (as one specific example of a preset threshold), the rectangular box region is stretched into a 128×128 gray image (as one specific example of a hand region image to be detected) using camera parameters and distortion parameters. If the confidence is not more than 0.95, the detection of the position of the opponent is considered to be failed, the detection of the hand joint is not carried out, and the next frame is directly processed.
S330, inputting the hand region image to be detected into a target hand joint detection model, and obtaining a hand joint detection result.
For example, in some embodiments of the present application, a 128×128 gray image is input into the target hand joint detection model trained by the above embodiments to perform joint detection, so as to obtain a hand joint detection result.
In an application scenario, if the hand joint data in the hand joint detection result of the current frame image is greater than a preset threshold value, a rectangular area of the hand position can be determined for the next frame image by using a characteristic point matching mode, and then whether hand joint detection is performed is determined through confidence coefficient calculation. And the operation is circularly executed until all the frame images are detected.
The following is an exemplary description of a specific procedure for hand joint detection in an eyeglass apparatus provided in accordance with some embodiments of the present application in conjunction with fig. 4.
Referring to fig. 4, fig. 4 is a flowchart illustrating a method for detecting a hand joint in an eyeglass device according to some embodiments of the present application.
S410, constructing a hand sample data set.
S420, training the initial hand joint detection model by using the hand sample data set, and determining a target hand joint detection model.
S430, identifying the current frame image and acquiring a hand area.
S440, obtaining the confidence coefficient of the hand area corresponding to the current frame image.
And S450, processing the hand region to obtain a hand region image to be detected under the condition that the confidence coefficient is greater than the preset threshold value.
S460, inputting the hand area image to be detected into a target hand joint detection model, and obtaining a hand joint detection result.
It should be noted that, the above embodiment only takes one frame of image as an example to illustrate the implementation process of the hand joint detection, and in practical application, there may be a case of continuously detecting each frame of image in the video frame. Whether continuous tracking detection is performed on a frame of image, the specific implementation process of S410 to S460 may refer to the method embodiments provided above, and detailed descriptions are omitted here for avoiding repetition.
Referring to fig. 5, fig. 5 is a block diagram illustrating an apparatus for detecting a hand joint in an eyeglass device according to some embodiments of the present application. It should be understood that the apparatus for detecting a hand joint in the eyeglass device corresponds to the above-described method embodiment, and is capable of performing the respective steps involved in the above-described method embodiment, and specific functions of the apparatus for detecting a hand joint in the eyeglass device may be referred to the above description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy.
The device for detecting the hand joints in the glasses equipment of fig. 5 comprises at least one software functional module which can be stored in a memory in a form of software or firmware or solidified in the device for detecting the hand joints in the glasses equipment, wherein the device for detecting the hand joints in the glasses equipment comprises an acquisition module 510 for acquiring the confidence coefficient of the hand region corresponding to the current frame image, a processing module 520 for processing the hand region to obtain a hand region image to be detected under the condition that the confidence coefficient is confirmed to be larger than a preset threshold value, and a detection module 530 for inputting the hand region image to be detected into a target hand joint detection model to acquire a hand joint detection result.
It will be clear to those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding procedure in the foregoing method for the specific working procedure of the apparatus described above, and this will not be repeated here.
Some embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the operations of the method according to any of the above-described methods provided by the above-described embodiments.
Some embodiments of the present application also provide a computer program product, where the computer program product includes a computer program, where the computer program when executed by a processor may implement operations of a method corresponding to any of the above embodiments of the above method provided by the above embodiments.
As shown in fig. 6, some embodiments of the present application provide an electronic device 600, the electronic device 600 comprising a memory 610, a processor 620 and a computer program stored on the memory 610 and executable on the processor 620, wherein the processor 620 can implement the method of any of the embodiments described above when reading the program from the memory 610 and executing the program via a bus 630.
The processor 620 may process the digital signals and may include various computing structures. Such as a complex instruction set computer architecture, a reduced instruction set computer architecture, or an architecture that implements a combination of instruction sets. In some examples, the processor 620 may be a microprocessor.
Memory 610 may be used for storing instructions to be executed by processor 620 or data related to execution of the instructions. Such instructions and/or data may include code to implement some or all of the functions of one or more of the modules described in embodiments of the present application. The processor 620 of the disclosed embodiments may be configured to execute instructions in the memory 610 to implement the methods shown above. Memory 610 includes dynamic random access memory, static random access memory, flash memory, optical memory, or other memory known to those skilled in the art.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

Claims (9)

Translated fromChinese
1.一种眼镜设备中手关节检测的方法,其特征在于,包括:1. A method for detecting hand joints in an eyewear device, comprising:获取当前帧图像对应的手部区域的置信度;Get the confidence of the hand area corresponding to the current frame image;在确认所述置信度大于预设阈值的情况下,对所述手部区域进行处理,得到待检测手部区域图像;When it is confirmed that the confidence level is greater than a preset threshold, the hand region is processed to obtain an image of the hand region to be detected;将所述待检测手部区域图像输入到目标手关节检测模型中,获取手关节检测结果,其中,所述目标手关节检测模型是通过手样本数据集训练得到的;所述手样本训练数据集中的每个手关节点的三维坐标表征每个手关节的二维位置坐标和每个手关节到中指指根区域的距离,预期值表征每个手关节点是否处于每个手样本的边界区域内。The image of the hand area to be detected is input into a target hand joint detection model to obtain a hand joint detection result, wherein the target hand joint detection model is obtained by training a hand sample data set; the three-dimensional coordinates of each hand joint point in the hand sample training data set represent the two-dimensional position coordinates of each hand joint and the distance from each hand joint to the middle finger root area, and the expected value represents whether each hand joint point is within the boundary area of each hand sample.2.如权利要求1所述的方法,其特征在于,在所述将所述待检测手部区域图像输入到目标手关节检测模型之前,所述方法还包括:2. The method according to claim 1, characterized in that before inputting the hand area image to be detected into the target hand joint detection model, the method further comprises:构建手样本数据集,所述手样本数据集中包括多个手样本中每个手样本中的手位置区域和手关节标准数据,其中,所述手关节标准数据包括:每个手关节点的三维坐标和预期值;Constructing a hand sample data set, wherein the hand sample data set includes a hand position area and hand joint standard data in each hand sample of a plurality of hand samples, wherein the hand joint standard data includes: a three-dimensional coordinate and an expected value of each hand joint point;利用所述手样本数据集对初始手关节检测模型进行训练,确定所述目标手关节检测模型。The initial hand joint detection model is trained using the hand sample data set to determine the target hand joint detection model.3.如权利要求2所述的方法,其特征在于,所述构建手样本数据集,包括:3. The method according to claim 2, wherein the step of constructing a hand sample data set comprises:对获取的手部图像数据进行归一化处理,得到处理后图像;Normalizing the acquired hand image data to obtain a processed image;对所述处理后图像中的手关节进行标准化,获取所述手关节标准数据;Standardizing the hand joints in the processed image to obtain standard data of the hand joints;对所述处理后图像中的手位置区域进行变换操作,得到所述多个手样本,其中,所述变换操作包括:偏移和/或扩大。Performing a transformation operation on the hand position area in the processed image to obtain the multiple hand samples, wherein the transformation operation includes: shifting and/or expanding.4.如权利要求2所述的方法,其特征在于,所述初始手关节检测模型的训练包括:第一训练阶段和第二训练阶段;其中,所述第一训练阶段使用未经过模糊处理和噪声处理的手样本数据集进行训练;所述第二训练阶段使用经过所述模糊处理和所述噪声处理的手样本数据集进行训练。4. The method as claimed in claim 2 is characterized in that the training of the initial hand joint detection model includes: a first training stage and a second training stage; wherein the first training stage uses a hand sample data set that has not been blurred and noise processed for training; and the second training stage uses a hand sample data set that has been blurred and noise processed for training.5.如权利要求4所述的方法,其特征在于,所述利用所述手样本数据集对初始手关节检测模型进行训练,确定所述目标手关节检测模型,包括:5. The method according to claim 4, wherein the step of training an initial hand joint detection model using the hand sample data set to determine the target hand joint detection model comprises:在所述第一训练阶段,将每个手样本的所述手位置区域转换为预设尺寸的灰度图像后,输入到初始手关节检测模型,输出手关节点数据;In the first training stage, the hand position area of each hand sample is converted into a grayscale image of a preset size, and then input into an initial hand joint detection model to output hand joint point data;通过将所述手关节点数据与所述手关节标准数据进行对比,确定本次训练的手关节检测模型符合预设收敛条件后,进入所述第二训练阶段;By comparing the hand joint point data with the hand joint standard data, after determining that the hand joint detection model trained this time meets the preset convergence condition, entering the second training stage;对本次训练的手关节检测模型执行第二训练阶段,得到所述目标手关节检测模型。The second training phase is performed on the hand joint detection model trained this time to obtain the target hand joint detection model.6.如权利要求1-2、5中任一项所述的方法,其特征在于,所述手部区域是通过如下方法获取的:6. The method according to any one of claims 1 to 2 and 5, characterized in that the hand region is obtained by:若确定所述当前帧图像的上一帧图像的手关节检测结果中含有不小于预设数量的手关节信息,则利用特征点匹配,获取所述当前帧图像的手部区域;If it is determined that the hand joint detection result of the previous frame image of the current frame image contains no less than a preset amount of hand joint information, then using feature point matching to obtain the hand region of the current frame image;若确定所述上一帧图像的手关节检测结果中含有低于所述预设数量的手关节信息,则利用手关节检测模型检测所述当前帧图像的手部区域。If it is determined that the hand joint detection result of the previous frame image contains less than the preset amount of hand joint information, the hand joint detection model is used to detect the hand area of the current frame image.7.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,其中,所述计算机程序被处理器运行时执行如权利要求1-6中任意一项权利要求所述的方法。7. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, wherein the computer program executes the method according to any one of claims 1 to 6 when executed by a processor.8.一种电子设备,其特征在于,包括存储器、处理器以及存储在所述存储器上并在所述处理器上运行的计算机程序,其中,所述计算机程序被所述处理器运行时执行如权利要求1-6中任意一项权利要求所述的方法。8. An electronic device, characterized in that it comprises a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the computer program executes the method according to any one of claims 1 to 6 when executed by the processor.9.一种计算机程序产品,其特征在于,所述的计算机程序产品包括计算机程序,其中,所述的计算机程序被处理器运行时执行如权利要求1-6中任意一项权利要求所述的方法。9. A computer program product, characterized in that the computer program product comprises a computer program, wherein the computer program executes the method according to any one of claims 1 to 6 when executed by a processor.
CN202510186010.7A2025-02-202025-02-20 Method, storage medium, electronic device and product for detecting hand joints in eyewear equipmentPendingCN119672766A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510186010.7ACN119672766A (en)2025-02-202025-02-20 Method, storage medium, electronic device and product for detecting hand joints in eyewear equipment

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510186010.7ACN119672766A (en)2025-02-202025-02-20 Method, storage medium, electronic device and product for detecting hand joints in eyewear equipment

Publications (1)

Publication NumberPublication Date
CN119672766Atrue CN119672766A (en)2025-03-21

Family

ID=94992291

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510186010.7APendingCN119672766A (en)2025-02-202025-02-20 Method, storage medium, electronic device and product for detecting hand joints in eyewear equipment

Country Status (1)

CountryLink
CN (1)CN119672766A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120014714A (en)*2025-04-212025-05-16杭州秋果计划科技有限公司 Hand gesture recognition method, head mounted display device and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109635630A (en)*2018-10-232019-04-16百度在线网络技术(北京)有限公司Hand joint point detecting method, device and storage medium
CN109800676A (en)*2018-12-292019-05-24上海易维视科技股份有限公司Gesture identification method and system based on depth information
CN110807410A (en)*2019-10-302020-02-18北京百度网讯科技有限公司Key point positioning method and device, electronic equipment and storage medium
CN111368768A (en)*2020-03-102020-07-03浙江理工大学桐乡研究院有限公司Human body key point-based employee gesture guidance detection method
US20210174519A1 (en)*2019-12-102021-06-10Google LlcScalable Real-Time Hand Tracking
CN114373191A (en)*2022-01-042022-04-19北京沃东天骏信息技术有限公司Hand condyle positioning method and device
CN115220574A (en)*2022-06-172022-10-21Oppo广东移动通信有限公司Pose determination method and device, computer readable storage medium and electronic equipment
CN115862067A (en)*2022-12-052023-03-28上海高德威智能交通系统有限公司 Hand gesture recognition method, device, equipment and storage medium
CN116263622A (en)*2021-12-132023-06-16北京字跳网络技术有限公司Gesture recognition method, gesture recognition device, electronic equipment, gesture recognition medium and gesture recognition program product
CN116434279A (en)*2023-04-242023-07-14摩尔线程智能科技(北京)有限责任公司Gesture detection method and device, electronic equipment and storage medium
CN117789256A (en)*2024-02-272024-03-29湖北星纪魅族集团有限公司Gesture recognition method, device, equipment and computer readable medium
CN118192787A (en)*2022-12-062024-06-14北京眼神智能科技有限公司Hand joint point detection method, device, computer readable storage medium and equipment
CN118968540A (en)*2023-05-152024-11-15宝马股份公司 Method and system for gesture recognition

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109635630A (en)*2018-10-232019-04-16百度在线网络技术(北京)有限公司Hand joint point detecting method, device and storage medium
CN109800676A (en)*2018-12-292019-05-24上海易维视科技股份有限公司Gesture identification method and system based on depth information
CN110807410A (en)*2019-10-302020-02-18北京百度网讯科技有限公司Key point positioning method and device, electronic equipment and storage medium
US20210174519A1 (en)*2019-12-102021-06-10Google LlcScalable Real-Time Hand Tracking
CN111368768A (en)*2020-03-102020-07-03浙江理工大学桐乡研究院有限公司Human body key point-based employee gesture guidance detection method
CN116263622A (en)*2021-12-132023-06-16北京字跳网络技术有限公司Gesture recognition method, gesture recognition device, electronic equipment, gesture recognition medium and gesture recognition program product
CN114373191A (en)*2022-01-042022-04-19北京沃东天骏信息技术有限公司Hand condyle positioning method and device
CN115220574A (en)*2022-06-172022-10-21Oppo广东移动通信有限公司Pose determination method and device, computer readable storage medium and electronic equipment
CN115862067A (en)*2022-12-052023-03-28上海高德威智能交通系统有限公司 Hand gesture recognition method, device, equipment and storage medium
CN118192787A (en)*2022-12-062024-06-14北京眼神智能科技有限公司Hand joint point detection method, device, computer readable storage medium and equipment
CN116434279A (en)*2023-04-242023-07-14摩尔线程智能科技(北京)有限责任公司Gesture detection method and device, electronic equipment and storage medium
CN118968540A (en)*2023-05-152024-11-15宝马股份公司 Method and system for gesture recognition
CN117789256A (en)*2024-02-272024-03-29湖北星纪魅族集团有限公司Gesture recognition method, device, equipment and computer readable medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120014714A (en)*2025-04-212025-05-16杭州秋果计划科技有限公司 Hand gesture recognition method, head mounted display device and storage medium
CN120014714B (en)*2025-04-212025-07-11杭州秋果计划科技有限公司 Hand gesture recognition method, head mounted display device and storage medium

Similar Documents

PublicationPublication DateTitle
US11928800B2 (en)Image coordinate system transformation method and apparatus, device, and storage medium
CN111862296B (en)Three-dimensional reconstruction method, three-dimensional reconstruction device, three-dimensional reconstruction system, model training method and storage medium
CN109934847B (en)Method and device for estimating posture of weak texture three-dimensional object
US11651581B2 (en)System and method for correspondence map determination
US10225473B2 (en)Threshold determination in a RANSAC algorithm
JP5243529B2 (en) Camera pose estimation apparatus and method for extended reality image
JP5538617B2 (en) Methods and configurations for multi-camera calibration
CN109376631B (en)Loop detection method and device based on neural network
CN112790758A (en) A computer vision-based human motion measurement method, system and electronic device
CN110443148A (en)A kind of action identification method, system and storage medium
CN119672766A (en) Method, storage medium, electronic device and product for detecting hand joints in eyewear equipment
CN109215131B (en) Driving method and device for virtual face
CN117894072B (en) A method and system for hand detection and three-dimensional posture estimation based on diffusion model
JP2017123087A (en) Program, apparatus and method for calculating normal vector of planar object reflected in continuous captured images
CN111476812A (en)Map segmentation method and device, pose estimation method and equipment terminal
WO2022237048A1 (en)Pose acquisition method and apparatus, and electronic device, storage medium and program
CN112233161B (en)Hand image depth determination method and device, electronic equipment and storage medium
CN111680573B (en)Face recognition method, device, electronic equipment and storage medium
JP5643147B2 (en) Motion vector detection apparatus, motion vector detection method, and motion vector detection program
CN115546876B (en)Pupil tracking method and device
CN117315635A (en)Automatic reading method for inclined pointer type instrument
JP7326965B2 (en) Image processing device, image processing program, and image processing method
JP2018097795A (en)Normal line estimation device, normal line estimation method, and normal line estimation program
JP4942197B2 (en) Template creation apparatus, facial expression recognition apparatus and method, program, and recording medium
Wang et al.3D-2D spatiotemporal registration for sports motion analysis

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp