Detailed Description
Virtual display devices such as AR and VR generally refer to a head-mounted display device (abbreviated as a head display or a helmet, such as VR glasses and AR glasses) with an independent processor, and have functions of independent operation, input and output. The virtual display device can be externally connected with a handle, and a user controls a virtual picture displayed by the virtual display device through operating the handle, so that conventional interaction is realized. Therefore, the handles are often sold in combination with virtual display devices such as AR and VR.
Taking a game scene as an example, referring to fig. 1, an application scene schematic diagram of VR device and a handle provided in an embodiment of the present application is shown. As shown in fig. 1, the virtual game screen of the VR device is put on the television by utilizing the large screen advantage of the television, so that the entertainment is higher. The player controls the game picture of VR head display through the handle to make the reflection on the limbs according to the change of game scene, thereby experience immersive experience like being personally on the scene, promote the interest of recreation.
In the game scene shown in fig. 1, in the interaction process, the relative pose of the handle and virtual display devices such as AR, VR and the like is calculated through a handle positioning algorithm, so that three-dimensional interaction of the virtual display devices in a three-dimensional space is realized, and immersive experience is improved.
At present, according to the difference of output position appearance, the handle that uses commonly used includes 3DOF handle and 6DOF handle, and wherein, 3DOF output 3D's rotation gesture, and 6DOF handle output 3D's translation position and 3D's rotation gesture, for 3DOF handle, the game action that 6DOF handle can make is more complicated, and the interest is stronger.
Fig. 2A is a schematic diagram of a 3DOF handle according to an embodiment of the present application. The DOF handle provides a 3DOF rotational pose for a 3DOF VR, AR, etc. virtual display device using an internal IMU. Specifically, according to measurement data collected by the IMU, a stable rotation gesture is output by adopting algorithms such as complementary filtering or kalman filtering (kalman) so as to realize basic functions such as clicking, dragging and the like in the virtual world based on the rotation gesture of the 3DoF handle. However, the accumulated error of translation by the IMU integral is larger, so that the translation position of the 3DOF handle is not used, and thus, when the 3DOF handle is used, complex game actions cannot be made due to the lack of the translation position, and the experience is poor.
As shown in fig. 2B, a schematic diagram of a 6DOF handle is provided for an embodiment of the present application, where the 6DOF handle provides a 6DOF pose (including a dimensional rotation pose and a translation position) for virtual display devices such as AR, VR, etc. of 6DOF using a visual positioning technology. As shown in fig. 2B, the 6DOF handle has one more LED lamp ring than the normal 3DOF handle, and the white spot hole is the position of each LED lamp. While virtual display devices such as AR, VR, etc. contain multiple cameras, typically 2 or 4, as shown in fig. 2C. The LED lamp ring images can be acquired by a multi-view system on virtual display equipment such as AR, VR and the like, and the 6DOF pose of the 6DOF handle relative to the virtual display equipment such as AR, VR and the like is output by utilizing a visual positioning algorithm, and the unconstrained 6DOF interaction operation can be realized after the 6DOF pose is converted into a virtual world.
As shown in fig. 2D, another 6DOF handle schematic is provided for embodiments of the present disclosure. The 6DFO handle may emit infrared light, and since infrared light is generally difficult to observe, a multi-camera of a virtual display device such as AR, VR, etc. is generally set to a mode in which two types of short exposure and natural exposure are cyclically switched to observe the position of the 6DOF handle, thereby outputting a 6DOF pose.
Although the 6DOF handle has more translational position output than the 3DOF handle, the structural design and circuit control of the 6DOF handle are more complex than those of the 3DOF handle, the cost is higher, the power consumption is larger, the cost of using VR equipment and AR equipment is not reduced, and the popularization and application of the VR equipment and the AR equipment are limited.
In view of this, the embodiment of the application provides a method and a device for determining a 6DOF pose based on 3D gesture recognition, which uses a 3DOF handle to output the 6DOF pose through a 3D gesture recognition technology, thereby reducing the cost and power consumption of the 6DOF pose. According to the embodiment of the application, four fingertips except the thumbs are selected as marking points for positioning the 3DOF handle, the four fingertips are fixedly pressed at preset position points of the 3DOF handle, and the thumbs can move keys, so that the 3DOF handle is not influenced to operate while the positioning accuracy is ensured, the 3D coordinates of the four fingertips under a virtual display device coordinate system can be directly calculated through a 3D gesture recognition technology, the 3D coordinates of the four fingertips under the 3DOF handle coordinate system are combined, the alignment of 3D-3D coordinates is realized, the measurement data of the IMU of the 3DOF handle after the alignment is integrated, the relative 6D pose of the 3DOF handle and a head display is obtained, and the relative 6D pose is converted to the aligned reference coordinate system and then output, so that the 3DOF handle with low cost, low power consumption and simple structure is utilized, and the high-accuracy 6DOF pose is output. Meanwhile, in order to ensure stable and accurate fingertip 3D coordinates, a hand gesture correction module is added in the 3D gesture recognition technology, a standard hand model of a grip handle is used as a reference, and recognition results are corrected and optimized, so that accurate fingertip 3D coordinates are output, and the accuracy of the 6DOF pose is improved.
Referring to fig. 3, an overall scheme diagram of outputting a 6DOF pose using a 3DOF handle is provided in an embodiment of the present application. As shown in fig. 3, the virtual display device has a multi-camera system, which can collect hand images of holding the 3DOF handle and send the hand images to the processor of the virtual display device, the processor performs gesture recognition on the received hand images, extracts four pointing points fixed at preset position points of the 3DOF handle except for thumbs, calculates 3D coordinates of four fingertips under the coordinate system of the virtual display device, and the preset position points of the 3DOF handle where the four fingertips are located can be determined according to the self structure of the 3DOF handle, so that by means of the preset position points, 3D coordinates of the four fingertips under the coordinate system of the 3DOF handle can be obtained, and by aligning the 3D coordinates under the coordinate system of the virtual display device with the 3D coordinates under the coordinate system of the 3DOF handle, the relative pose relationship between the virtual display device and the 3DOF handle is determined, so that the unification of the reference coordinate system of the virtual display device and the 3DOF handle is completed. Further, from the alignment time point of the reference coordinate system, the measurement data of the IMU of the 3DOF handle is integrated, the initial 6DOF pose of the 3DOF handle in the reference coordinate system is determined, and the rotation pose is accurate in consideration of the fact that the accumulated error of the IMU integral to the translation position is larger, so that the 3DOF handle is visually positioned by the hand image acquired by the multi-camera to obtain the visual 6DOF pose, and the initial 6DOF pose of the 3DOF handle can be updated to reduce the accumulated error of the IMU integral to the translation position due to the fact that the visual positioning error is smaller, the availability of the translation position output by the 3DOF handle is improved, and the 6DOF pose comprising the translation position and the rotation pose is obtained.
According to the embodiment of the application, the fact that the thumb needs to control the handle keys and calculation of the gesture of the 3D is not facilitated is considered, so that only other four fingertips fixed at preset position points of the 3DOF handle are selected to perform 6DOF positioning, grooves are designed for the four fingertips according to ergonomics in structure, the 3DOF handle is convenient to hold and calculation of the gesture of the 3D is facilitated, on the other hand, the embodiment of the application utilizes hand images acquired by a multi-camera to perform visual positioning on the 3DOF handle, and optimizes the initial 6DOF gesture of the 3DOF handle by using a visual positioning result, so that the precision of the 6DOF gesture is improved.
Based on the overall scheme shown in fig. 3, applied to a 3DOF handle, the embodiment of the application provides a method flowchart for determining a 6DOF pose based on 3D gesture recognition, and referring to fig. 4, the flowchart is executed by a virtual display device connected with the 3DOF handle and having an independent processor, and mainly comprises the following steps:
S401, recognizing gestures in hand images acquired by a multi-camera of the virtual display device, and determining first 3D coordinates of four fingertips, except for thumbs, fixed at preset position points of a 3DOF handle under a coordinate system of the virtual display device.
Generally, in order to facilitate the user to hold the handle, a groove (i.e., a preset position point) is provided on the structure of the handle, and the user can place the fingertip at the groove position. In consideration of the difference between the thumb and other four fingers and the need of the fingers for controlling the keys of the handle, the four fingertips except the thumb are placed in the grooves of the 3DOF handle, namely, the four fingertips except the thumb are fixed at preset position points of the 3DOF handle.
Referring to fig. 5, a schematic diagram of a process for extracting first 3D coordinates of four fingertips in a virtual display device coordinate system through gesture recognition according to an embodiment of the present application is provided. As shown in fig. 5, the main content of the process includes gesture area detection, hand joint point extraction and first 3D coordinate determination, and the specific implementation flow is shown in fig. 6, and mainly includes the following steps:
S4011, detecting a gesture area holding the 3DOF handle from hand images acquired by the multi-camera.
The outer surface of the virtual display device comprises a plurality of cameras, each camera has different orientations, and hand images with different angles can be acquired. Each hand image was input into a pre-trained object detection model, and the gesture area holding the 3DOF handle was detected using the object detection model, as shown in fig. 5.
The embodiment of the application does not have a limiting requirement on the target detection model, for example, a traditional machine learning algorithm (such as a support vector machine (Support Vector Machine, SVM)) can be adopted, and a deep learning algorithm (such as a convolutional neural network (Convolutional Neural Networks, CNN) and YOLOv3 network) can be adopted.
S4012, performing gesture estimation on the gesture area, and extracting hand joint points.
When S4012 is executed, a hand node detection model trained in advance is adopted to perform gesture estimation on each gesture area, and 21 hand nodes are extracted. Referring to fig. 7, a schematic diagram of 21 hand joint nodes is provided in an embodiment of the present application, where each hand joint node corresponds to a unique identifier. The extraction technology of the hand joint point is mature, and this part is not important to the application, and will not be described here.
S4013, determining a first 3D coordinate of each hand joint point under the virtual display device coordinate system through a multi-order matching algorithm.
According to the embodiment of the application, the depth information of each hand joint point can be determined according to the hand joint points extracted from the gesture area corresponding to the multi-camera, and the first 3D coordinate of each hand joint point under the virtual display device coordinate system can be determined by combining the internal parameters calibrated in advance by each camera. The specific implementation process is shown in fig. 8, and mainly comprises the following steps:
And S4013_1, respectively matching the hand joint points extracted in the gesture areas corresponding to the main cameras with the hand joint points extracted in the gesture areas corresponding to the other cameras.
Because the multi-view camera is a hand image shot from different angles, one camera is selected from the multi-view camera to serve as a main camera and the other cameras serve as auxiliary cameras according to the richness of hand information contained in the shot hand image, and hand joint points extracted from gesture areas corresponding to the main camera are respectively matched with hand joint points extracted from gesture areas corresponding to the other cameras.
And S4013_2, determining depth information of each hand joint point according to each matching result.
According to the hand joint points extracted in the gesture area corresponding to the main camera and the matching results of the hand joint points extracted in the gesture area corresponding to other cameras, the distance between each hand joint point and the corresponding camera is calculated, and as the camera is positioned on the virtual display device, the distance between the hand joint point and the camera can be used as the distance between the hand joint point and the virtual display device, so that the depth information of each hand joint point is obtained.
And S4013_3, determining a first 3D coordinate of each hand joint under a virtual display device coordinate system according to the internal parameters calibrated in advance by the multi-camera, the depth information of each hand joint and the image coordinate of each hand joint in the corresponding gesture area.
In S4013_3, the image coordinates of each hand joint in the gesture area can be directly read, the depth information is taken as the Z axis perpendicular to the virtual display device, and the first 3D coordinates of each hand joint in the virtual display device coordinate system can be determined by combining the internal parameters calibrated in advance by the multiple cameras.
S4014, acquiring first 3D coordinates of four fingertips fixed at preset position points of the 3DOF handle from each hand joint point.
For example, still taking fig. 7 as an example, according to the identifications of 21 hand joints, the identifications of the four fingertips except the thumb are 8, 12, 16 and 20 respectively, so from the first 3D coordinates of the 21 hand joints, the first 3D coordinates of the four fingertips fixed at the preset position point of the 3DOF handle can be obtained according to the identifications of the hand joints.
Considering that the hand joint extracted by gesture recognition may deviate due to hand shake or a manner of holding a handle, an error of the first 3D coordinates of the hand joint may be increased, and thus, in some embodiments, the gesture 3D correction content is further included before the first 3D coordinates of the four fingertips in the virtual display device coordinate system is determined, as shown in fig. 9. In the implementation, according to a pre-established standard gesture reference model, the detected gesture is optimized by adopting a least square method, so that the determination error of the first 3D coordinates of four fingertips caused by hand shake or wrong gestures is reduced, and the accuracy of 6DOF pose determination is improved.
S402, acquiring second 3D coordinates of four fingertips in a 3DOF handle coordinate system.
In the embodiment of the application, the preset position points of the 3DOF handles where the four fingertips are located can be determined according to the self structure of the 3DOF handles, so that the second 3D coordinates of the four fingertips under the 3DOF handle coordinate system can be obtained by means of the preset position points.
In using the virtual display device and the 3DOF handle, the virtual display device and the 3DOF handle are independently movable as two independent devices, and therefore, the virtual display device and the 3DOF handle have respective reference coordinate systems, and alignment of the two is required. For a specific procedure, see S403.
S403, determining a relative pose relation between the virtual display device and the 3DOF handle according to the first 3D coordinate and the second 3D coordinate so as to align a reference coordinate system of the virtual display device and the 3DOF handle.
Assume that the first 3D coordinates of the four fingertips in the virtual display device coordinate system are respectivelyThe second 3D coordinates in the 3DOF handle coordinate system are respectivelyDetermining a relative pose relationship between the virtual display device and the 3DOF handle by aligning a first 3D coordinate in a virtual display device coordinate system and a second 3D coordinate in a 3DOF handle coordinate systemThe formula is as follows:
through the relative pose relationThe pose of the 3DOF handle under the first reference coordinate system can be converted into the second reference coordinate system of the virtual display device, and the pose of the virtual display device under the second reference coordinate system can be converted into the first reference coordinate system of the 3DOF handle, so that the alignment of the reference coordinate systems is realized.
S404, integrating measurement data of the IMU of the 3DOF handle from the alignment time point of the reference coordinate system to determine the initial 6DOF pose of the 3DOF handle in the reference coordinate system.
After the reference coordinate systems of the virtual display device and the 3DOF handle are aligned, the pose of the virtual display device and the 3DOF handle may be determined under the same reference coordinate system (e.g., a second reference coordinate system in which the virtual display device is located). The 6DOF pose of the virtual display device under the reference coordinate system can be directly read out through a positioning device in the virtual display device, the 3DOF handle is predicted by adopting Kalman filtering from the alignment time point of the reference coordinate system, namely, the measurement data of the IMU of the 3DOF handle is integrated under the aligned reference coordinate system, and the initialization positioning is completed.
The process of determining the pose of the 3DOF handle in the aligned reference coordinate system is shown in fig. 10, and mainly includes the following steps:
S4041, acquiring an acceleration measurement value of the accelerometer in the IMU at an alignment time point of the reference coordinate system, and performing secondary integration on the acceleration measurement value in the time dimension to obtain the translation position of the 3DOF handle in the reference coordinate system.
From the mathematical relationship among acceleration, speed and displacement, from the alignment time point of the reference coordinate system, the speed information of the 3DOF handle can be obtained by integrating the acceleration measured value acquired by the accelerometer in the IMU once in the time dimension, and the translation position (namely displacement information) of the 3DOF handle in the reference coordinate system can be obtained by integrating the speed once (namely integrating the acceleration measured value twice).
S4042, acquiring angular velocity measurement values of gyroscopes in the IMU at alignment time points of the reference coordinate system, and integrating the angular velocity measurement values once in the time dimension to obtain the rotation gesture of the 3DOF handle in the reference coordinate system.
From the mathematical relationship between the rotation angle and the angular velocity, from the alignment time point of the reference coordinate system, the rotational attitude (i.e. the three-axis rotation angle) of the 3DOF handle in the reference coordinate system can be obtained by integrating the angular velocity measurement value acquired by the gyroscope in the IMU once in the time dimension.
S4043 determines an initial 6DOF pose of the 3DOF handle in the reference frame from the translational position and the rotational pose.
Wherein, the front three-dimension of the 6DOF pose is a translation position, and the rear three-dimension is a rotation pose.
However, because the IMU integral has drift, the longer the accumulation time, the larger the offset, and the smaller the influence of the offset on the rotation gesture, the larger the influence on the translation position, and the inaccurate translation position can be caused, so that the initial 6DOF gesture needs to be corrected.
S405, performing visual positioning on the 3DOF handle, and updating the initial 6DOF pose by using the visual 6DOF pose.
In the embodiment of the application, in order to offset the accumulated error of IMU integral positioning, the 3DOF handle can be visually positioned, and the initial 6DOF pose is updated by using the visual 6DOF pose. The specific implementation process is shown in fig. 11, and mainly comprises the following steps:
S4051, tracking the 3DOF handle by using hand images acquired by the multi-camera from the reference coordinate system alignment time point, and redetermining the relative pose relationship between the virtual display device and the 3DOF handle.
In the embodiment of the application, after a gesture area is detected in hand images acquired from a multi-camera, tracking a 3DOF handle by using the hand images from a reference coordinate system alignment time point, determining a gesture area at the current moment again, determining a first 3D coordinate of a hand joint point extracted from the gesture area at the current moment, and determining the relative pose relationship between the virtual display device at the current moment and the 3DOF handle again by using a formula 1 in combination with a second 3D coordinate of each hand joint point at the current moment.
In S4051, compared with the gesture area detection performed on each frame of hand image, the tracking method can save the calculation amount and improve the positioning speed.
S4052, determining the visual 6DOF pose of the 3DOF handle in the reference coordinate system according to the 6DOF pose of the virtual display device in the reference coordinate system and the new relative pose relation.
In S4052, the 6DOF pose of the virtual display device in the reference coordinate system may be directly read out by the positioning device in the virtual display device, and in combination with the new relative pose relationship at the current moment, the visual 6DOF pose of the 3DOF handle in the reference coordinate system may be determined.
S4053, updating the initial 6DOF pose with the visual 6DOF pose.
Generally, the measurement frequency of the IMU is higher than the acquisition frame rate of the camera, so that the IMU has already performed a section of integration process in the time range of the camera acquiring two adjacent frames of hand images, and therefore, the initial 6DOF pose of the 3DOF handle can be updated with the visual positioning result of each frame of the multi-view camera. Because the accuracy of the 6DOF pose of visual positioning is higher, the accumulated error generated by IMU integration on the translation position can be accurately corrected, and the positioning accuracy is improved.
In an alternative embodiment, the method of updating the IMU positioning results with visual positioning results may be implemented by introducing kalman filtering. The Kalman filtering is an error estimation algorithm for solving the minimum system variance by utilizing a state equation according to the predicted data and the observed data input by a linear system, and has the core ideas that firstly, a random dynamic variable in the system is selected, a prediction model is built, then, an optimal estimated value is calculated by the state equation according to the observed data in real time of the system, and the Kalman filtering is a continuous prediction-update calculation process, and has the advantages of small data processing capacity and strong real-time performance.
In the prediction part of the embodiment of the application, the angular velocity measurement value and the acceleration measurement value of the IMU in the 3DOF handle are respectively integrated, so that the initial 6DOF pose of the 3DOF handle in a reference coordinate system can be roughly calculated. In addition, in the integration process, a covariance matrix of the prediction error, namely Pk+1=FPkFT +Q, is also calculated iteratively. Wherein F represents a prediction matrix of the pose, Q is a Gaussian white noise matrix, and Pk is a covariance matrix of the last moment. Because of gaussian white noise and random walk, the longer the prediction time, the larger the error due to drift.
In the updating part of the embodiment of the application, the relative pose relation of the virtual display device and the 3DOF handle is obtained after the visual positioning is successful, and the 6DOF pose of the 3DOF handle in the reference coordinate system can be reversely deduced by combining the 6DOF pose of the virtual display device in the reference coordinate system, so that the 6DOF pose with higher precision in the visual positioning can be used for updating the integral pose of the handle IMU. In the updating process, the Kalman filtering can calculate Kalman gain, and the measured value is updated on the predicted value to restrain noise interference.
It should be noted that, the implementation of the fusion positioning of vision and IMU by using linear kalman filtering in the embodiment of the present application is only an example, and is not a limiting requirement of the embodiment of the present application. For example, a nonlinear algorithm may also be employed to optimize IMU positioning results with visual positioning results. In the specific implementation, an IMU pre-integration theory is adopted, and the 6DOF pose of the 3DOF handle is optimized by combining the re-projection error of the 6DOF pose positioned by N times of vision and the IMU pre-integration residual error.
In some embodiments, because IMU integration can continuously predict the 6DOF pose of the handle, low probability handle tracking anomalies do not affect the positioning result of the output handle, only the positioning accuracy, but if long-term tracking fails, it indicates that the handle has moved out of view of the camera, at this time, the translational positioning accuracy can be seriously affected, and the optimization of the rotational pose can be reversed. Specifically, when tracking of the 3DOF handle is failed by using hand images acquired by the multi-camera, and the tracking failure time is longer than a set time threshold, rotation information in the initial 6DOF pose is optimized, and the 3DOF pose of the 3DOF handle in the reference coordinate system is obtained. But when the 3DOF handle moves into the field of view of the camera again, the 3DOF handle is tracked again, the kalman filtering flow is restarted, and the 6DOF pose of the 3DOF handle is continuously output.
Referring to fig. 12, a flowchart of a complete method for outputting a 6DOF pose using a 3DOF handle according to an embodiment of the present application mainly includes the following steps:
s1201, acquiring hand images acquired by the multi-camera.
S1202, recognizing a gesture in a hand image, and determining first 3D coordinates of four fingertips, except for the thumb, fixed at preset position points of the 3DOF handle under a virtual display device coordinate system.
And S1203, determining whether the visual initialization positioning is successful, if not, executing S1204, and if so, executing S1207.
If the first reference coordinate system of the 3DOF handle is aligned with the second reference coordinate system of the virtual display device, the visual initialization positioning is successful, otherwise, the visual initialization positioning is failed.
And S1204, acquiring second 3D coordinates of the four fingertips in a 3DOF handle coordinate system.
And S1205, determining a relative pose relation between the virtual display device and the 3DOF handle according to the first 3D coordinate and the second 3D coordinate so as to align a reference coordinate system of the virtual display device and the 3DOF handle.
S1206, starting Kalman filtering to integrate the measurement data of the IMU of the 3DOF handle from the alignment time point of the reference coordinate system, and determining the initial 6DOF pose of the DOF handle in the reference coordinate system.
S1207, determining whether the visual tracking positioning is successful, if so, executing S1208, otherwise, executing S1211.
If the hand image acquired by the multi-camera is tracked to the 3DOF handle, the vision tracking and positioning are successful, otherwise, the vision tracking and positioning are failed.
S1208, re-determining the relative pose relationship between the virtual display device and the 3DOF handle.
S1209, determining the visual 6DOF pose of the 3DOF handle in the reference coordinate system according to the 6DOF pose of the virtual display device in the reference coordinate system and the new relative pose relation.
S1210, updating the initial 6DOF pose with the visual 6DOF pose by adopting Kalman filtering.
And S1211, when the visual positioning tracking failure time is longer than the set time threshold, optimizing the rotation information in the initial 6DOF pose to obtain the 3DOF pose of the 3DOF handle in the reference coordinate system.
In the method for determining the position and the posture of the 6DOF based on 3D gesture recognition, provided by the embodiment of the application, the 3D gesture recognition is utilized, the gesture recognition is carried out on the hand images acquired by the multi-view camera of the virtual display device, the first 3D coordinates of four fingertips at the preset position points of the 3DOF handle under the coordinate system of the virtual display device are determined except for the thumbs, the thumbs can move keys, so that the 3DOF handle is not influenced to operate while the positioning accuracy is ensured, the second 3D coordinates of the four fingertips under the coordinate system of the 3DOF handle are acquired according to the self structure of the 3DOF handle, the relative position and posture relation between the virtual display device and the 3DOF handle is determined, so that the reference coordinate system of the virtual display device and the 3DOF handle is aligned, the measurement data of the 3DOF handle are integrated from the reference coordinate system, the initial 6DOF position and the IMU position in the reference coordinate system are determined, the accumulated error of the IMU position and the translation position of the 3DOF handle can be updated accurately by the aid of the 3DOF handle, and the position and the 3DOF handle position can be translated accurately, and the visual position and the 3DOF position can be translated and the position of the 3DOF handle can be more accurately acquired.
Based on the same technical concept, the embodiment of the application provides a virtual display device, which can be a VR device or an AR device, and can realize the method steps of determining the 6DOF pose based on 3D gesture recognition in the embodiment and achieve the same technical effect.
Referring to fig. 13, the virtual display device includes a processor 1301, a memory 1302, a multi-camera 1303, and a communication interface 1304, the multi-camera 1303, the memory 1302 and the processor 1301 are connected through a bus 1305;
the memory 1302 includes a data storage unit and a program storage unit, the program storage unit storing computer program instructions, and the processor 1301 performs the following operations according to the computer program instructions:
the virtual display device is connected with the 3DOF handle through the communication interface 1304, acquires measurement data of the IMU of the 3DOF handle, and stores the measurement data in a data storage unit;
acquiring hand images acquired by the multi-camera 1303 and storing the hand images in a data storage unit;
Recognizing gestures in hand images acquired by the multi-view camera, and determining first 3D coordinates of four fingertips except for thumbs, which are fixed at preset position points of the 3DOF handle, under a virtual display device coordinate system;
acquiring second 3D coordinates of the four fingertips under a 3DOF handle coordinate system;
determining a relative pose relationship between the virtual display device and the 3DOF handle according to the first 3D coordinate and the second 3D coordinate to align a reference coordinate system of the virtual display device and the 3DOF handle;
Integrating measurement data of the IMU of the 3DOF handle from the reference frame alignment time point to determine an initial 6DOF pose of the 3DOF handle in the reference frame;
and performing visual positioning on the 3DOF handle, and updating the initial 6DOF pose by using the visual 6DOF pose.
Optionally, the processor 1301 performs visual positioning on the 3DOF handle, and updates the initial 6DOF pose with a visual 6DOF pose, specifically including:
Tracking the 3DOF handle by utilizing hand images acquired by the multi-camera from the reference coordinate system alignment time point, and re-determining the relative pose relationship between the virtual display device and the 3DOF handle;
Determining a visual 6DOF pose of the 3DOF handle in the reference coordinate system according to the 6DOF pose of the virtual display device in the reference coordinate system and the new relative pose relation;
Updating the initial 6DOF pose with the visual 6DOF pose.
Optionally, when tracking of the 3DOF handle using the hand images acquired by the multi-camera fails and the tracking time period is greater than a set time threshold, the processor 1301 further performs:
And optimizing the rotation information in the initial 6DOF pose to obtain the 3DOF pose of the 3DOF handle in the reference coordinate system.
Optionally, before determining the first 3D coordinates of the four fingertips in the virtual display device coordinate system, the processor 1301 further performs:
and optimizing the detected gestures by adopting a least square method according to a pre-established standard gesture reference model so as to reduce the determination errors of the first 3D coordinates of the four fingertips caused by hand shake or wrong gestures.
Optionally, the processor 1301 identifies a gesture in a hand image acquired by a multi-camera of the virtual display device, determines a first 3D coordinate of four fingertips, except for a thumb, fixed at a preset position point of the 3DOF handle, in a coordinate system of the virtual display device, and specifically operates as:
detecting a gesture area for holding the 3DOF handle from the hand image acquired by the multi-view camera;
Performing gesture estimation on the gesture area, and extracting hand joint points;
determining a first 3D coordinate of each hand joint point under the virtual display equipment coordinate system through a multi-order matching algorithm; first 3D coordinates of four fingertips fixed at preset position points of the 3DOF handle are obtained from each hand joint point.
Optionally, the processor 1301 determines, by using a multi-mesh matching algorithm, a first 3D coordinate of each hand node under the virtual display device coordinate system, specifically including:
Matching the hand joint points extracted from the gesture areas corresponding to the main cameras with the hand joint points extracted from the gesture areas corresponding to the other cameras respectively;
determining depth information of each hand joint point according to each matching result;
And determining a first 3D coordinate of each hand joint point under the virtual display equipment coordinate system according to the internal parameters calibrated in advance by the multi-camera, the depth information of each hand joint point and the image coordinate of each hand joint point in the corresponding gesture area.
Optionally, the processor 1301 integrates the measurement data of the IMU of the 3DOF handle from the reference coordinate system alignment time point, and determines an initial 6DOF pose of the 3DOF handle in the reference coordinate system, specifically including:
Acquiring an acceleration measurement value of an accelerometer in the IMU at an alignment time point of the reference coordinate system, and performing secondary integration on the acceleration measurement value in a time dimension to obtain a translation position of the 3DOF handle in the reference coordinate system;
acquiring an angular velocity measurement value of a gyroscope in the IMU at an alignment time point of the reference coordinate system, and integrating the acceleration measurement value once in a time dimension to obtain a rotation gesture of the 3DOF handle in the reference coordinate system;
an initial 6DOF pose of the 3DOF handle in the reference coordinate system is determined from the translational position and the rotational pose.
Optionally, a formula for determining a relative pose relationship between the virtual display device and the 3DOF handle is:
wherein, theRespectively representing the first 3D coordinates of the four fingertips in the virtual display device coordinate system,Representing the second 3D coordinates of the four fingertips in the 3DOF handle coordinate system, respectively.
It should be noted that fig. 13 is only an example, and shows the hardware necessary for the virtual display device to execute the steps of the method for determining the 6DOF pose based on the 3D gesture recognition provided by the embodiment of the present application, which is not shown, and the virtual display device may further include conventional hardware such as left and right glasses lenses, speakers, microphones, and the like.
The Processor referred to in fig. 13 of the embodiments of the present application may be a central processing unit (Central Processing Unit, CPU), a general purpose Processor, a graphics Processor (Graphics Processing Unit, GPU), a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an Application-specific integrated Circuit (ASIC), a field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof.
Referring to fig. 14, a functional block diagram of a virtual display device capable of implementing a method for determining a 6DOF pose based on 3D gesture recognition according to an embodiment of the present application includes a visual positioning module 1401, an obtaining module 1402, a coordinate system alignment module 1403, an IMU positioning module 1404, and a pose updating module 1405, where:
a visual positioning module 1401, configured to identify a gesture in a hand image acquired by a multi-camera of the virtual display device, and determine first 3D coordinates of four fingertips, except for a thumb, fixed at a preset position point of the 3DOF handle in a coordinate system of the virtual display device;
an acquisition module 1402, configured to acquire second 3D coordinates of the four fingertips in a 3DOF handle coordinate system;
A coordinate system alignment module 1403, configured to determine a relative pose relationship between the virtual display device and the 3DOF handle according to the first 3D coordinate and the second 3D coordinate to align a reference coordinate system of the virtual display device and the 3DOF handle;
An IMU positioning module 1404 configured to integrate measurement data of an IMU of the 3DOF handle from the reference coordinate system alignment time point, and determine an initial 6DOF pose of the 3DOF handle in the reference coordinate system;
A pose update module 1405, configured to perform visual positioning on the 3DOF handle, and update the initial 6DOF pose with a visual 6DOF pose.
The functional modules are matched with each other to realize the method step of determining the 6DOF pose based on 3D gesture recognition, and the same technical effect can be achieved. The specific implementation of each functional module is referred to in the foregoing embodiments and will not be repeated here.
Embodiments of the present application also provide a computer readable storage medium storing instructions that, when executed, perform the method of the previous embodiments.
The embodiment of the application also provides a computer program product for storing a computer program for executing the method of the previous embodiment.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.