Combined scene experience generation system and device based on XR technologyTechnical Field
The invention relates to the technical field of augmented reality, in particular to a scene experience generating system and device based on an XR technology.
Background
The extended display (XR) comprises various forms of XR such as Augmented Reality (AR), virtual Reality (VR), mixed Reality (MR) and the like, and the appearance and the use of the XR also enable interaction in a virtual space; XR scene technology has made significant progress in a number of areas including entertainment, medical, education, training and simulation. These techniques can provide an immersive virtual environment that enables a user to interact with digitized characters and objects. However, the effective virtual environment needs high-quality digital images and visual effects and user-friendly interactivity, but when the language of the crowd with multi-person interaction is not feasible, the difficulty of the multi-person interaction is caused, and basically only the crowd with the same language can interact in a multi-person interaction scene, so that the adaptability is not strong.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a combined scene experience generating system and device based on XR technology, and solves the problems in the background technology.
In order to achieve the above purpose, the invention is realized by the following technical scheme: the system and the device for generating the combined scene experience based on the XR technology comprise a user interaction module, a modeling module, a scene data management module, a device adaptation support module, an audio simulation module, a network communication module and a content updating module;
The user interaction module is used for acquiring interactive voice data and limb action images of a target user and identifying user intention, obtaining the intention of the target user and converting the intention into an instruction which can be understood by a system so as to realize interactive operation on a virtual scene, and the user interaction module also comprises a feedback module for providing feedback in visual and auditory modes and enhancing the immersion of the user;
The modeling module is used for carrying out environment scanning on the target scene area to obtain environment depth data and space layout data, and carrying out data cleaning to obtain target three-dimensional data; extracting spatial features of the target three-dimensional data and the environmental depth data, and registering depth information through an iterative nearest point algorithm to obtain an initial spatial model;
The scene data management module is used for storing, loading and editing geometric data in an XR scene, supports FBX, OBJ, GLTF data formats, is provided with a data compression and buffering mechanism so as to improve the loading speed and the system performance, and also comprises an editing tool for a developer to quickly create and modify the scene;
The device adaptation support module is used for identifying the type and the performance of the device and ensuring that an XR scene experience system can normally run on various devices, and the device adaptation support module can automatically adjust rendering parameters and interaction modes so as to ensure the consistency and the fluency of user experience;
the sound effect simulation module is used for simulating sound effects in an XR scene, including environmental sound, background music and sound source localization, the immersion of a user is further enhanced through high-quality sound effect processing, and the sound effect simulation system supports various audio formats and coding modes and has real-time processing and sound mixing capabilities;
The network communication module is used for realizing real-time communication and data synchronization between users, and the users can interact and cooperate with other users in a virtual scene through the network communication module; the network communication module supports various network protocols and transmission modes so as to ensure the stability and instantaneity of data transmission;
The content updating module is used for managing and maintaining content resources in the XR scene experience system, and ensuring that the system is always kept in an optimal state and providing the latest functions and experience for users by regularly releasing updating packages and repairing patches; the content updating module can provide perfect user support and service to solve the problems encountered by users in the using process;
Preferably, the modeling module further includes a 3D rendering engine module, where the 3D rendering engine module is configured to calculate, in real time, a position, a ray, and a material property of an object in the virtual space according to a position and a viewing angle of a user, calculate, in real time, a color and a brightness of each pixel based on simulated ray information, and generate a high-quality image, and the 3D rendering engine module further has an efficient performance optimization mechanism, so as to ensure that a smooth frame rate can be maintained in a complex scene.
Preferably, the simulated light information in the 3D rendering engine module is used for identifying a shadow area of the simulated light virtual object to obtain the shadow area; and constructing the fuzzy edge data of the shadow area to obtain simulated edge data, and simultaneously, calculating the light attenuation data of the shadow area through a light attenuation data calculation formula to obtain the light attenuation data, wherein the light attenuation data calculation formula is as follows:
Wherein A (P) is light attenuation data, Isource is light intensity data, and P represents a P point of a shadow area in the simulated light virtual object; d is the distance from the P point to the light source; sigma (x) is the medium absorption coefficient of the simulated ray virtual object; μ is a central position parameter of the shadow region; σshadow is the shadow area; k is the shape parameter of the shadow region; alpha is the scattering coefficient;
And inputting the light receiving data, the simulated edge data and the light attenuation data into the Monte Carlo ray tracing algorithm to generate simulated light information, so as to obtain the simulated light information in the simulated XR scene.
Preferably, the gesture recognition of the user interaction module captures gesture data of a user through a depth perception or image capturing device (such as an RGB-D camera, a depth camera and the like), records video or image data when the user executes various gestures, removes noise in the image or video, improves data quality, and segments gesture areas from the background for subsequent processing.
Preferably, the user gesture data is converted into a uniform size and format so as to facilitate comparison and analysis, shape features such as outlines, boundaries, corner points and the like of gestures are extracted, dynamic features such as motion tracks, speeds, accelerations and the like of the gestures are analyzed, and the gestures are classified by using a trained machine learning model (such as a support vector machine, a neural network and the like); training the model by using the marked gesture data set so that different gestures can be recognized; defining one or more instructions or operations associated therewith for each gesture; when the model identifies the gesture of the user, converting the gesture into a corresponding instruction according to the gesture mapping relation; and sending the generated instruction to the XR system or the application program so as to execute corresponding operation.
Preferably, the modeling module performs data cleaning on the environmental depth data and the spatial layout data, and performs definition correction on the spatial layout data through a depth image super-resolution algorithm to obtain corrected layout data; performing edge smoothing processing on the corrected layout data to obtain smoothed layout data; filling the missing data into the removed depth data to obtain filling depth data; and inputting the smooth layout data and the filling depth data into a point cloud conversion algorithm to perform space data conversion to obtain the target three-dimensional data.
Preferably, the user interaction module further comprises a binocular stereo correction module, the binocular stereo correction module is used for improving efficiency and accuracy of stereo matching and eliminating image distortion, and the binocular stereo correction module corrects a calculation formula as follows:
dst(x,y)=src(mapx(x,y),mapy(x,y))
Wherein src is a binocular camera image, dst (x, y) is a corrected binocular camera image, map_x and map_y are mapping matrices, in dst (x, y), x is the abscissa of the binocular camera corrected image, and y is the ordinate of the binocular camera corrected image; in map_x (x, y) or map_y (x, y), x is the abscissa of the binocular camera image, and y is the ordinate of the binocular camera image.
Preferably, the adapter support module further comprises a gyroscope unit, an eyeball tracking unit and a display unit, wherein the gyroscope unit is used for collecting the rotation angle of the head of the user, the eyeball tracking unit is used for tracking the eyeballs of the user and determining the position where the eyes of the user stay, and the display unit comprises a voice recognition unit, a voice conversion unit, a frequency analysis unit, a voice distinguishing unit and a text display unit;
Preferably, the voice recognition unit is used for recognizing the voice signal of the user, the frequency analysis unit is used for analyzing the frequency of the voice signal recognized by the voice recognition unit, the voice distinguishing unit judges whether the voice signal belongs to the user according to the analysis result of the frequency analysis unit, the voice conversion unit is used for converting the voice signal belonging to the user into text information, and the text display unit is used for displaying the text information converted by the voice conversion unit on the display unit of other people in the multi-person interaction scene in the form of subtitles.
Preferably, the device comprises an XR headset, wherein the XR headset is used for realizing user man-machine interaction, two sides of the XR headset are connected with a frame, the frame is used for wearing the XR headset on the head of a user, the inside of the XR headset is electrically connected with headphones, headphones on two sides of the XR headset are used for realizing feedback virtual scene audio, a display is embedded on the front surface of the XR headset, the display is used for feeding back real-time pictures of the virtual scene, a scanning device is installed at the top of the XR headset, and the scanning device is used for scanning gesture instructions of the user.
The invention provides a combined scene experience generating system and device based on an XR technology. The device comprises the following
The beneficial effects are that:
According to the system and the device for generating the combined scene experience based on the XR technology, the subtitle display module is arranged, so that the frequency of a voice signal of a user can be identified, the voice signal of the user is converted into character information, the character information of the user is displayed on the display unit of the other person during multi-person interaction through the character display unit, the content of speaking of the other person can be displayed through the subtitle, the multi-person interaction can be performed when a network is poor or a language is not feasible, the applicability of the multi-person interaction system is wider, the applicability is stronger, the light receiving data, the simulation edge data and the light attenuation data are input into a Monte Carlo ray tracing algorithm to generate simulated light information, and the simulated light information in a simulated XR scene is obtained, so that the light sensation in a virtual scene is more real.
Drawings
FIG. 1 is a schematic diagram of the overall structure of the present invention;
Fig. 2 is a schematic diagram of the module structure of the present invention.
In the figure, 1-XR headset, 2-gantry, 3-scanning device, 4-earphone, 5-display.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The system comprises a user interaction module, a modeling module, a scene data management module, a device adaptation support module, an audio simulation module, a network communication module and a content updating module;
the user interaction module is used for acquiring the interaction voice data and the limb action image of the target user and identifying the user intention to obtain the target user intention, converting the target user intention into an instruction which can be understood by the system so as to realize the interaction operation of the virtual scene, and the user interaction module also comprises a feedback module for providing feedback in visual and auditory forms and enhancing the immersion of the user;
The modeling module is used for carrying out environment scanning on the target scene area to obtain environment depth data and space layout data, and carrying out data cleaning to obtain target three-dimensional data; extracting spatial features of the target three-dimensional data and the environmental depth data, and registering depth information through an iterative nearest point algorithm to obtain an initial spatial model;
The scene data management module is used for storing, loading and editing geometric data in the XR scene, supports FBX, OBJ, GLTF data formats, is provided with a data compression and buffering mechanism so as to improve the loading speed and the system performance, and simultaneously comprises an editing tool for a developer to quickly create and modify the scene;
The device adaptation support module is used for identifying the type and the performance of the device, ensuring that the XR scene experience system can normally operate on various devices, and automatically adjusting rendering parameters and interaction modes so as to ensure the consistency and the fluency of user experience;
The sound effect simulation module is used for simulating sound effects in an XR scene, including environmental sounds, background music and sound source localization, the immersion of a user is further enhanced through high-quality sound effect processing, and the sound effect simulation system supports various audio formats and coding modes and has real-time processing and sound mixing capabilities;
the network communication module is used for realizing real-time communication and data synchronization between users, and the users can interact and cooperate with other users in a virtual scene through the network communication module; the network communication module supports various network protocols and transmission modes so as to ensure the stability and real-time performance of data transmission;
The content updating module is used for managing and maintaining content resources in the XR scene experience system, and ensuring that the system is always kept in an optimal state and provides the latest functions and experience for users by regularly releasing updating packages and repairing patches; the content updating module can provide perfect user support and service to solve the problems encountered by users in the using process;
the modeling module further comprises a 3D rendering engine module, the 3D rendering engine module is used for calculating the position, light and material properties of an object in the virtual space in real time according to the position and the visual angle of a user, calculating the color and the brightness of each pixel in real time based on the simulated light information, generating a high-quality image, and the 3D rendering engine module is further provided with an efficient performance optimization mechanism so as to ensure that the smooth frame rate can be maintained in a complex scene.
The simulated light information in the 3D rendering engine module is used for identifying a shadow area of the simulated light virtual object to obtain the shadow area; and constructing fuzzy edge data of the shadow area to obtain simulated edge data, and simultaneously, calculating light attenuation data of the shadow area through a light attenuation data calculation formula to obtain the light attenuation data, wherein the light attenuation data calculation formula is as follows:
Wherein A (P) is light attenuation data, Isource is light intensity data, and P represents a P point of a shadow area in the simulated light virtual object; d is the distance from the P point to the light source; sigma (x) is the medium absorption coefficient of the simulated ray virtual object; μ is a central position parameter of the shadow region; σshadow is the shadow area; k is the shape parameter of the shadow region; alpha is the scattering coefficient;
And inputting the light receiving data, the simulated edge data and the light attenuation data into a Monte Carlo ray tracing algorithm to generate simulated light information, so as to obtain the simulated light information in the simulated XR scene.
The user interaction module is used for capturing gesture data of a user through depth perception or image capturing equipment (such as an RGB-D camera, a depth camera and the like), recording video or image data when the user executes various gestures, removing noise in the image or video, improving data quality, and dividing gesture areas from the background for subsequent processing.
The user gesture data is converted into a uniform size and format so as to be convenient for comparison and analysis, shape characteristics such as outlines, boundaries, corner points and the like of gestures are extracted, dynamic characteristics such as motion tracks, speeds, accelerations and the like of the gestures are analyzed, and the gestures are classified by using a trained machine learning model (such as a support vector machine, a neural network and the like); training the model by using the marked gesture data set so that different gestures can be recognized; defining one or more instructions or operations associated therewith for each gesture; when the model identifies the gesture of the user, converting the gesture into a corresponding instruction according to the gesture mapping relation; and sending the generated instruction to the XR system or the application program so as to execute corresponding operation.
The modeling module performs data cleaning on the environment depth data and the space layout data, and performs definition correction on the space layout data through a depth image super-resolution algorithm to obtain corrected layout data; performing edge smoothing processing on the corrected layout data to obtain smoothed layout data; filling the missing data of the removed depth data to obtain filling depth data; and inputting the smooth layout data and the filling depth data into a point cloud conversion algorithm to perform space data conversion, so as to obtain target three-dimensional data.
The user interaction module further comprises a binocular stereo correction module, the binocular stereo correction module is used for improving the efficiency and accuracy of stereo matching and eliminating image distortion, and the binocular stereo correction module corrects the calculation formula as follows:
dst(x,y)=src(mapx(x,y),mapy(x,y))
Wherein src is a binocular camera image, dst (x, y) is a corrected binocular camera image, map_x and map_y are mapping matrices, in dst (x, y), x is the abscissa of the binocular camera corrected image, and y is the ordinate of the binocular camera corrected image; in map_x (x, y) or map_y (x, y), x is the abscissa of the binocular camera image, and y is the ordinate of the binocular camera image.
The adapter support module further comprises a gyroscope unit, an eyeball tracking unit and a display unit, wherein the gyroscope unit is used for collecting the rotation angle of the head of the user, the eyeball tracking unit is used for tracking the eyeball of the user and determining the stay position of the eye of the user, and the display unit comprises a voice recognition unit, a voice conversion unit, a frequency analysis unit, a voice distinguishing unit and a text display unit;
The voice recognition unit is used for recognizing the voice signal of the user, the frequency analysis unit is used for analyzing the frequency of the voice signal recognized by the voice recognition unit, the voice distinguishing unit judges whether the voice signal belongs to the user according to the analysis result of the frequency analysis unit, the voice conversion unit is used for converting the voice signal belonging to the user into text information, and the text display unit is used for displaying the text information converted by the voice conversion unit on the display unit of other people in the multi-person interaction scene in the form of subtitles.
According to the above technical solution, when the user's gaze moves from the center point to a certain point, the gyro unit detects that the XR headset is turned horizontally from the initial stateThe eyeball tracking unit detects that the iris of the user horizontally rotates from the central pointThe gyroscope and the eyeball tracking unit detect that the angle of the XR headset and the eyeball rotating leftwards is negative, the angle of the eyeball rotating rightwards is positive, and the movement analysis unit calculates the distance L of the user's gaze moving horizontally in the multi-person interaction scene according to the following formula;
The current position where the gaze of the user stays is the distance of the horizontal movement L of the center point of the multi-person interaction scene and then the distance of the vertical movement L'; through the calculation mode, the position where the user stays in the multi-person interaction scene at present can be accurately calculated and determined, at the moment, the multi-person interaction scene can be controlled by utilizing the iris of the user, and the position change of the user in the multi-person interaction scene is irregular, so that the position where the user stays in the gaze can be accurately judged by utilizing the calculation formula, and the determination of the position where the user stays in the gaze is facilitated;
Through being provided with caption display module, can discern user's speech signal's frequency to turn into text information with user's speech signal, the text information of user is shown on the display element of other people when many people are interacted through text display element, make can show the content that other people speak through the caption, make when network is bad or language is not expert, also can carry out many people and exchange, make many people interaction system's suitability wider, the adaptability is stronger, with light receiving data, simulation edge data and light decay data input Monte Carlo ray tracing algorithm carry out the formation of simulation ray information, obtain the simulation ray information in the simulation XR scene, thereby make the light sense in the virtual scene more true.
While the fundamental and principal features of the invention and advantages of the invention have been shown and described, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.