Disclosure of Invention
The 4D Gaussian splatter-based reinforcement learning self-adaptive multi-mode SLAM method provided by the application aims to solve the problems existing in the prior art, and provides an overall solution of further combining 4D GS and reinforcement learning on the basis of the existing multi-sensor fusion SLAM system, wherein the sensor mode is autonomously selected according to real-time environment information through reinforcement learning, and meanwhile, map updating and optimization are carried out by utilizing the 4D GS and integrated into the SLAM system, so that the purposes of improving the working efficiency, instantaneity and saving resources of the whole real-time multi-mode SLAM system are achieved.
In order to achieve the above design objective, the reinforcement learning adaptive multi-mode SLAM method based on 4D gaussian splatter includes the following steps:
step 1, multi-source sensor input and data preprocessing;
Receiving point cloud data from a laser radar, RGB image data from a depth camera and IMU data, and performing data preprocessing;
Step 2, selecting a sensor combination type;
Based on the reinforcement learning sensor selection module, the environment state is learned in real time through a reinforcement learning strategy network, the most reliable sensor combination type is dynamically selected, and the sensor is activated as required;
step 3, multi-source sensor data fusion processing;
based on the data fusion processing module, the sensor combination is used for carrying out multi-source sensor data fusion processing;
step 4, updating and optimizing the map;
Based on the map updating and rear-end optimizing module, the map data provided by the front-end data fusion processing module is utilized to update and optimize the Gaussian map of the environment map by combining 4D GS, and error correction and map optimization are carried out through loop detection.
Further, the step 1 comprises the following steps,
Step 1.1, data input;
the SLAM system synchronizes the point cloud data received from the laser radar, the RGB image data from the depth camera and the IMU data in time;
step 1.2, time information is fused in;
introducing time stamps in a data preprocessing stageEach laser radar point cloud and image frame is provided with a time stamp;
step 1.3, calibrating multi-sensor external parameters;
calibrating external parameters between the sensors by using Kalibr frames;
Step 1.4, initializing a reinforcement learning module;
the method comprises the steps of initializing a reinforcement learning sensor selection module in a data preprocessing stage, learning a strategy for selecting an optimal sensor under different scene conditions by analyzing historical data and environmental characteristics, and adopting a random exploration strategy in an initial state.
Further, the step 2 comprises the following steps,
Step 2.1, designing a state space;
defining environmental information which can be observed by the reinforcement learning system at each decision moment;
Step 2.2, defining an action space formula;
including the acts of defining a system switching sensor mode to optimize data acquisition and system performance;
The SLAM system adopts a mixed discrete-continuous space to realize fine control, and the mode selection part is discrete and comprises three sensor dominant modes, namely a depth camera-IMU mode, a laser radar-depth camera mode and a laser radar-IMU-depth camera mode;
The weight distribution part is continuous and comprises a sensor gain coefficient and a resource limiting parameter, wherein the sensor gain coefficient is the weight of the laser radar, the depth camera and the IMU, and the weight range is between 0 and 1, and the expression is as follows:
(9)
The resource limiting parameter is the laser radar sampling rate, the range is between 5Hz and 40Hz, and the expression is as follows:
(10)
step 2.3, rewarding a function module;
Including primary rewards, auxiliary rewards, and penalty items;
step 2.4, designing a network architecture;
The system comprises a state encoder which is used as the basis of the whole network and is responsible for converting the multi-mode sensor data into a unified characteristic representation so that the subsequent strategy network and the value network can make decisions and evaluations based on the characteristics;
the strategy network is responsible for outputting specific action selection according to the coded state characteristics, wherein the specific action selection comprises discrete action modes and continuous parameter adjustment;
the value network is used for evaluating the advantages and disadvantages of the current state and action combination, providing learning signals for the strategy network and helping the strategy network to optimize the decision process;
Step 2.5, executing a training strategy;
through an efficient data collection mechanism, an intelligent exploration strategy and strict safety learning constraint, a comprehensive training framework is provided for the self-adaptive selection module of the reinforcement learning sensor, which comprises,
The data collection mechanism provides experience data interacted with the environment for the system and is the basis of learning and optimizing strategies;
exploration strategies, deciding how the system effectively explores in an unknown or partially known environment to obtain more information and experience;
Safety learning constraints ensure that the system not only pursues high performance in the learning and decision process, but also meets a series of safety and practicality requirements.
Further, the step 2.1 comprises the steps of,
Geometric dynamic characteristic design comprising laser radar point cloud density gradient and dynamic object coverage rate;
the laser radar point cloud density gradient reflects the density change of the point cloud in space, helps the system to perceive the geometric structure and potential obstacle of the environment, quantifies the observation confidence of the laser radar in the local area, has the expression as follows,
(1)
Wherein,Represents the number of lidar hits within a unit voxel,The volume of the voxel is represented and,Representing a range of normalized values of the reflected intensity;
calculating the coverage rate of dynamic objects to represent the proportion and distribution of the dynamic objects in the scene, so that the system can know the dynamic characteristics of the environment in time, the expression is as follows,
(2)
Wherein,Representing an estimated velocity vector of the dynamic object,Representing the projected area of the dynamic object detection frame,Representing a field of view area of the sensor;
Sensory reliability, including IMU confidence and radar-visual depth consistency, the IMU confidence reflects the confidence of the inertial measurement unit data, calculated according to its noise level and bias stability, the higher the result value, the more reliable the IMU data is indicated, the expression is as follows:
(3)
Wherein,Representing the zero offset estimation error of the gyroscope,Represents accelerometer noise standard deviation (sliding window calculation);
The radar-vision depth consistency measures the consistency of the vision sensor and the radar depth data, the higher the consistency is calculated by comparing the depth measured values of the vision sensor and the radar depth data, the more coordinated the depth data of the two sensors is shown, and the expression is as follows:
(4)
Wherein,Representing a histogram of the observed depth distribution,Representing a depth distribution histogram of the depth camera,Representing the cross ratio of the effective areas of the two;
The system resource state monitoring comprises real-time computing power load and energy residual budget, wherein the real-time computing power load reflects the current computing pressure of the system and helps a module to determine whether the sensor data processing amount needs to be reduced or the computing task needs to be simplified, and the expression is as follows:
(5)
Wherein,Indicating the current GPU's used memory capacity,The total memory capacity of the GPU is represented,Representing that single frame rendering is time consuming,Representing a desired frame period;
the energy residual budget indicates the energy reserve condition of the system, so that the module selects a sensor or adjusts the working mode based on the current energy residual budget, and the expression is as follows:
(6)
Wherein,Indicating the current remaining amount of battery,Indicating the initial total amount of power,Indicating that the system has been in operation for a time,Representing an electrical quantity attenuation factor;
The space-time context comprises a historical decision memory and a scene category probability, wherein the historical decision memory records the success rate and effect of past sensor selection decisions, helps a module to learn from experience, avoids repeated errors and improves the decision efficiency, and the expression is as follows:
(7)
Wherein,Indicating the past thThe step-by-step action is performed,Representing a long-short term memory network;
The scene category probability calculates the probability of belonging to a low-texture scene, a limited view constraint scene and a dynamic scene according to the current environment characteristics, so that a module can adjust a sensor selection strategy according to scene characteristics, and the expression is as follows:
(8)
Wherein,Representing the image-text joint embedding vector extracted by the CLIP model,Representing a matrix of the weight that can be learned,To normalize the distribution, the sum is 1.
Further, the step 2.3 comprises,
The main rewards comprise positioning accuracy rewards and energy efficiency economic rewards which are respectively used for improving the positioning accuracy of the system and the energy utilization efficiency, and the expression is as follows:
(11)
(12)
where ATE represents the absolute track error, and,Representing the total instantaneous power consumption,A bonus compensation item representing when the remaining amount of power exceeds a threshold;
Auxiliary rewards, including rendering quality rewards and strategy stability rewards, are aimed at improving visual effect of scene reconstruction and ensuring continuity and reliability of strategy, and are expressed as follows:
(13)
(14)
Wherein SSIM represents a structural similarity index, LPIPS represents learning-aware image block similarity,Representing the penalty of the action mutation,Representing policy network parameter smoothness constraints;
the punishment items comprise fatal punishment errors and resource overload punishment and are used for avoiding unreasonable sensor selection and excessive consumption of resources, so that a system is guided to learn an optimal sensor selection strategy to realize comprehensive optimization of system performance, and the expression is as follows:
(15)
(16)
The comprehensive rewards calculation formula is as follows:
(17)
Further, the step3 comprises the following steps,
The reinforcement learning sensor selection module adaptively selects the most suitable sensor mode according to the influence of each parameter on the sensor mode selection according to the current environmental condition and the system state, wherein the weight function expression is as follows:
+++++(21)
Wherein,,,,,AndRespectively represent the laser radar confidenceDynamic object coverageIMU confidenceRadar vision consistencyCalculating force loadAnd energy surplus budgetHigh weight;
step 3.1, a depth camera-IMU mode;
when the reinforcement learning sensor selection module is according to the current environmental characteristics, the weight function S is more than or equal to,For judging the threshold value, the SLAM system selects a sensor of the depth camera and the IMU, the depth camera is taken as a leading part, and the IMU is taken as an auxiliary part;
Step 3.2 lidar-depth camera mode;
When the reinforcement learning sensor selection module is based on the current environmental characteristics, the reinforcement learning sensor selection module is based on the weight function,In order to judge the threshold value, the SLAM system selects a sensor of the laser radar and the depth camera, takes the depth camera as a leading part and takes the laser radar as an auxiliary part;
Step 3.3, a multi-sensor equalization mode of the laser radar-IMU-depth camera;
when the reinforcement learning sensor selection module is based on the current environment characteristics, the reinforcement learning sensor selection module is based on the weight function through the weight functionJudging a sensor mode of combination of the laser radar, the depth camera and the IMU for SLAM system selection;
And 3.4, after the reinforcement learning sensor selection module executes actions and interacts with the environment, recording historical decision memory through the rewarding function module, evaluating through the value network, and optimizing updating of the whole strategy network.
Further, the step 3.1 comprises,
Step 3.1.1, data input and key frame selection;
The frames containing new scene structures or salient feature increases are preferentially selected as key frames, so that the integrity and efficiency of the map are improved;
step 3.1.2, performing point cloud registration by adopting generalized ICP tracking;
select the firstRGB image of frameAnd depth imageGenerating a point cloudWherein each pointAnd calculates covariance matrix of each point. Estimating a current frame source point cloud by GICP algorithmPoint cloud with target mapRelative pose transformation between them;
Step 3.1.3. IMU pre-integration;
The IMU high-frequency measurement value is fused, the inter-frame motion pre-integral quantity is generated and used as an initial guess of GICP, the tracking efficiency is improved, and the robustness and the accuracy of the system are enhanced by combining with a GICP algorithm;
step 3.1.4, updating the pre-integration;
and adopting GICP optimized results to correct drift of IMU integration caused by noise and deviation along with time.
Further, the step 3.2 comprises the following steps,
Step 3.2.1 data input an image from a depth camera and a point cloud from a lidar;
Integration is performed by using calibrated external signals, and time aligned LiDAR point clouds are converted into depth images, wherein the expression is as follows:
(38)
Wherein,In the form of a point cloud of a lidar,AndThe rotation matrix and translation vector of the lidar to the camera coordinate system,Is an intrinsic matrix of the camera;
Step 3.2.2 using an incremental error minimization function;
ensuring accurate correspondence between planes and points, the formula is as follows:
(39)
Wherein,Representing a lidar point cloudIs provided with a plurality of points in the middle,Based on the current attitude estimation passing from the last moment to the world coordinate systemAs a result of the number of iterations,Is closest toIs arranged in the center of the gaussian,Is thatIs defined in the specification.Is a dotIs used for the weight of the (c),Is a regularization term used for enhancing the stability and precision of the error function, and considers the direction error between normal vectors;
Introducing regularization termTo enhance the stability and accuracy of the error function and to take into account the error in the normal direction, the expression is as follows:
(40)
Wherein,Is the normal of the current Gaussian distribution;
step 3.2.3 weight function calculation;
The weight function is calculated as follows:
a. determining the gaussian center within the partial sphere: find all nearest gaussian distribution centers in it, whereIs a sphere center, the spherical surface is provided with a plurality of grooves,Is the radius;
b. calculating a density function of Gaussian points, the density functionCalculated by the following formula:
(41)
Wherein,Is a reconstructed covariance matrix by selecting the smallest variance along the normal directionAnd greater variance in the vertical directionTo construct;
c. simplifying the calculation of the density function, and simplifying the calculation of the density function in the tracking process in order to accelerate the calculation speed:
(42)
d. consistency calculations, for each pointCalculating the normal of the current Gaussian distributionFrom local average normalConsistency of (2);
E. And calculating the complexity texture, namely calculating the local texture complexity for the image area corresponding to each radar point. Radar pointProjecting to a pixel coordinate system of a camera, and acquiring the position of the pixel coordinate system in an imageTaking the same as the center to intercept(=16), Converting the image block into a gray map, calculating the variance of pixel intensities:
(43)
Wherein,For the intensity of the pixel(s),An image block average value;
converting the variance to a texture weight of 0-1 by a sigmoid function:
(44)
Wherein,For a scaling factor, for adjusting variance sensitivity;
f. final weight function, defining final weight functionIs the product of normal consistency, density function and texture complexity, namely。
Further, the step 3.3 comprises the following steps,
Step 3.3.1, synchronizing the data input with the hardware;
the laser radar provides sparse but high-precision 3D point cloud, the depth camera captures RGB textures of a scene, and the IMU outputs angular speed and acceleration at high frequency for motion prediction, so that strict alignment of data time stamps of the three are ensured, and time sequence drift is avoided;
Step 3.3.2, key frame selection is carried out through depth camera input;
selecting representative frames from the continuous data stream as key frames, reducing redundancy calculation;
step 3.3.3. IMU data is used for state propagation;
predicting the current state through IMU pre-integration according to the input IMU data and the state of the last key frame, and predicting the current state through IMU pre-integration for state estimation and performing forward prediction on motion de-distortion;
step 3.3.4, de-distorting the laser radar input;
Using IMU predicted continuous pose to make each laser radar pointThe method comprises the following steps of converting a local coordinate system to a global coordinate system at the scanning moment, wherein the expression is as follows:
(45)
Wherein,Representation pointsIs used for the acquisition of the time stamp of (a),Representing the result obtained by IMU interpolationThe pose at the moment.
Further, the step (4) comprises the following steps,
Step 4.1, maintaining a sliding window;
Step 4.2, 4D Gaussian distribution;
step 4.3, introducing an optical flow to solve the 4D GS overfitting problem;
step 4.4, updating and optimizing the map;
Step 4.5 loop detection;
The module can detect potential loop candidates by extracting laser radar and visual features and generating feature descriptors;
confirming a loop hypothesis by using geometric verification and consistency check;
and feeding the verified loop constraint back to the map optimization process to globally optimize the map structure.
In summary, the reinforcement learning self-adaptive multi-mode SLAM method based on 4D Gaussian splatter has the following advantages and beneficial effects:
1. On the basis of the existing multi-sensor fusion SLAM system, the method and the system select proper sensors in a self-adaptive mode according to specific environmental characteristics through reinforcement learning, have remarkable environmental adaptability and resource optimization capacity, improve positioning accuracy and map construction quality, reduce power consumption and calculation complexity, and improve robustness, efficiency and resource utilization rate of the system.
2. The application integrates the 4D Gaussian splatter technology into the multi-mode SLAM system, expands the scene adaptability of the SLAM system, and has obvious advantages in the aspects of rendering speed, memory efficiency, dynamic adaptability, robustness, resource efficiency, global consistency and the like compared with the traditional SLAM technology.
3. The 4D GS adopted by the application can efficiently represent and process complex changes in dynamic scenes by adding the time dimension on the basis of the 3D GS. Meanwhile, an optical flow is introduced into the 4D GS to solve the problem of over-fitting, so that the method can capture the motion and deformation of a dynamic object, can remarkably reduce the memory occupation and the calculation complexity through sparse Gaussian distribution representation, and improves the stability and generalization capability of a model.
4. According to the application, more priori information is provided through the optical flow, and the excessive fitting of the deformation field network of the 4D GS to noise in training data is limited, so that the deformation field network learns a Gaussian point deformation mode which is more reasonable and more in line with a physical rule, and the stability and generalization capability of the model are improved.
5. The application is particularly suitable for mobile terminal equipment with limited resources and complex and changeable dynamic scenes, and provides a more efficient and more accurate solution for the application of the SLAM system in the dynamic environment.
Detailed Description
In order to further illustrate the technical means adopted by the present application for achieving the preset design purpose, the following preferred embodiments are presented in conjunction with the accompanying drawings.
In the following description, specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly generalized to the embodiments described herein may be made by those skilled in the art without departing from the spirit of the invention and the invention is therefore not limited to the specific embodiments disclosed below.
As shown in fig. 1, a system applying the adaptive multi-mode SLAM method of the present application includes a reinforcement learning sensor selection module, a data fusion processing module, a map updating and back-end optimizing module, a loop detection and global map optimizing module. The SLAM system takes three sensor types, namely depth camera-IMU, lidar-depth camera, and lidar-IMU-depth camera.
As shown in fig. 2 to 7, the reinforcement learning adaptive multi-modal SLAM method based on 4D gaussian splatter includes the steps of:
step 1, multi-source sensor input and data preprocessing;
Receiving point cloud data from a laser radar, RGB image data from a depth camera and IMU data, and performing data preprocessing;
Step 1.1, data input;
The SLAM system synchronizes the time of receiving the point cloud data from the laser radar, the RGB image data from the depth camera and the IMU data so as to ensure that all data under the same time stamp can be accurately corresponding;
step 1.2, time information is fused in;
to handle dynamic scenarios, time stamps are introduced during the data preprocessing phaseEach laser radar point cloud and each image frame are provided with a time stamp so that dynamic changes of a scene can be accurately tracked and compensated in subsequent processing, and therefore a necessary time background is provided for a deformation field network of the 4D GS;
step 1.3, calibrating multi-sensor external parameters;
calibrating external parameters between the sensors by using Kalibr frames;
Step 1.4, initializing a reinforcement learning module;
The method comprises the steps of initializing a reinforcement learning sensor selection module in a data preprocessing stage, and learning a strategy for selecting an optimal sensor under different scene conditions by analyzing historical data and environmental characteristics;
Step 2, selecting a sensor combination type;
Based on the reinforcement learning sensor selection module, the environment state is learned in real time through a reinforcement learning strategy network, including but not limited to data such as illumination, dynamic object density, sensor noise and the like, the most reliable sensor type is dynamically selected and the sensor is activated as required so as to adapt to a high-dynamic complex and changeable environment;
Step 2.1, designing a state space;
Defining environmental information that the reinforcement learning system can observe at each decision moment, for guiding the sensor selection strategy, including aspects,
Geometric dynamic characteristic design comprising laser radar point cloud density gradient and dynamic object coverage rate;
the laser radar point cloud density gradient reflects the density change of the point cloud in space, helps the system to perceive the geometric structure and potential obstacle of the environment, quantifies the observation confidence of the laser radar in the local area, has the expression as follows,
(1)
Wherein,Represents the number of lidar hits within a unit voxel,The volume of the voxel is represented and,Representing a range of normalized values of the reflected intensity;
calculating the coverage rate of dynamic objects to represent the proportion and distribution of the dynamic objects in the scene, so that the system can know the dynamic characteristics of the environment in time, the expression is as follows,
(2)
Wherein,Representing an estimated velocity vector of the dynamic object,Representing the projected area of the dynamic object detection frame,Representing a field of view area of the sensor;
the two characteristics provide key information for the reinforcement learning module together, so that the reinforcement learning module can optimize a sensor selection strategy according to the static and dynamic characteristics of the environment, and the adaptability and the accuracy of the system in a complex environment are improved;
Sensory reliability, including IMU confidence and radar-visual depth consistency, the IMU confidence reflects the confidence of the inertial measurement unit data, calculated according to its noise level and bias stability, the higher the result value, the more reliable the IMU data is indicated, the expression is as follows:
(3)
Wherein,Representing the zero offset estimation error of the gyroscope,Represents accelerometer noise standard deviation (sliding window calculation);
The radar-vision depth consistency measures the consistency of the vision sensor and the radar depth data, the higher the consistency is calculated by comparing the depth measured values of the vision sensor and the radar depth data, the more coordinated the depth data of the two sensors is shown, and the expression is as follows:
(4)
Wherein,Representing a histogram of the observed depth distribution,Representing a depth distribution histogram of the depth camera,Representing the cross ratio of the effective areas of the two;
The sensor data reliability enhancement method and the sensor data reliability enhancement system provide key information for the reinforcement learning sensor module together, assist the reinforcement learning sensor module to optimize sensor selection and weight distribution, and improve the perceptibility and decision accuracy of the system in a complex environment;
The system resource state monitoring comprises real-time computing power load and energy residual budget, wherein the real-time computing power load reflects the current computing pressure of the system and helps a module to determine whether the sensor data processing amount needs to be reduced or the computing task needs to be simplified, and the expression is as follows:
(5)
Wherein,Indicating the current GPU's used memory capacity,The total memory capacity of the GPU is represented,Representing that single frame rendering is time consuming,Representing a desired frame period;
The energy residual budget indicates the energy storage condition of the system, so that the module can preferentially select a sensor with lower energy consumption or adjust a working mode when the energy is limited, and the expression is as follows:
(6)
Wherein,Indicating the current remaining amount of battery,Indicating the initial total amount of power,Indicating that the system has been in operation for a time,Representing an electrical quantity attenuation factor;
The combined action of the two can ensure that the system meets the performance requirement and simultaneously realizes the efficient utilization of resources and the reasonable management of energy consumption;
The space-time context comprises a historical decision memory and a scene category probability, wherein the historical decision memory records the success rate and effect of past sensor selection decisions, helps a module to learn from experience, avoids repeated errors and improves the decision efficiency, and the expression is as follows:
(7)
Wherein,Indicating the past thThe step-by-step action is performed,Representing a long-short term memory network;
The scene category probability calculates the possibility that the scene belongs to different categories, such as long corridor, weak texture area and the like, according to the current environmental characteristics, so that the module can adjust the sensor selection strategy according to the scene characteristics, and the expression is as follows:
(8)
Wherein,Representing the image-text joint embedding vector extracted by the CLIP model,Representing a matrix of the weight that can be learned,To normalize the distribution, the sum is 1;
The combination of the sensor and the sensor can make the sensor selection more intelligent and has strong adaptability;
Step 2.2, defining an action space formula;
including defining all possible actions that the system can perform, i.e., adjusting sensor parameters to optimize data acquisition and system performance;
The SLAM system adopts a mixed discrete-continuous space to realize fine control, and the mode selection part is discrete and comprises three sensor leading modes, namely a mode 1 is a depth camera-IMU mode and is suitable for the conditions of better illumination conditions and relatively stable environment structure, a mode 2 is a laser radar-depth camera mode and is suitable for environments with more dynamic objects, low illumination or no illumination scenes and the like, a mode 3 is a laser radar-IMU-depth camera mode and is suitable for complex and changeable environments, long-time running systems, high-precision positioning requirement scenes and the like;
The weight distribution part is continuous and comprises a sensor gain coefficient and a resource limiting parameter, wherein the sensor gain coefficient is the weight of the laser radar, the depth camera and the IMU, and the weight range is 0 to 1, and the expression is as follows
(9)
The resource limiting parameter is the laser radar sampling rate, the range is between 5Hz and 40Hz, and the expression is as follows:
(10)
The action space formula provides a mathematical framework for adjusting the sensor configuration for the system, so that the system can flexibly decide in a dynamic environment, and balance the data quality and the resource consumption, thereby improving the overall performance of the system;
step 2.3, rewarding a function module;
the SLAM system is guided to learn an optimal sensor selection strategy to achieve overall optimization of system performance. The positioning accuracy and the map construction quality can be improved, the resource utilization efficiency is considered, excessive consumption is avoided, and unreasonable sensor selection is prevented by punishing item constraint system behaviors, so that the whole SLAM system is ensured to stably and efficiently operate in complex and changeable environments.
Including primary rewards, auxiliary rewards, and penalty items;
The main rewards comprise positioning accuracy rewards and energy efficiency economic rewards which are respectively used for improving the positioning accuracy of the system and the energy utilization efficiency, and the expression is as follows:
(11)
(12)
where ATE represents the absolute track error, and,Representing the total instantaneous power consumption,A bonus compensation item representing when the remaining amount of power exceeds a threshold;
step 2.3.2 auxiliary rewards;
the method comprises rendering quality rewards and strategy stability rewards, aims at improving visual effects of scene reconstruction and ensuring continuity and reliability of strategies, and has the following expression:
(13)
(14)
Wherein SSIM represents a structural similarity index, LPIPS represents learning-aware image block similarity,Representing the penalty of the action mutation,Representing policy network parameter smoothness constraints;
the punishment items comprise fatal punishment errors and resource overload punishment and are used for avoiding unreasonable sensor selection and excessive consumption of resources, so that a system is guided to learn an optimal sensor selection strategy to realize comprehensive optimization of system performance, and the expression is as follows:
(15)
(16)
The structure can realize that the reward function comprehensively guides the system to optimize the selection of the sensor, thereby improving the overall performance;
The comprehensive rewards calculation formula is as follows:
(17)
step 2.4, designing a network architecture;
as the core of the reinforcement learning sensor selection module, it can decide how the system processes multi-modal sensor data and makes intelligent decisions, including,
A state encoder, which is the basis of the whole network, is responsible for converting the multimodal sensor data into a unified representation of features so that subsequent policy and value networks can make decisions and evaluations based on these features, including,
A. Backbone network-using ResNet-18 architecture to handle visual features. ResNet-18 is a classical depth residual error network, can effectively extract advanced semantic information in an image, simultaneously relieves the degradation problem of a deep network through residual error connection, and ensures the training effect and the feature extraction capability of the network.
B. And (3) branching the network, namely extracting geometric features by using a point cloud transducer. The transducer architecture has advantages in processing sequence data, can capture global dependency in point cloud data, extracts more representative geometric features, and is helpful for a system to understand the three-dimensional structure of the environment.
C. Timing network-processing IMU sequences using Bi-LSTM. Bi-LSTM can simultaneously consider past and future information of IMU data, capture two-way dependency relationships in a time sequence, thereby modeling dynamic characteristics of the IMU data more accurately and providing stable attitude estimation for the system.
And the strategy network is responsible for outputting specific action selection according to the coded state characteristics, wherein the specific action selection comprises discrete action modes and continuous parameter adjustment. Wherein,
A. mode selection branch, adopting Gumbel-Softmax to output discrete action. Gumbil-Softmax is a technique that balances between continuous relativity and discrete sampling, so that the network can effectively learn the choice of discrete actions in the training process, and simultaneously maintain the transmissibility of gradients, and is suitable for discrete decision tasks such as selecting sensor dominant modes.
B. and a parameter adjusting branch, which outputs continuous weights by using a Tanh activation function. The Tanh activation function can limit the output to the range of [ -1, 1], and by appropriate scaling and offset, it can be mapped to a desired continuous parameter space, such as a sensor gain coefficient and a sampling rate parameter, etc., to achieve fine adjustment of the sensor parameters.
The value network is used for evaluating the advantages and disadvantages of the current state and the action combination, providing a learning signal for the strategy network to help the strategy network optimize the decision process,
Dueling DQN structure: separation status value and dominance function. The Dueling DQN structure divides the value network into two parts, respectively estimates the state cost function (V) and the dominance function (A), and then obtains the final Q value through a specific combination mode. The separation structure enables the network to evaluate the relative advantages and disadvantages of different actions in the current state more accurately, and learning efficiency and stability are improved.
B. Multi-head attention dynamically weights multi-modal features. The multi-head attention mechanism can pay attention to different aspects of different modal characteristics at the same time, dynamically adjust the weight of each modal characteristic according to the current task demand, realize effective fusion of multi-modal information and enhance the adaptability of the network to complex environments.
Step 2.5, executing a training strategy;
through an efficient data collection mechanism, an intelligent exploration strategy and strict safety learning constraint, a comprehensive training framework is provided for the self-adaptive selection module of the reinforcement learning sensor, which comprises,
The data collection mechanism provides experience data for interaction with the environment for the system and is the basis for learning and optimizing strategies. In particular, the method comprises the steps of,
A. and (3) deploying an environment, namely selecting Gazebo simulation environments, providing high-fidelity scene simulation and flexible sensor configuration options, and testing and training a robot algorithm.
B. The sampling frequency is set to 120FPS real-time sampling, and the high frame rate captures finer environmental dynamics and robot motion state, so that richer information is provided for subsequent data processing and strategy learning.
The exploration strategy determines how the system can effectively explore in an unknown or partially known environment to obtain more information and experience. In particular, the method comprises the steps of,
A. Adaptive adaptationGreddy as the training period increases,The value gradually decreases, the system is transited from wide exploration to the utilization of learned knowledge, the exploration and the utilization are balanced, the strategy performance is improved, and the expression is as follows:
(18)
b. Adding state prediction error rewards, namely estimating the next state through a prediction model and taking the prediction error as the intrinsic rewards, wherein the expression is as follows:
(19)
Wherein,Is an intrinsic reward indicating that the system is in stateTake action downwardsThe intrinsic rewards that are obtained at the time,Is an adjustment factor for controlling the intensity of the intrinsic rewards; is a prediction of the next state by the system, estimated by a forward model,Is the actual next state feature;
Safety learning constraints ensure that the system not only pursues high performance in the learning and decision process, but also meets a series of safety and practicality requirements. In particular, the method comprises the steps of,
Action Masking, which disables the selection of high power actions that exceed the battery capacity, i.e., masking out those actions in the action space that would cause the battery to drain quickly. The system is ensured to consider the energy limitation in the decision process, unreasonable high-power consumption behaviors are avoided, and the practicability and the sustainability of the system are improved.
B. Policy gradient correction, the expression is as follows:
(20)
Wherein, thereinA gradient of probability distribution representing policy network output actions relative to network parameters,Is a function of the cost of the action,Is a baseline function.
The application enables the sensor decision to achieve dynamic balance in time-space-energy triple dimensionality through the hierarchical state representation, the mixed action space and the depth combination of the multi-objective rewarding function, thereby realizing the advantages of achieving dynamic environment adaptation, resource optimization management, improving positioning and mapping precision, enhancing system robustness and supporting long-term stable operation of the SLAM system.
Step 3, multi-source sensor data fusion processing;
based on the data fusion processing module, the sensor is used for carrying out multi-source sensor data fusion processing, so that rich and accurate data are provided for the construction of the Gaussian map;
The reinforcement learning sensor selection module adaptively selects the most suitable sensor mode according to the influence of each parameter on the sensor mode selection according to the current environmental condition and the system state, wherein the weight function expression is as follows:
+++++(21)
Wherein,,,,,AndRespectively represent the laser radar confidenceDynamic object coverageIMU confidenceRadar vision consistencyCalculating force loadAnd energy surplus budgetHigh weight;
comprising the following steps:
step 3.1, a depth camera-IMU mode;
When the reinforcement learning sensor selection module is based on the current environmental characteristics, the laser radar confidence levelLow dynamic object coverageLow IMU confidenceHigh radar vision consistencyLow computational loadModerate and energy residual budgetWhen high, for example, under the environments of better illumination condition, relatively stable environment structure, more mirror objects, fewer dynamic objects with rich textures, and the like, the method is based on the weight function S ∈,In order to judge the threshold value, when judging that the SLAM system selects the mode 1, namely adopting a sensor combination of a depth camera and an IMU, taking the depth camera as a leading part and taking the IMU as an auxiliary part;
step 3.1.1, data input and key frame selection;
The frames containing new scene structures or salient feature increases are preferentially selected as key frames, so that the integrity and efficiency of the map are improved;
step 3.1.2, performing point cloud registration by adopting generalized ICP tracking (GICP);
select the firstRGB image of frameAnd depth imageGenerating a point cloudWherein each pointAnd calculates covariance matrix of each point. Estimating a current frame source point cloud by GICP algorithmPoint cloud with target mapRelative pose transformation between themAnd includes the steps of (a) a base,
A. distribution distance calculation modeling each point as a gaussian distributionThe source point cloud is transformedAfter that, distance from the target point cloudThe definition is as follows:
(22)
The distribution is as follows:
(23)
Wherein,,Representing coordinates of corresponding points in the target map and the source point cloud,,Representing a target point cloud and a source point cloud covariance matrix;
b. maximum likelihood estimation by maximizing the log likelihood of the probability density function to solve for the optimal transformation:
(24)
The optimization objective is reduced to minimize the mahalanobis distance, expressed as follows:
(25)
step 3.1.3 IMU pre-integration;
The IMU high-frequency measurement value is fused to generate the interframe motion pre-integral quantity as the initial guess of GICP, the tracking efficiency is improved, the system robustness and accuracy are enhanced by combining with GICP algorithm,
A. constructing an IMU measurement model original measured value:
(26)
(27)
Wherein,,Representing raw acceleration and angular velocity measurements,Representing the rotation of the IMU to the world coordinate system,Representing the gravity vector of the gravity force,/Indicating that the sensor is biased in a direction,/Representing sensor noise;
b. performing motion pattern recursion and positionSpeed ofRotatingThe recurrence formula of (2) is as follows:
(28)
(29)
(30)
c. calculating an IMU pre-integral quantity:
To avoid repeated integration, define from the firstFrame to the firstThe amount of inter-frame relative motion of a frame is expressed as follows:
(31)
(32)
(33)
Wherein,,,Representing the product of position, velocity, and rotation, respectively,An oblique symmetry matrix representing angular velocity。
D. Computing relative transformations between successive framesUsing external parameters between the camera and IMU sensorThe relative transformation is converted into camera coordinates. Obtaining GICP a good initial guess of tracking from IMU pre-integration;
(34)
(35)
Step 3.1.4, updating the pre-integration;
The result of GICP optimization is used to correct the IMU integral drift over time due to noise and bias, including,
A. camera pose optimizing GICPThe expression is as follows:
(36)
Wherein,Representing the camera's external parameters to the IMU,AndThe results of GICP tracking position and rotation are shown, respectively.
B. Updating the IMU state quantity:,, (37)
Step 3.2 lidar-depth camera mode;
When the reinforcement learning sensor selection module is based on the current environmental characteristics, the laser radar confidence levelHigh dynamic object coverageHigh IMU confidenceLow radar vision consistencyMedium, calculated force loadLarger and energy surplus budgetThe adaptive selection module of the medium-time reinforcement learning sensor is used for selecting the medium-time reinforcement learning sensor according to the current environment, such as low-illumination environment and low-texture environment, and according to the weight function,In order to judge the threshold value, when judging that the SLAM system selects the mode 2, adopting a sensor combination of a laser radar and a depth camera, taking the depth camera as a leading part and taking the laser radar as an auxiliary part,
Step 3.2.1 data input an image from a depth camera and a point cloud from a lidar;
Integration is performed by using calibrated external signals, and time aligned LiDAR point clouds are converted into depth images, wherein the expression is as follows:
(38)
Wherein,In the form of a point cloud of a lidar,AndThe rotation matrix and translation vector of the lidar to the camera coordinate system,Is an intrinsic matrix of the camera;
Step 3.2.2 using an incremental error minimization function;
ensuring accurate correspondence between planes and points, the formula is as follows:
(39)
Wherein,Representing a lidar point cloudIs provided with a plurality of points in the middle,Based on the current attitude estimation passing from the last moment to the world coordinate systemAs a result of the number of iterations,Is closest toIs arranged in the center of the gaussian,Is thatIs defined in the specification.Is a dotIs used for the weight of the (c),Is a regularization term used for enhancing the stability and precision of the error function, and considers the direction error between normal vectors;
Introducing regularization termTo enhance the stability and accuracy of the error function and to take into account the error in the normal direction, the expression is as follows:
(40)
Wherein,Is the normal of the current Gaussian distribution;
step 3.2.3 weight function calculation;
To distinguish between gaussian points generated solely by color supervision and gaussian points generated simultaneously by lidar depth, the system incorporates a weighting function. The weighting function combines the consistency of the normal vector, the density factor and the texture complexity to evaluate the reliability of different gaussian points. The weight function is calculated as follows:
a. determining the gaussian center within the partial sphere: find all nearest gaussian distribution centers in it, whereIs a sphere center, the spherical surface is provided with a plurality of grooves,Is the radius;
b. calculating a density function of Gaussian points, the density functionCalculated by the following formula:
(41)
Wherein,Is a reconstructed covariance matrix by selecting the smallest variance along the normal directionAnd greater variance in the vertical directionTo construct.
C. simplifying the calculation of the density function, and simplifying the calculation of the density function in the tracking process in order to accelerate the calculation speed:
(42)
d. consistency calculations, for each pointCalculating the normal of the current Gaussian distributionFrom local average normalConsistency of (2);
E. And calculating the complexity texture, namely calculating the local texture complexity for the image area corresponding to each radar point. Radar pointProjecting to a pixel coordinate system of a camera, and acquiring the position of the pixel coordinate system in an imageTaking the same as the center to intercept(=16), Converting the image block into a gray map, calculating the variance of pixel intensities:
(43)
Wherein,For the intensity of the pixel(s),An image block average value;
converting the variance to a texture weight of 0-1 by a sigmoid function:
(44)
Wherein,For a scaling factor, for adjusting variance sensitivity;
f. final weight function, defining final weight functionIs the product of normal consistency, density function and texture complexity, namely;
Step 3.3, a multi-sensor equalization mode of the laser radar-IMU-depth camera;
When the reinforcement learning sensor selection module is based on the current environmental characteristics, the laser radar confidence levelMedium, dynamic object coverageHigh IMU confidenceHigh radar vision consistencyLow computational loadLarger and energy surplus budgetWhen high, such as complex and changeable dynamic environment or high-precision positioning requirement environment running for a long time, the weight function is used for controlling the position of the objectWhen judging that the SLAM system selects the mode 3, the sensor mode of the combination of the laser radar, the depth camera and the IMU is adopted, and comprises,
Step 3.3.1, synchronizing the data input with the hardware;
the laser radar provides sparse but high-precision 3D point cloud, the depth camera captures RGB textures of a scene, and the IMU outputs angular speed and acceleration at high frequency for motion prediction, so that strict alignment of data time stamps of the three are ensured, and time sequence drift is avoided;
Step 3.3.2, key frame selection is carried out through depth camera input;
selecting representative frames from the continuous data stream as key frames, reducing redundancy calculation;
step 3.3.3. IMU data is used for state propagation;
predicting the current state through IMU pre-integration according to the input IMU data and the state of the last key frame, and predicting the current state through IMU pre-integration for state estimation and performing forward prediction on motion de-distortion;
step 3.3.4, de-distorting the laser radar input;
Using IMU predicted continuous pose to make each laser radar pointThe method comprises the following steps of converting a local coordinate system to a global coordinate system at the scanning moment, wherein the expression is as follows:
(45)
Wherein,Representation pointsIs used for the acquisition of the time stamp of (a),Representing the result obtained by IMU interpolationThe pose at the moment.
And 3.4, after the reinforcement learning sensor selection module executes actions and interacts with the environment, recording historical decision memory through the rewarding function module, evaluating through the value network, and optimizing updating of the whole strategy network.
Step 4, updating and optimizing the map;
Based on a map updating and rear-end optimizing module, utilizing rich map data provided by a front-end multi-mode sensor data fusion processing module, carrying out Gaussian map updating and optimizing on an environment map by combining 4D GS, and carrying out error correction and map optimizing through loop detection;
Comprises the steps of,
Step 4.1, maintaining a sliding window;
In the data fusion processing module, the system maintains a sliding window that screens and selects the point cloud from the nearest 10 time frames in the gaussian map to construct gaussian points while masking out the remaining gaussian points. This selection process ensures that the gaussian points are correlated in the sub-map of current interest;
Step 4.2, 4D Gaussian distribution;
The gaussian deformation field network is used for modeling the movement and shape change of the gaussian distribution of the dynamic object. The network consists of an efficient space-time structure encoder and a multi-headed gaussian deformation decoder. The method aims at realizing high-efficiency representation and real-time rendering of a dynamic scene by learning a Gaussian deformation field and transforming a standard 3D Gaussian distribution into a new position and shape and passing through a Gaussian deformation field network of 4D GSPredicting deformation of current frame point cloudThe expression is as follows:
(46)
Wherein,Representing the three-dimensional gaussian function after deformation.
In particular, a space-time structure encoderThe goal of (2) is to efficiently encode the spatial and temporal characteristics of the 3D gaussian distribution. It consists of a multi-resolution HexPlane module and a small multi-layer perceptron MLP.
The multi-resolution HexPlane module is used to efficiently encode the spatial and temporal features of the 3D gaussian distribution. It is achieved by decomposing a 4D neural voxel into multiple 2D planes, which can be sampled and encoded at different resolutions. Extracting the center coordinates of the 3D Gaussian distribution G according to the input Gaussian map and the time stamp tAnd a timestamp t, by querying a multi-resolution flat moduleVoxel features are acquired and queried using bilinear interpolation. Comprising 6 multi-resolution planar modulesWhich is provided withRespectively is,Representing the resolution level.
Each planar moduleIs defined asWhereinIs the hidden dimension of the feature and,Is the basic resolution of the voxel grid.
The voxel characteristics are queried through bilinear interpolation, and the formula is as follows:
(47)
Wherein,Is a feature of a nerve voxel.
Small MLPFor and all the features:
(48)
Wherein,Is the final feature representation.
In particular, a multi-headed gaussian deformation decoder D is used to decode the deformation of each 3D gaussian distribution from the characteristics obtained by the encoder. It consists of three independent MLPs, calculating the position, rotation and scaling deformations, respectively.
The position of the deformation head is changed,For calculating position distortionThe formula is:
(49)
The deformation head is rotated to be in contact with the surface of the workpiece,For calculating position distortionR, the formula is:
(50)
the deformation head is scaled and deformed,For calculating scaling deformationsThe formula is:
(51)
Applying these deformations to the original 3D Gaussian distribution to obtain a deformed Gaussian distributionThe formula is:
(52)
Wherein,Is a new location for the user to be able to locate,Is a new rotation, and is a new rotation,Is a new scaling.
Step 4.3, introducing an optical flow to solve the 4D GS overfitting problem;
In the process of real-time operation of the SLAM system, the 4D GS has the problem of overfitting for the construction of an environment map for a large amount of newly-appearing data and the influence of noise in the data.
Specifically, an optical flow computing mode is adopted to capture the motion information of pixels in the time dimension, additional time consistency constraint is provided for the model, and a RAFT optical flow algorithm is utilized to calculate the pixel motion between adjacent time stamps. Comprises a plurality of steps of the method, including the steps of,
Step 4.3.1, inputting an image;
For each pair of adjacent time stamped imagesAndComputing optical flow using pre-trained RAFTThe 4D GS predicted pixel motion is;
Step 4.3.2, constraining the deformation of the 3D Gaussian points through optical flow;
Ensuring that the dynamic part of model prediction is consistent with the optical flow, and constructing a loss function of optical flow constraint, wherein the expression is as follows:
(53)
Wherein,Represents the observed optical flow from the RGB image calculated by RAFT,Representing pixel-level motion prediction based on 4D GS deformation field rendering;
Step 4.3.3, predicting the change of the Gaussian parameters based on the deformation field network of the 4D GS;
deformable field network using 4D GSPredicting a change in a Gaussian parameter, the parameter including a change in positionVariation of rotationScaling changesFor the kth Gaussian, the position after deformation adopts the following expression:
(54)
Wherein,The position of the initial gaussian is indicated,A positional shift representing deformation field prediction;
step 4.3.4 projects the gaussian to the image plane through a differentiable SPLATTING process, calculating the pixel-level motion;
The expression is as follows:
(55)
step 4.3.5 introducing optical flow confidence to filter unreliable optical flow predictions;
to reduce noise in optical flow algorithms, confidence maps provided by optical flow algorithms are usedThe optical flow constraint loss is only applied to the region with higher confidence, and the expression is as follows:
(56)
Wherein,Representing optical flow confidence in locationIs a value of (2);
step 4.4, updating and optimizing the map;
Initializing a static 3D Gaussian distribution using Structure-from-Motion (SfM) method, optimizing only the static 3D Gaussian for the first 3000 iterations, then using the 3D GaussianInstead of 4D GaussianImage rendering is carried out, reasonable initial 3D Gaussian distribution is learned, dynamic and static parts are separated, the pressure of large deformation learning is reduced, and the problem of unstable numerical value when a deformation field network is directly optimized is avoided;
For the construction of the loss function, use is made in the reconstruction processColor loss supervised training process, optical flow constraint loss and grid-based total differential lossAlso applied are the following expressions:
(57)
Wherein,AndA weight parameter representing optical flow constraint loss and grid-based total differential loss for balancing the effects of the different losses;
Step 4.5 loop detection;
The module can detect potential loop candidates by extracting laser radar and visual features and generating feature descriptors;
confirming a loop hypothesis by using geometric verification and consistency check;
and feeding the verified loop constraint back to the map optimization process to globally optimize the map structure.
Through the loop detection step, the robustness and the efficiency of the SLAM system can be obviously improved, and the map quality and the positioning accuracy in long-time operation are ensured.
Similar technical solutions can be derived from the solution content presented in connection with the figures and description, as described above. But all the solutions without departing from the structure of the present application still fall within the scope of the claims of the technical solution of the present application.