Movatterモバイル変換


[0]ホーム

URL:


CN119960464A - A multi-agent path planning method and system based on improved RND3QN network - Google Patents

A multi-agent path planning method and system based on improved RND3QN network
Download PDF

Info

Publication number
CN119960464A
CN119960464ACN202510437370.XACN202510437370ACN119960464ACN 119960464 ACN119960464 ACN 119960464ACN 202510437370 ACN202510437370 ACN 202510437370ACN 119960464 ACN119960464 ACN 119960464A
Authority
CN
China
Prior art keywords
function
agent
rnd3qn
path planning
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202510437370.XA
Other languages
Chinese (zh)
Inventor
姚道金
任宇航
石杰
董文涛
高霄
李珍
余勇祥
薛宪法
陶丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Jiaotong University
Original Assignee
East China Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Jiaotong UniversityfiledCriticalEast China Jiaotong University
Priority to CN202510437370.XApriorityCriticalpatent/CN119960464A/en
Publication of CN119960464ApublicationCriticalpatent/CN119960464A/en
Withdrawnlegal-statusCriticalCurrent

Links

Landscapes

Abstract

The application discloses a multi-agent path planning method and a system based on an improved RND3QN network, wherein the method comprises the steps of data acquisition and data preprocessing, and format conversion of an original image, a radar and self state data; the method comprises the steps of constructing a convolutional neural network-long-short-term memory architecture, extracting multi-source measurement feature vectors to serve as inputs of RND3QN, adopting an improved RND3QN algorithm framework, designing a new rewarding function based on multi-agent position constraint, training a model through an incremental learning strategy to achieve collaborative path planning of a plurality of agents, introducing a greedy strategy and a Boltzmann probability selection strategy to avoid convergence to a local extremum, and adopting an n-step time sequence differential algorithm to reduce deviation. The application improves the accuracy of path planning, reduces the collision probability in the moving process, shortens the tracking time, enhances the capability of avoiding dynamic obstacles, and is beneficial to reasonable scheduling of multiple intelligent agents.

Description

Multi-agent path planning method and system based on improved RND3QN network
Technical Field
The invention belongs to the technical field of path planning, and particularly relates to a multi-agent path planning method and system based on an improved RND3QN network.
Background
In complex tasks such as security monitoring, patrol alarming, home service and the like, the mobile intelligent body plays a very important role. The path planning is an important component in the field of mobile intelligent agents, and aims to enable the intelligent agents to obtain a travel route which can reasonably avoid obstacles, is short in time consumption and high in feasibility. Conventional path planning algorithms include artificial potential field algorithms, DWA algorithms,Algorithms, dijkstra algorithm, particle swarm algorithm, artificial swarm algorithm, etc. Although these algorithms each perform well in the field of mobile agent path planning, they are very dependent on the feasibility of global maps in path planning and have significant limitations.
In practical applications, path planning faces many challenges. First, the environment tends to be complex and variable, including static obstacles, dynamic obstacles, and uncertain environmental factors, and the dynamics and handling performance limitations of the agent itself also place higher demands on path planning. The dynamic path planning algorithm is mainly applicable to scenes with unknown or continuously changing environmental information. Common dynamic path planning algorithms include reinforcement learning-based path planning algorithms, deep learning-based path planning algorithms, and the like. The path planning algorithm based on deep learning utilizes the strong learning capability of the deep neural network to extract features from a large amount of data and predict an optimal path, and the method is excellent in processing high-dimensional data and complex environments.
In addition, as the demand for multi-agent or vehicle co-operation increases, the multi-agent path planning problem also becomes a current research hotspot. The multi-agent path planning has the defects of insufficient environmental adaptability, complex cooperative conflict processing, high consumption of computing resources, slow dynamic environmental response, high global optimization difficulty and the like, and the problems limit the efficiency and the robustness of the path planning algorithm and influence the overall performance and the task execution capacity of the multi-agent system. The multi-agent path planning needs to consider the interaction and cooperation problems among agents, improves the planning speed and efficiency, develops a more intelligent planning strategy, and enhances the instantaneity and adaptability so as to realize the optimal performance of the whole system.
Disclosure of Invention
The invention provides a multi-agent path planning method and system based on an improved RND3QN (network node-network node) network, which are used for solving the technical problem of difficult interaction and cooperation between agents.
In a first aspect, the present invention provides a multi-agent path planning method based on an improved RND3QN network, including:
acquiring original image data, radar data and state data of an intelligent agent;
Performing feature extraction on the original image data, the radar data and the state data of the intelligent agent according to a preset convolutional neural network-long-short-term memory architecture to obtain a multi-source measurement feature vector;
inputting the multisource measurement feature vector to an improved RND3QN network, training the improved RND3QN network through an incremental learning strategy to obtain a path planning model, wherein training the improved RND3QN network comprises the steps of introducing a greedy strategy and a Boltzmann probability selection strategy to converge, and adopting an n-step time sequence difference algorithm to adjust path deviation;
And acquiring a real-time multi-source measurement feature vector, inputting the real-time multi-source measurement feature vector into the path planning model, and outputting the path planning model to obtain a collaborative path plan of a plurality of agents, wherein the path planning model comprises a new rewarding function based on multi-agent position constraint.
In a second aspect, the present invention provides a multi-agent path planning system based on an improved RND3QN network, comprising:
the acquisition module is configured to acquire original image data, radar data and state data of the intelligent agent;
The extraction module is configured to perform feature extraction on the original image data, the radar data and the state data of the intelligent agent according to a preset convolutional neural network-long-short-term memory architecture to obtain a multi-source measurement feature vector;
the training module is configured to input the multisource measurement feature vector to an improved RND3QN network, train the improved RND3QN network through an incremental learning strategy to obtain a path planning model, wherein training the improved RND3QN network comprises the steps of introducing a greedy strategy and a Boltzmann probability selection strategy to converge, and adopting an n-step time sequence difference algorithm to adjust path deviation;
The output module is configured to acquire real-time multi-source measurement feature vectors, input the real-time multi-source measurement feature vectors into the path planning model, and output the path planning model to obtain a collaborative path plan of a plurality of agents, wherein the path planning model comprises a new rewarding function based on multi-agent position constraint.
In a third aspect, an electronic device is provided that includes at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the improved RND3QN network-based multi-agent path planning method of any of the embodiments of the present invention.
In a fourth aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program, which when executed by a processor, causes the processor to perform the steps of the multi-agent path planning method based on an improved RND3QN network of any of the embodiments of the present invention.
The application discloses a multi-agent path planning method and a system based on an improved RND3QN (re-based e-GREEDY WITH N-step dueling double D QN) network, which are used for acquiring data and preprocessing the data, carrying out format conversion on an original image, a radar and self state data, constructing a convolutional neural network-long and short-term memory (CNN-LSTM) architecture, extracting a multi-source measurement feature vector as the input of the RND3QN, adopting the improved RND3QN network frame, designing a new reward function based on multi-agent position constraint, training the model through an incremental learning strategy, realizing the collaborative path planning of a plurality of agents, introducing a greedy strategy (e-greedy) and a Boltzmann probability selection strategy, avoiding convergence to local extremum, adopting an n-step time sequence difference algorithm (n-step bootstrapping) to reduce deviation, improving the path accuracy, reducing the collision probability in the moving process, shortening the tracking time, enhancing the avoidance capability of dynamic obstacles and being beneficial to reasonable scheduling of the multi-agents.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a multi-agent path planning method based on an improved RND3QN network according to an embodiment of the present invention;
FIG. 2 is a block diagram of a multi-agent path planning system based on an improved RND3QN network according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flow chart of a multi-agent path planning method based on an improved RND3QN network according to the present application is shown.
As shown in fig. 1, the multi-agent path planning method based on the improved RND3QN network specifically includes the following steps:
Step S101, acquiring original image data, radar data, and state data of an agent.
In the step, a color image is obtained, and the color image is subjected to size adjustment;
The image shot by the laser radar is a 320-pixel by 240-pixel RGB three-channel color image, and is easily affected by surrounding environment factors to generate noise. The method comprises the steps of acquiring an RGB image shot by a laser radar, fusing the RGB image with IMU data to obtain a pavement visual RGB image, converting the pavement visual RGB image into an HSL image according to a preset format conversion strategy, wherein the HSL feature map provides a topology expression with unchanged illumination for an RND3QN network, so that the planning success rate of a path decision module in a day-night switching scene is improved, and the format conversion strategy has the following expression:
,
,
,
In the formula,Color channels R, G, B, max and min are the maximum value and the minimum value of the color channels,Hue, saturation and brightness of the HSL image, respectively;
In a simulation environment, a laser radar sensor provides an array at each time step, and represents distances from an intelligent body to obstacles at intervals of 1 DEG within a range of 180 DEG by taking the intelligent body as a reference point, so that distance and position information of objects in a scene are extracted. Calculating the self state of the intelligent body including the position of the intelligent body according to the output of the simulated inertial measurement unit sensorCurrent speed of agentAngular velocity of intelligent bodyDistance of agent to target locationAnd the angle of the agent to the target distanceFinally, a group of numbers is synthesized
And step S102, extracting features of the original image data, the radar data and the state data of the intelligent agent according to a preset convolutional neural network-long-short-term memory architecture to obtain a multi-source measurement feature vector.
In the step, a plurality of convolution layers of a CNN network are utilized to extract spatial features in input data, the data enters a pooling layer after being output from the convolution layers, compression dimension reduction processing is carried out on the data, meanwhile, over fitting is avoided, convolution check input data are subjected to filtering operation, a ReLU function is used for realizing nonlinear mapping, and the pooling layer is used for carrying out dimension reduction through downsampling.
By stacking a plurality of LSTM units to capture long-term dependency in time series data, four-dimensional state vector attributes of an agent are combined with radar data to serve as LSTM network input, memory of past modes is kept when LSTM processes data, LSTM network output is combined with CNN output, and navigation decision is guided.
The multisource measurement feature vector extraction strategy (MMFS) is specifically characterized in that the traditional algorithm has poor processing effect on dynamic obstacle information, so that the multisource measurement feature vector extraction strategy is added, the information feature efficiency of the dynamic obstacle is improved, and the expression of the multisource measurement feature vector extraction strategy is as follows:
,
,
In the formula,A strategy is extracted for the multi-source measurement feature vector,Based on the spatial distance Pi and the terrain roughnessFor assigning different importance or influence to data points or regions according to different factors,Based on the spatial distance Pi and the terrain roughnessIs used as a function of the covariance of (a),For the localization function of the agent at different spatial distances,For the positioning function of the agent under different terrain roughness,As a function of the weight of the material,In order to be a spatial distance from each other,In order to re-roughen the terrain,For a phase difference of t=1 time from the initial time,A phase difference between time t=2 and time t=1,For a phase difference of time t=3 and time t=2,For a phase difference of time t=4 and time t=3,For a phase difference of time t=5 and time t=4,The vector base at t=1, the vector base at t=2, the vector base at t=3, the vector base at t=4, the vector base at t=5,As a vector base of the initial moment in time,The angle values are the angle values at different moments;
In a multisource measurement feature vector extraction strategy, a comprehensive environment perception model is built by fusing various sensor data, and the core is to extract key information from complex data sources so as to support subsequent path planning and decision making processes. Binocular cameras are capable of providing depth information and accurate two-dimensional pixel coordinates of a scene by simulating the principle of human binocular vision. Not only enriches the dimension of the multisource measurement feature vector, but also provides high-precision visual data support for obstacle detection and obstacle avoidance strategies in path planning. The method comprises the following steps:
,
,
In the formula,In order for the path to be energy-consuming,For the observation of data at the pose of the camera,As a function of the weight of the material,As a function of the characteristics of the curvature,For the purpose of the threshold value of the measurement,For the purpose of the threshold value of the measurement,For a particular angular frequency of the light,The path energy consumption at the moment of zero,For the path energy consumption at the initial moment,As the deflection angle at the initial moment in time,As the rotation angle at the initial time,For different moments of time of the deflection angle,For the rotation angle at different moments in time,For the angular frequency of the initial moment,In order to be able to take the time of day in a different way,Angular frequencies for different moments;
The global positioning efficiency is improved by calculating the scale of the feature points of the single-frame image. Extracting the characteristic points of the left eye image, tracking and positioning the characteristic points in the right eye image by using an optical flow method, and then matching the characteristic points of the left image and the right image. After the binocular camera is calibrated, a group of feature points which are matched correctly are selected, coordinates in a camera coordinate system are (u, v, 1), coordinates (x, y, z) of the binocular camera in a pixel coordinate system are calculated, and then feature point scales are obtained, wherein the formula is as follows:
,
,
wherein, C is the pixel coordinate under binocular vision,In order to project the matrix of the light,Is a coordinate matrix for left-hand photographing,Is a coordinate matrix for right-hand photographing,Is an internal reference matrix for left shooting,Is an internal reference matrix for right image pick-up,Is an external parameter matrix for left shooting,Is an external parameter matrix for right photographing,Is a distortion parameter.
The expression of the positioning function of the intelligent agent under different characteristic conditions is as follows:
,
In the formula,Is a set of natural numbers, which is a set of natural numbers,Is a natural number of the Chinese characters,Map edge information for any line feature point Pk,For the distortion of the best position estimate,Is an arbitrary line feature point k;
introducing non-iterative distortion compensation, and calculating the distortion of the optimal position estimation, wherein the expression is as follows:
,
In the formula,As a non-distorted point cloud,To use the jacobian matrix obtained by minimizing the weighted value and distance value between the points and the line and between the points and the plane,Is a position transform coefficient;
Optimizing the estimated position coordinates by using a Gauss-Newton method, introducing a left disturbance model, and constructing a Jacobian matrix as follows:
,
In the formula,For the inter-frame space to be the inter-frame space,In order to be a position-transformation matrix,The weight of the line feature point Pk point,The weight of the face feature point Ps point,Is an identity matrix of 3 rows and 3 columns,Is an identity matrix of 1 row and 3 columns,Is a zero matrix of 1 row and 3 columns,A zero matrix of 3 rows and 3 columns;
three-dimensional description is carried out on the line characteristic points, then an antisymmetric matrix of the line characteristic points is extracted, 5 points which are closest to the line characteristic points in the global map are selected, and two points are determined according to the characteristic vectors of the pointsAndAnd similarly, selecting 5 points closest to the surface characteristic points again, normalizing the points to form a plane, and constructing the residual matrix and the Jacobian matrix again.
When continuously acquiring line features and surface features, a weight function is introduced in order to balance the extraction effects of the line and surface features. Will beIs defined asThe weight function is defined as:
,
In the formula,For the distance between the line feature point Pk and the global line feature map,Is the local roughness of the line feature point Pk,Is the local roughness of the edge information of the line feature point Pk,As the global map edge information,A line feature point set;
,
In the formula,For the local roughness of the edge information of the surface feature point Ps,A set of face feature points;
the position estimation of the mobile intelligent agent in the global map is realized by calculating the distance between the minimum feature point and the global feature. Calculation ofThe distance between the global line feature map is as follows:
,
In the formula,As the distance coefficient, the distance coefficient is used,As the surface feature points,As the line feature points of the line,Is the face position of the neighboring point cloud,In order to be the position of the line,As the coordinates of the points of interest,Is the normal vector of the face and is used for the image,To move the direction of the candidate line in the agent location space.
Step S103, inputting the multisource measurement feature vector to an improved RND3QN network, training the improved RND3QN network through an incremental learning strategy to obtain a path planning model, wherein training the improved RND3QN network comprises the steps of introducing a greedy strategy and a Boltzmann probability selection strategy to converge, and adopting an n-step time sequence difference algorithm to adjust path deviation.
In the step, a greedy strategy rapidly guides an agent to explore an effective path by selecting a current optimal action, and a Boltzmann probability selection strategy introduces randomness in a subsequent stage to ensure that the agent can jump out of a local optimal solution and search a global optimal path.
Combining the memory effect of the magnetic memristor with the time sequence modeling capability of the GRU (gate control circulating unit) to construct a chaotic generator with dynamic weight adjusting capability:
,
In the formula,Is the calculation formula of the GRU unit,To reset the weight matrix at the last instant of the function,To update the weight matrix at the last moment of the function,To conceal the bias vector of the state,For the weight matrix of the hidden state,For the weight matrix of the last moment of the hidden state,In order to multiply by the element(s),In the hidden state of the last moment,As the information of the current moment of time,In order to update the function,In order to update the bias vector of the function,In order to reset the function,As a function of sigmoide,As a result of the candidate hidden state,In order to reset the weight matrix of the function,As a bias vector for the weight function,To pass on to the hidden state at the next instant,In order to update the weight matrix of the function,Is a tanh function;
Under the driving of a GRU-memristor framework, a chaotic oscillator with multistable characteristics is designed based on the nonlinear coupling characteristic of an improved Chua's circuit and a magnetic control memristor, and the system equation expression of the chaotic oscillator is as follows:
,
In the formula,As a function of the state variable w, f (w) =dw2,As a function of the state variables of the system,As a function of the state variables of the system,As a function of the state variables of the system,As a function of the state variable x, and g (x) =0.9 x,For the system parameters, 9 is taken,As a function of the state variables of the system,Taking 10 for the system parameters,Taking 60 as a system parameter;
Selecting a step function as a nonlinear kernel function of chaotic generation, and evolving a system equation of the chaotic oscillator into:
,
In the formula,As a nonlinear function of the state variable y;
wherein,Is arranged asOr (b)The expression is:
,
,
In the formula,Are all a non-linear function of the shape of the sample,As a function of the parameters of the system,Is a set of natural numbers, which is a set of natural numbers,As a function of the sign of the symbol,For the number of attractors,Is a set of natural numbers, which is a set of natural numbers,The application of the variable y for the sign function;
generating a chaotic sequence in [0, 1] through the improved Tent map, and converting the chaotic sequence into an individual search space, wherein the expression of the improved Tent map is as follows:
,
In the formula,The output value at time i +1,The output value at the moment i is the output value at the moment i,Is an input value;
The Tent map is expressed after the bernoulli shift transform as:
,
In the formula,And Z is the number of particles in the chaotic sequence, wherein the number is a random number between (0 and 1).
The chaotic maximum step number random generation strategy guides the intelligent agent to stably transition from a wide exploration stage to a refined utilization stage by gradually reducing randomness, ensures that the model can be efficiently converged to a global optimal solution, and remarkably improves the convergence speed and stability of the model. The greedy strategy and the boltzmann strategy form an action selection base, the chaotic mechanism provides an environment exploration driving force, dynamic balance of exploration breadth (chaotic disturbance) and decision accuracy (Q value guide) is achieved through time sequence coupling, the randomness strength is automatically attenuated along with the training process, and convergence oscillation caused by over exploration is avoided.
At the later stage of training, the model has fully explored the state space and learned effective strategies, and the action selection strategy adopts greedy algorithm to select the best action. In greedy algorithmThe value is set low to ensure that in most cases the optimal action is chosen.The initial setting of the value is 1.0 and gradually decreases, but never falls below 0.1, expressed as:
,
wherein,Episode _num (number of rounds of training) is represented.
The deviation is reduced by adopting an n-step time sequence difference algorithm (n-step bootstrapping), and the specific contents are as follows:
n-step reporting usage cost functionTo correct the return value at time t+nThe sum of all remaining rewards is then ensured that the expected maximum error is not greater thanAnd (3) withThe worst error product is expressed as:
,
In the formula,In order to reward rewards from time t to n steps in the future,Is the noise of zero mean value and,For the characteristic expression at the time t,As a domain of the features,For the value of the cost function of the probability mapping strategy pi,For a discount factor of n steps,As a function of the value of the function,Is a cost function of the probability mapping strategy pi.
The invention provides a dynamic density-guided rewarding reward (DDGR), which specifically aims to solve the problem of path deviation of multiple agents in a dynamic environment, so that the invention increases the dynamic density-guided rewarding reward and improves the robustness of path planning of the multiple agents in the dynamic environment;
And (3) generating density distribution of density function transition in advance, when the shape of the obstacle map is known, calculating a control point selected from forms needing interpolation of a target point, generating a dynamic transition function of an intermediate form, and taking the interpolation object as probability density. Probability density distributionAnd (3) withDistance between WassersteinThe following are provided:
,
,
In the formula,Representing a transportation plan from alpha to beta,Representing the slave probability distribution as a joint distribution function in two-dimensional Wasserstein distanceTo probability distributionThe density profile of the optimal transportation plan,A differential measure of the joint distribution function, describing how the mass of the midpoint moves from one location to another at the wasperstein distance calculation,In order to be a position-transformation matrix,For the left camera to take a picture of the best position estimation matrix,For the left-hand camera image of the actual position matrix,For the right camera to take a picture of the best position estimation matrix,For the right-hand camera image of the actual position matrix,Is the square of the distance between x and y.
Generating transition probability density distributionThe process of (2) is as follows:
,
In the formula,The initial and desired density profiles for the formation respectively,Transition probability density distribution for differences;
Selecting a certain position of the path through the optimal transmission path between the two density distributionsAnd calculating transition density distribution by utilizing gravity center interpolation on point sets on all paths, and realizing transition transformation of a density function.
Centered on the extension of the Bellman equation (Bellman equation), its iterative formula can be derived as:
,
In the formula,In order to accumulate the prize,Is the return value at the time t,For the jackpot at time t + n,The return value at the time t+i;
the calculation method of the target value in the RND3QN network comprises the following steps:
,
wherein,Is the target value of the current value,Is a neural network parameter.
Step S104, acquiring a real-time multi-source measurement feature vector, inputting the real-time multi-source measurement feature vector into the path planning model, and outputting the path planning model to obtain a collaborative path plan of a plurality of agents, wherein the path planning model comprises a new rewarding function based on multi-agent position constraint.
In this step, the new rewards function based on the multi-agent position constraint includes a main line rewards function and an auxiliary rewards function;
The main line reward function is used for global guidance of algorithm convergence, and gives negative feedback when an intelligent body collides with an obstacle or a wallWhen the agent reaches the target point without collision, giving positive feedbackThe expression is:
,
In the formula,Is a dominant line bonus function,Is a wall which is provided with a plurality of holes,Is an obstacle;
Sub-targets of the auxiliary reward function for reacting tasks correspond to a state space comprising an angle between the agent orientation and the targetDistance d, the range of the clockwise rotation angle of the agent isWhen the angle isOr (b)When the direction is exactly opposite to the target direction, the expression is:
,
Constructing potential energy-based distance auxiliary rewards, wherein the expression is as follows:
,
,
In the formula,For the distance between the current location of the agent and the target location,The absolute distance is the distance between the initial position of the agent and the target position,Rewarding for distance;
the angle is used as a source of abnormal auxiliary rewards, and the intelligent agent is continuously guided to reach the target position through continuous rewards, and the calculation formula of the angle rewards is as follows:
,
,
In the formula,Taking the integer part of the floating point number, the relation between the head of the intelligent agent and the angle of the target point is expressed as,When the head of the intelligent body is yawed to the right, the yaw angle is positive, otherwise, the yaw angle is negative,Is the head angle of the intelligent body,For the angle of the target point,In order to be a yaw angle,To take the remainder function;
combining the distance and angle components of the target to obtain an auxiliary rewarding functionThe expression is:
,
In the formula,In order to take the form of a dot product,For the purpose of an angular incentive,Awarding a distance.
In summary, the method of the application acquires data and pre-processes the data, carries out format conversion on original images, radars and self state data, constructs a convolutional neural network-long-short-term memory (CNN-LSTM) framework, extracts multisource measurement feature vectors as the input of RND3QN, adopts an improved RND3QN network framework, designs a new rewarding function based on multi-agent position constraint, trains a model through an incremental learning strategy, realizes the collaborative path planning of a plurality of agents, introduces a greedy strategy (e-greedy) and Boltzmann probability selection strategy, avoids convergence to local extremum, adopts an n-step time sequence difference algorithm (n-step bootstrapping) to reduce deviation, improves the accuracy of path planning, reduces the probability of collision in the moving process, shortens the tracking time, enhances the avoidance capability of dynamic obstacles and is beneficial to reasonable scheduling of the multi-intelligent agents.
Referring to fig. 2, a block diagram of a multi-agent path planning system based on an improved RND3QN network according to the present application is shown.
As shown in fig. 2, the multi-agent path planning system 200 includes an acquisition module 210, an extraction module 220, a training module 230, and an output module 240.
The system comprises an acquisition module 210 configured to acquire original image data, radar data and state data of an agent, an extraction module 220 configured to perform feature extraction on the original image data, radar data and state data of the agent according to a preset convolutional neural network-long and short-term memory architecture to obtain multi-source measurement feature vectors, a training module 230 configured to input the multi-source measurement feature vectors to an improved RND3QN network and train the improved RND3QN network through an incremental learning strategy to obtain a path planning model, wherein training the improved RND3QN network comprises introducing a greedy strategy and a Boltzmann probability selection strategy to converge and adopting an n-step time sequence difference algorithm to adjust path deviation, and an output module 240 configured to acquire real-time multi-source measurement feature vectors and input the real-time multi-source measurement feature vectors into the path planning model, and output the path planning model to obtain a new reward path planning function based on the position constraint of a plurality of agents.
It should be understood that the modules depicted in fig. 2 correspond to the various steps in the method described with reference to fig. 1. Thus, the operations and features described above for the method and the corresponding technical effects are equally applicable to the modules in fig. 2, and are not described here again.
In other embodiments, embodiments of the present invention further provide a computer readable storage medium having stored thereon a computer program, the program instructions, when executed by a processor, cause the processor to perform the multi-agent path planning method based on the modified RND3QN network in any of the method embodiments described above;
as one embodiment, the computer-readable storage medium of the present invention stores computer-executable instructions configured to:
acquiring original image data, radar data and state data of an intelligent agent;
Performing feature extraction on the original image data, the radar data and the state data of the intelligent agent according to a preset convolutional neural network-long-short-term memory architecture to obtain a multi-source measurement feature vector;
inputting the multisource measurement feature vector to an improved RND3QN network, training the improved RND3QN network through an incremental learning strategy to obtain a path planning model, wherein training the improved RND3QN network comprises the steps of introducing a greedy strategy and a Boltzmann probability selection strategy to converge, and adopting an n-step time sequence difference algorithm to adjust path deviation;
And acquiring a real-time multi-source measurement feature vector, inputting the real-time multi-source measurement feature vector into the path planning model, and outputting the path planning model to obtain a collaborative path plan of a plurality of agents, wherein the path planning model comprises a new rewarding function based on multi-agent position constraint.
The computer-readable storage medium may include a storage program area that may store an operating system, an application program required for at least one function, and a storage data area that may store data created according to the use of a multi-agent path planning system based on an improved RND3QN network, etc. In addition, the computer-readable storage medium may include high-speed random access memory, and may also include memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the computer readable storage medium optionally includes memory remotely located with respect to the processor, the remote memory being connectable over a network to the improved RND3QN network-based multi-agent path planning system. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the device includes a processor 310 and a memory 320. The electronic device may further comprise input means 330 and output means 340. The processor 310, memory 320, input device 330, and output device 340 may be connected by a bus or other means, for example in fig. 3. Memory 320 is the computer-readable storage medium described above. The processor 310 executes various functional applications of the server and data processing by running non-volatile software programs, instructions and modules stored in the memory 320, i.e. implementing the multi-agent path planning method of the above-described method embodiments based on improving the RND3QN network. The input device 330 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the improved RND3QN network based multi-agent path planning system. The output device 340 may include a display device such as a display screen.
The electronic equipment can execute the method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment may be found in the methods provided in the embodiments of the present invention.
As an embodiment, the electronic device is applied to a multi-agent path planning system based on an improved RND3QN network, and is used for a client, and comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to:
acquiring original image data, radar data and state data of an intelligent agent;
Performing feature extraction on the original image data, the radar data and the state data of the intelligent agent according to a preset convolutional neural network-long-short-term memory architecture to obtain a multi-source measurement feature vector;
inputting the multisource measurement feature vector to an improved RND3QN network, training the improved RND3QN network through an incremental learning strategy to obtain a path planning model, wherein training the improved RND3QN network comprises the steps of introducing a greedy strategy and a Boltzmann probability selection strategy to converge, and adopting an n-step time sequence difference algorithm to adjust path deviation;
And acquiring a real-time multi-source measurement feature vector, inputting the real-time multi-source measurement feature vector into the path planning model, and outputting the path planning model to obtain a collaborative path plan of a plurality of agents, wherein the path planning model comprises a new rewarding function based on multi-agent position constraint.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solutions may be embodied essentially or in part in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the various embodiments or methods of some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims (9)

In the formula,A strategy is extracted for the multi-source measurement feature vector,Based on the spatial distance Pi and the terrain roughnessFor assigning different importance or influence to data points or regions according to different factors,Based on the spatial distance Pi and the terrain roughnessIs used as a function of the covariance of (a),For the localization function of the agent at different spatial distances,For the positioning function of the agent under different terrain roughness,As a function of the weight of the material,In order to be a spatial distance from each other,In order to re-roughen the terrain,For a phase difference of t=1 time from the initial time,A phase difference between time t=2 and time t=1,For a phase difference of time t=3 and time t=2,For a phase difference of time t=4 and time t=3,For a phase difference of time t=5 and time t=4,The vector base at t=1, the vector base at t=2, the vector base at t=3, the vector base at t=4, the vector base at t=5,As a vector base of the initial moment in time,The angle values are the angle values at different moments;
In the formula,Is the calculation formula of the GRU unit,To reset the weight matrix at the last instant of the function,To update the weight matrix at the last moment of the function,To conceal the bias vector of the state,For the weight matrix of the hidden state,For the weight matrix of the last moment of the hidden state,In order to multiply by the element(s),In the hidden state of the last moment,As the information of the current moment of time,In order to update the function,In order to update the bias vector of the function,In order to reset the function,As a function of sigmoide,As a result of the candidate hidden state,In order to reset the weight matrix of the function,As a bias vector for the weight function,To pass on to the hidden state at the next instant,In order to update the weight matrix of the function,Is a tanh function;
CN202510437370.XA2025-04-092025-04-09 A multi-agent path planning method and system based on improved RND3QN networkWithdrawnCN119960464A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510437370.XACN119960464A (en)2025-04-092025-04-09 A multi-agent path planning method and system based on improved RND3QN network

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510437370.XACN119960464A (en)2025-04-092025-04-09 A multi-agent path planning method and system based on improved RND3QN network

Publications (1)

Publication NumberPublication Date
CN119960464Atrue CN119960464A (en)2025-05-09

Family

ID=95586456

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510437370.XAWithdrawnCN119960464A (en)2025-04-092025-04-09 A multi-agent path planning method and system based on improved RND3QN network

Country Status (1)

CountryLink
CN (1)CN119960464A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120008616A (en)*2025-04-182025-05-16华东交通大学 A method and system for generating a robot navigation path

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116661503A (en)*2023-08-022023-08-29中国人民解放军96901部队Cluster track automatic planning method based on multi-agent safety reinforcement learning
CN117218892A (en)*2023-08-302023-12-12松立控股集团股份有限公司Autonomous parking system and method based on vision synchronous positioning and mapping
CN118759846A (en)*2024-07-042024-10-11葳迪易(苏州)信息科技有限公司 Multi-agent deep reinforcement learning path planning method based on improved A* heuristic

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116661503A (en)*2023-08-022023-08-29中国人民解放军96901部队Cluster track automatic planning method based on multi-agent safety reinforcement learning
CN117218892A (en)*2023-08-302023-12-12松立控股集团股份有限公司Autonomous parking system and method based on vision synchronous positioning and mapping
CN118759846A (en)*2024-07-042024-10-11葳迪易(苏州)信息科技有限公司 Multi-agent deep reinforcement learning path planning method based on improved A* heuristic

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
呙力平,等: "激光雷达SLAM下移动机器人双目视觉全局定位", 《激光杂志》, 30 April 2024 (2024-04-30), pages 103 - 107*
殷焱: "基于深度强化学习的未知环境下移动机器人自主导航研究", 《中国优秀硕士学位论文全文数据库》, 15 March 2025 (2025-03-15), pages 21 - 36*
王猛,等: "基于改进Q 学习 算法的无人水面艇动态环境路径规划", 《仪表技术》, 30 April 2020 (2020-04-30), pages 17 - 20*
陈洪: "基于多智能体强化学习的协同决策方法研究", 《中国优秀硕士学位论文全文数据库》, 4 May 2023 (2023-05-04), pages 12 - 15*
黄炜昭,等: "基于激光雷达与IMU融合的变电站智能巡检机器人视觉导航方法", 《机械设计与制造工程》, 31 December 2024 (2024-12-31), pages 42 - 46*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120008616A (en)*2025-04-182025-05-16华东交通大学 A method and system for generating a robot navigation path
CN120008616B (en)*2025-04-182025-07-22华东交通大学Method and system for generating robot navigation path

Similar Documents

PublicationPublication DateTitle
CN111061277B (en)Unmanned vehicle global path planning method and device
Akan et al.Stretchbev: Stretching future instance prediction spatially and temporally
CN119960464A (en) A multi-agent path planning method and system based on improved RND3QN network
Shen et al.Parallel sensing in metaverses: Virtual-real interactive smart systems for “6S” sensing
CN115454096B (en)Course reinforcement learning-based robot strategy training system and training method
CN119649257A (en) Drone return detection method and system based on AI recognition
CN117572885B (en)Night tracking method, system and related device based on thermal infrared camera of unmanned aerial vehicle
Chen et al.SGSR-Net: structure semantics guided LiDAR super-resolution network for indoor LiDAR SLAM
Gao et al.Joint optimization of depth and ego-motion for intelligent autonomous vehicles
CN116679710A (en)Robot obstacle avoidance strategy training and deployment method based on multitask learning
CN117036607A (en)Automatic driving scene data generation method and system based on implicit neural rendering
CN118244260A (en) Fuzzy deep learning single target tracking system based on generative adversarial network
Cao et al.Unsupervised visual odometry and action integration for PointGoal navigation in indoor environment
CN120141518A (en) A navigation method for unmanned vehicles based on multimodal fusion
Xu et al.Vision-aided intelligent and adaptive vehicle pose estimation during GNSS outages
CN113807417B (en)Dense matching method and system based on deep learning visual field self-selection network
Wang et al.Intelligent path planning algorithm of Autonomous Underwater Vehicle based on vision under ocean current
CN119336045A (en) Distributed control method for target tracking based on time-space fusion optimization
Wang et al.A Model Stacking Algorithm for Indoor Positioning System using WiFi Fingerprinting.
CN118896610A (en) UAV route planning method and system based on deep reinforcement learning
CN117880770A (en) A remote positioning method combining AR camera with Internet of Things system
CN115494859B (en) An autonomous obstacle avoidance method for drone swarm based on transfer learning of pigeon intelligence
Lai et al.Deep Learning-Based Multi-Modal Fusion for Robust Robot Perception and Navigation
CN111486847B (en) Method and system for unmanned aerial vehicle navigation
Wang et al.Intelligent situation awareness based on the fractal dimension and multidimensional reconstruction

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
WW01Invention patent application withdrawn after publication
WW01Invention patent application withdrawn after publication

Application publication date:20250509


[8]ページ先頭

©2009-2025 Movatter.jp