Movatterモバイル変換


[0]ホーム

URL:


CN120440311A - Spacecraft morphology control method and system based on evolutionary algorithm and reinforcement learning - Google Patents

Spacecraft morphology control method and system based on evolutionary algorithm and reinforcement learning

Info

Publication number
CN120440311A
CN120440311ACN202510942197.9ACN202510942197ACN120440311ACN 120440311 ACN120440311 ACN 120440311ACN 202510942197 ACN202510942197 ACN 202510942197ACN 120440311 ACN120440311 ACN 120440311A
Authority
CN
China
Prior art keywords
spacecraft
population
reinforcement learning
individuals
elite
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510942197.9A
Other languages
Chinese (zh)
Inventor
郭鹏宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Defense Technology Innovation Institute PLA Academy of Military Science
Original Assignee
National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Defense Technology Innovation Institute PLA Academy of Military SciencefiledCriticalNational Defense Technology Innovation Institute PLA Academy of Military Science
Priority to CN202510942197.9ApriorityCriticalpatent/CN120440311A/en
Publication of CN120440311ApublicationCriticalpatent/CN120440311A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

Translated fromChinese

本发明公开了基于进化算法和强化学习的航天器形态控制方法及系统,具体为:S1.建立初始种群,种群中的个体为不同功能模块和不同模块数量组合的不同形态的航天器;S2.对种群中的所有个体进行内环初始化学习训练,计算每个个体的适应度值;S3.选择适应度值较高的个体组成精英种群;S4.利用遗传算法中的遗传和变异操作,对精英种群进行均匀交叉、单点变异,生成精英子代;S5.内环强化学习:对步骤S4中获得精英子代进行学习训练;S6.形态评估;S7.形成最优个体。本发明充分结合空间环境和任务需求的特点,基于深度进化强化学习的内外环算法架构,通过外环形态进化和内环学习训练不断交替,实现模块化航天器的形态自主生成。

The present invention discloses a spacecraft morphology control method and system based on evolutionary algorithms and reinforcement learning, specifically: S1. Establishing an initial population, in which individuals in the population are spacecraft of different morphologies with different functional modules and different module quantity combinations; S2. Performing inner-loop initialization learning training on all individuals in the population, and calculating the fitness value of each individual; S3. Selecting individuals with higher fitness values to form an elite population; S4. Using the inheritance and mutation operations in the genetic algorithm, performing uniform crossover and single-point mutation on the elite population to generate elite offspring; S5. Inner-loop reinforcement learning: performing learning and training on the elite offspring obtained in step S4; S6. Morphology evaluation; S7. Forming the optimal individual. The present invention fully combines the characteristics of the space environment and mission requirements, and based on the inner and outer loop algorithm architecture of deep evolutionary reinforcement learning, realizes the autonomous generation of the morphology of modular spacecraft through the continuous alternation of outer-loop morphological evolution and inner-loop learning and training.

Description

Spacecraft morphology control method and system based on evolutionary algorithm and reinforcement learning
Technical Field
The invention relates to the field of spacecraft design optimization and autonomous control, in particular to a spacecraft morphology control method and system based on an evolutionary algorithm and reinforcement learning.
Background
The traditional spacecraft adopts a highly customized design method, the topological structure and the adaptability to tasks are relatively fixed, and functions are required to be designed in advance according to specific task requirements. The structure, the function, the operation mode and the like are not basically changed during the track operation, and the problems of high research and development cost, poor flexibility, long development period and the like exist. The modular spacecraft can effectively solve the problems.
The modular spacecraft can realize the form change of flexible combination of different modules by defining the existing single machine or subsystem as a standardized module and designing a controllable connecting mechanism, so that the economic and efficient batch production is realized by virtue of the advantages of low research and development cost and the like of the standardized module, the service life of the spacecraft is obviously prolonged by upgrading and replacing the modules, the design of the whole satellite system is simplified, the production and development period is shortened, and the requirements of complex space environment and diversified tasks are better met.
However, existing modular spacecraft still face limitations in the way of reconstruction in actual operation. The reconstruction mode mainly depends on manual operation of astronauts or auxiliary operation of an orbital robot, and has the problems of limited reconstruction opportunity, poor autonomy, low efficiency, inflexibility and the like.
Meanwhile, the traditional spacecraft has a fixed and unchanged structural design, and functional design is needed to be carried out in advance according to task requirements. The existing modularized spacecraft reconstruction mode mainly depends on manual operation of astronauts or auxiliary operation of an orbit robot, and in the operation process, the on-orbit autonomous reconstruction capability in the true sense is difficult to realize.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to provide a spacecraft morphology control method based on an evolutionary algorithm and reinforcement learning. It is another object of the present invention to provide a spacecraft morphology control system based on evolutionary algorithm and reinforcement learning that implements the above method.
In order to achieve the above purpose, the spacecraft morphology control method based on the evolutionary algorithm and reinforcement learning of the invention specifically comprises the following steps:
s1, establishing an initial population, wherein individuals in the population are spacecraft with different forms of different functional modules and different module quantity combinations;
S2, performing inner loop initialization learning training on all individuals in the population, and calculating the fitness value of each individual;
s3, selecting individuals with higher fitness values to form elite populations;
s4, carrying out uniform crossing and single-point mutation on elite population by utilizing genetic and mutation operation in a genetic algorithm to generate elite filial generation;
S5, inner ring reinforcement learning, namely learning and training elite offspring obtained in the step S4;
s6, morphological evaluation, namely analyzing the task completion condition of the spacecraft in the current task scene to obtain a comprehensive morphological evaluation result, taking the evaluation result as an adaptability value of the spacecraft morphology and taking the evaluation result as an evaluation basis of external circulation evolution;
S7, forming optimal individuals, namely adding elite offspring after the inner ring reinforcement learning training into an original population, sequencing individuals in the population according to the fitness value, eliminating individuals with the lowest fitness until the number of the individuals in the population is consistent with the number of the initial population, and outputting the optimal individuals when the outer circulation reaches a set evolution algebra.
Further, the steps S1, S2, S3, S4, S7 are external cyclic morphological evolution processes, and the steps S5, S6 are internal cyclic reinforcement learning processes.
Further, the internal circulation comprises an intelligent body, a sensor, a controller and an actuator, wherein the intelligent body is a research object, namely an individual in a population, and the sensor, the controller and the actuator realize the sensing and control actions of the intelligent body.
Further, the reward value obtained in the inner loop learning training process is used for calculating the fitness value of the population individuals, and is output to the outer loop.
Further, the outer circulation completes the evolution of the population generation by generation according to the fitness value of the population individuals, and the evolution of the spacecraft morphology is realized.
Further, in step S1, defineAs an initial population set in a genetic algorithm, the initial populations are co-populatedIndividual.
Further, in step S3, the initial population is randomly selectedIndividuals, develop tournaments, co-mingleGroup tournaments, the winners of each group tournament, i.e. the individuals with highest fitness values, are used as parents to form elite populations consisting ofIndividual constitution.
Further, in step S4, elite populations are obtained by uniform crossover and single point mutation based on elite genetic algorithm by using crossover mutation operators in population iterationsElite offspring.
Further, in step S5, the inner loop reinforcement learning stage employs a nested pair of PPO algorithmsAnd (5) carrying out learning training on the elite offspring.
The spacecraft morphology control system based on the evolutionary algorithm and the reinforcement learning is used for implementing the spacecraft morphology control method based on the evolutionary algorithm and the reinforcement learning.
The method fully combines the characteristics of space environment and task requirements, realizes the morphological autonomous generation of the modularized spacecraft by continuously alternating the outer ring morphological evolution and the inner ring learning training based on the inner and outer ring algorithm architecture of deep evolution reinforcement learning.
Drawings
FIG. 1 is a frame diagram of the present invention;
FIG. 2 is a flow chart of the present invention;
FIG. 3 is a block diagram of an inner loop reinforcement learning algorithm;
FIG. 4 is a graph of a gesture track rewarding function;
FIG. 5 is a block diagram of an outer loop evolutionary algorithm;
Fig. 6 is a graph of the outer loop population fitness value change process.
Detailed Description
The following description of the embodiments of the present invention will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, unless explicitly stated or limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected, mechanically connected, electrically connected, directly connected, indirectly connected via an intervening medium, or in communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
Specific embodiments of the present invention are described in detail below with reference to fig. 1-6. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
The spacecraft form control method and system based on the evolutionary algorithm and the reinforcement learning, the inner and outer ring algorithm frames based on the deep evolutionary reinforcement learning, the evolutionary algorithm and the reinforcement learning are utilized to realize the modularized spacecraft form optimization oriented to different task scenes, and the structure frame diagram is shown in figure 1.
Wherein, defineAs an initial population set in the genetic algorithm,,For individuals in the population, each individual represents a different form of spacecraft, namely spacecraft with different functional modules and different module quantity combinations, and the initial population is commonIndividual.
It is clear that the invention defines the shape of the spacecraft as a configuration, namely, different functional modules and different module numbers are combined to form the spacecraft.
The outer circulation is the process of realizing the interplanting evolution of the population:
Firstly, an initial population is established, then, all individuals in the population are subjected to inner loop initialization learning training, and the fitness value of each individual is calculated. Here, the fitness value is a calculation result of the fitness function, and is a quantization index for measuring the quality of the individual, and is determined by a morphological evaluation rule set forth later. Individuals with higher fitness values are then selected to make up the elite population. And carrying out uniform crossing and single-point mutation on elite population by utilizing genetic and mutation operation in a genetic algorithm to generate elite filial generation.
The internal circulation is to realize the life evolution process of individuals in the population:
The internal circulation comprises an intelligent body, a sensor, a controller and an actuator. The intelligent agent is a research object, namely population individuals, and the sensor, the controller and the executor realize the sensing and control behaviors of the intelligent agent.
Firstly, setting a training environment under the spacecraft form given by external circulation, continuously interacting population individuals with a task scene, training an optimal controller, an output configuration and a gesture track control strategy according to a set form evaluation rule, and forming the current generation of optimal configuration according to a reward value obtained by completing a task. The reward value obtained in the internal circulation learning training process is used for calculating the fitness value of the population individuals and is output to the external circulation. And the outer circulation completes the evolution of the population generation by generation according to the fitness value of the population individuals, and the evolution of the spacecraft morphology is realized.
The invention relates to a spacecraft morphology control method based on an evolutionary algorithm and reinforcement learning, which comprises the following specific method steps of:
① Initial population generation, ② inner loop initial learning training, ③ acquisition of elite population, ④ generation of elite offspring, ⑤ inner loop reinforcement learning, ⑥ morphological assessment, ⑦ formation of optimal individuals. Wherein ①、③、④、⑦ is an outer loop morphological evolution algorithm, and ②、⑤、⑥ is an inner loop reinforcement learning algorithm.
① Generating an initial population:
in the initial stage, the modularized spacecraft is firstly modeled according to different functional modules and different module numbers. Defining vectorsThe number of functional modules and the functional modules inside the spacecraft are represented as follows.
(1) ;
Wherein, theRepresenting the number of functional modules in the modular spacecraft, in particular,Representing the number of attitude and orbit control modules,Indicating the number of propulsion modules,Indicating the number of energy modules,Indicating the number of the management and control modules,Representing the number of calculation modules,Indicating the number of communication modules,Indicating the number of optical sensing modules,Indicating the number of radar sensing modules,Indicating the number of the electronic positioning modules,Indicating the number of emi modules. According to the function modules in the spacecraft, the number of the function modules and the connection constraint among the function modules, population individuals are randomly generated, and then an initial population of the modularized spacecraft is established.
② Inner loop initial chemistry training:
And sending the individuals in the initial population to the inner ring to interact with the task scene, and performing initial learning training to obtain the fitness value of each individual in the population. In the inner loop training process, a multi-process mode is used for distributing independent processes for learning training of each individual, so that parallel computing is realized, and computing efficiency is improved. The individual fitness function is expressed as:
(2);
Wherein, theRepresenting the number of typical task scenarios,Representing a spacecraft in a mission scenarioThe prize value obtained by the inner loop reinforcement learning,Representing task scenesIs a higher threshold for the prize value of (c),Represents the standard weight of the reward function for unifying the magnitude of the reward value and fitness value.
And in the inner ring reinforcement learning stage, a nested PPO algorithm is adopted to learn and train individuals in the whole initial population. The PPO algorithm adopts an Actor-Critic architecture to realize spacecraft configuration and attitude orbit control strategies. Wherein the Actor model uses a neural networkFitting a control strategy function of a spacecraft, whereinIs an Actor network parameter to be optimized. Actor network uses environmental situation information of task sceneOutput spacecraft allosteric and maneuver strategiesCritic model also fits an evaluation function from a neural networkWhereinIs a network parameter.Output ofTo evaluate the policy of the Actor. The PPO algorithm needs to sample a large amount of data to form an experience buffer to supply training to the neural network and update the strategy, while a single Actor network cannot perform training and updating of network parameters when performing the sampling task. So design a sampling networkSpecially adapted for empirical sampling, whereThe network parameters are sampled, and the Actor network is only responsible for continuously updating parameters and policies by using empirical data.
The PPO algorithm targets the cost function with the desire to maximize the dynamic weight dominance function as follows:
(3);
Wherein, the,Respectively expressed in the task process, the current time stepThe environmental state of the task scene and the action taken by the Actor network; expressed in the current parametersUnder the following strategy, the task scene is acted by the spacecraftFrom the stateTransition to the next stateIs a function of the probability of (1),Then expressed in parametersProbability below. The objective of the optimization function is to maximize the dominance functionAt probability weightingExpectations of the aboveWherein the dominance function can be expressed as:
(4);
Wherein, theRepresenting a spacecraft in time stepsEnvironmental rewards achieved by reinforcement learning the reward function,Indicating that the bonus discount factor is to be applied,Representing slave statesIn response to the desired cumulative discount prize being obtained,Representing slave statesThe desired cumulative discount prize obtained.
The reinforcement learning training framework nested by the double PPO algorithm is shown in figure 3. Will beIndividual vectorsAnd inputting the inner ring configuration reinforcement learning. Processing vectors in orderFunctional modules corresponding to the elementsA pose rail control module,The number of propulsion modules.The electromagnetic interference modules) are randomly addressed for each module in sequence according to module connection constraint and front-back installation sequence (attitude and orbit control, propulsion. According to the method, a single attitude and orbit control module is used as an initial unit, a connection space with a current configuration meeting connection constraint is solved through iteration, and a new module is added from a random selected position to update the configuration until all functional modules are traversed. The method can obtain the initial configuration with connectivity, rationality and randomness.
The configuration strategy output by the Actor1 network of the configuration PPO algorithm is a set of action sequences, and then the initial configuration is transformed. The sequence of actions specifies the flip direction of each module. And processing each action sequentially, namely calculating the motion space of the corresponding module, and executing the overturn and updating the configuration if the overturn is feasible (in the motion space) and the configuration after the overturn meets the connectivity and the constraint. And iterating the process based on the new configuration until the sequence is processed, and obtaining the training configuration.
And inputting the training configuration into gesture track control reinforcement learning, and obtaining a gesture track control strategy under a typical task scene through reinforcement learning. The training configuration completes state transition by executing an Actor2 network output attitude and orbit control strategy, and obtains a new attitude and orbit control reinforcement learning environment state and a corresponding rewarding value. Reward function for gesture track control reinforcement learningConsists of action rewards, boundary penalties, success rewards, and failure penalties. And the Critic2 network evaluates the control strategy of the Actor2 according to the attitude and orbit control reinforcement learning environment state and the rewarding value, and optimizes the network parameters of the Actor2 according to the evaluation, so as to improve the attitude and orbit control performance. The graph of the attitude and orbit control reward function under the invention is shown in figure 4. And meanwhile, evaluating the gesture track control effect of the controller through the efficiency evaluation rule, and outputting the evaluation result to the configuration reinforcement learning.
And then the Critic1 network evaluates the configuration transformation strategy of the Actor1 through the current state information and evaluation information, and completes parameter adjustment of the Actor1 network. Configuration reinforcement learning evaluates the strategy of Actor1 by a configuration rewarding function (numerical form) thatThe expression of (2) is as follows
;
Wherein, theRespectively represents the task completion rate rewards, the task completion time rewards, the task cost penalties, the platform cost penalties and the accumulated rewards of reinforcement learning in the internal circulation typical task scene interaction process of the spacecraft, andThe calculation method of (3) is shown as "morphology evaluation ⑥".
③ Obtaining elite population:
random selection from an initial populationIndividuals, develop tournaments, co-mingleGroup tournaments. Wherein the method comprises the steps ofFor the size of a tournament, this refers to the number of individuals randomly selected for comparison per competition. Winners of each group of tournaments, i.e. individuals with highest fitness values, as parents form elite populations consisting ofIndividual constitution.
④ Producing elite offspring:
elite population is obtained by uniform crossing and single-point variationElite offspring. Based on elite genetic algorithm, by using cross mutation operator in population iteration, population diversity is improved, algorithm optimizing space is enlarged, and spacecraft morphology is fully evolved. The crossover operator adopts a uniform crossover method. Traversing the population, when an individual triggers a crossover algorithm through crossover probability, the individual becomes a male parent, and then searching another individual as a female parent for uniform crossover. The chromosomes of an individual are traversed, i.e., each locus on their morphovariable matrix is traversed. If the gene positions trigger uniform crossing through uniform crossing probability, genes on the corresponding gene positions of the male parent and the female parent are exchanged, namely the number of the corresponding functional modules of the individual is adjusted. In this process, it is checked whether the module class corresponding to the genetic locus meets a specific module connection constraint among model constraints. If a specific modular linkage exists, the genes of the specific linked modules at the parent crossover gene locus are simultaneously transformed to ensure that the progeny always meet the modular constraints. If the uniform crossing is not triggered, keeping the gene position unchanged, and continuing to move to the next operation position until all the gene positions of the morphological variable matrix are completely traversed, and completing one uniform crossing.
The mutation operator adopts a single-point mutation method. Traversing the population, and selecting the chromosome of the individual to operate when the individual triggers a mutation algorithm through mutation probability, namely randomly selecting a gene position in a morphological variable matrix of the individual to carry out single-point mutation. In the single-point mutation process, the number of the genes on the gene position, namely the corresponding functional modules, is adjusted, so that random mutation reconstruction is ensured within the range allowed by the constraint of the number of the modules. Meanwhile, whether the module type corresponding to the gene position accords with a specific module connection constraint in the model constraint is checked. If specific module connection exists, the number of modules on the specific connection module gene positions corresponding to the variant gene positions is synchronously changed, so that the offspring can always meet the module constraint. And finishing the primary morphological variable mutation operation through the flow. The complete outer loop evolutionary algorithm structure is shown in figure 5.
⑤ Inner ring reinforcement learning:
The inner loop reinforcement learning phase invokes the same nested PPO algorithm as the ② th step initial learning training. Unlike ②, in this step, the whole population of individuals is not required to be input into the inner loop reinforcement learning, but only obtained in ④ in the outer circulationAnd learning and training the elite offspring, and obtaining the fitness value of the elite offspring through a fitness function.
⑥ Morphological assessment:
And analyzing the task completion condition of the spacecraft in the current task scene, and providing a comprehensive morphological evaluation method. The reward function value obtained by training in the current task scene and the task execution condition are taken as the basis of comprehensive morphological evaluation, and the design comprises the task completion rateTime of task completionTask rewardsCost ofCost of platformSpacecraft morphology evaluation rules including indexes. Task completion rateEstimating task execution progress and task completion time of spacecraftThe number of the training steps of the gesture track control reinforcement learning is evaluated, and the expression is as follows:
;
Wherein, theThe number of time steps for the gesture track control task is executed,And completing the mark for the gesture track control task. The training task of the screen reinforcement learning is completed,=1, Vice versa=0. Task rewardsMeaning that the pose and orbit control reinforcement learning training rewards function value and costThe energy consumption for evaluating the task execution process of the spacecraft is expressed as follows:
;
Wherein, theIs the firstThe control force output by the secondary spacecraft,The number of time steps is performed for the gesture track task. Platform costThe total number of functional modules required by the spacecraft to complete the task is estimated, and the expression is as follows:
;
Wherein, theIs the total number of each functional module.
Awarding a configuration to a functionThe method is used as an adaptability value of spacecraft morphology and is used as an evaluation basis of external circulation evolution to support morphology evolution.
⑦ Optimal individuals:
And (3) finishing inner-loop reinforcement learning training, adding elite offspring with obtained fitness values into the original population where the father is located, and expanding the population individual number. And then sequencing individuals in the population according to the fitness value, eliminating individuals with the lowest fitness until the number of individuals in the population is consistent with the number of initial population, and outputting optimal individuals when the outer circulation reaches a certain evolution algebra. The outer loop population fitness value change process is shown in fig. 6.
Aiming at the problems that the traditional structure fixed spacecraft has long research and development period, high development cost, functional solidification, difficulty in meeting the flexible response of microsatellites to complex external environments and the like, the invention carries out modularized modeling on the spacecraft. And then adopting an inner and outer ring algorithm framework of deep evolution reinforcement learning, wherein the outer ring realizes the morphological evolution of the spacecraft by carrying out topological recombination on the functional module through elite genetic algorithm, the inner ring adopts the dynamic weight dominance function expectation maximization as a target cost function, reinforcement learning is utilized to complete spacecraft configuration strategy and attitude orbit control strategy learning, data generated in the inner ring learning training process are used for calculating the fitness value of an individual in the population and are output to the outer ring, and the outer ring realizes the morphological evolution of the spacecraft according to the fitness value of the individual in the population until the population converges to the optimal individual.
The method is carried out continuously and alternately through morphological evolution and training, so that the morphological optimization and control of the modularized spacecraft are realized, and the aims of improving the environmental adaptability, the quick response and the task agility of the spacecraft are fulfilled.
Any process or method description in a flowchart of the invention or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, which may be implemented in any computer-readable medium for use by an instruction execution system, apparatus, or device, which may be any medium that contains a program for storing, communicating, propagating, or transmitting for use by the execution system, apparatus, or device. Including read-only memory, magnetic or optical disks, and the like.
In the description herein, reference to the term "embodiment," "example," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the different embodiments or examples described in this specification and the features therein may be combined or combined by those skilled in the art without creating contradictions.
While embodiments of the present invention have been shown and described, it will be understood that the embodiments are illustrative and not to be construed as limiting the invention, and that various changes, modifications, substitutions and alterations may be made by those skilled in the art without departing from the scope of the invention.

Claims (10)

CN202510942197.9A2025-07-092025-07-09Spacecraft morphology control method and system based on evolutionary algorithm and reinforcement learningPendingCN120440311A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510942197.9ACN120440311A (en)2025-07-092025-07-09Spacecraft morphology control method and system based on evolutionary algorithm and reinforcement learning

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510942197.9ACN120440311A (en)2025-07-092025-07-09Spacecraft morphology control method and system based on evolutionary algorithm and reinforcement learning

Publications (1)

Publication NumberPublication Date
CN120440311Atrue CN120440311A (en)2025-08-08

Family

ID=96610029

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510942197.9APendingCN120440311A (en)2025-07-092025-07-09Spacecraft morphology control method and system based on evolutionary algorithm and reinforcement learning

Country Status (1)

CountryLink
CN (1)CN120440311A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107608208A (en)*2017-08-242018-01-19南京航空航天大学A kind of in-orbit reconstructing method of spacecraft attitude control system of oriented mission constraint
CN111204476A (en)*2019-12-252020-05-29上海航天控制技术研究所Vision-touch fusion fine operation method based on reinforcement learning
CN114036631A (en)*2021-10-222022-02-11南京航空航天大学Spacecraft autonomous rendezvous and docking guidance strategy generation method based on reinforcement learning
CN114118000A (en)*2021-10-252022-03-01浙江工业大学PCB splicing and blanking method based on deep intelligent genetic optimization algorithm
US20220227503A1 (en)*2021-01-042022-07-21University Of Southern CaliforniaUsing genetic algorithms for safe swarm trajectory optimization
US20220363415A1 (en)*2021-05-122022-11-17Orbital AI LLCDeep reinforcement learning method for controlling orbital trajectories of spacecrafts in multi-spacecraft swarm
CN115758981A (en)*2022-11-292023-03-07东南大学Layout planning method based on reinforcement learning and genetic algorithm
CN116611505A (en)*2023-07-172023-08-18中南大学Satellite cluster observation task scheduling method, system, equipment and storage medium
CN117420841A (en)*2023-09-282024-01-19江西省军民融合研究院Unmanned aerial vehicle navigation and obstacle avoidance method based on evolutionary computation and reinforcement learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107608208A (en)*2017-08-242018-01-19南京航空航天大学A kind of in-orbit reconstructing method of spacecraft attitude control system of oriented mission constraint
CN111204476A (en)*2019-12-252020-05-29上海航天控制技术研究所Vision-touch fusion fine operation method based on reinforcement learning
US20220227503A1 (en)*2021-01-042022-07-21University Of Southern CaliforniaUsing genetic algorithms for safe swarm trajectory optimization
US20220363415A1 (en)*2021-05-122022-11-17Orbital AI LLCDeep reinforcement learning method for controlling orbital trajectories of spacecrafts in multi-spacecraft swarm
CN114036631A (en)*2021-10-222022-02-11南京航空航天大学Spacecraft autonomous rendezvous and docking guidance strategy generation method based on reinforcement learning
CN114118000A (en)*2021-10-252022-03-01浙江工业大学PCB splicing and blanking method based on deep intelligent genetic optimization algorithm
CN115758981A (en)*2022-11-292023-03-07东南大学Layout planning method based on reinforcement learning and genetic algorithm
CN116611505A (en)*2023-07-172023-08-18中南大学Satellite cluster observation task scheduling method, system, equipment and storage medium
CN117420841A (en)*2023-09-282024-01-19江西省军民融合研究院Unmanned aerial vehicle navigation and obstacle avoidance method based on evolutionary computation and reinforcement learning

Similar Documents

PublicationPublication DateTitle
CN110666793B (en)Method for realizing robot square part assembly based on deep reinforcement learning
CN110977967A (en)Robot path planning method based on deep reinforcement learning
CN110928189A (en)Robust control method based on reinforcement learning and Lyapunov function
CN113485323B (en)Flexible formation method for cascading multiple mobile robots
CN117394461A (en) Supply and demand coordinated control system and method for integrated energy system
Cui et al.Mobile robot sequential decision making using a deep reinforcement learning hyper-heuristic approach
CN118395849A (en)Numerical simulation and parameter optimization design method and system for hydraulic system
CN116175581A (en) A Reinforcement Learning Robot Continuous Action Control Method Based on Random Discrete Policy-Evaluation Network
CN119990246B (en) A multi-agent deep reinforcement learning method based on human guidance
CN117826713A (en) An improved reinforcement learning AGV path planning method
CN113110052A (en)Hybrid energy management method based on neural network and reinforcement learning
CN112434792A (en) A Reinforcement Learning Algorithm for Collaborative Communication and Control of Multi-Agent Systems
CN120440311A (en)Spacecraft morphology control method and system based on evolutionary algorithm and reinforcement learning
CN117748747B (en)AUV cluster energy online monitoring and management system and method
CN118228579A (en) Assembly path planning method based on Soft Actor-Critic algorithm combined with multi-objective part constraints
CN118536684A (en)Multi-agent path planning method based on deep reinforcement learning
CN115016540B (en) A multi-UAV disaster detection method and system
Zhang et al.Towards jumping skill learning by target-guided policy optimization for quadruped robots
Pei et al.Quadruped robot locomotion in unknown terrain using deep reinforcement learning
CN117302204A (en) Multi-style vehicle trajectory tracking and collision avoidance control method and device based on reinforcement learning
CN114676909B (en)Unmanned vehicle charging path planning method based on deep reinforcement learning
CN120143717B (en)Operation control method and system for two-wheeled leg type amphibious robot
CN119458375B (en) Robot control method, device and equipment based on multi-agent reinforcement learning
CN119781491B (en)Unmanned bee colony movement path planning method and system based on attention distribution
CN119879967B (en) A deep reinforcement learning multi-AGV conflict-free path planning method for warehouse environment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp