Movatterモバイル変換


[0]ホーム

URL:


CN112338921A - A fast training method for intelligent control of robotic arm based on deep reinforcement learning - Google Patents

A fast training method for intelligent control of robotic arm based on deep reinforcement learning
Download PDF

Info

Publication number
CN112338921A
CN112338921ACN202011277634.3ACN202011277634ACN112338921ACN 112338921 ACN112338921 ACN 112338921ACN 202011277634 ACN202011277634 ACN 202011277634ACN 112338921 ACN112338921 ACN 112338921A
Authority
CN
China
Prior art keywords
mechanical arm
training
network
reinforcement learning
deep reinforcement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011277634.3A
Other languages
Chinese (zh)
Inventor
冯正勇
赵寅甫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China West Normal University
Original Assignee
China West Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China West Normal UniversityfiledCriticalChina West Normal University
Priority to CN202011277634.3ApriorityCriticalpatent/CN112338921A/en
Publication of CN112338921ApublicationCriticalpatent/CN112338921A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开一种基于深度强化学习的机械臂智能控制快速训练方法,应用于机器人智能控制领域,针对现有的训练方法训练时间长,控制效果差的问题,本发明首先在无物理属性的2D机械臂仿真环境中采用深度强化学习算法进行训练,其训练复杂度极大的降低,使得训练时间极大的缩短,加速了机械臂的控制策略模型的训练;然后将在2D机械臂仿真环境下训练找到最优的状态向量表示和最优的奖励函数形式,作为3D机械臂深度强化学习算法训练的最优状态向量表示,最优奖励函数形式;从而得到3D机械臂的控制模型,采用本发明的方法不仅可以极大缩短训练时间,并且可以使得训练得到的控制策略模型的效果到达应用要求。

Figure 202011277634

The invention discloses a fast training method for intelligent control of robotic arms based on deep reinforcement learning, which is applied to the field of intelligent control of robots. Aiming at the problems of long training time and poor control effect of the existing training methods, the present invention is firstly used in 2D without physical attributes. The deep reinforcement learning algorithm is used for training in the robotic arm simulation environment, which greatly reduces the training complexity, greatly shortens the training time, and accelerates the training of the control strategy model of the robotic arm. Training to find the optimal state vector representation and the optimal reward function form, as the optimal state vector representation and optimal reward function form for the training of the 3D robotic arm deep reinforcement learning algorithm; thus obtaining the control model of the 3D robotic arm, using the present invention The method can not only greatly shorten the training time, but also make the effect of the trained control strategy model meet the application requirements.

Figure 202011277634

Description

Mechanical arm intelligent control rapid training method based on deep reinforcement learning
Technical Field
The invention belongs to the field of intelligent control of robots, and particularly relates to an intelligent control technology for a mechanical arm.
Background
The artificial intelligence algorithm is widely applied to robot control, the robot control algorithm gradually transfers from equation solution to data drive, and more robot control adopts the artificial intelligence algorithm. The design adopts a deep reinforcement learning algorithm DDPG (deep Deterministic Policy gradient) to replace a positive (inverse) kinematics resolving method in the traditional control algorithm of the mechanical arm, and a neural network model is obtained through data-driven training directly to control the tail end of the mechanical arm to reach a target position. According to the method, the trained model can be rapidly deployed on the mechanical arm control platform, so that the mechanical arm can be rapidly moved to any given target position point, the mechanical arm is trained by using a deep reinforcement learning algorithm DDPG in a simulation environment, a training mode of firstly carrying out 2D modeling and then carrying out 3D modeling is adopted, the training time is greatly shortened, finally, the trained algorithm model is realized and verified on a real mechanical arm, and the control effect of the trained algorithm model meets the application requirement.
In the deep reinforcement learning algorithm, there are the following 5 major elements: agent, Environment, Action, State, Reward, and Reward. As shown in fig. 1, the agent interacts with the environment in real time, and after observing a state, the agent outputs an action according to a policy model, and the action acts on the environment to influence the state, and in addition, the environment gives a reward to the agent according to the action and the state, and the agent updates the policy model of selecting the action according to the action state and the reward. By trying continuously in the environment, the maximum reward is obtained, and the mapping from state to action is learned, namely the strategy model, or simply the model, which is expressed by a parameterized deep neural network.
The current DDPG algorithm is already widely used in the intelligent control of the robot arm, but the following difficulties still exist in implementation:
1. the data-driven deep reinforcement learning algorithm acquires data for learning by interacting the simulation mechanical arm and the virtual environment, so that an effective control model is obtained.
2. Aiming at the training process of the mechanical arm, how to set the state parameters of the mechanical arm and the environment and how to set the reward function of the training process, the method ensures that the control effect of the mechanical arm obtained by training reaches the best.
Disclosure of Invention
In order to solve the technical problems, the invention provides a mechanical arm intelligent control quick training method based on deep reinforcement learning, the training time of the method is short, and the obtained model has a good control effect.
The technical scheme adopted by the invention is as follows: a mechanical arm intelligent control rapid training method based on deep reinforcement learning comprises the following steps:
s1, training a 2D mechanical arm by adopting a deep reinforcement learning algorithm DDPG in a physical-attribute-free 2D mechanical arm simulation environment, and finding out an optimal state vector representation and an optimal reward function form;
s2, training the 3D mechanical arm by adopting a deep reinforcement learning algorithm DDPG in a 3D mechanical arm simulation environment with physical attributes, and using an optimal state vector representation and an optimal reward function form in the DDPG to obtain an optimal result obtained in the 2D mechanical arm simulation environment so as to obtain a control strategy model;
and S3, deploying a control strategy model obtained by training in a 3D mechanical arm simulation environment with physical attributes to the real mechanical arm.
The 2D robotic arm comprises an axis a, an axis b, and a tip c; the lengths of the rod ab and the rod bc are L, the axis a is a fixed rotary joint, the axis b is a movable rotary joint, the c is the tail end of the mechanical arm, the included angle between the rod ab and the horizontal line is & lt theta, and the included angle between the rod bc and the horizontal line is & lt alpha.
The optimal state vector obtained in step S1 is:
Figure BDA0002779643410000021
Figure BDA0002779643410000022
where c _ x represents the x-axis coordinate of the robot arm end c, c _ y represents the y-axis coordinate of the robot arm end c, | c _ x-x | represents the x-axis distance of the robot arm end c from the target point, | c _ y-y | represents the y-axis distance of the robot arm end c from the target point, and (x, y) represents the target location point on any given 2D plane.
The optimal reward function obtained in step S1 is:
Figure BDA0002779643410000023
the DDPG includes four neural networks, respectively: the target network and the evaluation network of the Actor and the target network and the evaluation network of the Critic are the same in structure, and the target network and the evaluation network of the Critic are the same in structure.
Using a mean square error loss function
Figure BDA0002779643410000024
Updating parameters of the criticic's evaluation network by gradient back propagation of the neural network; m represents the number of samples of the batch gradient descent, yiTarget Q value of target network representing Critic obtained at ith sample, ω represents parameter of evaluation network of Critic, siRepresenting the state in the ith sample, aiRepresenting the action in the ith sample.
Use of
Figure BDA0002779643410000025
Updating parameters of an evaluation network of the Actor through gradient back propagation of the neural network as a loss function; m denotes the number of samples of the batch gradient descent, ω denotes the parameter of the evaluation network of Critic, siRepresenting the state in the ith sample, aiRepresenting the action in the ith sample.
If T% C is 1, updating the parameter of the Actor target network by θ '← τ θ + (1- τ) θ', and updating the parameter of the criticic target network by ω '← τ ω + (1- τ) ω';
wherein C represents the updating step number of the target network parameters; t denotes the maximum number of iterations, θ denotes a parameter of the evaluation network of Actor, θ 'denotes a parameter of the target network of Actor, ω denotes a parameter of the evaluation network of Critic, ω' denotes a parameter of the target network of Critic, ← denotes assigning the calculation result of the right equation to the left, and τ denotes a soft update weight coefficient.
The invention has the beneficial effects that: according to the method, a deep reinforcement learning algorithm is adopted for training in a physical-attribute-free 2D mechanical arm simulation environment, the training complexity is greatly reduced, the training time is greatly shortened, and the training of a control strategy model of the mechanical arm is accelerated. Meanwhile, the optimal state vector representation and the optimal reward function form are found through training in the 2D mechanical arm simulation environment, so that the convergence speed and the stability of the trained control strategy model are optimal, and the tail end of the mechanical arm can be controlled to quickly reach the target position.
Drawings
FIG. 1 is a flow chart of reinforcement learning;
FIG. 2 is a flow chart of DDPG;
fig. 3 is a schematic diagram of a physical attribute-free 2D robot simulation environment according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a simulation environment of a 3D robot arm with physical properties according to an embodiment of the present invention;
fig. 5 is a schematic view of a real robot arm provided in an embodiment of the present invention.
Detailed Description
In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
The Deep reinforcement learning algorithm used in the invention is a Deep Deterministic Policy Gradient algorithm (DDPG), which combines a Policy Network in a Deterministic Policy Gradient algorithm (DPG), adopts an Actor-critical framework, combines experience playback in a Deep Q Network (DQN, Deep Q-Network) and a technique for separating a target Network (target Network) from an evaluation Network (even Network), and obtains good effect in an environment aiming at a continuous action space. In the DDPG, there are four neural networks, namely a target network and an evaluation network of an Actor and a target network and an evaluation network of Critic, and the structures of the two Actor networks are completely the same, and the structures of the two Critic networks are completely the same.
Fig. 2 is a flowchart of the DDPG algorithm, in which an Actor evaluation network outputs a current action according to a current state, an Actor target network outputs a next action according to a next state, a Critic evaluation network outputs a current Q value according to the current state and the current action, the Critic target network outputs a target Q value according to the next state and the next action, the Actor evaluation network updates itself according to the current Q value, the Critic evaluation network updates itself according to the current Q value, the target Q value, and an award, and parameters of the Actor evaluation network are copied to the target network in a weighted average manner at intervals.
The reference values in FIG. 2 are shown in Table 1:
TABLE 1 meanings of parameters in FIG. 2
Parameter nameMeaning of parameters
SCurrent state
S_Next state
RReward
ActorActor network for outputting actions according to status
CriticThe reviewer network evaluates the actions according to the states
Eval_NetEvaluating a network
Target_NetTarget network
Target_QTarget Q value
TD_ErrorTD error
Critic_TrainFor updating reviewer networks
Policy_GradsPolicy gradient
Actor_TrainFor updating actor networks
The detailed flow of the DDPG algorithm is described as follows:
inputting: the Actor evaluates the network, and the parameter is theta; an Actor target network, the parameter being theta'; critic evaluation network, parameter omega; an Actor target network, wherein the parameter is omega'; an attenuation factor γ; soft update weight coefficient τ; the number m of samples in batch gradient descent; updating step number C of target network parameters; the maximum number of iterations T.
And (3) outputting: the optimal Actor evaluates the network parameter theta, and the optimal Critic evaluates the network parameter omega. The Actor evaluates the network and is the policy model.
1. Randomly initializing theta and omega, enabling theta 'to be theta and omega', and emptying the experience playback set D.
2. Iterations are performed from 1 to T (total training round).
Initializing an initial state s;
fortor evaluation network obtains action a ═ pi based on state sθ(s)+N;
Executing action a to obtain a new state s', rewarding r, and judging whether the state is a termination state done or not;
fourthly, saving the { s, a, r, s', done } in the experience playback set D;
fromEmpirical playback of m samples s uniformly sampled in Di,ai,ri,s′i,donei1,2, m, Actor target network according to s'iOutput a ═ piθ′(s') + N, Critic evaluation network based on si,aiOutput the current Q value Q(s)i,aiω), Critic target network according to s'i,a′iOutput Q '(s'i,a′iω'), calculating the target Q value yi:
Figure BDA0002779643410000051
Using the mean square error loss function
Figure BDA0002779643410000052
Updating a parameter omega of the criticic evaluation network through gradient back propagation of the neural network;
is used
Figure BDA0002779643410000053
Updating a parameter theta of the Actor evaluation network through back propagation of the neural network as a loss function;
if T% C ═ 1 (every C step), then θ '← τ θ + (1- τ) θ', ω '← τ ω + (1- τ) ω', update parameters θ 'and ω' of the Actor target network and Critic target network.
Ninthly, if s 'is in a termination state, the iteration of the round is ended, otherwise, the process returns to s ═ s', and returns to the step (②).
A mechanical arm simulation model is built in a computer virtual environment, a control strategy model of the mechanical arm is obtained through the training process of the DDPG algorithm, and the model is deployed on a real mechanical arm, so that the real mechanical arm can control an end effector to reach any given spatial position in real time, and a control function for further completing an automation task is laid.
The method is based on a deep reinforcement learning algorithm, rapid training of the control model of the mechanical arm is completed, an optimized state vector representation and a stable reward function form are found in the training process, the control model obtained through training can be effectively deployed on the real mechanical arm, any given space target point can be given, the mechanical arm can automatically move the tail end to the position of the space target point, and a foundation is laid for the control application of the mechanical arm.
The method specifically comprises the following steps:
s1, training the mechanical arm by adopting a deep reinforcement learning algorithm DDPG in a physical-attribute-free 2D mechanical arm simulation environment, finding out the optimal state vector representation, and finding out the optimal reward function form.
S2, training the mechanical arm by adopting a deep reinforcement learning algorithm DDPG in a 3D mechanical arm simulation environment with physical attributes, expressing an optimal state vector in the DDPG algorithm, and continuously using an optimal result obtained in the 2D mechanical arm simulation environment in an optimal reward function form.
And S3, directly deploying the control model obtained by training in the 3D mechanical arm simulation environment with physical attributes to a real mechanical arm, wherein the physical attributes of the real mechanical arm and the physical attributes of the 3D simulation mechanical arm are required to be close to the same. And deploying the trained control model on the real mechanical arm. The model can control the movement of the real mechanical arm end to a limited area range of any given space position target point.
In step S1, a depth-enhanced learning algorithm DDPG is used to train a 2D mechanical arm in a physical-attribute-free 2D mechanical arm simulation environment, a schematic diagram of the 2D mechanical arm simulation environment is shown in fig. 3, a side length of a 2D square plane frame is 400 (unit is not limited), a lower left corner is a coordinate origin (0,0), parameters of the 2D mechanical arm include an axis a, an axis b, a terminal c, a rod length L, the axis a is a fixed rotary joint, which is located at a central point of the square, coordinates are (200 ), the axis b is a movable rotary joint, and c is a mechanical arm terminal. The included angle between the rod ab, the rod bc and the horizontal line is &, &, and the central point of the lower left corner black part represents a target position point (x, y) on any given 2D plane.
Training is carried out based on a deep reinforcement learning algorithm DDPG, and the optimal state vector representation is obtained by:
Figure BDA0002779643410000061
Figure BDA0002779643410000062
Figure BDA0002779643410000063
the physical meaning of each parameter entry in the state vector is shown in table 2.
TABLE 2 optimal State vector representation and State vector membership parameters
Figure BDA0002779643410000064
Training is carried out based on a deep reinforcement learning algorithm DDPG, and the optimal reward function form is obtained by:
Figure BDA0002779643410000065
the negative of the linear distance between the end c of the arm and the target point. The reward function parameter description is shown in table 3.
TABLE 3 description of the optimal reward function parameters
Variable of reward functionPhysical significance
c_xX-axis coordinate of end c of arm
c_yY-axis coordinate of end c of arm
xX-axis coordinate of target point
yY-axis coordinate of target point
In step S2, the mechanical arm is trained by using a DDPG algorithm in a 3D mechanical arm simulation environment with physical attributes, the optimal state vector in the DDPG algorithm represents, and the optimal reward function form follows the optimal result obtained in the 2D mechanical arm simulation environment. A schematic diagram of a 3D mechanical arm simulation environment with physical attributes is shown in fig. 4, where the physical attributes of the mechanical arm in the simulation environment include the position relationship, rotation axis, and maximum angular velocity of each joint, and also include the shape, quality, collision detection, etc. of the mechanical arm connecting rod, and the above physical attributes of the simulation model are approximately consistent with the physical attributes of the real mechanical arm. The method comprises the steps of obtaining a mechanical arm control model after training through a deep reinforcement learning algorithm DDPG in a 3D mechanical arm simulation environment with physical attributes, and outputting the angle change quantity of each joint required by the mechanical arm to reach a target position by taking an optimal state vector as input.
Joint_values←Joint_values+DDPG(state)
Joint _ values: angle of each joint of mechanical arm
Ddpg (state): model output mechanical arm joint angle variation
In step S3, the control model is deployed directly onto the real robot arm as shown in fig. 5, and since the 3D simulated robot arm is generated entirely from the real robot arm model, their physical properties are close to unity. And deploying the trained control model on the real mechanical arm. The model can control the movement of the real mechanical arm end to a limited area range of any given space position target point. The joints of the real robot arm shown in fig. 5 include J1, J2, J3, and J4.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (8)

1. A mechanical arm intelligent control rapid training method based on deep reinforcement learning is characterized by comprising the following steps:
s1, training a 2D mechanical arm by adopting a deep reinforcement learning algorithm DDPG in a physical-attribute-free 2D mechanical arm simulation environment, and finding out an optimal state vector representation and an optimal reward function form;
s2, training the 3D mechanical arm by adopting a deep reinforcement learning algorithm DDPG in a 3D mechanical arm simulation environment with physical attributes, and using an optimal state vector representation and an optimal reward function form in the DDPG to obtain an optimal result obtained in the 2D mechanical arm simulation environment so as to obtain a control strategy model;
and S3, deploying a control strategy model obtained by training in a 3D mechanical arm simulation environment with physical attributes to the real mechanical arm.
2. The intelligent control quick training method for the mechanical arm based on the deep reinforcement learning as claimed in claim 1, wherein the 2D mechanical arm comprises an axis a, an axis b and a terminal c; the lengths of the rod ab and the rod bc are L, the axis a is a fixed rotary joint, the axis b is a movable rotary joint, the c is the tail end of the mechanical arm, the included angle between the rod ab and the horizontal line is & lt theta, and the included angle between the rod bc and the horizontal line is & lt alpha.
3. The intelligent control rapid training method for the mechanical arm based on the deep reinforcement learning as claimed in claim 2, wherein the optimal state vector obtained in step S1 is:
Figure FDA0002779643400000011
Figure FDA0002779643400000012
|b_x-x|,i b y i, Indicator); where c _ x represents the x-axis coordinate of the robot arm end c, c _ y represents the y-axis coordinate of the robot arm end c, | c _ x-x | represents the x-axis distance of the robot arm end c from the target point, | c _ y-y | represents the y-axis distance of the robot arm end c from the target point, and (x, y) represents the target location point on any given 2D plane.
4. The method for intelligent mechanical arm control quick training based on deep reinforcement learning of claim 3, wherein the optimal reward function obtained in step S1 is as follows:
Figure FDA0002779643400000013
5. the intelligent control rapid training method for the mechanical arm based on the deep reinforcement learning as claimed in any one of claims 1 to 4, wherein the DDPG comprises four neural networks, which are respectively: the target network and the evaluation network of the Actor and the target network and the evaluation network of the Critic are the same in structure, and the target network and the evaluation network of the Critic are the same in structure.
6. The intelligent control rapid training method for the mechanical arm based on the deep reinforcement learning as claimed in claim 5, wherein a mean square error loss function is used
Figure FDA0002779643400000014
Updating parameters of the criticic's evaluation network by gradient back propagation of the neural network;
where m denotes the number of samples in which the batch gradient decreases, yiTarget Q value of target network representing Critic obtained at ith sample, ω represents parameter of evaluation network of Critic, siRepresenting the state in the ith sample, aiRepresenting the action in the ith sample.
7. The method of claim 5 based on deep reinforcement learningThe intelligent control quick training method of the mechanical arm is characterized by using
Figure FDA0002779643400000021
Updating parameters of an evaluation network of the Actor through gradient back propagation of the neural network as a loss function;
where m denotes the number of samples in which the gradient of the batch decreases, ω denotes the parameter of the evaluation network of Critic, siRepresenting the state in the ith sample, aiRepresenting the action in the ith sample.
8. The method for intelligent control and rapid training of the mechanical arm based on the deep reinforcement learning of claim 5, wherein if T% C ═ 1, the parameters of the Actor target network are updated through θ '← τ θ + (1- τ) θ', and the parameters of the Critic target network are updated through ω '← τ ω + (1- τ) ω';
wherein C represents the updating step number of the target network parameters; t denotes the maximum number of iterations, θ denotes a parameter of the evaluation network of Actor, θ 'denotes a parameter of the target network of Actor, ω denotes a parameter of the evaluation network of Critic, ω' denotes a parameter of the target network of Critic, ← denotes assigning the calculation result of the right equation to the left, and τ denotes a soft update weight coefficient.
CN202011277634.3A2020-11-162020-11-16 A fast training method for intelligent control of robotic arm based on deep reinforcement learningPendingCN112338921A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011277634.3ACN112338921A (en)2020-11-162020-11-16 A fast training method for intelligent control of robotic arm based on deep reinforcement learning

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011277634.3ACN112338921A (en)2020-11-162020-11-16 A fast training method for intelligent control of robotic arm based on deep reinforcement learning

Publications (1)

Publication NumberPublication Date
CN112338921Atrue CN112338921A (en)2021-02-09

Family

ID=74363994

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011277634.3APendingCN112338921A (en)2020-11-162020-11-16 A fast training method for intelligent control of robotic arm based on deep reinforcement learning

Country Status (1)

CountryLink
CN (1)CN112338921A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112975977A (en)*2021-03-052021-06-18西北大学Efficient mechanical arm grabbing depth reinforcement learning reward training method and system
CN113119132A (en)*2021-04-212021-07-16浙江大学Deep sea fine remote control task implementation method based on simulation learning
CN113478486A (en)*2021-07-122021-10-08上海微电机研究所(中国电子科技集团公司第二十一研究所)Robot motion parameter self-adaptive control method and system based on deep reinforcement learning
CN113524173A (en)*2021-06-172021-10-22北京控制工程研究所 An end-to-end intelligent grasping method for extraterrestrial detection samples
CN113967909A (en)*2021-09-132022-01-25中国人民解放军军事科学院国防科技创新研究院Mechanical arm intelligent control method based on direction reward
CN113977583A (en)*2021-11-162022-01-28山东大学 Robot rapid assembly method and system based on near-end strategy optimization algorithm
CN114454160A (en)*2021-12-312022-05-10中国人民解放军国防科技大学Mechanical arm grabbing control method and system based on kernel least square soft Bellman residual reinforcement learning
CN114932546A (en)*2022-03-232022-08-23燕山大学Deep reinforcement learning vibration suppression system and method based on unknown mechanical arm model
CN114939870A (en)*2022-05-302022-08-26兰州大学Model training method and device, strategy optimization method, equipment and medium
CN115366099A (en)*2022-08-182022-11-22江苏科技大学Mechanical arm depth certainty strategy gradient training method based on forward kinematics
CN115464659A (en)*2022-10-052022-12-13哈尔滨理工大学Mechanical arm grabbing control method based on deep reinforcement learning DDPG algorithm of visual information
CN115674191A (en)*2022-10-082023-02-03广东工业大学 A method and system for controlling a robotic arm based on a digital twin
CN116038691A (en)*2022-12-082023-05-02南京理工大学 A Continuum Manipulator Motion Control Method Based on Deep Reinforcement Learning
CN116061190A (en)*2023-03-142023-05-05浙江大学 A Method of Using Curriculum Learning to Train Robotic Arms to Complete Cloth Folding Tasks
CN117313546A (en)*2023-10-262023-12-29北京大学Trusted smart hand system simulation method and simulation system
CN119165873A (en)*2024-09-132024-12-20中山大学 Two-stage target search and tracking method for UAV based on deep reinforcement learning
WO2025043552A1 (en)*2023-08-302025-03-06西门子股份公司Simulation optimization method and apparatus for automation system

Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108052004A (en)*2017-12-062018-05-18湖北工业大学Industrial machinery arm autocontrol method based on depth enhancing study
US20190126472A1 (en)*2017-10-272019-05-02Deepmind Technologies LimitedReinforcement and imitation learning for a task
CN109906132A (en)*2016-09-152019-06-18谷歌有限责任公司Robotic deep reinforcement learning
CN110370267A (en)*2018-09-102019-10-25北京京东尚科信息技术有限公司Method and apparatus for generating model
CN110450164A (en)*2019-08-202019-11-15中国科学技术大学Robot control method, device, robot and storage medium
CN111390908A (en)*2020-03-262020-07-10哈尔滨工业大学 A web-based virtual dragging method for robotic arms
CN111546349A (en)*2020-06-282020-08-18常州工学院 A new deep reinforcement learning method for humanoid robot gait planning
CN111618847A (en)*2020-04-222020-09-04南通大学Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements
CN111725836A (en)*2020-06-182020-09-29上海电器科学研究所(集团)有限公司 A Demand Response Control Method Based on Deep Reinforcement Learning
US20200306980A1 (en)*2019-03-252020-10-01Dishcraft Robotics, Inc.Automated Manipulation Of Transparent Vessels

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109906132A (en)*2016-09-152019-06-18谷歌有限责任公司Robotic deep reinforcement learning
US20190126472A1 (en)*2017-10-272019-05-02Deepmind Technologies LimitedReinforcement and imitation learning for a task
CN108052004A (en)*2017-12-062018-05-18湖北工业大学Industrial machinery arm autocontrol method based on depth enhancing study
CN110370267A (en)*2018-09-102019-10-25北京京东尚科信息技术有限公司Method and apparatus for generating model
US20200306980A1 (en)*2019-03-252020-10-01Dishcraft Robotics, Inc.Automated Manipulation Of Transparent Vessels
CN110450164A (en)*2019-08-202019-11-15中国科学技术大学Robot control method, device, robot and storage medium
CN111390908A (en)*2020-03-262020-07-10哈尔滨工业大学 A web-based virtual dragging method for robotic arms
CN111618847A (en)*2020-04-222020-09-04南通大学Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements
CN111725836A (en)*2020-06-182020-09-29上海电器科学研究所(集团)有限公司 A Demand Response Control Method Based on Deep Reinforcement Learning
CN111546349A (en)*2020-06-282020-08-18常州工学院 A new deep reinforcement learning method for humanoid robot gait planning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
叶伟杰等: "一种提升机器人强化学习开发效率的训练模式研究", 《广东工业大学学报》*

Cited By (26)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112975977A (en)*2021-03-052021-06-18西北大学Efficient mechanical arm grabbing depth reinforcement learning reward training method and system
CN113119132A (en)*2021-04-212021-07-16浙江大学Deep sea fine remote control task implementation method based on simulation learning
CN113524173B (en)*2021-06-172022-12-27北京控制工程研究所End-to-end intelligent capture method for extraterrestrial exploration sample
CN113524173A (en)*2021-06-172021-10-22北京控制工程研究所 An end-to-end intelligent grasping method for extraterrestrial detection samples
CN113478486A (en)*2021-07-122021-10-08上海微电机研究所(中国电子科技集团公司第二十一研究所)Robot motion parameter self-adaptive control method and system based on deep reinforcement learning
CN113478486B (en)*2021-07-122022-05-17上海微电机研究所(中国电子科技集团公司第二十一研究所)Robot motion parameter self-adaptive control method and system based on deep reinforcement learning
CN113967909A (en)*2021-09-132022-01-25中国人民解放军军事科学院国防科技创新研究院Mechanical arm intelligent control method based on direction reward
CN113977583A (en)*2021-11-162022-01-28山东大学 Robot rapid assembly method and system based on near-end strategy optimization algorithm
CN114454160A (en)*2021-12-312022-05-10中国人民解放军国防科技大学Mechanical arm grabbing control method and system based on kernel least square soft Bellman residual reinforcement learning
CN114454160B (en)*2021-12-312024-04-16中国人民解放军国防科技大学Mechanical arm grabbing control method and system based on kernel least square soft Belman residual error reinforcement learning
CN114932546B (en)*2022-03-232023-10-03燕山大学 A deep reinforcement learning vibration suppression system and method based on an unknown manipulator model
CN114932546A (en)*2022-03-232022-08-23燕山大学Deep reinforcement learning vibration suppression system and method based on unknown mechanical arm model
CN114939870A (en)*2022-05-302022-08-26兰州大学Model training method and device, strategy optimization method, equipment and medium
CN115366099A (en)*2022-08-182022-11-22江苏科技大学Mechanical arm depth certainty strategy gradient training method based on forward kinematics
CN115366099B (en)*2022-08-182024-05-28江苏科技大学 Deep deterministic policy gradient training method for robotic arms based on forward kinematics
CN115464659B (en)*2022-10-052023-10-24哈尔滨理工大学 A robotic arm grasping control method based on deep reinforcement learning DDPG algorithm based on visual information
CN115464659A (en)*2022-10-052022-12-13哈尔滨理工大学Mechanical arm grabbing control method based on deep reinforcement learning DDPG algorithm of visual information
CN115674191A (en)*2022-10-082023-02-03广东工业大学 A method and system for controlling a robotic arm based on a digital twin
CN115674191B (en)*2022-10-082024-05-10广东工业大学Mechanical arm control method and system based on digital twin
CN116038691A (en)*2022-12-082023-05-02南京理工大学 A Continuum Manipulator Motion Control Method Based on Deep Reinforcement Learning
CN116038691B (en)*2022-12-082025-03-07南京理工大学 A continuum robotic arm motion control method based on deep reinforcement learning
CN116061190A (en)*2023-03-142023-05-05浙江大学 A Method of Using Curriculum Learning to Train Robotic Arms to Complete Cloth Folding Tasks
CN116061190B (en)*2023-03-142024-11-15浙江大学 A method for training a robotic arm to complete cloth folding tasks using curriculum learning
WO2025043552A1 (en)*2023-08-302025-03-06西门子股份公司Simulation optimization method and apparatus for automation system
CN117313546A (en)*2023-10-262023-12-29北京大学Trusted smart hand system simulation method and simulation system
CN119165873A (en)*2024-09-132024-12-20中山大学 Two-stage target search and tracking method for UAV based on deep reinforcement learning

Similar Documents

PublicationPublication DateTitle
CN112338921A (en) A fast training method for intelligent control of robotic arm based on deep reinforcement learning
CN114603564B (en) Robotic arm navigation obstacle avoidance method, system, computer equipment and storage medium
CN108052004B (en) Automatic control method of industrial robotic arm based on deep reinforcement learning
CN114952828B (en) A robotic arm motion planning method and system based on deep reinforcement learning
CN110238839B (en)Multi-shaft-hole assembly control method for optimizing non-model robot by utilizing environment prediction
CN109948642A (en) A Multi-Agent Cross-Modality Deep Deterministic Policy Gradient Training Method Based on Image Input
CN109483530B (en) A motion control method and system for a footed robot based on deep reinforcement learning
CN110764415B (en)Gait planning method for leg movement of quadruped robot
CN113510704A (en) A Motion Planning Method for Industrial Robot Arm Based on Reinforcement Learning Algorithm
CN102402712B (en) A Neural Network-Based Initialization Method for Robot Reinforcement Learning
CN114326722B (en)Six-foot robot self-adaptive gait planning method, system, device and medium
CN115781685B (en) A high-precision robotic arm control method and system based on reinforcement learning
CN115890670B (en) Method for training motion trajectory of seven-DOF redundant robotic arm based on enhanced deep learning
CN114474004B (en)Error compensation planning control strategy for multi-factor coupling vehicle-mounted building robot
CN114779661B (en) Chemical synthesis robot system based on multi-class generative confrontation imitation learning algorithm
CN114083539B (en) An anti-jamming motion planning method for robotic arm based on multi-agent reinforcement learning
CN114378820A (en)Robot impedance learning method based on safety reinforcement learning
CN110328668A (en)Robotic arm path planing method based on rate smoothing deterministic policy gradient
CN113043278B (en)Mechanical arm track planning method based on improved whale searching method
CN116352715A (en) A collaborative motion control method for dual-arm robots based on deep reinforcement learning
CN110389591A (en) A Path Planning Method Based on DBQ Algorithm
CN114290339A (en)Robot reality migration system and method based on reinforcement learning and residual modeling
CN116587275A (en) Method and system for intelligent impedance control of manipulator based on deep reinforcement learning
Yan et al.Path planning for mobile robot's continuous action space based on deep reinforcement learning
CN117798928A (en) A digital twin robotic arm reinforcement learning training method and system based on Unity

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20210209

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp