CN107053179B

Movatterモバイル変換

Info

Publication number: CN107053179B
Application number: CN201710263232.XA
Authority: CN
Inventors: 杨文龙; 王伟; 庞海峰
Original assignee: Suzhou Health Multirobot Co Ltd
Current assignee: Suzhou Health Multirobot Co Ltd
Priority date: 2017-04-21
Filing date: 2017-04-21
Publication date: 2019-07-23
Anticipated expiration: 2037-04-21
Also published as: CN107053179A

Abstract

Translated fromChinese

本发明公开了一种基于模糊强化学习的机械臂柔顺力控制方法，采用模糊强化学习算法，通过在线学习的方式训练导纳参数的实时调整策略，收敛后的变导纳控制策略根据操作者所施加的外力矩、当前关节速度和加速度控制电机主动顺应操作者的控制意图，以完成机械臂的主动跟随任务，无需建立相应的任务及环境模型，具有更快的收敛速度和稳定的实际效果。本方法能够显著降低操作者的工作强度，改善定位精度，有助于减小机械臂结构尺寸和自重，人机力交互模型能够很好地响应操作者的控制意图，具有良好的自适应能力，可使人机力交互体验更加流畅自然，更接近日常生活中对实际物体进行操作时的力交互感受。

The invention discloses a method for controlling the compliance force of a robotic arm based on fuzzy reinforcement learning. A fuzzy reinforcement learning algorithm is used to train a real-time adjustment strategy of admittance parameters by means of online learning. The applied external torque, current joint speed and acceleration control the motor to actively comply with the operator's control intention to complete the active follow-up task of the robotic arm, without the need to establish a corresponding task and environment model, with faster convergence speed and stable actual effect. The method can significantly reduce the operator's work intensity, improve the positioning accuracy, and help reduce the structural size and self-weight of the manipulator. It can make the human-machine force interaction experience more smooth and natural, and is closer to the force interaction experience when manipulating actual objects in daily life.

Description

Translated fromChinese

一种基于模糊强化学习的机械臂柔顺力控制方法A compliant force control method of robotic arm based on fuzzy reinforcement learning

技术领域：Technical field:

本发明属于人机交互控制技术领域，具体是涉及一种基于模糊强化学习的机械臂柔顺力控制方法。The invention belongs to the technical field of human-computer interaction control, in particular to a method for controlling the compliance force of a robotic arm based on fuzzy reinforcement learning.

背景技术：Background technique:

在进行机器人辅助微创手术之前，医护人员需要根据病人的个体特征制定相应的手术方案，选择微创手术的切口位置并以此设定各机械臂的初始姿态。在执行过程中，需要将各机械臂拖拽至微创切口位置并手动调整手术臂的关节角度，即操作者直接对机械臂施加外力，根据操作意图对机械臂各连杆位姿进行相应调整。通常，机械臂以减速器作为机械臂关节动力的传动环节，大减速比及传动摩擦会使主动关节的位姿调整变得困难。Before performing robot-assisted minimally invasive surgery, medical staff need to formulate a corresponding surgical plan according to the individual characteristics of the patient, select the incision position of the minimally invasive surgery, and set the initial posture of each robotic arm accordingly. During the execution process, it is necessary to drag each robotic arm to the position of the minimally invasive incision and manually adjust the joint angle of the surgical arm, that is, the operator directly applies external force to the robotic arm, and adjusts the pose of each link of the robotic arm according to the operation intention. . Usually, the mechanical arm uses the reducer as the transmission link of the mechanical arm joint power, and the large reduction ratio and transmission friction will make it difficult to adjust the pose of the active joint.

目前常见的解决方法主要有两种：一种是在减速器后安装电磁制动器，通过控制电磁制动器的动作实现减速器与后端动力输出部分的脱离与吸合，即所谓的被动顺应控制方式。若机械臂在此种方式下进行拖拽，机械臂自身的重力全部由操作者承担，工作强度增大且难以控制操作精度。此外，由于拖拽过程中各关节与其驱动电机脱离，为了能够获得机械臂调整后的关节转动角度，还需要额外增加编码器记录关节的位置变化。电磁制动器和辅助编码器的引入会增大机械臂的结构尺寸和自身重量，电磁制动器的频繁吸合也会影响机器人的绝对位置精度。与之相对的另外一种实现方式是在关节电机处于受控状态下，根据机械臂的受力情况估计操作者的控制意图，并通过关节电机驱动机械臂辅助操作者完成预期的位姿调整，即所谓的主动顺应控制方式。当前的主动顺应控制方法采用在机械臂末端安装力传感器方式控制机械臂末端执行器在笛卡尔空间内的位置移动，往往关注的是末端工具的位置轨迹而非姿态调整，加之固定的力交互作用位置也不便于机械臂连杆姿态的独立调整，因此并不适用于微创外科手术机器人的主动摆位要求。此外，该类方式也存在一定的问题，若采用固定的控制参数模型则难以兼顾控制精度和操作感受，若采用可变的控制参数模型又难以保障人机交互操作的柔顺和流畅。There are two common solutions at present: one is to install an electromagnetic brake behind the reducer, and by controlling the action of the electromagnetic brake, the separation and attraction of the reducer and the rear power output part are realized, which is the so-called passive compliance control method. If the manipulator is dragged in this way, the gravity of the manipulator itself is entirely borne by the operator, which increases the work intensity and makes it difficult to control the operation accuracy. In addition, since each joint is disengaged from its driving motor during the dragging process, in order to obtain the joint rotation angle adjusted by the manipulator, an additional encoder needs to be added to record the position change of the joint. The introduction of electromagnetic brakes and auxiliary encoders will increase the structure size and weight of the robot arm, and the frequent pull-in of electromagnetic brakes will also affect the absolute position accuracy of the robot. In contrast, another implementation method is to estimate the operator's control intention according to the force of the manipulator when the joint motor is in a controlled state, and drive the manipulator through the joint motor to assist the operator to complete the expected pose adjustment. The so-called active compliance control method. The current active compliance control method adopts the method of installing a force sensor at the end of the manipulator to control the position movement of the end effector of the manipulator in the Cartesian space, often focusing on the position trajectory of the end tool rather than the attitude adjustment, coupled with the fixed force interaction The position is also not convenient for the independent adjustment of the posture of the manipulator link, so it is not suitable for the active positioning requirements of the minimally invasive surgical robot. In addition, this type of method also has certain problems. If a fixed control parameter model is used, it is difficult to take into account the control accuracy and operation experience, and if a variable control parameter model is used, it is difficult to ensure the softness and smoothness of human-computer interaction.

发明内容：Invention content:

为解决上述问题，本发明提出了一种基于模糊强化学习的机械臂柔顺力控制方法。In order to solve the above problems, the present invention proposes a method for controlling the compliance force of a robotic arm based on fuzzy reinforcement learning.

为达到上述目的，本发明的技术方案如下：For achieving the above object, technical scheme of the present invention is as follows:

一种基于模糊强化学习的机械臂柔顺力控制方法，包括如下步骤：A method for controlling the compliance force of a robotic arm based on fuzzy reinforcement learning, comprising the following steps:

S1：建立导纳控制模型。S1: Establish an admittance control model.

S2：获取机械臂的运动状态、操作者施加的外力矩以及环境回报值。S2: Obtain the motion state of the robotic arm, the external torque applied by the operator, and the environmental reward value.

S3：为了获得与当前环境相适应的导纳模型参数调整策略，根据步骤S2中获得的相关信息，通过模糊强化学习进行导纳模型参数调整策略的在线训练，直至算法收敛，以期望获与当前环境相适应的变导纳控制模型。S3: In order to obtain the parameter adjustment strategy of the admittance model suitable for the current environment, according to the relevant information obtained in step S2, online training of the parameter adjustment strategy of the admittance model is carried out through fuzzy reinforcement learning until the algorithm converges, in order to obtain the current Environment-adapted variable admittance control model.

S4：将步骤S3中经训练收敛后的导纳参数调整策略应用于变导纳控制模型之中，改变参数后的导纳控制模型根据操作者施加的外力矩和机械臂关节的反馈速度计算关节当前速度值并发送至关节驱动电机。S4: Apply the admittance parameter adjustment strategy after training and convergence in step S3 to the variable admittance control model, and the admittance control model after changing the parameters calculates the joints according to the external torque applied by the operator and the feedback speed of the manipulator joints The current speed value is sent to the joint drive motor.

作为上述技术方案的优选，所述步骤S3中的模糊强化学习具体包括如下步骤：As a preference of the above technical solution, the fuzzy reinforcement learning in the step S3 specifically includes the following steps:

S31：将机械臂的运动状态以及操作者施加的外力矩作为状态变量，在各状态变量的论域范围内划分多个模糊集合，建立对应的模糊规则并给出离散动作集合。S31: Taking the motion state of the robotic arm and the external torque exerted by the operator as state variables, divide multiple fuzzy sets within the universe of each state variable, establish corresponding fuzzy rules, and give discrete action sets.

S32：根据当前的状态输入计算各状态变量的隶属度，对状态空间进行模糊划分，计算已激活模糊规则所对应的权值。S32: Calculate the membership degree of each state variable according to the current state input, perform fuzzy division on the state space, and calculate the weight corresponding to the activated fuzzy rule.

S33：根据当前的导纳模型参数调整策略选择离散动作值。S33: Select the discrete action value according to the current admittance model parameter adjustment strategy.

S34：将步骤S3中经训练收敛后的导纳参数调整策略应用于变导纳控制模型之中，根据操作者施加的外力矩和机械臂关节的反馈速度计算关节当前速度值并发送至关节驱动电机，从而实现微创外科手术机械臂的主动摆位功能。S34: Apply the admittance parameter adjustment strategy after training and convergence in step S3 to the variable admittance control model, calculate the current speed value of the joint according to the external torque applied by the operator and the feedback speed of the manipulator joint, and send it to the joint driver motor, so as to realize the active positioning function of the minimally invasive surgical robotic arm.

S35：根据当前获得的环境回报值更新当前的导纳模型参数调整策略。S35: Update the current admittance model parameter adjustment strategy according to the currently obtained environmental reward value.

S36：重复上述S32-S35步骤，直至算法收敛。S36: Repeat the above steps S32-S35 until the algorithm converges.

作为上述技术方案的优选，还包括如下步骤：As the optimization of above-mentioned technical scheme, also comprises the following steps:

S0：在各机械臂关节中集成力矩传感器，所述力矩传感器用于检测人机之间的接触力矩。S0: A torque sensor is integrated in each manipulator joint, and the torque sensor is used to detect the contact torque between the human and the machine.

作为上述技术方案的优选，所述步骤S2中：As the optimization of above-mentioned technical scheme, in described step S2:

采用线性回归的方式离线识别机械臂的重力补偿模型，从而获取操作者施加的外力矩。The gravity compensation model of the manipulator is identified offline by means of linear regression, so as to obtain the external torque exerted by the operator.

所述机械臂的运动状态包括各机械臂关节的速度及加速度。The motion state of the robotic arm includes the speed and acceleration of each robotic arm joint.

本发明的有益效果在于：The beneficial effects of the present invention are:

相对于被动顺应方式，能够显著降低操作者的工作强度，改善定位精度，有助于减小结构尺寸和自重。Compared with the passive compliance method, the work intensity of the operator can be significantly reduced, the positioning accuracy can be improved, and the structure size and self-weight can be reduced.

相对于固定参数模型的主动顺应方式，具有良好的自适应能力，当接触力矩增加时，力交互控制模型会主动降低环境的虚拟阻尼参数，使机械臂的运动速度变化的更快，能够快速跟随人手臂的运动趋势，给人的操作感受会更省力；反之，当接触力(幅值)逐渐减小时，力交互控制模型会相应地增加虚拟阻尼参数值以提高人机交互的控制精度，辅助操作者定位，减少过冲量。Compared with the active adaptation method of the fixed parameter model, it has good adaptive ability. When the contact torque increases, the force interaction control model will actively reduce the virtual damping parameters of the environment, so that the movement speed of the manipulator changes faster and can quickly follow The movement trend of the human arm will give people a more labor-saving operation experience; on the contrary, when the contact force (amplitude) gradually decreases, the force interaction control model will increase the virtual damping parameter value accordingly to improve the control accuracy of human-computer interaction, assisting Operator positioning reduces overshoot.

相对于时变参数模型的主动顺应方式，人机力交互模型能够很好地响应操作者的控制意图，使人机力交互体验更加流畅自然，更接近日常生活中对实际物体进行操作时的力交互感受。Compared with the active adaptation method of the time-varying parameter model, the human-machine force interaction model can respond well to the operator's control intention, making the human-machine force interaction experience smoother and more natural, and closer to the force of manipulating actual objects in daily life. interactive experience.

附图说明：Description of drawings:

以下附图仅旨在于对本发明做示意性说明和解释，并不限定本发明的范围。其中：The following drawings are only intended to illustrate and explain the present invention schematically, and do not limit the scope of the present invention. in:

图1为本发明一个实施例的一种基于模糊强化学习的机械臂柔顺力控制方法的主动柔顺控制流程图；1 is a flow chart of active compliance control of a method for controlling compliance force of a robotic arm based on fuzzy reinforcement learning according to an embodiment of the present invention;

图2为本发明一个实施例的模糊强化学习流程图。FIG. 2 is a flowchart of fuzzy reinforcement learning according to an embodiment of the present invention.

具体实施方式：Detailed ways:

如图1所示，本发明的一种基于模糊强化学习的机械臂柔顺力控制方法，包括如下步骤：As shown in FIG. 1 , a method for controlling the compliance force of a robotic arm based on fuzzy reinforcement learning of the present invention includes the following steps:

S1：建立导纳控制模型。S1: Establish an admittance control model.

S2：获取机械臂的运动状态、操作者施加的外力矩以及环境回报值。所述机械臂的运动状态包括各机械臂主动旋转关节的速度及加速度。S2: Obtain the motion state of the robotic arm, the external torque applied by the operator, and the environmental reward value. The motion state of the manipulator includes the speed and acceleration of the active rotating joint of each manipulator.

S3：为了获得与当前环境相适应的导纳模型参数调整策略，根据步骤S2中获得的相关信息，通过模糊强化学习进行导纳模型参数调整策略的在线训练，直至算法收敛，以期望获与当前环境相适应的变导纳控制模型。所述步骤S3中的模糊强化学习具体包括如下步骤：S3: In order to obtain the parameter adjustment strategy of the admittance model suitable for the current environment, according to the relevant information obtained in step S2, online training of the parameter adjustment strategy of the admittance model is carried out through fuzzy reinforcement learning until the algorithm converges, in order to obtain the current Environment-adapted variable admittance control model. The fuzzy reinforcement learning in the step S3 specifically includes the following steps:

S31：将机械臂的运动状态以及操作者施加的外力矩作为状态变量(I)，在各状态变量的论域X_i范围内划分多个模糊集合，建立对应的模糊规则并给出离散动作集合A＝{u₁,u₂,…,u_n}，其中，μ_i为当前已激活的模糊规则(由当前的模糊划分确定)所对应的离散分动作。S31: Take the motion state of the robotic arm and the external torque applied by the operator as the state variable (I), divide multiple fuzzy sets within the universe of discourse_Xi of each state variable, establish corresponding fuzzy rules and give discrete action sets A={u₁ , u₂ , . . . , u_n }, where μ_i is the discrete action corresponding to the currently activated fuzzy rule (determined by the current fuzzy division).

S32：根据当前的状态输入I_i计算各状态变量的隶属度μ_i(I_i)，对状态空间进行模糊划分，计算已激活模糊规则f_i所对应的权值w_i，其中，f_i表示第i个模糊规则，w_i为相应的模糊规则激活度，即当前各模糊状态分量对应的离散动作的权值。S32: Calculate the membership degree μ_i (I_i ) of each state variable according to the current state input I_i , perform fuzzy division on the state space, and calculate the weight w_i corresponding to the activated fuzzy rule f_i , where f_i represents For the i-th fuzzy rule, w_i is the activation degree of the corresponding fuzzy rule, that is, the weight of the discrete action corresponding to each current fuzzy state component.

S33：根据当前的导纳模型参数调整策略选择离散动作值u_i。S33: Select the discrete action value_ui according to the current admittance model parameter adjustment strategy.

S34：将步骤S33中的离散动作值整合成最终的动作输出值U并将该值用于导纳控制模型。S34: Integrate the discrete action value in step S33 into a final action output value U and use the value for the admittance control model.

S4：将步骤S3中经训练收敛后的导纳参数调整策略应用于变导纳控制模型之中，根据操作者施加的外力矩和机械臂关节的反馈速度计算关节当前速度值并发送至关节驱动电机，从而实现微创外科手术机械臂的主动摆位功能。S4: Apply the admittance parameter adjustment strategy after training and convergence in step S3 to the variable admittance control model, calculate the current speed value of the joint according to the external torque applied by the operator and the feedback speed of the robot arm joint, and send it to the joint driver motor, so as to realize the active positioning function of the minimally invasive surgical robotic arm.

本实施例的控制方法需要在各机械臂关节中集成力矩传感器，所述力矩传感器用于检测人机之间的接触力矩。本实施例中采用线性回归的方式离线识别机械臂的重力补偿模型，从而获取操作者施加的外力矩。不同于传统工业机器人的示教方式，微创外科手术机器人的术前摆位过程需要调节的是机械臂各连杆的空间姿态而非末端执行器在世界坐标系(笛卡尔坐标系)中的空间位置，而通常此类力交互实现方式采用在机械臂执行末端安装六维力传感器的方式实现交互力信息采集，但这样做会限制人与机器人进行力交互的作用位置，不利于手术机械臂各连杆位姿的独立调整。为解决上述问题将力矩传感器集成到各机械臂主动旋转关节之中，通过此种方式可使机械臂与外界环境的力交互位置扩展至整条机械臂，力矩检测和力交互控制也更直接可靠。The control method of this embodiment needs to integrate a torque sensor in each manipulator joint, and the torque sensor is used to detect the contact torque between the human and the machine. In this embodiment, the method of linear regression is used to identify the gravity compensation model of the robotic arm offline, so as to obtain the external torque exerted by the operator. Different from the teaching method of traditional industrial robots, the preoperative positioning process of the minimally invasive surgical robot needs to adjust the spatial posture of each link of the robotic arm rather than the end effector in the world coordinate system (Cartesian coordinate system). Usually, this kind of force interaction implementation method adopts the method of installing a six-dimensional force sensor at the end of the robot arm to realize the interactive force information collection, but this will limit the action position of the force interaction between the human and the robot, which is not conducive to the surgical robot arm. Independent adjustment of the pose of each link. In order to solve the above problems, the torque sensor is integrated into the active rotating joints of each manipulator. In this way, the force interaction position between the manipulator and the external environment can be extended to the entire manipulator, and the torque detection and force interaction control are also more direct and reliable. .

在关节空间内，结合实际应用提出了一种基于模糊理论与强化学习算法相结合的变导纳控制模型架构。在人机交互过程中，由于人在整个力交互控制回路之中起到引导作用，因此人的操作特性会对力交互效果有较大影响。此外，机械臂动力学特性会随着控制模型参数的变化而改变，也会对人机交互产生影响。为了能够将交互过程中的人为因素和动力学变化考虑到主动顺应控制模型之中，采用基于多步时间差分的强化学习方法通过在线学习的方式处理上述因素带来的问题。同时，模糊理论的引入有助于解决强化学习状态空间的泛问题，使柔顺力控制算法能够接收连续的状态输入并产生连续的控制参数输出。此外为了提取操作者施加的外力矩，采用线性回归的方式离线识别机械臂的重力补偿模型。提出的人机力交互控制方法无需建立相应的任务及环境模型，具有更快的收敛速度和稳定的实际效果。In the joint space, a variable admittance control model architecture based on the combination of fuzzy theory and reinforcement learning algorithm is proposed in combination with practical applications. In the process of human-computer interaction, since humans play a leading role in the entire force interaction control loop, the operating characteristics of humans will have a greater impact on the force interaction effect. In addition, the dynamic characteristics of the manipulator will change with the parameters of the control model, which will also have an impact on the human-robot interaction. In order to consider the human factors and dynamic changes in the interaction process into the active adaptive control model, the reinforcement learning method based on multi-step time difference is used to deal with the problems caused by the above factors through online learning. At the same time, the introduction of fuzzy theory helps to solve the general problem of reinforcement learning state space, so that the compliance force control algorithm can receive continuous state input and generate continuous control parameter output. In addition, in order to extract the external torque exerted by the operator, the gravity compensation model of the manipulator is identified offline by means of linear regression. The proposed human-machine force interaction control method does not need to establish corresponding task and environment models, and has faster convergence speed and stable practical effect.

主动柔顺控制过程如图1所示，模糊强化学习根据当前的运动状态结合经训练收敛后的离散动作选择策略获得当前导纳控制模型参数，导纳控制模型根据操作者所施加的外力矩和当前关节速度控制电机主动顺应操作者的控制意图，以完成力交互任务。模糊强化学习的单步训练过程如图2所示，首先根据当前的状态输入计算各状态变量的隶属度并对状态空间进行模糊划分，根据当前触发的模糊规则所对应的离散动作权值和探索策略选择离散动作值并整合输出最终的导纳模型参数值。将改变参数的新导纳模型用于当前的人机交互过程以获得当前的环境反馈并根据反馈值修正动作权值，使人机交互过程中期望获得的性能指标趋于最大，不断迭代上述过程直至算法收敛。The active compliance control process is shown in Figure 1. The fuzzy reinforcement learning obtains the parameters of the current admittance control model according to the current motion state combined with the discrete action selection strategy after training and convergence. The admittance control model is based on the external torque applied by the operator and the current The joint speed control motor actively conforms to the operator's control intention to complete the force interaction task. The single-step training process of fuzzy reinforcement learning is shown in Figure 2. First, the membership degree of each state variable is calculated according to the current state input, and the state space is fuzzy divided. According to the discrete action weights and explorations corresponding to the currently triggered fuzzy rules The strategy selects discrete action values and integrates the final admittance model parameter values. The new admittance model with changing parameters is used in the current human-computer interaction process to obtain the current environmental feedback, and the action weights are corrected according to the feedback value, so that the expected performance index in the human-computer interaction process tends to be the largest, and the above process is continuously iterated until the algorithm converges.

本实施例以单关节力交互控制为例，对上述方法做进一步说明，如图1所示，：This embodiment takes the single-joint force interaction control as an example to further illustrate the above method, as shown in FIG. 1 :

以机械臂旋转关节当前测得的速度和加速度以及所受外力矩τ_h作为状态输入变量，分别在各状态变量的论域范围内等间距划分7个模糊集合，与之对应的模糊规则数为343(7×7×7)。以导纳控制模型中的虚拟阻尼参数作为强化学习的动作输出，若离散动作集元素个数为3，则对应的模糊规则权值数为1029(7×7×7×3)。人机柔顺力交互控制的实现包含策略训练和交互应用两部分。在策略训练过程中，以人的最小加加速度模型作为优化性能指标，不断重复所需执行的人机交互任务，强化学习算法会根据与操作者交互而产生的经验持续修改智能体的决策策略直至收敛。在人机交互应用过程中，基于强化学习的力交互控制算法根据当前的状态输入进行模糊划分以触发相应的模糊规则，根据收敛后的变导纳策略选择每个已激活模糊规则的最优动作值分量，最后通过模糊规则对应的激活度(由对应的模糊集合隶属度的T范数表示)整合各动作值分量，最终生成当前时刻导纳控制模型所采用的参数值c。改变参数后的导纳控制模型根据操作者施加的外力矩τ_h和机械臂关节的反馈速度计算关节当前速度值并发送至关节驱动电机。The current measured velocity of the robotic arm rotation joint and acceleration And the received external moment τ_h is used as the state input variable, and 7 fuzzy sets are divided into 7 fuzzy sets at equal intervals within the universe of each state variable, and the corresponding number of fuzzy rules is 343 (7 × 7 × 7). Taking the virtual damping parameter in the admittance control model as the action output of reinforcement learning, if the number of discrete action set elements is 3, the corresponding fuzzy rule weights are 1029 (7×7×7×3). The realization of human-machine compliance interactive control includes two parts: strategy training and interactive application. In the process of strategy training, the minimum jerk model of human is used as the optimized performance index, and the human-computer interaction tasks that need to be performed are continuously repeated. convergence. In the process of human-computer interaction application, the force interaction control algorithm based on reinforcement learning performs fuzzy division according to the current state input to trigger the corresponding fuzzy rules, and selects the optimal action for each activated fuzzy rule according to the converged variable admittance strategy. Finally, each action value component is integrated through the activation degree corresponding to the fuzzy rule (represented by the T norm of the corresponding fuzzy set membership degree), and finally the parameter value c used by the admittance control model at the current moment is generated. The admittance control model after changing the parameters is based on the external torque τ_h applied by the operator and the feedback speed of the manipulator joints Calculate the current velocity value of the joint and sent to the joint drive motor.

本实施例所述的一种基于模糊强化学习的机械臂柔顺力控制方法，采用模糊强化学习算法，通过在线学习的方式训练导纳参数的实时调整策略，收敛后的变导纳控制策略根据操作者所施加的外力矩、当前关节速度和加速度控制电机主动顺应操作者的控制意图，以完成机械臂的主动跟随任务，无需建立相应的任务及环境模型，具有更快的收敛速度和稳定的实际效果。本方法能够显著降低操作者的工作强度，改善定位精度，有助于减小机械臂结构尺寸和自重，人机力交互模型能够很好地响应操作者的控制意图，具有良好的自适应能力，可使人机力交互体验更加流畅自然，更接近日常生活中对实际物体进行操作时的力交互感受。A method for controlling the compliance force of a robotic arm based on fuzzy reinforcement learning described in this embodiment adopts a fuzzy reinforcement learning algorithm to train a real-time adjustment strategy of admittance parameters through online learning, and the converged variable admittance control strategy is based on the operation The external torque applied by the operator, the current joint speed and acceleration control the motor to actively comply with the operator's control intention to complete the active follow-up task of the robotic arm, without the need to establish a corresponding task and environment model, with faster convergence speed and stable practical Effect. The method can significantly reduce the operator's work intensity, improve the positioning accuracy, and help reduce the structural size and self-weight of the manipulator. It can make the human-machine force interaction experience more smooth and natural, and is closer to the force interaction experience when manipulating actual objects in daily life.

显然，上述实施例仅仅是为清楚地说明所作的举例，而并非对实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本发明创造的保护范围之中。Obviously, the above-mentioned embodiments are only examples for clear description, and are not intended to limit the implementation manner. For those of ordinary skill in the art, changes or modifications in other different forms can also be made on the basis of the above description. There is no need and cannot be exhaustive of all implementations here. And the obvious changes or changes derived from this are still within the protection scope of the present invention.

Claims

Translated fromChinese

1.一种基于模糊强化学习的机械臂柔顺力控制方法，其特征在于，包括如下步骤：1. a robotic arm compliance force control method based on fuzzy reinforcement learning, is characterized in that, comprises the steps:

S1：建立导纳控制模型；S1: Establish an admittance control model;

S2：获取机械臂的运动状态、操作者施加的外力矩以及环境回报值；S2: Obtain the motion state of the robotic arm, the external torque applied by the operator, and the environmental reward value;

S3：为了获得与当前环境相适应的导纳模型参数调整策略，根据步骤S2中获得的相关信息，通过模糊强化学习进行导纳模型参数调整策略的在线训练直至算法收敛，以期望获与当前环境相适应的变导纳控制模型；S3: In order to obtain the parameter adjustment strategy of the admittance model adapted to the current environment, according to the relevant information obtained in step S2, the online training of the parameter adjustment strategy of the admittance model is carried out through fuzzy reinforcement learning until the algorithm converges, in order to obtain the current environment. Adaptive variable admittance control model;

2.根据权利要求1所述的基于模糊强化学习的机械臂柔顺力控制方法，其特征在于，所述步骤S3中的模糊强化学习具体包括如下步骤：2. The method for controlling the compliance force of a robotic arm based on fuzzy reinforcement learning according to claim 1, wherein the fuzzy reinforcement learning in the step S3 specifically comprises the following steps:

S31：将机械臂的运动状态以及操作者施加的外力矩作为状态变量，在各状态变量的论域范围内划分多个模糊集合，建立对应的模糊规则并给出离散动作集合；S31: Use the motion state of the robotic arm and the external torque applied by the operator as state variables, divide a plurality of fuzzy sets within the universe of each state variable, establish corresponding fuzzy rules, and give discrete action sets;

S32：根据当前的状态输入计算各状态变量的隶属度，对状态空间进行模糊划分，计算已激活模糊规则所对应的权值；S32: Calculate the membership degree of each state variable according to the current state input, perform fuzzy division on the state space, and calculate the weight corresponding to the activated fuzzy rule;

S33：根据当前的导纳模型参数调整策略选择离散动作值；S33: Select the discrete action value according to the current admittance model parameter adjustment strategy;

S34：将步骤S33中的离散动作值整合成最终的动作输出值U并将该值用于导纳控制模型；S34: Integrate the discrete action value in step S33 into the final action output value U and use the value for the admittance control model;

S35：根据当前获得的环境回报值更新当前的导纳模型参数调整策略；S35: Update the current admittance model parameter adjustment strategy according to the currently obtained environmental reward value;

3.根据权利要求1所述的基于模糊强化学习的机械臂柔顺力控制方法，其特征在于，还包括如下步骤：3. The method for controlling the compliance force of a robotic arm based on fuzzy reinforcement learning according to claim 1, further comprising the steps of:

4.根据权利要求3所述的基于模糊强化学习的机械臂柔顺力控制方法，其特征在于，所述步骤S2中：4. The method for controlling the compliance force of a robotic arm based on fuzzy reinforcement learning according to claim 3, wherein in the step S2:

5.根据权利要求1所述的基于模糊强化学习的机械臂柔顺力控制方法，其特征在于，所述步骤S2中：5. The method for controlling the compliance force of a robotic arm based on fuzzy reinforcement learning according to claim 1, wherein in the step S2: