CN114193458A

Movatterモバイル変換

Info

Publication number: CN114193458A
Application number: CN202210088894.9A
Authority: CN
Inventors: 潘永平; 李威
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-03-18
Anticipated expiration: 2042-01-25
Also published as: CN114193458B

Abstract

The invention discloses a robot control method based on Gaussian process online learning, which comprises the steps of obtaining initial data through low-gain proportional-differential control, and constructing an initial Gaussian process online learning model according to the initial data, wherein the initial Gaussian process online learning model is used for primarily controlling a robot; updating the Gaussian process online learning model in a rotation mode in each control period; the method comprises the steps of taking expected position, speed and acceleration as input, predicting a plurality of moments according to a latest Gaussian process online learning model, and taking the moments as feed-forward input of robot control to control the robot. The invention can improve the tracking precision and reduce the model updating frequency, and can be widely applied to the technical field of robot control.

Description

Translated fromChinese

一种基于高斯过程在线学习的机器人控制方法A robot control method based on online learning of Gaussian process

技术领域technical field

本发明涉及机器人控制技术领域，尤其是一种基于高斯过程在线学习的机器人控制方法。The invention relates to the technical field of robot control, in particular to a robot control method based on Gaussian process online learning.

背景技术Background technique

高自由度的机械臂被广泛应用于工业、医疗、物流等领域，往往要求具备控制精确、灵活感知、人机交互等能力，因此需要对机械臂精确地建模和控制。高自由度机械臂是一个非线性、高耦合的系统，包含摩擦力、电机动力学等未建模因素，使得在实际中很难精确地获得动力学模型。对于控制任务，动力学模型信息非常重要。比如在轨迹跟踪任务中，单纯的PID控制在高速和重载的情况下都不能保证精确地完成轨迹跟踪任务。Manipulators with high degrees of freedom are widely used in industry, medical, logistics and other fields, often requiring precise control, flexible perception, human-computer interaction and other capabilities. Therefore, it is necessary to accurately model and control the manipulator. The high-degree-of-freedom manipulator is a nonlinear and highly coupled system, which contains unmodeled factors such as friction and motor dynamics, making it difficult to obtain an accurate dynamic model in practice. For control tasks, dynamic model information is very important. For example, in the trajectory tracking task, the simple PID control cannot guarantee the accurate completion of the trajectory tracking task in the case of high speed and heavy load.

高斯过程在线学习(Gaussian Process Online Learning，GPOL)是一种数据驱动的、非参数化的学习方法，使用持续到达的数据(也称流数据)实时更新模型。且对比神经网络，这种学习方法具有可解释性和提供不确定性估计，这在机械臂的轨迹跟踪任务中是非常重要的。高斯过程在线学习主要思想，是从流数据中，维护一个基向量集(Basis vectorset，BVs)用于持续预测。Gaussian Process Online Learning (GPOL) is a data-driven, non-parametric learning method that uses continuously arriving data (also known as streaming data) to update models in real-time. And compared to neural networks, this learning method is interpretable and provides uncertainty estimates, which are very important in the trajectory tracking task of robotic arms. The main idea of online learning of Gaussian process is to maintain a Basis Vectorset (BVs) for continuous prediction from streaming data.

对于机器人控制的在线学习过程，现有技术存在以下缺点：For the online learning process of robot control, the existing technology has the following disadvantages:

1、目前的方法(如神经网络和高斯过程回归)大多是离线学习一个模型，再应用于轨迹跟踪的。这些方法需要大量的训练数据和训练时间，并且训练出来的模型，还可能受到其他实时因素的影响(如温度、未知负载)，这些都将大大降低这个模型的实用性。1. Current methods (such as neural network and Gaussian process regression) mostly learn a model offline and then apply it to trajectory tracking. These methods require a lot of training data and training time, and the trained model may also be affected by other real-time factors (such as temperature, unknown load), which will greatly reduce the practicability of this model.

2、目前的高斯过程在线学习应用于机械臂，还面临控制频率过高导致不能及时预测的关键挑战。即机械臂要求发送力矩命令的频率是1kHz，则模型每次只有1ms的时间来预测。而目前的高斯过程在线学习方法，更关注对整个模型建模(全局最优)。这导致了在高频控制命令的情况下，建模能力不足，不能很好地应用于轨迹跟踪。2. The current online learning of Gaussian process is applied to the robotic arm, and it also faces the key challenge that the control frequency is too high and cannot be predicted in time. That is, the frequency required by the manipulator to send the torque command is 1kHz, and the model only has 1ms time to predict each time. The current Gaussian process online learning method focuses more on modeling the entire model (global optimization). This results in insufficient modeling capability and cannot be well applied to trajectory tracking in the case of high-frequency control commands.

3、目前的一些在线学习方法，会学习到一些可能不值得学习甚至错误的数据(如偶尔经过，之后很久都不会经过的位置；在突发、非静态的干扰下到达的流数据)。3. Some current online learning methods will learn some data that may not be worth learning or even wrong (such as locations that pass occasionally but will not pass for a long time afterward; streaming data arriving under sudden and non-static interference).

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明实施例提供一种跟踪精度高的，基于高斯过程在线学习的机器人控制方法。In view of this, the embodiments of the present invention provide a robot control method based on Gaussian process online learning with high tracking accuracy.

本发明的一方面提供了一种基于高斯过程在线学习的机器人控制方法，包括：One aspect of the present invention provides a robot control method based on Gaussian process online learning, including:

通过低增益的比例-微分控制，获取初始数据，根据所述初始数据构建初始高斯过程在线学习模型，所述初始高斯过程在线学习模型用于对机器人进行初步控制；Through low-gain proportional-differential control, initial data is obtained, and an initial Gaussian process online learning model is constructed according to the initial data, and the initial Gaussian process online learning model is used for preliminary control of the robot;

在每个控制周期内轮转更新高斯过程在线学习模型；Rotately update the Gaussian process online learning model in each control cycle;

将期望的位置、速度和加速作为输入，根据最新的高斯过程在线学习模型预测得到多个力矩，将所述力矩作为机器人控制的前馈输入，以对所述机器人进行控制。Taking the desired position, velocity and acceleration as inputs, a plurality of torques are predicted and obtained according to the latest Gaussian process online learning model, and the torques are used as the feedforward input of the robot control to control the robot.

可选地，所述通过低增益的比例-微分控制，获取初始数据，根据所述初始数据构建初始高斯过程在线学习模型，包括：Optionally, obtaining initial data through low-gain proportional-differential control, and constructing an initial Gaussian process online learning model according to the initial data, including:

配置初始化超参数；其中，所述初始化超参数包括比例和微分增益、基向量集大小、核函数、模型噪声、方差阈值以及遗忘速度参数；Configure initialization hyperparameters; wherein, the initialization hyperparameters include proportional and differential gains, basis vector set size, kernel function, model noise, variance threshold, and forgetting speed parameters;

根据所述初始化超参数进行低增益的比例-微分控制；perform low-gain proportional-derivative control according to the initialization hyperparameter;

根据所述比例-微分控制获取初始数据；obtaining initial data according to the proportional-derivative control;

根据所述初始数据构建初始高斯过程在线学习模型，将所述高斯过程在线学习模型输出的预测力矩为前馈项。An initial Gaussian process online learning model is constructed according to the initial data, and the predicted torque output by the Gaussian process online learning model is used as a feedforward term.

可选地，所述在每个控制周期内轮转更新高斯过程在线学习模型，包括：Optionally, the rotating update of the Gaussian process online learning model in each control cycle includes:

将再生希尔伯特空间范数作为衡量数据点的标准，以衡量新数据点距离原空间的距离；The regenerated Hilbert space norm is used as a criterion for measuring data points to measure the distance of new data points from the original space;

当新的数据点计算得到的距离大于预设阈值，将该数据点加入基向量集并且更新对应的辅助变量；When the distance calculated by the new data point is greater than the preset threshold, add the data point to the basis vector set and update the corresponding auxiliary variable;

当所述基向量集大于预设大小，将所述基向量集中的无用点删除。When the basis vector set is larger than the preset size, useless points in the basis vector set are deleted.

可选地，所述当新的数据点计算得到的距离大于预设阈值，将该数据点加入基向量集并且更新对应的辅助变量这一步骤中，Optionally, when the distance calculated by the new data point is greater than the preset threshold, the data point is added to the basis vector set and the corresponding auxiliary variable is updated in this step,

当数据点无法进入基向量集时，根据所述数据点对所述基向量集进行调整，使得所述数据点能够进入调整后的基向量集，并且不增加所述基向量集的大小。When the data point cannot enter the basis vector set, the basis vector set is adjusted according to the data point, so that the data point can enter the adjusted basis vector set without increasing the size of the basis vector set.

可选地，所述当所述基向量集大于预设大小，将所述基向量集中的无用点删除，包括：Optionally, when the basis vector set is larger than a preset size, delete useless points in the basis vector set, including:

配置计数器和遗忘条件，当所述基向量集中新加入数据点时，所述计数器加1；Configure a counter and a forgetting condition, when a new data point is added to the base vector set, the counter is incremented by 1;

当所述计数器的数值达到预设数值时，将最旧的数据点删除，并将所述计数器置零。When the value of the counter reaches a preset value, the oldest data point is deleted, and the counter is set to zero.

可选地，所述当所述基向量集大于预设大小，将所述基向量集中的无用点删除，还包括：Optionally, when the basis vector set is larger than a preset size, deleting useless points in the basis vector set further includes:

当所述计数器的数值没有达到预设数值时，将当前BVs中距离最近的点作为无用点删除。When the value of the counter does not reach the preset value, the closest point in the current BVs is deleted as a useless point.

可选地，所述将期望的位置、速度和加速作为输入，根据最新的高斯过程在线学习模型预测得到多个力矩，将所述力矩作为机器人控制的前馈输入；Optionally, the desired position, speed and acceleration are used as input, and multiple torques are predicted and obtained according to the latest Gaussian process online learning model, and the torque is used as the feedforward input of the robot control;

对于新的数据点，计算所述数据点的预测均值和预测方差；For a new data point, calculate the predicted mean and predicted variance of the data point;

将所述预测均值作为控制力矩的前馈项；using the predicted mean value as the feedforward term of the control torque;

将所述前馈项结合对应的反馈项得到控制命令；Combining the feedforward term with the corresponding feedback term to obtain a control command;

将所述控制命令输入所述机器人，对所述机器人进行控制。The control command is input to the robot to control the robot.

本发明实施例的另一方面还提供了一种基于高斯过程在线学习的机器人控制装置，包括：Another aspect of the embodiments of the present invention also provides a robot control device based on Gaussian process online learning, including:

第一模块，用于通过低增益的比例-微分控制，获取初始数据，根据所述初始数据构建初始高斯过程在线学习模型，所述初始高斯过程在线学习模型用于对机器人进行初步控制；The first module is used for obtaining initial data through low-gain proportional-differential control, and constructing an initial Gaussian process online learning model according to the initial data, and the initial Gaussian process online learning model is used for preliminary control of the robot;

第二模块，用于在每个控制周期内轮转更新高斯过程在线学习模型；The second module is used to rotate the Gaussian process online learning model to update in each control cycle;

第三模块，用于将期望的位置、速度和加速作为输入，根据最新的高斯过程在线学习模型预测得到多个力矩，将所述力矩作为机器人控制的前馈输入，以对所述机器人进行控制。The third module is used to take the desired position, velocity and acceleration as input, predict and obtain multiple torques according to the latest Gaussian process online learning model, and use the torques as the feedforward input of robot control to control the robot .

本发明实施例的另一方面还提供了一种电子设备，包括处理器以及存储器；Another aspect of the embodiments of the present invention further provides an electronic device, including a processor and a memory;

所述存储器用于存储程序；the memory is used to store programs;

所述处理器执行所述程序实现如前面所述的方法。The processor executes the program to implement the method as described above.

本发明实施例的另一方面还提供了一种计算机可读存储介质，所述存储介质存储有程序，所述程序被处理器执行实现如前面所述的方法。Another aspect of the embodiments of the present invention further provides a computer-readable storage medium, where the storage medium stores a program, and the program is executed by a processor to implement the aforementioned method.

本发明实施例还公开了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存储介质中。计算机设备的处理器可以从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行前面的方法。The embodiment of the present invention also discloses a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The computer instructions can be read from the computer-readable storage medium by a processor of the computer device, and the processor executes the computer instructions to cause the computer device to perform the foregoing method.

本发明的实施例通过低增益的比例-微分控制，获取初始数据，根据所述初始数据构建初始高斯过程在线学习模型，所述初始高斯过程在线学习模型用于对机器人进行初步控制；在每个控制周期内轮转更新高斯过程在线学习模型；将期望的位置、速度和加速作为输入，根据最新的高斯过程在线学习模型预测得到多个力矩，将所述力矩作为机器人控制的前馈输入，以对所述机器人进行控制。本发明能够提高跟踪精度并且能够降低模型更新频率。The embodiment of the present invention obtains initial data through low-gain proportional-differential control, and constructs an initial Gaussian process online learning model according to the initial data, and the initial Gaussian process online learning model is used to perform preliminary control of the robot; The Gaussian process online learning model is rotated and updated in the control cycle; the desired position, velocity and acceleration are used as inputs, and multiple torques are predicted and obtained according to the latest Gaussian process online learning model, and the torques are used as the feedforward input of the robot control to control the robot. The robot is controlled. The present invention can improve the tracking accuracy and can reduce the frequency of model updating.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

图1为本发明实施例提供的机器人控制流程图；1 is a flowchart of a robot control provided by an embodiment of the present invention;

图2为本发明实施例提供的数据点删除策略的示意图；2 is a schematic diagram of a data point deletion strategy provided by an embodiment of the present invention;

图3为本发明实施例提供的整体步骤流程图。FIG. 3 is a flowchart of an overall step provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

针对现有技术存在的问题，本发明的一方面提供了一种基于高斯过程在线学习的机器人控制方法，如图3所示，包括：In view of the problems existing in the prior art, one aspect of the present invention provides a method for controlling a robot based on online learning of Gaussian process, as shown in FIG. 3 , including:

当所述计数器的数值没有达到预设数值时，将当前基向量集中距离最近的点作为无用点删除。When the value of the counter does not reach the preset value, the closest point in the current basis vector set is deleted as a useless point.

所述存储器用于存储程序；the memory is used to store programs;

下面结合说明书附图，对本发明的具体实现过程进行详细描述：The specific implementation process of the present invention is described in detail below in conjunction with the accompanying drawings:

本发明在高斯过程在线学习(Gaussian Process Online Learning，GPOL)的过程中，采用轮转更新的方式降低实际更新模型的频率，并增加一定的遗忘条件使得学习到的模型在复杂的轨迹跟踪任务中表现更好。对于n个关节的机械臂，使用n个独立的GPOL用于建模，模型的输入都是由位置、速度和加速度组成的向量(向量共3n个元素)，输出是控制力矩，即

其中

是实际关节位置,下面为了方便，部分变量不显式写出函数参数t。具体步骤如下：In the process of Gaussian Process Online Learning (GPOL), the present invention adopts the way of rotating update to reduce the frequency of actually updating the model, and increases a certain forgetting condition to make the learned model perform in complex trajectory tracking tasks better. For a manipulator with n joints, n independent GPOLs are used for modeling. The input of the model is a vector composed of position, velocity and acceleration (the vector has a total of 3n elements), and the output is the control torque, that is

in

is the actual joint position. For convenience, some variables do not explicitly write the function parameter t. Specific steps are as follows:

步骤一：仅使用低增益的比例-微分控制，预先收集少量数据用于初始化高斯过程在线学习模型，之后切换到基于模型的控制方式，其中控制律为

是期望关节位置，

是实际关节位置，

和

分别是比例增益和微分增益；Step 1: Only use low-gain proportional-derivative control, collect a small amount of data in advance to initialize the Gaussian process online learning model, and then switch to the model-based control method, where the control law is

is the desired joint position,

is the actual joint position,

and

are proportional gain and differential gain, respectively;

步骤二：在一个控制周期内，轮转更新高斯过程在线学习模型(即每次只更新一个高斯过程在线学习模型)；Step 2: In one control cycle, the Gaussian process online learning model is updated in rotation (that is, only one Gaussian process online learning model is updated each time);

步骤三：根据目前学习到的模型，以期望的位置、速度和加速作为输入，预测n个力矩作为控制的前馈输入(控制流程图如图1所示)。Step 3: According to the currently learned model, with the desired position, speed and acceleration as input, predict n torques as the feedforward input of the control (the control flow chart is shown in Figure 1).

其中，本实施例的总的控制律u＝u_ff+u_fb，第一项为前馈项

是高斯过程在线学习模型的预测输出；第二项为反馈项

Among them, the overall control law of this embodiment is u=u_ff +u_fb , and the first term is the feedforward term

is the predicted output of the Gaussian process online learning model; the second term is the feedback term

在步骤一中，仅仅使用低增益的比例-微分控制收集少量数据(例如

分别代表位置信息、速度信息、加速度信息，以及对应上一个时刻的控制力矩，这些信息组合后可以作为一个数据对)，用于归一化输入和输出，之后增加模型预测的力矩作为前馈项。注意，其中需要设置的超参数有：BVs(基向量集)的大小N，这个参数的设置直接影响预测速度；核函数k(x,x′)＝exp(-0.5(x-x′)^TΛ(c-c′))中Λ的取值，和输入的维度一致，如果输入数据的噪声比较大，对应的项可以取得小点；σ_n是模型噪声，根据输出项的噪声设置，注意这个不能过小(如小于0.0001)或为0，否则会导致矩阵逆求解失败；方差阈值∈_Tol用于衡量数据点是否应该加入BVs，可以设为0.01再调节；h用于设置遗忘速度，可以设置为N的10％或100％，当期望轨迹开始不断变化时，合理的h可以迅速学习新的轨迹相关的动力学，也能保证预测力矩相对光滑。In step one, only a small amount of data is collected using a low gain proportional-derivative control (e.g.

Represent position information, velocity information, acceleration information, and control torque corresponding to the previous moment, which can be combined as a data pair) to normalize the input and output, and then add the torque predicted by the model as a feedforward term . Note that the hyperparameters that need to be set are: the size N of the BVs (basic vector set), the setting of this parameter directly affects the prediction speed; the kernel function k(x,x')=exp(-0.5(xx')^T Λ( The value of Λ in cc')) is consistent with the input dimension. If the noise of the input data is relatively large, the corresponding item can be obtained as a small point; σ_n is the model noise, according to the noise setting of the output item, note that this cannot be too small (If less than 0.0001) or 0, otherwise the matrix inverse solution will fail; the variance threshold ∈_Tol is used to measure whether the data point should be added to BVs, which can be set to 0.01 and then adjusted; h is used to set the forgetting speed, which can be set to N 10% or 100%, when the desired trajectory starts to change continuously, a reasonable h can quickly learn the new trajectory-related dynamics, and also ensure that the predicted torque is relatively smooth.

在步骤二中，以再生希尔伯特空间范数(reproducing kernel Hilbert space，RKHS)作为衡量新数据点的标准，即衡量新数据点x_*距离原空间的距离

本实施例定义K_XX表示由矩阵X组成的格拉姆矩阵，即：In step 2, the reproducing kernel Hilbert space (RKHS) is used as the standard to measure the new data point, that is, the distance of the new data point x_* from the original space is measured

This embodiment defines K_XX to represent the Gram matrix composed of the matrix X, namely:

其中X表示N个数据点的输入向量组成的矩阵。K_*和K_**也有类似定义。where X represents a matrix of input vectors of N data points. K_* and K_** are similarly defined.

(1)根据实际情况定义一个方差阈值∈_Tol(如0.01)，如果新数据点x_*算出的γ比阈值大，则添加这个点进入BVs，并更新对应的辅助变量：(1) Define a variance threshold ∈_Tol (such as 0.01) according to the actual situation. If the calculated γ of the new data point x_* is larger than the threshold, add this point into the BVs, and update the corresponding auxiliary variable:

α_m+1＝T_m+1(α_m)+q_m+1s_m+1α_m+1 =T_m+1 (α_m )+q_m+1 s_m+1

S_m+1＝T_m+1(C_mK_X*)+e_m+1S_m+1 =T_m+1 (C_m K_X* )+e_m+1

其中，m是目前BVs的大小，

可理解为信息权重向量，

可理解为辅助核矩阵，后面的变量s，q，r，

和

用于方便表示，T_m+1通过在最后添加0的方式将一个向量扩展成m+1维度，U_m+1则是通过在最后一行和最后一列添加0的方式，将矩阵扩展成m+1×m+1维的矩阵，

表示只有第m+1个元素为1，其他元素为0的向量，σ_n是预先设定的模型噪声。如果新数据点不能进入BVs，则只根据这个点微调BVs而不增加BVs的大小，这样可以保持BVs的大小不至于使得预测过慢，微调BVs如下：where m is the size of the current BVs,

It can be understood as the information weight vector,

It can be understood as an auxiliary kernel matrix, followed by variables s, q, r,

and

For convenience of representation, T_m+1 expands a vector to m+1 dimension by adding 0 at the end, and U_m+1 expands the matrix to m+ by adding 0 to the last row and last column. 1×m+1 dimensional matrix,

Indicates that only the m+1th element is 1, and the other elements are 0, and σ_n is the preset model noise. If the new data point cannot enter the BVs, then only fine-tune the BVs according to this point without increasing the size of the BVs, which can keep the size of the BVs so that the prediction is not too slow. Fine-tune the BVs as follows:

α_m+1＝α_m+q_m+1s_m+1α_m+1 =α_m +q_m+1 s_m+1

其中，

是新数据点用原空间数据点表示的权重。in,

is the weight of the new data point represented by the original spatial data point.

(2)如果BVs大于某个预先设定的大小N(即为N+1)，则从BVs中选择最无用的点删除。这里使用带遗忘的策略选择需要删除的点(如图2所示)。本实施例维护一个计数器c和选择一个预先设置的遗忘条件h，每次增加新的点进入BVs时c加1，当c达到h时直接选择最旧的点删除，再置0开始计数。否则，选择在目前BVs中，距离其他点最近的点删除，可以用ρ_i＝α_i/Q_ii来计算距离，这里α_i是α的第i个元素，Q_ii是Q的第i行i列的元素。再根据选择的第i个元素，删除α，C和Q对应位置的元素(此时维度已经缩小为N和N×N)，并修正如下：(2) If the BVs is larger than a preset size N (ie, N+1), select the most useless point from the BVs to delete. Here a strategy with forgetting is used to select the points that need to be deleted (as shown in Figure 2). This embodiment maintains a counter c and selects a preset forgetting condition h. Every time a new point is added to enter the BVs, c is incremented by 1. When c reaches h, the oldest point is directly selected for deletion, and then set to 0 to start counting. Otherwise, select the point closest to other points in the current BVs to delete, and the distance can be calculated by ρ_i =α_i /Q_ii , where α_i is the ith element of α, and Qi_ii is the ith row i of Q element of the column. Then, according to the selected i-th element, delete the elements corresponding to α, C and Q (the dimensions have been reduced to N and N×N at this time), and modify it as follows:

α＝α-ρ_iQα=α-ρ_i Q

其中，Q_i和C_i分别表示取Q和C的第i列。再删除BVs中对应的数据点即可。Among them, Q_i and C_i represent taking the i-th column of Q and C, respectively. Then delete the corresponding data points in the BVs.

在步骤三中，对新的点x_*进行预测时，预测均值为

预测方差为

直接使用预测均值作为控制力矩的前馈项u_ff，再加上一个合适的反馈项u_fb作为控制命令发送到机器人即可。之后根据新的

和上一步的控制命令u作为新的训练点更新高斯过程在线学习模型。In step three, when making predictions for a new point x_* , the predicted mean is

The forecast variance is

The predicted mean value is directly used as the feedforward term u_ff of the control torque, plus a suitable feedback term u_fb is sent to the robot as a control command. Afterwards according to the new

And the control command u of the previous step is used as a new training point to update the Gaussian process online learning model.

需要说明的是，本实施例最开始0.6秒先是比例-微分控制，之后高斯过程在线学习模型不断根据输入力矩和测量得到的位置速度加速度更新，即上面步骤二；高斯过程在线学习模型再根据期望的位置、速度和加速度来预测力矩用于控制，整体实现一边学习、一边应用的过程。It should be noted that the first 0.6 seconds of this embodiment is proportional-derivative control, and then the Gaussian process online learning model is continuously updated according to the input torque and the measured position and velocity acceleration, that is, the above step 2; the Gaussian process online learning model is based on expectations. The position, velocity and acceleration are used to predict the torque for control, and the overall process of learning and application is realized.

综上所述，本发明使用高斯过程在线学习学习机械臂逆动力学，提出了在高控制频率下，如何在轨迹跟踪任务中应用高斯过程在线学习的方法：即降低模型更新频率，轮询更新。本发明在高斯过程在线学习中，增加了遗忘机制以提高短期跟踪精度，在轨迹跟踪任务切换时也能快速学习到未知动力学。To sum up, the present invention uses the Gaussian process online learning to learn the inverse dynamics of the manipulator, and proposes a method of applying the Gaussian process online learning in the trajectory tracking task under high control frequency: that is, reducing the model update frequency, polling update . In the online learning of the Gaussian process, the present invention adds a forgetting mechanism to improve the short-term tracking accuracy, and can also quickly learn the unknown dynamics when the track tracking task is switched.

相较于现有技术，本发明具有以下优点：Compared with the prior art, the present invention has the following advantages:

1、现有技术要么是离线的，要么受限于机械臂高控制频率的要求，难以在实际应用于实际。1. The existing technology is either offline or limited by the requirement of high control frequency of the manipulator, and it is difficult to be applied in practice.

2、现有的学习方法，要么关心整体性能导致局部性能不足(不能很好地预测近期动态)，要么不能排除学习到的错误数据。2. Existing learning methods either care about the overall performance leading to insufficient local performance (cannot predict recent dynamics well), or fail to exclude learned erroneous data.

本发明解决了高斯过程学习应用于高控制频率机械臂难的问题，而高斯过程在线学习有概率论的理论基础，也能提供不确定性估计。其次，本发明提出了增加遗忘速度的策略，相比现有技术，在复杂轨迹跟踪时有着更好的局部性能，跟踪精度更高。The invention solves the problem that the Gaussian process learning is difficult to apply to a high control frequency manipulator, and the online learning of the Gaussian process has the theoretical basis of probability theory and can also provide uncertainty estimation. Secondly, the present invention proposes a strategy for increasing the forgetting speed. Compared with the prior art, it has better local performance and higher tracking accuracy when tracking complex trajectories.

在一些可选择的实施例中，在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如，取决于所涉及的功能/操作，连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外，在本发明的流程图中所呈现和描述的实施例以示例的方式被提供，目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。可选择的实施例是可预期的，其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。In some alternative implementations, the functions/operations noted in the block diagrams may occur out of the order noted in the operational diagrams. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/operations involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of the various operations are altered and in which sub-operations described as part of larger operations are performed independently.

此外，虽然在功能性模块的背景下描述了本发明，但应当理解的是，除非另有相反说明，所述的功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中，或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是，有关每个模块的实际实现的详细讨论对于理解本发明是不必要的。更确切地说，考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下，在工程师的常规技术内将会了解该模块的实际实现。因此，本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本发明。还可以理解的是，所公开的特定概念仅仅是说明性的，并不意在限制本发明的范围，本发明的范围由所附权利要求书及其等同方案的全部范围来决定。Furthermore, while the invention is described in the context of functional modules, it is to be understood that, unless stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or or software modules, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to understand the present invention. Rather, given the attributes, functions, and internal relationships of the various functional modules in the apparatus disclosed herein, the actual implementation of the modules will be within the routine skill of the engineer. Accordingly, those skilled in the art, using ordinary skill, can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are illustrative only and are not intended to limit the scope of the invention, which is to be determined by the appended claims along with their full scope of equivalents.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus.

计算机可读介质的更具体的示例(非穷尽性列表)包括以下：具有一个或多个布线的电连接部(电子装置)，便携式计算机盘盒(磁装置)，随机存取存储器(RAM)，只读存储器(ROM)，可擦除可编辑只读存储器(EPROM或闪速存储器)，光纤装置，以及便携式光盘只读存储器(CDROM)。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本发明的实施例，本领域的普通技术人员可以理解：在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, The scope of the invention is defined by the claims and their equivalents.

以上是对本发明的较佳实施进行了具体说明，但本发明并不限于所述实施例，熟悉本领域的技术人员在不违背本发明精神的前提下还可做出种种的等同变形或替换，这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the described embodiments, and those skilled in the art can also make various equivalent deformations or replacements on the premise of not violating the spirit of the present invention, These equivalent modifications or substitutions are all included within the scope defined by the claims of the present application.

Claims

Translated fromChinese

1.一种基于高斯过程在线学习的机器人控制方法，其特征在于，包括：1. a robot control method based on Gaussian process online learning, is characterized in that, comprises:

2.根据权利要求1所述的一种基于高斯过程在线学习的机器人控制方法，其特征在于，所述通过低增益的比例-微分控制，获取初始数据，根据所述初始数据构建初始高斯过程在线学习模型，包括：2. A kind of robot control method based on Gaussian process online learning according to claim 1, it is characterized in that, described through the proportional-differential control of low gain, obtain initial data, construct initial Gaussian process online according to described initial data Learning models, including:

3.根据权利要求1所述的一种基于高斯过程在线学习的机器人控制方法，其特征在于，所述在每个控制周期内轮转更新高斯过程在线学习模型，包括：3. a kind of robot control method based on Gaussian process online learning according to claim 1, it is characterised in that described in each control cycle to update the Gaussian process online learning model by rotation, comprising:

4.根据权利要求3所述的一种基于高斯过程在线学习的机器人控制方法，其特征在于，所述当新的数据点计算得到的距离大于预设阈值，将该数据点加入基向量集并且更新对应的辅助变量这一步骤中，4. A kind of robot control method based on Gaussian process online learning according to claim 3, is characterized in that, described when the distance that the new data point is calculated is greater than the preset threshold value, this data point is added to the basis vector set and In the step of updating the corresponding auxiliary variable,

5.根据权利要求3所述的一种基于高斯过程在线学习的机器人控制方法，其特征在于，所述当所述基向量集大于预设大小，将所述基向量集中的无用点删除，包括：5. The robot control method based on Gaussian process online learning according to claim 3, wherein when the basis vector set is larger than a preset size, the useless points in the basis vector set are deleted, including :

6.根据权利要求5所述的一种基于高斯过程在线学习的机器人控制方法，其特征在于，所述当所述基向量集大于预设大小，将所述基向量集中的无用点删除，还包括：6. The robot control method based on Gaussian process online learning according to claim 5, wherein when the basis vector set is larger than a preset size, the useless points in the basis vector set are deleted, and the include:

7.根据权利要求1所述的一种基于高斯过程在线学习的机器人控制方法，其特征在于，所述将期望的位置、速度和加速作为输入，根据最新的高斯过程在线学习模型预测得到多个力矩，将所述力矩作为机器人控制的前馈输入；7. a kind of robot control method based on Gaussian process online learning according to claim 1, is characterized in that, described with expected position, speed and acceleration as input, according to the latest Gaussian process online learning model prediction to obtain multiple torque, which is used as a feedforward input for robot control;

8.一种基于高斯过程在线学习的机器人控制装置，其特征在于，包括：8. A robot control device based on Gaussian process online learning, characterized in that, comprising:

9.一种电子设备，其特征在于，包括处理器以及存储器；9. An electronic device, comprising a processor and a memory;

所述存储器用于存储程序；the memory is used to store programs;

所述处理器执行所述程序实现如权利要求1至7中任一项所述的方法。The processor executes the program to implement the method according to any one of claims 1 to 7 .

10.一种计算机可读存储介质，其特征在于，所述存储介质存储有程序，所述程序被处理器执行实现如权利要求1至7中任一项所述的方法。10. A computer-readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to implement the method according to any one of claims 1 to 7.