CN104091035B

Movatterモバイル変換

Info

Publication number: CN104091035B
Application number: CN201410370848.3A
Authority: CN
Inventors: 王功; 施建明; 李永祥; 刘亦飞
Original assignee: Technology and Engineering Center for Space Utilization of CAS
Current assignee: Technology and Engineering Center for Space Utilization of CAS
Priority date: 2014-07-30
Filing date: 2014-07-30
Publication date: 2017-02-08
Anticipated expiration: 2034-07-30
Also published as: CN104091035A

Abstract

本发明提供一种基于数据驱动算法的空间站有效载荷健康监测方法。在设计阶段，有效载荷历史数据经过构造状态向量，参数标准化和权重处理之后，得到训练样本，然后对训练样本进行聚类学习，得到不同工况数据分类。在运行阶段，有效载荷实时下行测试数据经过处理之后，利用聚类学习得到的工况对下行数据进行实时监测，如果出现异常数据，说明载荷出现了新的工况，可能或即将可能发生故障，最后结合故障诊断树方法对异常数据进行检测，确定故障发生位置。通过历史数据的机器学习形成系统健康知识库，基于离群点的距离值计算发现载荷的异常状态，实现对载荷健康状态的实时监测，可支持载荷的故障检测和定位，以及一定程度的预测。

The invention provides a data-driven algorithm-based health monitoring method for a payload of a space station. In the design stage, after constructing the state vector, parameter standardization and weight processing of the payload historical data, the training samples are obtained, and then the training samples are clustered and learned to obtain the data classification of different working conditions. In the operation phase, after the real-time downlink test data of the payload is processed, the downlink data is monitored in real time by using the working conditions obtained by cluster learning. If there is abnormal data, it means that the load has a new working condition, and a failure may or will occur soon. Finally, combined with the fault diagnosis tree method, the abnormal data is detected to determine the location of the fault. The system health knowledge base is formed through the machine learning of historical data, and the abnormal state of the load is found based on the calculation of the distance value of the outlier, so as to realize the real-time monitoring of the health state of the load, which can support the fault detection and location of the load, and a certain degree of prediction.

Description

Translated fromChinese

一种基于数据驱动算法的空间站有效载荷健康监测方法A data-driven algorithm-based health monitoring method for space station payloads

技术领域technical field

本发明属于空间站应用系统有效载荷故障诊断与健康管理技术领域，具体涉及一种基于数据驱动算法的空间站有效载荷健康监测方法。The invention belongs to the technical field of payload fault diagnosis and health management of a space station application system, and in particular relates to a method for monitoring the payload health of a space station based on a data-driven algorithm.

背景技术Background technique

航天任务具有政治影响重大、风险高、投资大以及周期长等特点，因此，保证航天任务的顺利实施是国家的一个重要目标。Space missions have the characteristics of significant political influence, high risk, large investment, and long cycle. Therefore, ensuring the smooth implementation of space missions is an important goal of the country.

为保证航天任务的顺利实施，现在技术中，通常采取的做法之一为：采用高可靠性设计方式设计空间站有效载荷。然而，由于空间环境的复杂性以及地面测试条件的局限性，空间站有效载荷在运行时仍然会出现故障，因此，如何有效迅速的对空间站有效载荷进行故障预测以及故障诊断具有重要现实意义。In order to ensure the smooth implementation of space missions, one of the methods usually adopted in the current technology is to design the payload of the space station with a high-reliability design method. However, due to the complexity of the space environment and the limitations of ground test conditions, the space station payload will still fail during operation. Therefore, how to effectively and quickly perform fault prediction and fault diagnosis on the space station payload has important practical significance.

发明内容Contents of the invention

针对现有技术存在的缺陷，本发明提供一种基于数据驱动算法的空间站有效载荷健康监测方法，用以解决上述问题。Aiming at the defects existing in the prior art, the present invention provides a data-driven algorithm-based health monitoring method for the payload of a space station to solve the above-mentioned problems.

本发明采用的技术方案如下：The technical scheme that the present invention adopts is as follows:

本发明提供一种基于数据驱动算法的空间站有效载荷健康监测方法，包括以下步骤：The invention provides a method for monitoring the health of a space station payload based on a data-driven algorithm, comprising the following steps:

S1，对于被健康监测的空间站有效载荷，所述有效载荷具有H个测试点，即：所述有效载荷的每条下行测试数据为包括H个测试参数的测试数据；S1, for the space station payload being health monitored, the payload has H test points, that is: each piece of downlink test data of the payload is test data including H test parameters;

建立所述有效载荷的历史测试数据库；其中，所述历史测试数据库用于存储所述有效载荷的若干条历史下行测试数据；其中，所述历史下行测试数据为历史下行无故障测试数据；Establishing a historical test database of the payload; wherein the historical test database is used to store several pieces of historical downlink test data of the payload; wherein the historical downlink test data is historical downlink fault-free test data;

S2，当需要对所述有效载荷进行健康监测时，读取所述历史测试数据库，得到多条历史下行测试数据；S2. When it is necessary to perform health monitoring on the payload, read the historical test database to obtain a plurality of historical downlink test data;

然后对所得到的历史下行测试数据进行预处理，得到满足要求的n条历史测试数据；其中，所述n条历史测试数据为所述有效载荷正常运作过程的测试数据；Then preprocess the obtained historical downlink test data to obtain n pieces of historical test data that meet the requirements; wherein, the n pieces of historical test data are test data of the normal operation process of the payload;

S3，基于预设选取原则，从所述H个测试参数中选取m个测试参数作为关键监测因子，得到m维的状态向量；其中，m≤H；S3. Based on the preset selection principle, select m test parameters from the H test parameters as key monitoring factors to obtain an m-dimensional state vector; wherein, m≤H;

S4，将n个m维的状态向量构造为n*m矩阵，n为矩阵行数，m为矩阵列数；其中，每一行中的m个数据即为一条S2得到的历史测试数据中所包含的m个关键监测因子的数据；每一列的数据为同一关键监测因子在不同次测试得到的测试数据；S4, construct n m-dimensional state vectors into an n*m matrix, n is the number of matrix rows, and m is the number of matrix columns; wherein, the m data in each row is included in a piece of historical test data obtained by S2 The data of the m key monitoring factors; the data in each column is the test data obtained in different tests of the same key monitoring factor;

S5，对n*m矩阵按列进行归一化处理，将每个关键监测因子的值域统一划到同一区间；S5, normalize the n*m matrix by column, and divide the value range of each key monitoring factor into the same interval;

S6，确定各个关键监测因子的权重，然后对归一化后的矩阵进行加权处理，得到加权矩阵；S6, determining the weight of each key monitoring factor, and then performing weighting processing on the normalized matrix to obtain a weighted matrix;

S7，将所述加权矩阵的各个行向量作为训练样本，则共有n个训练样本；对所述n个训练样本进行聚类，得到与不同载荷正常工况分别对应的多个聚类；S7, using each row vector of the weighted matrix as a training sample, there are n training samples in total; performing clustering on the n training samples to obtain a plurality of clusters respectively corresponding to different load normal working conditions;

S8，在对所述有效载荷进行健康监测时，接收所述有效载荷发送的实时下行测试数据；然后选取与S3相同的m个测试参数作为关键监测因子，组成状态向量；S8. When performing health monitoring on the payload, receive the real-time downlink test data sent by the payload; then select the same m test parameters as in S3 as key monitoring factors to form a state vector;

S9，按与S5中归一化处理方法相同的方法，对所述状态向量进行归一化处理；按与S6中加权处理方法相同的方法，对归一化后的状态向量进行加权处理，得到加权状态向量；S9, by the same method as the normalization processing method in S5, normalize the state vector; by the same method as the weight processing method in S6, weight the normalized state vector to obtain weighted state vector;

S10，比对所述加权状态向量与S7得到的各聚类中的训练样本，判断是否存在与所述加权状态向量相同的特定训练样本，如果存在，则得出所述有效载荷正常运行的结论，所述特定训练样本所在的聚类所对应的载荷工况即为所述有效载荷当前的工况，输出所述有效载荷当前的工况，结束流程；如果不存在，则执行S11；S10, comparing the weighted state vector with the training samples in each cluster obtained in S7, and judging whether there is a specific training sample identical to the weighted state vector, and if so, drawing a conclusion that the payload is operating normally , the load condition corresponding to the cluster where the specific training sample is located is the current working condition of the payload, output the current working condition of the payload, and end the process; if it does not exist, execute S11;

S11，比较所述加权状态向量与S7得到的各个聚类的距离，得到距所述加权状态向量最短的特定聚类，设最短距离为D；然后比较D与预设距离临界值R，如果D≤R，则得出所述有效载荷正常运行的结论，所述特定聚类所对应的载荷工况即为所述有效载荷当前的工况，输出所述有效载荷当前的工况，结束流程；如果D＞R，则表明所述加权状态向量不属于任何已知的历史聚类，所述加权状态向量为异常状态向量；则进一步得出所述有效载荷当前的工况并不属于任何正常的工况，所述有效载荷当前时刻可能出现或者即将出现故障的结论；最后保存所述异常状态向量，结束流程。S11, comparing the weighted state vector with the distances of each cluster obtained in S7 to obtain the shortest specific cluster from the weighted state vector, set the shortest distance as D; then compare D with the preset distance threshold R, if D ≤R, then draw the conclusion that the payload is running normally, the load condition corresponding to the specific cluster is the current condition of the payload, output the current condition of the payload, and end the process; If D>R, it indicates that the weighted state vector does not belong to any known historical clustering, and the weighted state vector is an abnormal state vector; it is further concluded that the current working condition of the payload does not belong to any normal Working condition, the conclusion that the payload may or will fail at the current moment; finally save the abnormal state vector and end the process.

优选的，S2中，对所述历史下行测试数据进行预处理，具体为：Preferably, in S2, the historical downlink test data is preprocessed, specifically:

将不完整数据，异常数据以及格式错误数据统称为坏数据；将所述历史下行测试数据中的坏数据剔除，得到预处理后的满足要求的历史测试数据。Incomplete data, abnormal data, and format error data are collectively referred to as bad data; bad data in the historical downlink test data is eliminated to obtain preprocessed historical test data that meets requirements.

优选的，S7中，对所述n个训练样本进行聚类，得到与不同载荷正常工况分别对应的多个聚类，具体为：Preferably, in S7, the n training samples are clustered to obtain a plurality of clusters respectively corresponding to different load normal working conditions, specifically:

S7.1，判断分类规则是否已知，如果分类规则完全已知，则执行S7.2；如果分类规则完全未知，则执行S7.3；如果分类规则部分已知，则执行S7.4；S7.1, determine whether the classification rules are known, if the classification rules are completely known, execute S7.2; if the classification rules are completely unknown, execute S7.3; if the classification rules are partially known, execute S7.4;

S7.2，所述分类规则完全已知是指：对于由m个关键监测因子所组成的m维状态向量，分类规则为：每一个关键监测因子均绑定若干个状态等级，每一个状态等级对应的数据区间均已知；各个关键监测因子的状态等级的不同组合即形成不同的载荷正常工况；S7.2, the classification rule is fully known means: for the m-dimensional state vector composed of m key monitoring factors, the classification rule is: each key monitoring factor is bound to several state levels, each state level The corresponding data intervals are known; different combinations of the state levels of each key monitoring factor form different normal load conditions;

聚类方法为：判断训练样本是否属于分类规则已知的载荷工况，如果属于，则将该训练样本归入对应的载荷工况，属于同一载荷工况的各个训练样本即形成一个聚类；如果不属于，则将该训练样本作为一个新的聚类而存入数据库中；其中，新的聚类即对应一种新的载荷工况；The clustering method is: judging whether the training sample belongs to the load case with known classification rules, and if so, classifying the training sample into the corresponding load case, and each training sample belonging to the same load case forms a cluster; If not, store the training sample in the database as a new cluster; wherein, the new cluster corresponds to a new load case;

S7.3，所述分类规则完全未知是指：对于由m个关键监测因子所组成的m维状态向量，每一个关键监测因子均末绑定若干个状态等级；S7.3, the classification rule is completely unknown means: for an m-dimensional state vector composed of m key monitoring factors, each key monitoring factor is not bound to several state levels;

聚类方法为：The clustering method is:

第一步，判断是否存在基础类，如果存在，则执行第二步；如果不存在，则给定基础类，同时按下式确定所述基础类中各个聚类的聚类中心与聚类半径；The first step is to judge whether there is a basic class, if it exists, then execute the second step; if not, then give the basic class, and at the same time determine the cluster center and cluster radius of each cluster in the basic class according to the formula ;

聚类中心：x_center＝(x_max+x_min)/2； (3)Clustering center: x_center = (x_max + x_min )/2; (3)

聚类半径：d_min＝D(x_max,x_min)/2； (4)Cluster radius: d_min ＝D(x_max ,x_min )/2; (4)

其中，x_max，x_max分别为同一聚类中训练样本的上限与下限；D为向量x_max和向量x_min之间的欧式距离；Among them, x_max and x_max are the upper limit and lower limit of the training samples in the same cluster respectively; D is the Euclidean distance between the vector x_max and the vector x_min ;

第二步，不断扩充基础类，直至得到的各个聚类包含所有训练样本；The second step is to continuously expand the basic class until each cluster obtained contains all training samples;

第三步，对得到的所有聚类进行优化，将最终得到的各个聚类作为有效载荷的工况存入数据库；The third step is to optimize all the obtained clusters, and store the final obtained clusters as the working conditions of the payload in the database;

S7.4，所述分类规则部分已知是指：对于由m个关键监测因子所组成的m维状态向量，存在M1个关键监测因子已绑定若干个状态等级，每一个状态等级对应的数据区间均已知；存在M2个关键监测因子未绑定若干个状态等级；其中，M1+M2＝m；M1≥1；M2≥1；M1和M2均为自然数；则：M1个关键监测因子为分类规则已知的因子；M2个关键监测因子为分类规则未知的因子；S7.4, the known part of the classification rules means: for an m-dimensional state vector composed of m key monitoring factors, there are M1 key monitoring factors bound to several state levels, and the data corresponding to each state level The intervals are all known; there are M2 key monitoring factors that are not bound to several state levels; among them, M1+M2=m; M1≥1; M2≥1; both M1 and M2 are natural numbers; then: M1 key monitoring factors are Factors with known classification rules; M2 key monitoring factors are factors with unknown classification rules;

聚类方法为：对于n个训练样本，将各训练样本包含的因子按照其所表示的物理意义分类；以分类规则已知的因子为分类依据，按S7.2中的聚类方法对n个训练样本进行第一次聚类，得到若干个原始聚类；The clustering method is as follows: for n training samples, classify the factors contained in each training sample according to the physical meaning they represent; use the known factors of the classification rules as the classification basis, and use the clustering method in S7.2 to classify n training samples The training samples are clustered for the first time to obtain several original clusters;

对于每一个原始聚类，按S7.3中的聚类方法进行第二次聚类，得到若干个子聚类；For each original cluster, perform the second clustering according to the clustering method in S7.3 to obtain several sub-clusters;

判断所得到的各个子聚类是否满足工况划分要求，如果满足，则将各子聚类作为最终聚类结果而存入数据库中；如果不满足，更换第一次聚类时所选择的M1个关键监测因子，重新进行第一次聚类和第二次聚类，循环上述过程，直至第二次聚类所得到的各子聚类满足工况划分要求为止。Judging whether the obtained sub-clusters meet the requirements of working condition division, if so, store each sub-cluster as the final clustering result in the database; if not, replace the M1 selected in the first clustering The first clustering and the second clustering are carried out again, and the above process is repeated until the sub-clusters obtained by the second clustering meet the requirements of working condition division.

优选的，S7.3中的第三步中，对得到的所有聚类进行优化，具体为：Preferably, in the third step in S7.3, all clusters obtained are optimized, specifically:

对于基础类C＝{C₁,C₂,...,C_q}，假定其中最小聚类为C_i(1≤i≤q)，该最小聚类中元素上下限分别为：x_max＝(a₁,a₂,...,a_p)与x_min＝(b₁,b₂,...,b_m)，记上下限元素之间的欧式距离为d_min，即：d_min＝D(x_max,x_min)；对任意扩展类C'_j，记其上下限元素之间的欧式距离为d'_j；For the basic class C={C₁ ,C₂ ,...,C_q }, assuming that the smallest cluster is C_i (1≤i≤q), the upper and lower limits of the elements in the smallest cluster are: x_max = (a₁ ,a₂ ,...,a_p ) and x_min =(b₁ ,b₂ ,...,b_m ), record the Euclidean distance between the upper and lower limit elements as d_min , namely: d_min ＝D(x_max ,x_min ); for any extended class C'_j , record the Euclidean distance between its upper and lower limit elements as d'_j;

如果则取消该扩展类，将其包括的各状态向量归入临近类；if Then cancel the extended class, and classify the state vectors included in it into the adjacent class;

如果则保留该扩展类。if Then keep the extension class.

优选的，S7.2中，将该训练样本作为一个新的聚类而存入数据库中，具体为：设该训练样本为x₀＝{a₁,a₂,.....,a_m}；Preferably, in S7.2, the training sample is stored in the database as a new cluster, specifically: the training sample is set as x₀ ={a₁ ,a₂ ,...,a_m };

计算x₀与距离最近的工况边界之间的距离d，将x₀作为新聚类的中心，距离d作为新聚类的半径，则新聚类表示为：Calculate the distance d between x₀ and the nearest working condition boundary, take x₀ as the center of the new cluster, and the distance d as the radius of the new cluster, then the new cluster is expressed as:

C_new＝{x,D(x,x₀)≤d} (2)C_new ＝{x,D(x,x₀ )≤d} (2)

其中，C_new表示新聚类，x表示可归属于新聚类的任意状态向量；D代表计算向量x和x₀之间的欧式距离。Among them, C_new represents a new cluster, x represents any state vector that can be assigned to the new cluster; D represents the Euclidean distance between the calculated vector x and_x0 .

优选的，S11之后，还包括：Preferably, after S11, it also includes:

S12，不断将有效载荷实时下行测试数据存入所述历史测试数据库，所述历史测试数据库不断更新；然后，基于更新后的历史测试数据库，每隔固定的周期，最新的训练样本集进行自我学习，形成最新的聚类，具体为：S12, continuously store the real-time downlink test data of the payload into the historical test database, and the historical test database is continuously updated; then, based on the updated historical test database, the latest training sample set is used for self-learning at regular intervals , forming the latest clustering, specifically:

1)对已有类进行新的聚类，按照公式(5)重新确定其聚类中心；1) Carry out new clustering for existing classes, and redetermine their clustering centers according to formula (5);

${c c}_{i i} = = \frac{11}{| | {A A}_{i i} | |} {Σ Σ}_{x x &Element; &Element; {A A}_{i i}} x x,, i i = = 1,2 1,2,, . . . . . . ((55))$

其中，c_i表示第i次更新后的聚类中心，A_i表示第i次更新后的聚类，x表示A_i的元素，|A_i|表示聚类A_i的元素数量；Among them, c_i represents the cluster center after the i-th update, A_i represents the cluster after the i-th update, x represents the elements of A_i , |A_i | represents the number of elements in the cluster A_i ;

2)对于新形成的类，则将其作为已知工况重新确定其聚类半径与聚类中心，完成对新类的更新。2) For the newly formed cluster, take it as a known working condition to re-determine its cluster radius and cluster center, and complete the update of the new cluster.

优选的，S11之后，还包括：Preferably, after S11, it also includes:

S12，在得出所述加权状态向量为异常状态向量的结论之后，首先判断是否为虚警，如果确定为虚警，则直接结束流程；如果不是，则继续判断该异常状态向量是否存在异常监测因子值，如果不存在，则说明有效载荷当前工况异常，处在故障临界点的几率较高；如果存在，则查找到该异常监测因子所对应的有效载荷测试点，结合故障诊断树方法，对有效载荷故障进行检测和隔离，最终输出发生故障的部件及其故障模式信息。S12, after drawing the conclusion that the weighted state vector is an abnormal state vector, first judge whether it is a false alarm, if it is determined to be a false alarm, then directly end the process; if not, continue to judge whether there is abnormal monitoring in the abnormal state vector If the factor value does not exist, it means that the current working condition of the payload is abnormal, and the probability of being at the critical point of failure is high; if it exists, the payload test point corresponding to the abnormal monitoring factor is found, combined with the fault diagnosis tree method, Payload faults are detected and isolated, and information about the failed component and its failure mode is finally output.

优选的，S12中，通过多次测试判断是否为虚警。Preferably, in S12, multiple tests are used to determine whether it is a false alarm.

本发明的有益效果如下：The beneficial effects of the present invention are as follows:

本发明提供的一种基于数据驱动算法的空间站有效载荷健康监测方法，通过历史数据的机器学习形成系统健康知识库，基于离群点的距离值计算发现载荷的异常状态，实现对载荷健康状态的实时监测，可支持载荷的故障检测和定位，以及一定程度的预测。The invention provides a data-driven algorithm-based space station payload health monitoring method, which forms a system health knowledge base through machine learning of historical data, calculates and finds the abnormal state of the load based on the distance value of the outlier point, and realizes the monitoring of the health state of the load. Real-time monitoring can support load fault detection and location, as well as a certain degree of prediction.

附图说明Description of drawings

图1为本发明提供的基于数据驱动算法的空间站有效载荷健康监测方法的整体流程示意图；Fig. 1 is the overall flow diagram of the space station payload health monitoring method based on the data-driven algorithm provided by the present invention;

图2为有效载荷数据预处理阶段的流程示意图；Fig. 2 is a schematic flow chart of the payload data preprocessing stage;

图3为分类规则已知情况下，训练样本聚类方法流程示意图；Fig. 3 is a schematic flow diagram of the training sample clustering method when the classification rules are known;

图4为分类规则完全未知情况下，训练样本聚类方法流程图；Fig. 4 is a flow chart of the training sample clustering method when the classification rules are completely unknown;

图5为扩充基础类的流程示意图；Fig. 5 is a schematic flow chart of extending the basic class;

图6为分类规则部分已知情况下，训练样本聚类方法流程图；Fig. 6 is a flow chart of the training sample clustering method when the classification rules are partially known;

图7为对实时下行测试数据进行实时监测过程的流程图；Fig. 7 is the flowchart of real-time monitoring process to real-time downlink test data;

图8为故障检测和隔离过程的流程示意图；Fig. 8 is a schematic flow chart of the fault detection and isolation process;

图9为空间站某冷却系统的测点分布示意图；Figure 9 is a schematic diagram of the distribution of measuring points of a cooling system of the space station;

图10为本发明实施例对历史测试数据归一化处理结果图；Fig. 10 is a graph of the results of normalization processing of historical test data according to the embodiment of the present invention;

图11为本发明实施例工况分类流程图；Fig. 11 is a flow chart of working condition classification according to an embodiment of the present invention;

图12为聚类方法对待测向量数据监测时，1300以前的测试数据显示图；Fig. 12 is when the vector data to be tested is monitored by the clustering method, the test data display figure before 1300;

图13为聚类方法对待测向量数据监测时，1000以后的测试数据显示图。Fig. 13 is a display diagram of the test data after 1000 when the clustering method monitors the vector data to be tested.

具体实施方式detailed description

以下结合附图对本发明进行详细说明：The present invention is described in detail below in conjunction with accompanying drawing:

国内航天产品健康监测技术的研究还处于起步阶段，以往工程中，有效载荷的下行数据在地面只做简单的处理，其中包含的健康信息未得到充分的挖掘。本发明在借鉴数据驱动方法的基础上，从有效载荷数据预处理、数据分类和聚类等方面对原有方法进行改进与整合，并结合有效载荷的故障模式及失效机理，形成了基于数据驱动算法的空间站有效载荷健康监测方法，本发明大致思路为：输入部分包括有效载荷历史测试数据以及有效载荷实时下行测试数据，分别作为设计阶段的输入和运行阶段的输入。(1)在设计阶段，主要包括有效载荷历史测试数据预处理和基于聚类算法的载荷数据工况分类。有效载荷历史测试数据经过初步处理后，首先选择要检测的参数构造状态向量，状态向量生成之后，对向量中的元素进行标准化和权重处理，移除坏数据，得到训练样本。然后对训练样本进行聚类学习，得到不同工况数据分类。(2)在运行阶段，主要包括有效载荷实时下行测试数据预处理、数据监测以及故障检测与隔离。有效载荷实时下行测试数据经过清除坏数据、标准化以及权重处理之后，进入监测阶段，如果为正常数据，则存入数据库。如果出现异常，一方面输出异常数据，另一方面，对可能的故障进行报警，并对其进行故障检测与隔离，将诊断结果存入数据库。The research on the health monitoring technology of domestic aerospace products is still in its infancy. In previous projects, the downlink data of the payload was only processed briefly on the ground, and the health information contained in it was not fully mined. On the basis of referring to the data-driven method, the present invention improves and integrates the original method from the aspects of payload data preprocessing, data classification and clustering, and combines the failure mode and failure mechanism of the payload to form a data-driven The algorithmic space station payload health monitoring method, the general idea of the present invention is: the input part includes the payload historical test data and the payload real-time downlink test data, which are used as the input of the design phase and the input of the operation phase respectively. (1) In the design stage, it mainly includes the preprocessing of payload historical test data and the classification of load data working conditions based on clustering algorithm. After preliminary processing of payload historical test data, the parameters to be detected are first selected to construct a state vector. After the state vector is generated, the elements in the vector are standardized and weighted to remove bad data and obtain training samples. Then cluster learning is performed on the training samples to obtain data classification of different working conditions. (2) In the operation phase, it mainly includes payload real-time downlink test data preprocessing, data monitoring, and fault detection and isolation. The real-time downlink test data of the payload enters the monitoring stage after clearing bad data, standardization, and weight processing. If it is normal data, it will be stored in the database. If there is an abnormality, on the one hand, the abnormal data will be output; on the other hand, the possible fault will be alarmed, and the fault detection and isolation will be carried out, and the diagnosis result will be stored in the database.

如图1所示，本发明主要分为五部分：有效载荷数据预处理、基于聚类算法的工况数据分类、有效载荷实时下行数据的监测、基于故障树算法的故障检测和隔离以及载荷健康知识学习过程。以下对这五部分分别详细介绍：As shown in Figure 1, the present invention is mainly divided into five parts: payload data preprocessing, working condition data classification based on clustering algorithm, monitoring of payload real-time downlink data, fault detection and isolation based on fault tree algorithm, and load health knowledge learning process. The following is a detailed introduction to these five parts:

(一)有效载荷数据预处理阶段(1) Payload data preprocessing stage

有效载荷数据预处理阶段区分为对有效载荷历史下行测试数据进行预处理，以及对有效载荷实时下行测试数据进行预处理，通过预处理，得到适合聚类所需数据。The payload data preprocessing stage is divided into preprocessing the historical downlink test data of the payload and preprocessing the real-time downlink test data of the payload. Through preprocessing, the data suitable for clustering is obtained.

如图2所示，为有效载荷数据预处理阶段的流程示意图；As shown in Figure 2, it is a schematic flow chart of the payload data preprocessing stage;

(1)对于有效载荷历史下行测试数据，主要区分为数据预处理、数据归一化和参数加权处理三部分：(1) For the payload historical downlink test data, it is mainly divided into three parts: data preprocessing, data normalization and parameter weighting processing:

(1.1)数据预处理：(1.1) Data preprocessing:

对于被健康监测的空间站有效载荷，有效载荷具有H个测试点，即：有效载荷的每条下行测试数据为包括H个测试参数的测试数据；服务器建立有效载荷的历史测试数据库；其中，历史测试数据库用于存储有效载荷的若干条历史下行测试数据；例如，H个测试点为6个测试点，其中有两个测试点分别用于测试流量，其他4个测试点分别用于测试电流、电压、温度、压强。则每条下行测试数据为包含6个测试参数的测试数据。For the payload of the space station monitored by health, the payload has H test points, that is: each piece of downlink test data of the payload is test data including H test parameters; the server establishes a historical test database of the payload; wherein, the historical test The database is used to store several pieces of historical downlink test data of the payload; for example, the H test points are 6 test points, two of which are used to test traffic, and the other 4 test points are used to test current and voltage respectively , temperature, and pressure. Then each piece of downlink test data is test data including 6 test parameters.

当需要对所述有效载荷进行健康监测时，读取所述历史测试数据库，得到多条历史下行测试数据；然后对所得到的历史下行测试数据进行预处理，得到满足要求的n条历史测试数据；其中，所述n条历史测试数据为所述有效载荷正常运作过程的测试数据。When it is necessary to perform health monitoring on the payload, read the historical test database to obtain a plurality of historical downlink test data; then preprocess the obtained historical downlink test data to obtain n pieces of historical test data that meet the requirements ; Wherein, the n pieces of historical test data are the test data of the normal operation process of the payload.

本步骤中的预处理具体为：判断历史测试数据是否完好，不完好的数据定义为坏数据，对其进行清除；其中，坏数据主要包含不完整数据，异常数据以及格式错误数据。The preprocessing in this step is specifically: judging whether the historical test data is intact or not, defining incomplete data as bad data, and clearing it; wherein, bad data mainly includes incomplete data, abnormal data and data with wrong format.

由于每条有效载荷下行测试数据涉及到各种测试参数，如电流、电压、温度、流量、压强等等，但是，为避免测试向量中的元素过多导致计算复杂化；以及，在后续的聚类过程中，对于分类规则已知的情况，需要依据各测试参数的区间判断工况，直接选用测试参数作为分析对象会更简单可信。因此，通常选取部分测试参数进行聚类。Since each piece of payload downlink test data involves various test parameters, such as current, voltage, temperature, flow, pressure, etc., however, in order to avoid too many elements in the test vector from complicating the calculation; and, in the subsequent aggregation In the classification process, for the situation where the classification rules are known, it is necessary to judge the working conditions based on the interval of each test parameter. It is simpler and more reliable to directly select the test parameters as the analysis object. Therefore, usually some test parameters are selected for clustering.

本发明中，从所述H个测试参数中选取m个测试参数作为关键监测因子，得到m维的状态向量；其中，m≤n。In the present invention, m test parameters are selected from the H test parameters as key monitoring factors to obtain an m-dimensional state vector; wherein, m≤n.

(1.2)数据归一化(1.2) Data normalization

由于待测试的状态向量包含多种参数，各个测试参数单位不同，所以各个测试参数的数值值域也不尽相同。例如，某状态向量为A＝{a₁,a₂,.....,a_m}，其中，a₁∈[0,1]，a₂∈[0,100]。选取状态向量A的两组数据A₁和A₂，A₁＝[0.01,1,.......,20]，A₂＝[0.9,10,.......,20]，除前两个元素，其余元素都一致。虽然元素a₁之间的欧式距离差距(0.89)远小于元素a₂(9)，但是从值域跨度看，元素a₁变化量为： $\frac{0.9 - 0.01}{1} \times 100 % = 89 %,$ 而元素a₂变化量为： $\frac{10 - 1}{100} \times 100 % = 9 % .$ 所以，向量元素的归一化处理十分必要，将各个参数值域统一划分到同一区间，例如[0,1]内。Since the state vector to be tested contains various parameters, and the unit of each test parameter is different, the numerical value range of each test parameter is also different. For example, a certain state vector is A=_{ a₁ ,a₂ ,...,am }, where a₁ ∈[0,1], a₂ ∈[0,100]. Select two sets of data A₁ and A₂ of state vector A, A₁ =[0.01,1,.....,20], A₂ =[0.9,10,....,20 ], except for the first two elements, the rest of the elements are consistent. Although the Euclidean distance gap (0.89) between elements a₁ is much smaller than that of element a₂ (9), from the perspective of the range span, the variation of element a₁ is: $\frac{0.9 - 0.01}{1} \times 100 % = 89 %,$ And the variation of element a₂ is: $\frac{10 - 1}{100} \times 100 % = 9 % .$ Therefore, the normalization processing of vector elements is very necessary, and the value range of each parameter is uniformly divided into the same interval, such as [0,1].

当有n个m维的状态向量时，可构造n*m矩阵，n为矩阵行数，m为矩阵列数；其中，每一行中的m个数据即为一条S2得到的历史测试数据中所包含的m个关键监测因子的数据；每一列的数据为同一关键监测因子在不同次测试得到的测试数据；对n*m矩阵按列进行归一化处理，将每个关键监测因子的值域统一划到同一区间。When there are n m-dimensional state vectors, an n*m matrix can be constructed, n is the number of matrix rows, and m is the number of matrix columns; wherein, the m data in each row is the historical test data obtained by S2 It contains the data of m key monitoring factors; the data in each column is the test data obtained by the same key monitoring factor in different tests; the n*m matrix is normalized by column, and the value range of each key monitoring factor is Unified into the same range.

(1.3)参数加权处理(1.3) Parameter weighting processing

参数加权处理是指按照参数重要程度赋予相应的权重，将待测向量所有参数归一化处理只是使参数单位同一，但没有考虑不同参数之间重要程度可能不同。同样，假定状态向量A＝{a₁,a₂,.....,a_m}，假设a₁、a₂同表示流量，不同的是a₁表示干路水流量，而a₂表示的是某一支路水流量。经过归一化处理之后，A的两组测试向量A_i和A_j分别为A_i＝[0.3,0.2,.......,0.2]，A_j＝[0.7,0.9,.......,0.2]，假定除前两个元素，其余元素均一致。元素a₁之间的欧式距离差为0.4，而a₂之间的欧式距离差为0.7，虽然a₂大于a₁，但是a₁是干路流量，a₂的变化量只是a₁变化量的一部分，从流量变化量上来说，a₁变化量的0.4要包含a₂变化量的0.7。所以，对参数赋予相应的权重，对聚类结果的准确性是十分必要的。一方面为了计算方便，另一方面，当向量的元素之间具有较强独立性，可假定各个参数经过归一化处理之后权重相同。The parameter weighting process refers to assigning corresponding weights according to the importance of the parameters. The normalization process of all parameters of the vector to be measured is only to make the parameter units the same, but it does not consider that the importance of different parameters may be different. Similarly, assuming the state vector A=_{ a₁ ,a₂ ,...,am }, assuming that a₁ and a₂ both represent the flow, the difference is that a₁ represents the water flow of the main road, and a₂ represents the is the water flow of a certain branch. After normalization, the two sets of test vectors A_i and A_j of A are respectively A_i =[0.3,0.2,...,0.2], A_j =[0.7,0.9,... ....,0.2], assuming that except for the first two elements, the remaining elements are consistent. The Euclidean distance difference between elements a₁ is 0.4, and the Euclidean distance difference between a₂ is 0.7. Although a₂ is greater than a₁ , but a₁ is the main road flow, the change of a₂ is only the change of a₁ Partly, in terms of the amount of flow change, 0.4 of the change in a₁ should include 0.7 of the change in a₂ . Therefore, assigning corresponding weights to the parameters is very necessary for the accuracy of the clustering results. On the one hand, for the convenience of calculation, on the other hand, when the elements of the vector have strong independence, it can be assumed that each parameter has the same weight after normalization.

具体的，可确定各个关键监测因子的权重，然后对归一化后的矩阵进行加权处理，得到加权矩阵。Specifically, the weight of each key monitoring factor can be determined, and then the normalized matrix can be weighted to obtain a weighted matrix.

待测向量经过参数归一化和权重处理后，数据预处理阶段结束。经过预处理的数据将作为训练样本进入下一步的工况分类。同时，将处理后的状态向量存入数据库，以备以后重复使用。After the vectors to be measured are normalized and weighted, the data preprocessing stage ends. The preprocessed data will be used as training samples to enter the next step of working condition classification. At the same time, the processed state vector is stored in the database for future reuse.

(2)对于有效载荷实时下行测试数据，同样需要进行预处理，如图2所示，实时下行测试数据预处理过程与历史下行测试数据预处理过程基本一致；即：选取相同的m个测试参数作为关键监测因子，组成状态向量；采用相同的归一化处理方法，对所述状态向量进行归一化处理；采用相同的加权处理方法，对归一化后的状态向量进行加权处理，得到加权状态向量。(2) For the real-time downlink test data of the payload, preprocessing is also required. As shown in Figure 2, the preprocessing process of the real-time downlink test data is basically the same as the preprocessing process of the historical downlink test data; that is, select the same m test parameters As a key monitoring factor, a state vector is formed; the same normalization processing method is used to normalize the state vector; the same weighting processing method is used to weight the normalized state vector to obtain a weighted state vector.

实时下行测试数据预处理过程与历史下行测试数据预处理过程的不同仅在于，在经过坏数据筛选之后，实时下行测试数据要首先存入数据库，参数构造所需数据再从数据库中读取。The difference between the preprocessing process of real-time downlink test data and the preprocessing process of historical downlink test data is that after filtering bad data, the real-time downlink test data must first be stored in the database, and then the data required for parameter construction is read from the database.

(二)基于聚类算法的工况数据分类阶段(2) Working condition data classification stage based on clustering algorithm

设经过数据预处理后，共得到n个状态向量，则该n个状态向量作为n个训练样本，对所述n个训练样本进行聚类，得到与不同载荷正常工况分别对应的多个聚类；如图1所示，按分类规则是否已知，共区分为三种情况：Assuming that after data preprocessing, a total of n state vectors are obtained, then the n state vectors are used as n training samples, and the n training samples are clustered to obtain multiple clusters corresponding to different normal load conditions. Class; As shown in Figure 1, according to whether the classification rules are known, it can be divided into three situations:

(1)分类规则完全已知(1) The classification rules are completely known

分类规则完全已知，即载荷不同工况数据阈值已经确定，具体为：对于由m个关键监测因子所组成的m维状态向量，分类规则为：每一个关键监测因子均绑定若干个状态等级，每一个状态等级对应的数据区间均已知；各个关键监测因子的状态等级的不同组合即形成不同的载荷正常工况；The classification rules are completely known, that is, the data thresholds of different load conditions have been determined, specifically: for an m-dimensional state vector composed of m key monitoring factors, the classification rules are: each key monitoring factor is bound to several state levels , the data interval corresponding to each state level is known; different combinations of the state levels of each key monitoring factor form different normal load conditions;

例如，对于状态向量A＝{a₁,a₂,a₃}，共有3个关键监测因子，分别代表水温、风温和水流量，假设水温对应3个状态等级，分别为：高、中、低；风温对应3个状态等级，分别为：高、中、低；水流量对应3个状态等级，分别为：高、中、低；则共有3³＝27个载荷正常工况。如果水温、风温和水流量所对应的状态等级具已知数据区间，例如：对于水温，已知高等级的数据区间为40～50；中等级的数据区间为30～40；低等级的数据区间为20～30；该种情况即为分类规则完全已知的情况。For example, for the state vector A={a₁ ,a₂ ,a₃ }, there are 3 key monitoring factors, representing water temperature, wind temperature and water flow, assuming that the water temperature corresponds to 3 state levels, namely: high, medium and low ; Wind temperature corresponds to 3 state levels: high, medium, and low; water flow corresponds to 3 state levels: high, medium, and low; then there are 3³ =27 normal load conditions. If the state level corresponding to water temperature, wind temperature and water flow has a known data interval, for example: for water temperature, the known high-level data interval is 40-50; the middle-level data interval is 30-40; the low-level data interval is 20 to 30; this situation is the situation where the classification rules are completely known.

分类规则已知情况下，训练样本聚类方法如图3所示，判断训练样本是否属于分类规则已知的载荷工况，如果属于，则将该训练样本归入对应的载荷工况，属于同一载荷工况的各个训练样本即形成一个聚类；如果不属于，则将该训练样本作为一个新的聚类而存入数据库中；其中，新的聚类即对应一种新的载荷工况；When the classification rules are known, the training sample clustering method is shown in Figure 3. It is judged whether the training samples belong to the load case with known classification rules. If so, the training sample is classified into the corresponding load case and belongs to the same Each training sample of the load case forms a cluster; if it does not belong, the training sample is stored in the database as a new cluster; where the new cluster corresponds to a new load case;

其中，对于训练样本不属于现有任何工况的情况，将其作为新的工况，给定合理的阈值，存入数据库。阈值确定方法为：假定某训练样本x₀＝{a₁,a₂,.....,a_m}不属于任何已知的工况，计算x₀与其距离最近的工况边界距离d，将x₀作为新聚类的中心，距离d作为新聚类的半径，则新聚类表示为：Among them, for the case that the training sample does not belong to any existing working conditions, it is regarded as a new working condition, given a reasonable threshold, and stored in the database. The threshold determination method is as follows: assuming that a certain training sample x₀ =_{ a₁ ,a₂ ,...,am } does not belong to any known working conditions, calculate the distance d between x₀ and the nearest working condition boundary, Taking x₀ as the center of the new cluster, and the distance d as the radius of the new cluster, the new cluster is expressed as:

C_new＝{x,D(x,x₀)≤d} (2)C_new ＝{x,D(x,x₀ )≤d} (2)

其中，C_new表示新聚类，x表示可归属于新聚类的任意状态向量。D代表计算向量x和x₀之间的欧式距离。Among them, C_new represents a new cluster, and x represents any state vector that can be assigned to the new cluster. D stands for computing the Euclidean distance between vectors x and_x0 .

其中，设向量X＝{x₁,x₂,...,x_m}、向量Y＝{y₁,y₂,.....,y_m}，X与Y之间欧式距离通过下式计算：Among them, let the vector X={x₁ ,x₂ ,...,x_m }, the vector Y={y₁ ,y₂ ,...,y_m }, the Euclidean distance between X and Y is passed through the following formula calculation:

$D D. ((X x,, Y Y)) = = \sqrt{{Σ Σ}_{i i = = 11}^{m m} {(({y the y}_{i i} - - {x x}_{i i}))}^{22}} - - - - - - ((11))$

(2)分类规则完全未知(2) The classification rules are completely unknown

分类规则完全未知是指：对于由m个关键监测因子所组成的m维状态向量，每一个关键监测因子均末绑定若干个状态等级；The classification rules are completely unknown: for an m-dimensional state vector composed of m key monitoring factors, each key monitoring factor is not bound to several state levels;

如图4所示，为分类规则完全未知情况下，训练样本聚类方法流程图，包括以下步骤：As shown in Figure 4, it is a flow chart of the training sample clustering method when the classification rules are completely unknown, including the following steps:

第一步，判断是否存在基础类，如果存在，则执行第二步；如果不存在，则给出一个基础类：基础类给定借助于Matlab中的linkage()函数，对部分训练样本，按照训练样本之间距离大小进行初步的分组，并确定聚类中心与聚类半径。The first step is to judge whether there is a basic class, if it exists, then execute the second step; if it does not exist, then give a basic class: the basic class is given with the help of the linkage() function in Matlab, and for some training samples, according to The distance between training samples is used for preliminary grouping, and the cluster center and cluster radius are determined.

其中，x_max，x_max分别为聚类中的向量的上限与下限；其中，向量的上限指：对于某一聚类，向量模最大值对应的向量；向量的下限指：对于某一聚类，向量模最小值对应的向量。向量模的计算方法为：例如，对于向量(3，3，3)，其模为： $| (3,3,3) | = \sqrt{3^{2} + 3^{2} + 3^{2}} = 3 \sqrt{3}$ Among them, x_max and x_max are the upper limit and lower limit of the vectors in the cluster respectively; the upper limit of the vector refers to: for a certain cluster, the vector corresponding to the maximum value of the vector modulus; the lower limit of the vector refers to: for a certain cluster , the vector corresponding to the minimum value of the vector modulus. The calculation method of vector modulus is: For example, for vector (3, 3, 3), its modulus is: $| (3,3,3) | = \sqrt{3^{2} + 3^{2} + 3^{2}} = 3 \sqrt{3}$

第二步，不断扩充基础类，直至得到的聚类包含所有训练样本，扩充基础类的方法类似于分类规则已知中添加新类的方法，如图5所示，为扩充基础类的流程示意图；The second step is to continuously expand the basic class until the obtained cluster contains all training samples. The method of expanding the basic class is similar to the method of adding a new class when the classification rules are known. As shown in Figure 5, it is a schematic diagram of the process of expanding the basic class ;

对于训练样本A_i，计算A_i与当前各个聚类之间的距离，得到最短距离；判断最短距离是否超出界限值，如果超出，将A_i作为新的聚类；然后处理下一个训练样本；如果未超出，将训练样本A_i归入最短距离的聚类中；然后处理下一个训练样本。For the training sample A_i , calculate the distance between A_i and each current cluster to obtain the shortest distance; judge whether the shortest distance exceeds the limit value, if it exceeds, use A_i as a new cluster; then process the next training sample; If not, classify the training sample A_i into the cluster with the shortest distance; then process the next training sample.

优化方法包括但不限于以下方式：Optimization methods include but are not limited to the following methods:

将阈值很小的类与邻域合并，合并规则如下：Merge the class with a small threshold and the neighborhood, and the merging rules are as follows:

如果则取消该扩展类，将其包括的各状态向量归入临近类(一般归入阈值较小的类)；如果则保留该扩展类。if Then cancel the extended class, and classify the state vectors included in it into the adjacent class (generally classified into the class with a smaller threshold); if Then keep the extension class.

(3)分类规则部分已知(3) The classification rules are partially known

多数情况下，载荷数据分类规则并不简单的已知或者未知，比如某冷却系统有两个温度传感器，数据分类规则已知。在此基础上再添加一个流量传感器，如果没有历史参照，此时向量分类规则属于部分已知的情况。In most cases, the load data classification rules are not simply known or unknown. For example, a cooling system has two temperature sensors, and the data classification rules are known. Add another flow sensor on this basis. If there is no historical reference, the vector classification rules are partially known at this time.

具体的，分类规则部分已知是指：对于由m个关键监测因子所组成的m维状态向量，存在M1个关键监测因子已绑定若干个状态等级，每一个状态等级对应的数据区间均已知；存在M2个关键监测因子未绑定若干个状态等级；其中，M1+M2＝m；M1≥1；M2≥1；M1和M2均为自然数；则：M1个关键监测因子为分类规则已知的因子；M2个关键监测因子为分类规则未知的因子；Specifically, part of the classification rules is known: for an m-dimensional state vector composed of m key monitoring factors, there are M1 key monitoring factors bound to several state levels, and the data interval corresponding to each state level has been It is known; there are M2 key monitoring factors that are not bound to several state levels; among them, M1+M2=m; M1≥1; M2≥1; both M1 and M2 are natural numbers; then: M1 key monitoring factors are classified rules Known factors; M2 key monitoring factors are factors whose classification rules are unknown;

本发明采用两次分类的方法。第一次分类是按照训练样本中分类规则已知的参数进行分类，第二次分类是在第一次分类的基础上借助聚类的方法对训练样本进行更进一步的划分。The present invention adopts the method of two classifications. The first classification is to classify according to the known parameters of the classification rules in the training samples, and the second classification is to further divide the training samples by clustering method based on the first classification.

训练样本一般包含多个参数，各参数单位以及所表示的物理意义不尽相同。如图6所示，为分类规则部分已知情况下，训练样本聚类方法流程图，包括以下步骤：Training samples generally contain multiple parameters, and the units and physical meanings of each parameter are different. As shown in Figure 6, it is a flow chart of the training sample clustering method when the classification rules are partially known, including the following steps:

对于n个训练样本，将各训练样本包含的因子按照其所表示的物理意义分类；以分类规则已知的因子为分类依据，按S7.2中的聚类方法对n个训练样本进行第一次聚类，得到若干个原始聚类；For n training samples, classify the factors contained in each training sample according to the physical meaning it represents; take the factors known to the classification rules as the classification basis, and perform the first step on the n training samples according to the clustering method in S7.2 sub-clustering to obtain several original clusters;

对于每一个原始聚类，按S7.3中的聚类方法进行第二次聚类，得到若干个更为详细的子聚类；For each original cluster, perform the second clustering according to the clustering method in S7.3 to obtain several more detailed sub-clusters;

二次分类方法与分类规则未知的分类方法类似，不同之处在于二次分类针对的是经过一次分类之后的子类数据，并不是针对全部数据。The secondary classification method is similar to the classification method with unknown classification rules, the difference is that the secondary classification is aimed at the subcategory data after the first classification, not all the data.

例如，假定向量A包含参数{a₁,a₂,.....,a_n}，所有参数均属于以下四类{水温,风温,水流量,风流量}。按照{水温,风温,水流量,风流量}中的水流量分类，如果水流分两路，即For example, assume that the vector A contains parameters {a₁ , a₂ ,..., a_n }, and all parameters belong to the following four categories {water temperature, wind temperature, water flow, wind flow}. According to the water flow classification in {water temperature, wind temperature, water flow, wind flow}, if the water flow is divided into two paths, that is

水流量＝{支流a_p,支流a_q},1≤p,q≤nWater flow = {branch a_p , tributary a_q }, 1≤p, q≤n

假定每一支路流量正常工况分为三类：高、中、低，则两路水流量所表示的状态有9类，如下表所示：Assuming that the normal working condition of each branch flow is divided into three categories: high, medium and low, there are 9 categories of states represented by the two-way water flow, as shown in the following table:

因此，如果按照水流量划分规则作为向量A的划分依据，初步将向量A分为9类工况。在此基础上，分别针对每一种工况，结合聚类方法，得到针对每个类更为详细的划分。同样继续上例，针对子工况{高，高}，在该工况基础上继续划分为K个子工况，分别为：{高，高，子工况1}，{高，高，子工况2}，….，{高，高，子工况K}。最终实现工况的完全划分。Therefore, if the division rule of water flow is used as the division basis of vector A, vector A can be divided into 9 types of working conditions initially. On this basis, for each working condition, combined with the clustering method, a more detailed division for each class is obtained. Continuing with the above example, for the sub-working condition {high, high}, continue to divide it into K sub-working conditions on the basis of this working condition, which are: {high, high, sub-working condition 1}, {high, high, sub-working condition Case 2}, ...., {high, high, subcase K}. Finally, the complete division of the working conditions is realized.

工况分类规则制定以后，按照规则初步划分工况类别，再结合聚类方法，得到更细致的数据分类。如果聚类之后工况分类不符合要求，如聚类出现重合部分，将重新选择新的向量参数重新制定分类规则，直至符合要求。After the working condition classification rules are formulated, the working condition categories are preliminarily divided according to the rules, and then combined with the clustering method, a more detailed data classification is obtained. If the classification of working conditions after clustering does not meet the requirements, such as overlapping parts of the clusters, new vector parameters will be re-selected to re-establish the classification rules until the requirements are met.

(三)有效载荷实时下行数据的监测(3) Monitoring of real-time downlink data of payload

经过前面两个步骤，得到了有效载荷正常工况下对应的聚类。将这些已经得到的聚类作为基础类对实时下行测试数据进行监视，如果出现不属于任何聚类的数据出现，说明载荷出现了新的工作状况(如果不是虚警，那么就是新的故障模式)。After the previous two steps, the clusters corresponding to the normal working conditions of the payload are obtained. Use these obtained clusters as the basic class to monitor the real-time downlink test data. If data that does not belong to any cluster appears, it means that the load has a new working condition (if it is not a false alarm, then it is a new failure mode) .

经过预处理、归一化以及加权处理之后的实时下行测试数据称为加权状态向量，作为实时监测过程的输入，检测流程如图7所示，包括以下步骤：The real-time downlink test data after preprocessing, normalization and weighting processing is called the weighted state vector, which is used as the input of the real-time monitoring process. The detection process is shown in Figure 7, including the following steps:

(1)比对所述加权状态向量与前述步骤得到的各聚类中的训练样本，判断是否存在与所述加权状态向量相同的特定训练样本，如果存在，则得出所述有效载荷正常运行的结论，所述特定训练样本所在的聚类所对应的载荷工况即为所述有效载荷当前的工况，输出所述有效载荷当前的工况，结束流程，完成对该实时下行测试数据的监测过程；如果不存在，则执行(2)；(1) Compare the weighted state vector with the training samples in each cluster obtained in the preceding steps, and determine whether there is a specific training sample identical to the weighted state vector, and if so, conclude that the payload is running normally conclusion, the load condition corresponding to the cluster where the specific training sample is located is the current working condition of the payload, output the current working condition of the payload, end the process, and complete the real-time downlink test data monitor process; if not present, execute (2);

(2)比较所述加权状态向量与各个聚类的距离，得到距所述加权状态向量最短的特定聚类，设最短距离为D；然后比较D与预设距离临界值R，如果D≤R，则得出所述有效载荷正常运行的结论，所述特定聚类所对应的载荷工况即为所述有效载荷当前的工况，输出所述有效载荷当前的工况，结束流程；如果D＞R，则表明所述加权状态向量不属于任何已知的历史聚类，所述加权状态向量为异常状态向量；则进一步得出所述有效载荷当前的工况并不属于任何正常的工况，所述有效载荷当前时刻可能出现或者即将出现故障的结论；最后保存所述异常状态向量，有效载荷监测过程结束。(2) Compare the distance between the weighted state vector and each cluster to obtain the shortest specific cluster from the weighted state vector, set the shortest distance as D; then compare D with the preset distance threshold R, if D≤R , then it is concluded that the payload is running normally, the load condition corresponding to the specific cluster is the current condition of the payload, output the current condition of the payload, and end the process; if D >R, it indicates that the weighted state vector does not belong to any known historical cluster, and the weighted state vector is an abnormal state vector; it is further concluded that the current working condition of the payload does not belong to any normal working condition , the conclusion that the payload may or will fail at the current moment; finally save the abnormal state vector, and the payload monitoring process ends.

(四)基于故障树算法的故障检测和隔离(4) Fault detection and isolation based on fault tree algorithm

在得出所述加权状态向量为异常状态向量的结论之后，异常状态向量作为故障检测和隔离过程的输入，如图8所示，为故障检测和隔离过程的流程示意图；首先判断是否为虚警，如果确定为虚警，则直接结束流程；如果不是，则继续判断该异常状态向量是否存在异常监测因子值，如果不存在，则说明有效载荷当前工况异常，可能处在故障临界点，即载荷很有可能在不久的时间内发生故障；如果存在，则查找到该异常监测因子所对应的有效载荷测试点，结合故障诊断树(Fault Diagnosis Tree)方法，对有效载荷故障进行检测和隔离，最终输出发生故障的部件及其故障模式信息。After drawing the conclusion that the weighted state vector is an abnormal state vector, the abnormal state vector is used as the input of the fault detection and isolation process, as shown in Figure 8, which is a schematic flow chart of the fault detection and isolation process; first judge whether it is a false alarm , if it is determined to be a false alarm, then end the process directly; if not, continue to judge whether there is an abnormal monitoring factor value in the abnormal state vector, if it does not exist, it means that the current working condition of the payload is abnormal, and it may be at the critical point of failure, that is The load is likely to fail in a short period of time; if it exists, find the payload test point corresponding to the abnormal monitoring factor, and combine the Fault Diagnosis Tree (Fault Diagnosis Tree) method to detect and isolate the payload fault. Finally, the failed components and their failure mode information are output.

(五)载荷健康知识学习过程(5) Learning process of load health knowledge

经过步骤五，对异常数据的监测和诊断已经结束，本步骤目的在于完善监测算法。After step five, the monitoring and diagnosis of abnormal data has ended, and the purpose of this step is to improve the monitoring algorithm.

数据量的增加对聚类结果产生了两方面的影响：The increase in the amount of data has two effects on the clustering results:

1)生成了新的聚类；1) A new cluster is generated;

2)原来的聚类包含的数据量增加。2) The amount of data contained in the original cluster increases.

因此，为了提高聚类结果的准确性，需要对聚类结果进行周期性的更新。具体为：Therefore, in order to improve the accuracy of the clustering results, it is necessary to periodically update the clustering results. Specifically:

1)对原有的聚，由于数据的增加，其聚类中心不可避免发生变化，对其新的聚类中心，按照公式(5)方法进行更新。1) For the original cluster, due to the increase of data, the cluster center will inevitably change, and the new cluster center should be updated according to the formula (5).

其中，c_i表示第i次更新后的聚类中心，A_i表示第i次更新后的聚类，x表示A_i的元素，|A_i|表示聚类A_i的元素数量。Among them, c_i represents the cluster center after the i-th update, A_i represents the cluster after the i-th update, x represents the elements of A_i , and |A_i | represents the number of elements in the cluster A_i .

2)对于新形成的类，则按照公式(2)给定的方法完成对新类中心与聚类半径的确定。2) For the newly formed class, complete the determination of the new class center and cluster radius according to the method given in formula (2).

实施例Example

本实施例参考空间站某冷却系统修改后的案例，该冷却系统包括三条水支路，每条支路上均有相应的载荷，载荷前后均有温度测点，每条支流流量阀后也对应有水流量测点。根据以往经验数据，流量数据的工况分类规则已经确定，每一支路的流量数据按照大小分为{高，中，低}三类；而温度数据的分类规则未知。This embodiment refers to the modified case of a certain cooling system in the space station. The cooling system includes three water branches, each branch has a corresponding load, and there are temperature measuring points before and after the load. There is also a corresponding water flow behind the flow valve of each branch. measurement point. According to past experience data, the working condition classification rules of flow data have been determined, and the flow data of each branch is divided into three categories according to the size {high, medium, low}; while the classification rules of temperature data are unknown.

如图9所示，为空间站某冷却系统的测点分布示意图，假定温度传感器为T₁～T₆，流量传感器为T₇～T₉。下面利用本发明给出的基于数据驱动的空间站有效载荷健康监测方法，通过对传感器T₁～T₉下行测试数据进行处理分析，完成对该冷却系统的健康状态的监测。步骤如下：As shown in Figure 9, it is a schematic diagram of the distribution of measuring points of a cooling system in a space station, assuming that the temperature sensors are T₁ ~ T₆ , and the flow sensors are T₇ ~ T₉ . Next, using the data-driven space station payload health monitoring method provided by the present invention, the health monitoring of the cooling system is completed by processing and analyzing the downlink test data of the sensors T₁ -T₉ . Proceed as follows:

步骤一：有效载荷历史测试数据预处理Step 1: Preprocessing payload history test data

按照数据处理要求，对历史测试数据进行坏数据剔除、归一化处理，假定元素重要度相同。由于载荷数据为正，归一化处理为：According to the data processing requirements, the bad data is eliminated and normalized for the historical test data, assuming that the elements have the same importance. Since the load data is positive, the normalization process is:

${x x}^{' '} = = \frac{x x - - {x x}_{min min}}{{x x}_{max max} - - {x x}_{min min}}$

其中，x是原数据，x'是归一化后数据，现在以支路1出口水温(T4)数据为例，归一化处理结果如图10所示。Among them, x is the original data, and x' is the normalized data. Now take the outlet water temperature (T4) data of branch 1 as an example, and the normalized processing result is shown in FIG. 10 .

则得到向量T为：Then the vector T is obtained as:

T＝{T₁,T₂,T₃,T₄,T₅,T₆,T₇,T₈,T₉}T＝{T₁ ,T₂ ,T₃ ,T₄ ,T₅ ,T₆ ,T₇ ,T₈ ,T₉ }

其中，T₁,T₂,T₃,T₄,T₅,T₆为水温测点，T₇,T₈,T₉为水流量测点，测点分布表如下表所示：Among them, T₁ , T₂ , T₃ , T₄ , T₅ , T₆ are water temperature measuring points, T₇ , T₈ , T₉ are water flow measuring points, and the distribution table of measuring points is shown in the following table:

测试点编号Test point number变量variable传感器sensor路数road number备注RemarkT1T1载荷1入口水温Load 1 inlet water temperature11T2T2载荷2入口水温Load 2 inlet water temperature11T3T3载荷3入口水温Load 3 inlet water temperature11T4T4载荷1出口水温Load 1 outlet water temperature11T5T5载荷2出口水温Load 2 outlet water temperature11T6T6载荷3出口水温Load 3 outlet water temperature11T7T7载荷1水分支流量Load 1 water branch flow11T8T8载荷2水分支流量Load 2 water branch flow11T9T9载荷3水分支流量Load 3 water branch flow11

之后，将待测数据分为两类：第一类用于方法学习，即完成工况的分类，作为学习数据用于方法的第二步；第二类用于验证，作为实时下行数据用于方法的第四步。After that, the data to be tested is divided into two categories: the first category is used for method learning, that is, the classification of working conditions is completed, and used as learning data for the second step of the method; the second category is used for verification, as real-time downlink data for The fourth step of the method.

步骤二：载荷工况分类Step 2: Classification of Load Cases

案例中流量数据T₇～T₉分类规则已知，而温度数据T₁～T₆分类规则未知，属于分类规则部分已知的情况。按照上文给定的方法，我们选取流量参数来进行一次工况分类，流量调节阀调节流量分三个状态：高、中、低，则三条水支流组合的一次工况分类共有：3³＝27类。In the case, the classification rules of flow data T₇ ～ T₉ are known, but the classification rules of temperature data T₁ ～ T₆ are unknown, which belongs to the case where the classification rules are partially known. According to the method given above, we select the flow parameters to classify the primary working conditions. The flow regulating valve regulates the flow in three states: high, medium, and low. The primary working condition classification of the combination of three water tributaries is: 3³ = 27 classes.

对一次分类基础上，对每一类数据按照本发明给定的方法进行更为精确的划分，即二次聚类。这里，以一次分类的工况1数据为例进行说明，工况分类流程图如图11所示。On the basis of primary classification, each type of data is more accurately divided according to the method given in the present invention, that is, secondary clustering. Here, the working condition 1 data classified once is taken as an example for illustration, and the flow chart of working condition classification is shown in FIG. 11 .

借助Matlab中距离函数pdist()与分类函数linkage()函数，对第一类部分数据进行处理，得到了4组基础类，参照公式(3)、(4)确定每一类的聚类中心与聚类半径。With the help of the distance function pdist() and classification function linkage() in Matlab, part of the data of the first category is processed, and four groups of basic categories are obtained. Refer to formulas (3) and (4) to determine the clustering center and Cluster radius.

将第一类的剩余数据按照图6方法进行二次聚类，例如某测试数据t＝{t₁,t₂,...,t₉}，与所有聚类中心作欧式距离，即：Perform secondary clustering on the remaining data of the first category according to the method in Figure 6, for example, a certain test data t={t₁ ,t₂ ,...,t₉ }, and make a Euclidean distance from all cluster centers, namely:

${d d}_{k k} = = d d (({x x}_{k k},, t t)) = = \sqrt{{Σ Σ}_{i i = = 11}^{99} {(({x x}_{k k,, i i} - - {t t}_{i i}))}^{22}},, k k = = 1,2,3,4 1,2,3,4$

其中，x_k,k＝1,2,3,4表示初始类的中心坐标；Among them, x_k , k=1,2,3,4 represent the center coordinates of the initial class;

取最小的距离d_m＝d_min＝{d₁,d₂,d₃,d₄}，比较d_m与其对应的聚类半径R_m，如果d_m≤R_m，则该数据属于该基础类；如果d_m＞R_m，则数据t不属于现有的任何类，按照公式(2)建立一个新类存入数据库，工况分类流程如图11所示。Take the smallest distance d_m ＝d_min ＝{d₁ ,d₂ ,d₃ ,d₄ }, compare d_m with its corresponding clustering radius R_m , if d_m ≤R_m , then the data belongs to the basic class ; If d_m > R_m , then the data t does not belong to any existing class, and a new class is established according to formula (2) and stored in the database. The working condition classification process is shown in Figure 11.

步骤三：有效载荷实时下行数据预处理Step 3: Payload real-time downlink data preprocessing

有效载荷实时下行数据预处理参考步骤一及图2。Refer to Step 1 and Figure 2 for the preprocessing of payload real-time downlink data.

步骤四：有效载荷实时下行数据的监测Step 4: Monitoring of payload real-time downlink data

经过步骤二，得到了有效载荷正常模式下的工况分类。利用这些工况对下行数据进行实施监测，如果出现不属于任何工况的数据，说明载荷出现了新的工作状况，并把该数据作为异常数据存入数据库。利用步骤二生成的聚类对第二类数据进行监测，如图12、图13所示，其中，图12是聚类方法对待测向量数据监测时，1300以前的测试数据显示图；图13是聚类方法对待测向量数据监测时，1000以后的测试数据显示图；After the second step, the classification of the working conditions under the normal mode of the payload is obtained. Use these working conditions to monitor the downlink data. If there is data that does not belong to any working condition, it means that the load has a new working condition, and the data will be stored in the database as abnormal data. Use the clustering generated in step 2 to monitor the second type of data, as shown in Figure 12 and Figure 13, wherein Figure 12 is a display diagram of the test data before 1300 when the clustering method is used to monitor the vector data to be tested; Figure 13 is When the clustering method is used to monitor the vector data to be tested, the test data after 1000 will be displayed in the graph;

上图得出，从第1300个向量数据开始，出现了偏离正常的类(超过1的类为异常类)，这预示着载荷正在或者即将发生故障。From the above figure, it can be seen that from the 1300th vector data, there are classes that deviate from normal (classes with more than 1 are abnormal classes), which indicates that the load is or is about to fail.

步骤五：有效载荷故障检测与隔离Step 5: Payload Fault Detection and Isolation

对步骤四输出的异常数据，首先判断其是否是虚警(通过多次测试判断是否为虚警)。如果确定是异常数据，按照图8故障检测流程，首先判断该异常数据各元素值是否在正常范围，如果有元素值超出界限，说明该元素对应的传感器传出异常数据，按照故障诊断树方法，确定故障发生的位置(载荷)。For the abnormal data output in step four, first judge whether it is a false alarm (judging whether it is a false alarm through multiple tests). If it is determined to be abnormal data, according to the fault detection process in Figure 8, first judge whether the value of each element of the abnormal data is within the normal range, if any element value exceeds the limit, it means that the sensor corresponding to the element transmits abnormal data, according to the fault diagnosis tree method, Determine where the fault occurred (load).

步骤六：载荷健康知识学习过程Step 6: Load health knowledge learning process

每隔固定的周期，载荷健康知识进行自我学习，以提高算法的准确性。主要表现在两点：At regular intervals, load health knowledge is self-learned to improve the accuracy of the algorithm. Mainly manifested in two points:

1)对已有类进行新的聚类，按照公式(5)重新确定其聚类中心。1) Carry out new clustering for existing classes, and re-determine their cluster centers according to formula (5).

2)对于新形成的类，则按照步骤二，将其作为已知工况重新确定其聚类半径与聚类中心，完成对新类的更新。2) For the newly formed cluster, according to step 2, take it as a known working condition to re-determine its cluster radius and cluster center, and complete the update of the new cluster.

综上所述，本发明提供的基于数据驱动算法的空间站有效载荷健康监测方法，通过对空间站有效载荷实时下行测试数据进行分析，实现对载荷健康状态的实时监测，可支持载荷的故障检测和定位，以及一定程度的预测。另外，在载荷设计和研制阶段对其工作模式和失效模式进行了充分的分析，在此基础上开展测试性设计，目的是使得采集与传送的数据有利于开展故障诊断及健康管理。该方法改变了过去载荷下行测试数据只做显示和超限判断的做法，通过历史数据的机器学习形成系统健康知识库，基于离群点的距离值计算发现载荷的异常状态，能在真实故障发生前给出提示。In summary, the data-driven algorithm-based space station payload health monitoring method provided by the present invention can realize real-time monitoring of the load health status by analyzing the real-time downlink test data of the space station payload, and can support load fault detection and positioning , and a certain degree of prediction. In addition, the working mode and failure mode were fully analyzed in the load design and development stage, and the test design was carried out on this basis, with the purpose of making the data collected and transmitted conducive to fault diagnosis and health management. This method has changed the previous practice of only displaying and overrunning the load downlink test data. The system health knowledge base is formed through machine learning of historical data, and the abnormal state of the load is found based on the calculation of the distance value of the outlier point, which can be detected when a real fault occurs. Give a hint before.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视本发明的保护范围。The above is only a preferred embodiment of the present invention, it should be pointed out that for those skilled in the art, without departing from the principle of the present invention, some improvements and modifications can also be made, and these improvements and modifications are also It should be regarded as the protection scope of the present invention.

Claims

Translated fromChinese

1.一种基于数据驱动算法的空间站有效载荷健康监测方法，其特征在于，包括以下步骤：1. A method for monitoring the health of a space station payload based on a data-driven algorithm, comprising the following steps:

S11，比较所述加权状态向量与S7得到的各个聚类的距离，得到距所述加权状态向量最短的特定聚类，设最短距离为D；然后比较D与预设距离临界值R，如果D≤R，则得出所述有效载荷正常运行的结论，所述特定聚类所对应的载荷工况即为所述有效载荷当前的工况，输出所述有效载荷当前的工况，结束流程；如果D>R，则表明所述加权状态向量不属于任何已知的历史聚类，所述加权状态向量为异常状态向量；则进一步得出所述有效载荷当前的工况并不属于任何正常的工况，所述有效载荷当前时刻可能出现或者即将出现故障的结论；最后保存所述异常状态向量，结束流程；S11, comparing the weighted state vector with the distances of each cluster obtained in S7 to obtain the shortest specific cluster from the weighted state vector, set the shortest distance as D; then compare D with the preset distance threshold R, if D ≤R, then draw the conclusion that the payload is running normally, the load condition corresponding to the specific cluster is the current condition of the payload, output the current condition of the payload, and end the process; If D>R, it indicates that the weighted state vector does not belong to any known historical clustering, and the weighted state vector is an abnormal state vector; then it is further concluded that the current working condition of the payload does not belong to any normal Working conditions, the conclusion that the payload may or will fail at the current moment; finally save the abnormal state vector and end the process;

其中，S7中，对所述n个训练样本进行聚类，得到与不同载荷正常工况分别对应的多个聚类，具体为：Wherein, in S7, the n training samples are clustered to obtain a plurality of clusters respectively corresponding to different load normal working conditions, specifically:

聚类方法为：The clustering method is:

2.根据权利要求1所述的基于数据驱动算法的空间站有效载荷健康监测方法，其特征在于，S2中，对所述历史下行测试数据进行预处理，具体为：2. the space station payload health monitoring method based on data-driven algorithm according to claim 1, is characterized in that, in S2, described historical downlink test data is carried out preprocessing, specifically:

3.根据权利要求1所述的基于数据驱动算法的空间站有效载荷健康监测方法，其特征在于，S7.3中的第三步中，对得到的所有聚类进行优化，具体为：3. the space station payload health monitoring method based on data-driven algorithm according to claim 1, is characterized in that, in the 3rd step in S7.3, optimizes all clusters obtained, specifically:

如果则保留该扩展类。if Then keep the extension class.

4.根据权利要求3所述的基于数据驱动算法的空间站有效载荷健康监测方法，其特征在于，S7.2中，将该训练样本作为一个新的聚类而存入数据库中，具体为：设该训练样本为x₀＝{a₁,a₂,.....,a_m}；4. the space station payload health monitoring method based on data-driven algorithm according to claim 3, is characterized in that, in S7.2, this training sample is stored in the database as a new clustering, specifically: set The training sample is x₀ =_{ a₁ ,a₂ ,...,am };

C_new＝{x,D(x,x₀)≤d} (2)C_new ＝{x,D(x,x₀ )≤d} (2)

5.根据权利要求1所述的基于数据驱动算法的空间站有效载荷健康监测方法，其特征在于，S11之后，还包括：5. the space station payload health monitoring method based on data-driven algorithm according to claim 1, is characterized in that, after S11, also comprises:

{c c}_{i i} = = \frac{11}{| | {A A}_{i i} | |} {Σ Σ}_{x x &Element; &Element; {A A}_{i i}} x x,, i i = = 11,, 22,, ... ... - - - - - - ((55))

6.根据权利要求1所述的基于数据驱动算法的空间站有效载荷健康监测方法，其特征在于，S11之后，还包括：6. the space station payload health monitoring method based on data-driven algorithm according to claim 1, is characterized in that, after S11, also comprises:

7.根据权利要求6所述的基于数据驱动算法的空间站有效载荷健康监测方法，其特征在于，S12中，通过多次测试判断是否为虚警。7. The method for monitoring the health of a space station payload based on a data-driven algorithm according to claim 6, wherein in S12, multiple tests are used to determine whether it is a false alarm.