WO2019104618A1

Movatterモバイル変換

Info

Publication number: WO2019104618A1
Application number: PCT/CN2017/113877
Authority: WO
Inventors: 阳光; 吴峰
Original assignee: Shenzhen A&E Intelligent Technology Institute Co Ltd
Current assignee: Shenzhen A&E Intelligent Technology Institute Co Ltd
Priority date: 2017-11-30
Filing date: 2017-11-30
Publication date: 2019-06-06
Anticipated expiration: 2020-05-30
Also published as: CN109416748B; CN109416748A

Abstract

An SVM-based sample data updating method. The method comprises: obtaining sample data and dividing, in a feature space, the sample data into classification hyperplanes of at least two classification sets (S101); separately determining the largest convex hull sample in each classification set, wherein the largest convex hull sample is used for defining a first convex hull space in the feature space, and the first convex hull space is capable of accommodating sample data other than the largest convex hull sample in the classification set (S102); selectively updating the largest convex hull sample according to the position relationship between data points to be added or deleted and the largest convex hull sample (S103); and selectively updating the classification hyperplanes according to the update status of the largest convex hull sample (S104). By selectively updating the largest convex hull sample according to the position relationship between the data points to be added or deleted and the largest convex hull sample and selectively updating the classification hyperplanes, the computation efficiency for sample data classification can be improved.

Description

Translated fromChinese

基于SVM的样本数据更新方法、分类系统和存储装置Sample data updating method, classification system and storage device based on SVM

【技术领域】[Technical Field]

本发明主要涉及计算机技术领域，特别是涉及一种基于SVM的样本数据更新方法、分类系统和存储装置。The present invention mainly relates to the field of computer technology, and in particular, to a sample data updating method, a classification system, and a storage device based on an SVM.

【背景技术】【Background technique】

SVM（Support Vector Machine）是指支持向量机，通常用来进行模式识别、分类以及回归分析。运用SVM算法对样本做分类时，若每个样本数据有n个特征，则可认为样本处于n维特征空间，从而利用n维超平面（hyperplane）将样本划分为两类。其中，n维超平面距离两类样本中最近的数据是等距的。当向一类样本中新增数据或者将一类样本中已有的部分数据删除，即当样本数据变更时，根据SVM算法，通常需要根据更新后的所有数据重新计算超平面。SVM通常用于图像识别和机器人深度学习等领域。SVM (Support VectorMachine) refers to support vector machines, which are commonly used for pattern recognition, classification, and regression analysis. When using the SVM algorithm to classify samples, if each sample data has n features, the sample can be considered to be in the n-dimensional feature space, and the samples are divided into two categories by using the n-dimensional hyperplane. Among them, the n-dimensional hyperplane distance is the closest of the two types of samples. When adding data to a class of samples or deleting some of the data already in a class of samples, that is, when the sample data is changed, according to the SVM algorithm, it is usually necessary to recalculate the hyperplane based on all the updated data. SVM is commonly used in areas such as image recognition and robot deep learning.

本发明的发明人在对现有技术的研究过程中发现，对于现有的SVM样本数据更新方法，每次样本更新时都需要基于更新后的所有数据重新计算超平面，导致计算效率低下，难以满足图像识别或者机器人深度学习对计算速度的要求。The inventor of the present invention found in the research process of the prior art that for the existing SVM sample data update method, each time the sample is updated, it is necessary to recalculate the hyperplane based on all the updated data, resulting in low computational efficiency and difficulty in calculation. Meet the requirements of image recognition or robot deep learning for calculation speed.

【发明内容】 [Summary of the Invention]

本发明的提供一种基于SVM的样本数据更新方法、系统和存储装置，用于解决现有技术中SVM样本数据分类计算效率低下的问题。The invention provides an SVM-based sample data updating method, system and storage device for solving the problem of low efficiency of SVM sample data classification calculation in the prior art.

为解决上述技术问题，本发明采用的一技术方案是提供一种基于SVM的样本数据更新方法，该方法包括：获取样本数据以及在特征空间内将所述样本数据划分成至少两个分类集合的分类超平面；分别确定每一所述分类集合中的最大凸包样本，其中所述最大凸包样本用于在所述特征空间内定义第一凸包空间，所述第一凸包空间能够容纳所述分类集合中除所述最大凸包样本以外的其他样本数据；根据待增加或待删除的数据点与所述最大凸包样本的位置关系，对所述最大凸包样本进行选择性更新；以及根据所述最大凸包样本的更新情况，对所述分类超平面进行选择性更新。In order to solve the above technical problem, a technical solution adopted by the present invention is to provide an SVM-based sample data updating method, the method comprising: acquiring sample data and dividing the sample data into at least two classification sets in a feature space. Classifying a hyperplane; respectively determining a maximum convex hull sample in each of the classification sets, wherein the maximum convex hull sample is used to define a first convex hull space within the feature space, the first convex hull space being capable of accommodating And other sample data in the classification set except the maximum convex hull sample; and selectively updating the maximum convex hull sample according to a positional relationship between the data point to be added or to be deleted and the maximum convex hull sample; And selectively updating the classified hyperplane according to the update condition of the maximum convex hull sample.

为解决上述技术问题，本发明采用的另一技术方案是提供一种基于SVM的样本数据分类系统，该系统包括电耦合的处理器和存储器，其中所述存储器用于存储程序指令，所述处理器可加载所述程序指令并执行上述样本数据更新方法。In order to solve the above technical problem, another technical solution adopted by the present invention is to provide an SVM-based sample data classification system, which includes an electrically coupled processor and a memory, wherein the memory is used to store program instructions, and the processing The program can load the program instructions and execute the sample data update method described above.

为解决上述技术问题，本发明采用的又一技术方案是提供一种具有存储功能的装置，用于存储程序指令，所述程序指令可被执行以实现上述样本数据更新方法。In order to solve the above technical problem, another technical solution adopted by the present invention is to provide a device having a storage function for storing program instructions, which can be executed to implement the sample data update method described above.

本发明实施例的有益效果是：通过根据待增加或待删除的数据点与最大凸包样本的位置关系对最大凸包样本进行选择性更新，并根据最大凸包样本的更新情况对分类超平面进行选择性更新，可以提高样本数据分类的运算效率。The beneficial effects of the embodiment of the present invention are: selectively updating the maximum convex hull sample according to the positional relationship between the data point to be added or to be deleted and the maximum convex hull sample, and classifying the hyperplane according to the update condition of the maximum convex hull sample. Selective updates can improve the efficiency of sample data classification.

【附图说明】 [Description of the Drawings]

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图：In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained according to these drawings without any creative work:

图1是本发明基于SVM的样本数据更新方法一实施例的流程示意图。1 is a schematic flow chart of an embodiment of a method for updating sample data based on SVM according to the present invention.

图2是本发明基于SVM的样本数据更新方法另一实施例的流程示意图。2 is a schematic flow chart of another embodiment of a method for updating sample data based on SVM according to the present invention.

图3是本发明最大凸包更新方法一实施例的流程示意图。FIG. 3 is a schematic flow chart of an embodiment of a method for updating a maximum convex hull according to the present invention.

图4示出了在二维特征空间上利用超平面对样本数据进行分类的示意图。FIG. 4 shows a schematic diagram of classifying sample data using a hyperplane on a two-dimensional feature space.

图5示出了对如图4所述的样本数据新增数据点的示意图。Figure 5 shows a schematic diagram of the addition of data points to the sample data as described in Figure 4.

图6示出了图5中的样本数据及分类超平面经过更新后的示意图。FIG. 6 shows a schematic diagram of the sample data and the classification hyperplane in FIG. 5 after being updated.

图7示出了对如图4所述的样本数据删除数据点的示意图。Figure 7 shows a schematic diagram of the deletion of data points for the sample data as described in Figure 4.

图8示出了图7中的样本数据及分类超平面经过更新后的示意图。FIG. 8 is a schematic diagram showing the sample data and the classification hyperplane in FIG. 7 after being updated.

图9示出了对如图4所述的样本数据删除数据点的另一例子的示意图。FIG. 9 shows a schematic diagram of another example of deleting sample data points as described in FIG.

图10示出了图9中的样本数据及分类超平面经过更新后的示意图。FIG. 10 is a schematic diagram showing the sample data and the classification hyperplane in FIG. 9 updated.

图11示出了本发明基于SVM的样本数据分类系统一实施例的结构示意图。FIG. 11 is a block diagram showing an embodiment of an SVM-based sample data classification system according to the present invention.

【具体实施方式】【Detailed ways】

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

请参阅图1，图1是本发明基于SVM的样本数据更新方法一实施例的流程示意图。如图1所示，本发明基于SVM的样本数据更新方法可包括以下步骤：Please refer to FIG. 1. FIG. 1 is a schematic flowchart diagram of an embodiment of an SVM-based sample data updating method according to the present invention. As shown in FIG. 1, the SVM-based sample data update method of the present invention may include the following steps:

S101：获取样本数据以及在特征空间内将该样本数据划分成至少两个分类集合的分类超平面。S101: Acquire sample data and divide the sample data into a classification hyperplane of at least two classification sets in a feature space.

样本数据即待分类的数据的总和，在不同的应用场景中，样本数据对应不同类型的数据，例如，在图像识别技术中，摄像头摄取的图像数据化得到的样本数据，或者在自动驾驶技术中，外部车辆或者控制系统发来的数据信息等。在后续的处理中，需要将获取的样本数据分类（例如，前景和后景，人和车辆等），基于SVM算法，通常可以利用特征空间内的分类超平面对这些获取的样本数据进行处理。特征空间和分类超平面均为数学上的概念，而不是实际存在的空间和平面。例如，一个数据点如果可以用n个特征描述，则该数据点可以作为一个n维向量x=(x1,x2,……,xn)，n维特征空间可以被定义以容纳这些数据点，而在数据可以分类的情况下，可以在n维特征空间中找到一个超平面（例如，在二维特征空间中，可以是一条直线、曲线、圆形等，在三维特征空间中，可以是平面、曲面等），使所有数据点在超平面两侧分为两类，且两类数据中离超平面最近的数据点到超平面的距离相同。可以理解，若可以同时定义多个分类超平面，则可以将样本数据划分为超过两个的分类集合。为方便解释说明，下文均以二维特征空间中存在一个超平面的情况为例进行说明。The sample data is the sum of the data to be classified. In different application scenarios, the sample data corresponds to different types of data, for example, in image recognition technology, the sample data obtained by image acquisition by the camera, or in the automatic driving technology. , data information from external vehicles or control systems, etc. In the subsequent processing, the acquired sample data needs to be classified (for example, foreground and background, people and vehicles, etc.), and based on the SVM algorithm, the acquired sample data can usually be processed by using the classification hyperplane in the feature space. Both the feature space and the classification hyperplane are mathematical concepts, not actual spaces and planes. For example, if a data point can be described by n features, the data point can be used as an n-dimensional vector x=(x1, x2, ..., xn), and the n-dimensional feature space can be defined to accommodate these data points. In the case that the data can be classified, a hyperplane can be found in the n-dimensional feature space (for example, in a two-dimensional feature space, it can be a straight line, a curve, a circle, etc., in a three-dimensional feature space, it can be a plane, Surfaces, etc., so that all data points are divided into two categories on both sides of the hyperplane, and the distance between the data points closest to the hyperplane in the two types of data to the hyperplane is the same. It can be understood that if multiple classification hyperplanes can be defined at the same time, the sample data can be divided into more than two classification sets. For convenience of explanation, the following describes an example in which a hyperplane exists in a two-dimensional feature space.

在步骤S101中，获取样本数据，以及在对应维度的特征空间内将该样本数据划分为至少两个分类集合的分类超平面。例如，如图4所示，首先获取图4中所有的数据点（三角形或圆形），并获取将这些数据点划分为A类和B类集合的分类超平面HP。在一些实施例中，这些数据点以及分类超平面HP可以是已计算、待更新的，在另一些实施例中，这些数据点也可以是在获取数据的最初阶段获取的，而分类超平面HP可以根据现有的SVM算法计算得到。In step S101, sample data is acquired, and the sample data is divided into classification hyperplanes of at least two classification sets in a feature space of a corresponding dimension. For example, as shown in FIG. 4, all of the data points (triangles or circles) in FIG. 4 are first acquired, and a classification hyperplane HP that divides these data points into a class A and a class B set is obtained. In some embodiments, the data points and the classification hyperplane HP may be calculated and to be updated, and in other embodiments, the data points may also be acquired during the initial phase of acquiring data, and the classification hyperplane HP It can be calculated according to the existing SVM algorithm.

S102：分别确定每一类集合中的最大凸包样本，其中最大凸包样本用于在特征空间内定义第一凸包空间，第一凸包空间能够容纳分类集合中除最大凸包样本以外的其他样本数据。S102: Determine a maximum convex hull sample in each class set, where a maximum convex hull sample is used to define a first convex hull space in the feature space, and the first convex hull space can accommodate the classification convex set except the largest convex hull sample. Other sample data.

仍然以图4为例，在步骤S102中，对A类集合和B类集合分别确定最大凸包样本CA和CB，CA和CB分别在特征空间内定义第一凸包空间，可以从图中直观的看出，最大凸包样本即该类集合中“最外侧”的数据点的集合，将这些数据点相连后其内部的空间即为第一凸包空间。第一凸包空间能够容纳分类集合中除最大凸包样本（最大凸包的边界）以外的其他数据点。Still taking FIG. 4 as an example, in step S102, the maximum convex hull samples CA and CB are respectively determined for the class A set and the class B set, and the CA and CB respectively define the first convex hull space in the feature space, which can be intuitively seen from the figure. It can be seen that the largest convex hull sample is the set of "outermost" data points in the set, and the internal space of the data points is the first convex hull space. The first convex hull space can accommodate other data points in the classification set except the largest convex hull sample (the boundary of the largest convex hull).

S103：根据待增加或待删除的数据点与最大凸包样本的位置关系，对最大凸包样本进行选择性更新。S103: Selectively update the maximum convex hull sample according to the positional relationship between the data point to be added or to be deleted and the maximum convex hull sample.

在步骤S103中，当需要向样本中新增数据点或者删除数据点时，判断待增加或待删除的数据点与最大凸包样本的位置关系。例如，若对A类集合（根据超平面HP判断）新增或删除数据点，则比较待增加或待删除的数据点与最大凸包样本CA进行比较。根据比较的结果，对最大凸包样本CA进行选择性更新。例如，若待增加或待删除的数据点在CA的内部，则不会对最大凸包样本CA产生影响，无需对最大凸包样本CA进行更新；反之，若待增加的数据点在CA的外侧，或待删除的数据点属于最大凸包样本CA，则需要对最大凸包样本CA进行更新。In step S103, when it is necessary to add a data point to the sample or delete the data point, the positional relationship between the data point to be added or to be deleted and the maximum convex hull sample is determined. For example, if a data point is added or deleted for the class A set (determined according to the hyperplane HP), the data points to be added or to be deleted are compared with the largest convex hull sample CA. According to the result of the comparison, the maximum convex hull sample CA is selectively updated. For example, if the data point to be added or to be deleted is inside the CA, the maximum convex hull sample CA is not affected, and the maximum convex hull sample CA is not updated; otherwise, if the data point to be added is outside the CA If the data point to be deleted belongs to the largest convex hull sample CA, the largest convex hull sample CA needs to be updated.

S104：根据最大凸包样本的更新情况，对分类超平面进行选择性更新。S104: Selectively update the classified hyperplane according to the update condition of the maximum convex hull sample.

在步骤S104中，若步骤S103中未对最大凸包样本进行更新，则同样无需更新分类超平面。反之，若步骤S103中对最大凸包样本进行了更新，则可以重新根据更新后的最大凸包样本计算分类超平面。例如，若步骤S103中对最大凸包样本CA进行了更新，则可以根据A类集合更新后的最大凸包样本CA’（图未示）以及B类集合未变化的最大凸包样本CB重新计算分类超平面。In step S104, if the maximum convex hull sample is not updated in step S103, it is also unnecessary to update the classification hyperplane. On the other hand, if the maximum convex hull sample is updated in step S103, the classified hyperplane can be calculated again according to the updated maximum convex hull sample. For example, if the maximum convex hull sample CA is updated in step S103, it may be recalculated according to the maximum convex hull sample CA' (not shown) updated by the class A set and the largest convex hull sample CB whose class B set is unchanged. Classification hyperplane.

在一些实施例中，还可以进一步检测更新后的最大凸包样本中离分类超平面最接近的样本数据是否发生变化，若发生变化，则重新计算并更新分类超平面，否则，不对分类超平面进行更新。In some embodiments, it is further possible to further detect whether the sample data closest to the classified hyperplane in the updated maximum convex hull sample changes, and if the change occurs, recalculate and update the classified hyperplane; otherwise, the classification hyperplane is not Update.

本发明通过根据待增加或待删除的数据点与最大凸包样本的位置关系对最大凸包样本进行选择性更新，并根据最大凸包样本的更新情况对分类超平面进行选择性更新，避免了每次样本数据更新时均需要根据所有数据重新计算分类超平面，从而可以提高样本数据分类的运算效率。The invention selectively updates the maximum convex hull sample according to the positional relationship between the data point to be added or to be deleted and the maximum convex hull sample, and selectively updates the classified hyperplane according to the update condition of the maximum convex hull sample, thereby avoiding Each time the sample data is updated, the classification hyperplane needs to be recalculated based on all the data, so that the operation efficiency of the sample data classification can be improved.

请参阅图2，图2是本发明基于SVM的样本数据更新方法另一实施例的流程示意图。在本实施例中，基于SVM的样本数据更新方法可包括以下步骤：Please refer to FIG. 2. FIG. 2 is a schematic flowchart diagram of another embodiment of a method for updating sample data based on SVM according to the present invention. In this embodiment, the SVM-based sample data update method may include the following steps:

S201：获取样本数据以及在特征空间将该样本数据划分成至少两个分类集合的分类超平面。S201: Acquire sample data and divide the sample data into a classification hyperplane of at least two classification sets in a feature space.

S202：分别确定每一分类集合中的最大凸包样本。S202: Determine a maximum convex hull sample in each classification set separately.

步骤S201和S202与步骤S101和步骤S102类似，在此不再赘述。Steps S201 and S202 are similar to steps S101 and S102, and are not described herein again.

S203：判断待增加的数据点是否落在第一凸包空间内。S203: Determine whether the data point to be added falls within the first convex hull space.

步骤S203中，当需要向样本中新增数据点时，判断待增加的数据点是否落在最大凸包样本所定义的第一凸包空间内。若待增加的数据点落在第一凸包空间内，则执行步骤S205；反之，若未落在第一凸包空间内，则执行步骤S204。以图5所示的例子为例，数据点P11和P12分别为两种情况下待增加的数据点。首先将P11或P12与分类超平面HP比较，可知其属于A类集合，进而与A类集合的最大凸包样本CA进行比较。可以发现，数据点P11落在第一凸包空间内，因此，新增数据点为P11时，执行步骤S205；而数据点P12未落在第一凸包空间内，因此，新增数据点为P12时，执行步骤S204。In step S203, when it is necessary to add a data point to the sample, it is determined whether the data point to be added falls within the first convex hull space defined by the maximum convex hull sample. If the data point to be added falls within the first convex hull space, step S205 is performed; otherwise, if it does not fall within the first convex hull space, step S204 is performed. Taking the example shown in FIG. 5 as an example, the data points P11 and P12 are respectively data points to be added in two cases. First, comparing P11 or P12 with the classified hyperplane HP, it is known that it belongs to the class A set, and then compared with the largest convex hull sample CA of the class A set. It can be found that the data point P11 falls within the first convex hull space. Therefore, when the new data point is P11, step S205 is performed; and the data point P12 does not fall within the first convex hull space. Therefore, the new data point is At P12, step S204 is performed.

S204：将待增加的数据点添加至最大凸包样本。S204: Add the data point to be added to the maximum convex hull sample.

当待增加的数据点未落在第一凸包空间内时，需要更新最大凸包样本。这种情况下，将待增加的数据点添加至最大凸包样本。例如，在图5的例子中，需要将新增数据点P12添加至最大凸包样本CA，得到图6中所示更新后分类集合A1的最大凸包样本CA1。这样一来，就可以利用更新后的最大凸包样本CA1重新计算分类超平面，将原分类超平面HP更新为HP1。需要注意，待增加的数据点也有可能正好落在最大凸包样本所形成的边界线上，在这种情况下，同样可以对最大凸包样本进行更新，以免对后续计算造成影响。When the data point to be added does not fall within the first convex hull space, the largest convex hull sample needs to be updated. In this case, add the data points to be added to the largest convex hull sample. For example, in the example of FIG. 5, the newly added data point P12 needs to be added to the maximum convex hull sample CA, and the largest convex hull sample CA1 of the updated classification set A1 shown in FIG. 6 is obtained. In this way, the classified hyperplane can be recalculated using the updated maximum convex hull sample CA1, and the original classified hyperplane HP is updated to HP1. It should be noted that the data points to be added may also fall on the boundary line formed by the largest convex hull samples. In this case, the maximum convex hull samples may also be updated to avoid affecting subsequent calculations.

S205：不对最大凸包样本进行更新。S205: The maximum convex hull sample is not updated.

当待增加或待删除的数据点落在第一凸包空间内时，无需对最大凸包样本进行更新，后续也不会对分类超平面进行更新。需要注意，在一些实施例中，为提高后续的计算效率，当待增加或待删除的数据点落在第一凸包空间内时，还可以进一步判断是否需要对次大凸包样本进行更新。有关如何利用次大凸包进行计算的详细信息将在下文中介绍。When the data point to be added or to be deleted falls within the first convex hull space, the maximum convex hull sample need not be updated, and the classification hyperplane is not updated subsequently. It should be noted that, in some embodiments, to improve the subsequent calculation efficiency, when the data point to be added or to be deleted falls within the first convex hull space, it may further determine whether the second largest convex hull sample needs to be updated. Detailed information on how to use the second largest convex hull for calculations is provided below.

S206：判断待删除的数据点是否落在第一凸包空间内。S206: Determine whether the data point to be deleted falls within the first convex hull space.

步骤S206中，当需要从样本中删除数据点时，判断待删除的数据点是否落在最大凸包样本内侧的第一凸包空间内。若待删除的数据点落在第一凸包空间内，则执行步骤S205；反之，若未落在第一凸包空间内，则执行步骤S207。例如，在图7所示的例子中，数据点P21、P22和P23分别为三种情况下待删除的数据点。其中，数据点P21和P22均落在第一凸包空间内，因此，需要删除数据点P21或P22时，执行步骤S205。而数据点P23未落在第一凸包空间内，那么在需要删除数据点P23时，执行步骤S207。In step S206, when it is necessary to delete the data point from the sample, it is determined whether the data point to be deleted falls within the first convex hull space inside the largest convex hull sample. If the data point to be deleted falls within the first convex hull space, step S205 is performed; otherwise, if it is not within the first convex hull space, step S207 is performed. For example, in the example shown in FIG. 7, the data points P21, P22, and P23 are data points to be deleted in three cases, respectively. The data points P21 and P22 all fall within the first convex hull space. Therefore, when the data point P21 or P22 needs to be deleted, step S205 is performed. The data point P23 does not fall within the first convex hull space. Then, when the data point P23 needs to be deleted, step S207 is performed.

S207：从最大凸包样本中删除待删除的数据点，以得到待定样本。S207: Delete the data point to be deleted from the maximum convex hull sample to obtain a pending sample.

当待删除的数据点未落在第一凸包空间内时，需要更新最大凸包样本。可以理解的是，在需要删除数据点的情况下，未落在第一凸包空间内意味着该数据点属于最大凸包样本。这种情况下，将待删除的数据点从最大凸包样本中删除。例如，在图7的例子中，需要将待删除的数据点P23从最大凸包样本中删除，得到图8中所示待定样本CB2，待定样本内部为特征空间内的一个待定空间。待定样本是最大凸包样本更新过程中的一个中间量。When the data point to be deleted does not fall within the first convex hull space, the largest convex hull sample needs to be updated. It can be understood that, in the case that the data point needs to be deleted, not falling within the first convex hull space means that the data point belongs to the largest convex hull sample. In this case, the data points to be deleted are deleted from the largest convex hull sample. For example, in the example of FIG. 7, the data point P23 to be deleted needs to be deleted from the maximum convex hull sample, and the pending sample CB2 shown in FIG. 8 is obtained. The interior of the pending sample is a pending space in the feature space. The pending sample is an intermediate quantity in the process of updating the largest convex hull sample.

S208：判断除最大凸包样本以外的其他样本数据是否全部落入待定空间。S208: Determine whether all the sample data except the largest convex hull sample falls into the to-be-determined space.

在步骤S208中，利用步骤S207中得到的待定样本和待定空间，帮助判断将待删除的数据点从最大凸包样本中删除后，得到的空间是否能够将其余数据点包括在内。若全部落入，则执行步骤S210，否则，则执行步骤S209。例如，在图8中，待定样本CB2形成的待定空间可以将除原最大凸包样本以外的其他样本数据包含在其中，因此，执行步骤S210。In step S208, the undetermined sample and the to-be-determined space obtained in step S207 are used to help determine whether the obtained data point can be deleted from the largest convex hull sample after the data point to be deleted is deleted. If all of them fall, step S210 is performed; otherwise, step S209 is performed. For example, in FIG. 8, the pending space formed by the pending sample CB2 may include other sample data other than the original largest convex hull sample therein, and therefore, step S210 is performed.

S209：将除最大凸包样本以外的其他样本数据中至少部分未落入待定空间的样本数据添加到待定样本。S209: Add at least part of the sample data other than the largest convex hull sample to the pending sample.

当除最大凸包样本以外的其他样本数据未全部落入待定空间时，意味着此时最大凸包样本的更新还不能结束。此时，可以通过待定样本以及除原最大凸包样本以外的其他样本数据重新计算待定样本。在一实施例中，可以根据除原最大凸包样本以外的其他样本数据以及待定样本直接重新计算最大凸包样本，计算结果作为更新后的待定样本（实际上，这种情况下得到的待定样本即为更新后的最大凸包样本）。在另一些实施例中，可以采用逐一或择一添加的方式，将一个或若干个数据点添加到待定样本中，对待定样本进行更新。在又一些实施例中，还可以使用次大凸包样本对待定样本进行辅助计算，详细信息将在下文介绍。When all the sample data except the largest convex hull sample does not fall into the pending space, it means that the update of the largest convex hull sample cannot be ended at this time. At this point, the pending sample can be recalculated from the pending sample and other sample data than the original largest convex hull sample. In an embodiment, the maximum convex hull sample may be directly recalculated according to other sample data than the original maximum convex hull sample and the pending sample, and the calculated result is used as the updated pending sample (actually, the pending sample obtained in this case) This is the updated maximum convex hull sample). In other embodiments, one or several data points may be added to the pending samples one by one or one by one, and the samples are updated. In still other embodiments, sub-large convex hull samples may also be used to perform sample calculations for the sample to be determined, the details of which are described below.

在执行步骤S209得到更新后的待定样本后，重新返回步骤S208，判断除最大凸包样本以外的其他样本数据是否全部落入待定空间，重复执行上述步骤，直到最后得到的待定样本所定义的待定空间可以将其余样本数据全部容纳其中。After the updated sample is obtained in step S209, the process returns to step S208, and it is determined whether all the sample data except the largest convex hull sample falls into the undetermined space, and the above steps are repeatedly performed until the last determined sample to be determined is determined. Space can hold all the remaining sample data.

S210：最大凸包样本更新为待定样本。S210: The maximum convex hull sample is updated to be a pending sample.

最后，将待定样本作为新的最大凸包样本，得到更新后的最大凸包样本。该更新后的最大凸包样本即可以用于更新分类超平面。例如，在图8的例子中，在步骤S208中得到的待定样本CB2已能容纳除原最大凸包样本以外的全部样本数据，因此，将更新后的分类集合B2的最大凸包即为CB2。Finally, the pending sample is taken as the new largest convex hull sample, and the updated maximum convex hull sample is obtained. The updated maximum convex hull sample can be used to update the classification hyperplane. For example, in the example of FIG. 8, the pending sample CB2 obtained in step S208 can already accommodate all the sample data except the original maximum convex hull sample, and therefore, the largest convex hull of the updated classification set B2 is CB2.

请参阅图3，图3是本发明最大凸包更新方法一实施例的流程示意图。如前述，在一些实施例中，可以通过定义并计算次大凸包来辅助最大凸包更新的计算。图3所示的实施例主要可紧跟图2中步骤S206使用，当需要删除数据点，且待删除的数据点未落入该分类集合的第一凸包空间内时，使用图3实施例所示的方法更新最大凸包，该方法包括：Please refer to FIG. 3. FIG. 3 is a schematic flowchart diagram of an embodiment of a method for updating a maximum convex hull according to the present invention. As mentioned above, in some embodiments, the calculation of the maximum convex hull update can be assisted by defining and calculating the second largest convex hull. The embodiment shown in FIG. 3 can be used mainly in step S206 of FIG. 2. When the data point needs to be deleted, and the data point to be deleted does not fall within the first convex hull space of the classification set, the embodiment of FIG. 3 is used. The method shown updates the largest convex hull, and the method includes:

S301：分别确定每一分类集合中的次大凸包样本。S301: Determine a second largest convex hull sample in each classification set separately.

在本实施例中，除了确定每一分类集合的最大凸包样本以外，还确定每一分类集合（或者有数据更新的集合）的次大凸包样本，次大凸包样本即位于除了最大凸包样本以外的其他数据点的最大凸包上的数据点。例如，如图9所示，对分类集合B，除了确定其最大凸包样本CB以外，还确定其次大凸包样本ICB。次大凸包样本在特征空间内定义第二凸包空间，第二凸包空间能够容纳除了最大凸包样本和次大凸包样本以外的其他数据点。应当理解，确定次大凸包样本的步骤可以与确定最大凸包样本的步骤同时执行，或者在有数据更新请求时执行，又或者在需要利用次大凸包进行计算的步骤之前执行。In this embodiment, in addition to determining the largest convex hull sample of each classification set, the second largest convex hull sample of each classification set (or a set of data updates) is determined, and the second largest convex hull sample is located except the largest convex The data point on the largest convex hull of other data points other than the packet sample. For example, as shown in FIG. 9, for the classification set B, in addition to determining its largest convex hull sample CB, the next largest convex hull sample ICB is also determined. The second largest convex hull sample defines a second convex hull space in the feature space, and the second convex hull space can accommodate other data points except the largest convex hull sample and the second largest convex hull sample. It should be understood that the step of determining the second largest convex hull sample may be performed concurrently with the step of determining the largest convex hull sample, or when there is a data update request, or before the step of performing the calculation using the second largest convex hull.

S302：从最大凸包样本中删除待删除的数据点，以得到待定样本。S302: Delete the data point to be deleted from the maximum convex hull sample to obtain a pending sample.

S303：将次大凸包样本中最接近待删除的数据点的样本数据作为待定数据点。S303: The sample data of the second largest convex hull sample closest to the data point to be deleted is used as the data point to be determined.

在步骤S303中，可以通过计算得到次大凸包样本中最接近待删除的数据点的样本数据作为待定数据点。例如，图9中，分类集合B的次大凸包样本ICB包括5个样本数据，通过计算（以图形表示时可通过直接观察或者测量）可知，次大凸包样本ICB中最接近待删除的数据点P31的样本数据P34，将其作为待定数据点，用于后续的比较和判断。In step S303, the sample data closest to the data point to be deleted among the second largest convex hull samples can be obtained as a pending data point by calculation. For example, in FIG. 9, the sub-large convex hull sample ICB of the classification set B includes 5 sample data, and it can be known by calculation (directly by observation or measurement when graphically represented) that the sub-large convex hull sample ICB is the closest to be deleted. The sample data P34 of the data point P31 is used as a data point to be determined for subsequent comparison and judgment.

S304：判断待定数据点是否落入待定空间。S304: Determine whether the pending data point falls into the to-be-determined space.

在步骤S304中，判断待定数据点是否落入待定空间，待定空间由步骤S302中得到的待定样本在特征空间内定义。若待定数据点落入待定空间，说明待定空间可以容纳除原最大凸包样本以外的所有样本数据，此时，执行步骤S307。否则，说明还需要对待定样本和待定空间进一步处理，执行步骤S305。例如，图9中，待定空间为原第一凸包空间除去待删除点P31及其邻近点P32、P33围成的三角形以外的区域。由图可知，待定数据点P34在待删除点P31及其邻近点P32、P33围成的三角形，而不在待定空间中，因此需要对待定样本和待定空间进一步处理。In step S304, it is determined whether the data point to be determined falls within the to-be-determined space, and the to-be-determined space is defined in the feature space by the pending sample obtained in step S302. If the data point to be determined falls within the space to be determined, it indicates that the pending space can accommodate all the sample data except the original maximum convex hull sample. In this case, step S307 is performed. Otherwise, the description also needs to process the sample and the pending space for further processing, and step S305 is performed. For example, in FIG. 9, the to-be-determined space is an area other than the triangle surrounded by the point to be deleted P31 and its neighboring points P32, P33. As can be seen from the figure, the data point P34 to be deleted is a triangle enclosed by the point P31 to be deleted and its neighboring points P32, P33, and is not in the space to be determined. Therefore, the sample to be fixed and the space to be determined need to be further processed.

S305：将待定数据点添加到待定样本。S305: Add a pending data point to the pending sample.

在步骤S305中，对待定样本进行更新，将待定数据点添加到其中。In step S305, the sample to be processed is updated, and the data point to be determined is added thereto.

S306：将待定数据点更新为次大凸包样本中待定数据点的邻近数据点，并将原待定数据点从次大凸包样本中删除。S306: Update the to-be-determined data point to the adjacent data point of the to-be-determined data point in the second largest convex hull sample, and delete the original pending data point from the second largest convex hull sample.

在步骤S306中，需要将次大凸包样本中的其他数据点作为下一步比较的参考，因此，将待定数据点的邻近数据点作为新的待定数据点，并将原待定数据点从次大凸包样本中删除。邻近数据点可以是原待定数据点两侧的数据点，也可以按照左-右-左或者右-左-右的顺序每次验证其单侧的邻近数据点。例如，在图9中，可以将原待定数据点P34添加到待定样本中，并将其从次大凸包样本ICB中删除，并将P34一侧的数据点P35作为新的待定数据点。In step S306, other data points in the second largest convex hull sample need to be used as a reference for the next comparison. Therefore, the adjacent data point of the data point to be determined is used as a new pending data point, and the original pending data point is from the next largest. The convex hull sample is removed. The adjacent data points may be data points on both sides of the original data point to be determined, and the neighboring data points on one side may be verified each time in the order of left-right-left or right-left-right. For example, in FIG. 9, the original pending data point P34 can be added to the pending sample and deleted from the next largest convex hull sample ICB, and the data point P35 on the P34 side is taken as the new pending data point.

在执行步骤S306得到更新后的待定数据点和待定样本后，重新返回步骤S304，判断更新后的待定数据点是否落入更新后的待定空间内，重复执行上述步骤，直到最后得到的待定样本所定义的待定空间可以将待定数据点容纳其中。在图9的例子中，更新后的待定空间仍然无法将待定数据点容纳其中（P35位于P34和P33连线的右侧），因此，继续重复上述步骤，最终将P35添加至待定样本后，无论采用P36或者P38作为新的待定数据点，待定空间均能将其容纳其中，此时认为待定样本已更新结束。After the updated data point and the pending sample are obtained in step S306, the process returns to step S304 to determine whether the updated data point to be determined falls within the updated pending space, and the above steps are repeatedly performed until the last obtained sample is to be determined. The defined pending space can accommodate the pending data points. In the example of FIG. 9, the updated pending space still cannot accommodate the pending data points (P35 is located on the right side of the P34 and P33 connections), therefore, the above steps are continued, and finally P35 is added to the pending samples, regardless of P36 or P38 is used as the new data point to be determined, and the pending space can accommodate it. At this time, it is considered that the pending sample has been updated.

S307：最大凸包样本更新为待定样本。S307: The maximum convex hull sample is updated to be a pending sample.

最后，将更新后的待定样本作为新的最大凸包样本，得到如图10所示的更新后的最大凸包样本CB3。该更新后的最大凸包样本CB3即可以用于更新分类超平面。同时，还得到更新后的次大凸包样本ICB3，可以用于下一次样本数据更新时的计算。Finally, the updated pending sample is taken as the new maximum convex hull sample, and the updated maximum convex hull sample CB3 as shown in FIG. 10 is obtained. The updated maximum convex hull sample CB3 can be used to update the classification hyperplane. At the same time, the updated sub-large convex hull sample ICB3 is also obtained, which can be used for the calculation of the next sample data update.

在此实施例中，通过使用次大凸包样本，可以进一步减少后续步骤中比较、计算的时间，提高样本数据分类的效率。In this embodiment, by using the second largest convex hull sample, the time of comparison and calculation in the subsequent steps can be further reduced, and the efficiency of classifying the sample data is improved.

请参阅图11，本发明基于SVM的样本数据分类系统的一个实施例，其中基于SVM的样本数据分类系统400包括通信总线401、处理器402、存储器403。处理器402和存储器403通过通信总线401耦接。Referring to FIG. 11, an embodiment of the SVM-based sample data classification system of the present invention, wherein the SVM-based sample data classification system 400 includes a communication bus 401, a processor 402, and a memory 403. Processor 402 and memory 403 are coupled by communication bus 401.

其中，存储器403保存有程序数据，程序数据可被处理器402加载并执行上述任意实施例的基于SVM的样本数据更新方法。可以理解地，在其它一些实施例中，存储器403可以不同处理器402设置于同一实体装置中，而是通过将系统400结合网络来执行上述任一实施例的方法。The memory 403 holds program data, and the program data can be loaded by the processor 402 and execute the SVM-based sample data update method of any of the above embodiments. It will be appreciated that in other embodiments, the memory 403 may be disposed in the same physical device by different processors 402, but the method of any of the above embodiments may be performed by incorporating the system 400 into a network.

上述实施例所述功能如果以软件形式实现并作为独立的产品销售或使用时，可存储在一个具有存储功能的装置中，即，本发明还提供一种存储有程序的存储装置。存储装置中程序数据能够被执行以实现上述实施例中操作系统框架的翻译方法，该存储装置包括但不限于U盘、光盘、服务器或者硬盘等。The functions described in the above embodiments can be stored in a device having a storage function if implemented in software and sold or used as a stand-alone product, that is, the present invention also provides a storage device storing the program. The program data in the storage device can be executed to implement the translation method of the operating system framework in the above embodiments, including but not limited to a USB flash drive, an optical disk, a server, a hard disk, and the like.

以上所述仅为本发明的实施方式，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本发明的专利保护范围内。The above is only the embodiment of the present invention, and is not intended to limit the scope of the invention, and the equivalent structure or equivalent process transformations made by the description of the invention and the drawings are directly or indirectly applied to other related technologies. The fields are all included in the scope of patent protection of the present invention.