CN110033862B

Movatterモバイル変換

Info

Publication number: CN110033862B
Application number: CN201910295314.1A
Authority: CN
Inventors: 孙鑫亮; 杨涛; 章颖; 李鑫欣; 汪叶群; 苏璐萍; 高佳奕; 于婧
Original assignee: Nanjing University of Chinese Medicine
Current assignee: Nanjing University of Chinese Medicine
Priority date: 2019-04-12
Filing date: 2019-04-12
Publication date: 2022-05-17
Anticipated expiration: 2039-04-12
Also published as: CN110033862A

Abstract

The invention discloses a traditional Chinese medicine quantitative diagnosis system and a storage medium based on a weighted directed graph, and belongs to the field of traditional Chinese medicine information processing. The method comprises the following steps: the weight calculation module is used for calculating the weight of the characteristic data according to a preset strategy; the directed graph constructing module is used for constructing a weighted directed graph according to the relation between the weight and the feature data; and the reasoning diagnosis module is used for reasoning the case to be detected through the directed graph construction module to obtain a corresponding result of the case. The invention provides a traditional Chinese medicine quantitative diagnosis system based on a weighted directed graph, which visually expresses characteristic data in a complex pathogenesis as the directed graph, completes dynamic construction of the pathogenesis through the weight relation between symptoms and syndrome types, can better display the evaluation result of a constructed traditional Chinese medicine model compared with the method for exhaustively enumerating a standard syndrome name mode in the existing intelligent syndrome differentiation system, has wide diagnosis adaptability and good accuracy, and can effectively improve the diagnosis efficiency of the traditional Chinese medicine.

Description

Translated fromChinese

一种基于加权有向图的中医量化诊断系统及存储介质A TCM Quantitative Diagnosis System and Storage Medium Based on Weighted Directed Graph

技术领域technical field

本发明属于中医信息处理领域，尤其涉及一种基于加权有向图的中医量化诊断系统及存储介质。The invention belongs to the field of TCM information processing, and in particular relates to a TCM quantitative diagnosis system and a storage medium based on a weighted directed graph.

背景技术Background technique

辨证论治是中医学的特色与精华，其中辨证是中医立法、处方和用药的前提。中医辨证是通过四诊(望、闻、问、切)搜集临床信息(症状和体征)，运用中医学理论进行抽象与概括，最终得到中医证型的过程。由于中医辨证依赖中医专家的经验，具有主观性和、复杂性和模糊性特点，导致中医辨证较难量化和重复，阻碍了中医现代化的发展。Treatment based on syndrome differentiation is the characteristic and essence of TCM, among which syndrome differentiation is the premise of TCM legislation, prescription and medication. TCM syndrome differentiation is the process of collecting clinical information (symptoms and signs) through the four diagnostics (looking, smelling, asking, and seeing), using TCM theories to abstract and generalize, and finally obtain TCM syndrome types. Because TCM syndrome differentiation relies on the experience of TCM experts, it has the characteristics of subjectivity, complexity and ambiguity, which makes TCM syndrome differentiation difficult to quantify and repeat, which hinders the development of TCM modernization.

随着信息技术的发展，越来越多的新方法和技术被引入到中医药研究领域。以知识工程、机器学习、模式识别等技术为代表的人工智能技术逐步被引入中医辨证研究中，取得了一些进展。然而上述研究大多集中在针对某一疾病或某几个证型的识别和判断，难以有效应对临床上复杂的病情变化。在中医临床实际中，患者病情较为复杂，证型往往不会单一出现，常常多个证型交织重叠，采用传统的人工智能技术不能进行有效地的建模和分析；此外，在传统分析过程中，临床信息大多以二进制来表示(赋值“1”为出现某症状，“0”未出现)，以临床信息的二进制数值参与建模，忽视了临床信息本身的权重，难以取得令人满意的效果。With the development of information technology, more and more new methods and technologies have been introduced into the field of traditional Chinese medicine research. Artificial intelligence technology represented by knowledge engineering, machine learning, pattern recognition and other technologies has been gradually introduced into the research of TCM syndrome differentiation, and some progress has been made. However, most of the above studies focus on the identification and judgment of a certain disease or a few syndromes, and it is difficult to effectively deal with the complex clinical changes of the disease. In the clinical practice of traditional Chinese medicine, the patient's condition is relatively complex, and the syndrome types often do not appear in a single manner, and often multiple syndrome types are intertwined and overlapped. The traditional artificial intelligence technology cannot be used for effective modeling and analysis. , clinical information is mostly represented in binary (assignment "1" means that a certain symptom occurs, "0" does not appear), and the binary value of clinical information is involved in modeling, ignoring the weight of clinical information itself, and it is difficult to achieve satisfactory results .

中国专利公开号为CN102298663A，公开了一种中医自动识别证型的检测方法，包含以下步骤：建立标准客观化的中医病例数据库；针对该标准化中医样本数据库，以基于协关系的属性筛选方法，计算各个属性间的互信息及对称不确定性，基于启发式规则，挑选出对于证型检测贡献度较大的症状属性集合；利用挑选出的关键属性集合以及病例数据库中的样本信息构建分类训练样本集合，通过计算属性的信息增益率，确定决策属性，同时控制每个节点的样本下限并记录分类误差，以增量学习的方式读取所有训练样本及准训练样本，最后得到分类规则；利用得到的分类规则进行新样本的证型识别检测。但是该方案只在肝硬化的自动辨证问题作了研究，对于推广到中医其他证型的自动判别领域还没有具体的解决办法。The Chinese Patent Publication No. CN102298663A discloses a detection method for automatic identification of syndrome types in traditional Chinese medicine, comprising the following steps: establishing a standard and objective traditional Chinese medicine case database; Mutual information and symmetric uncertainty among various attributes, based on heuristic rules, select symptom attribute sets that contribute more to syndrome detection; use the selected key attribute sets and sample information in the case database to construct classification training samples Set, determine the decision-making attribute by calculating the information gain rate of the attribute, control the sample lower limit of each node and record the classification error, read all training samples and quasi-training samples in the way of incremental learning, and finally obtain the classification rules; The classification rules of the new samples are used to identify and detect the syndrome. However, this scheme has only been studied on the automatic syndrome differentiation of liver cirrhosis, and there is no specific solution to the automatic differentiation of other syndrome types in traditional Chinese medicine.

中国专利公开号为CN104615894B，公开了一种基于k近邻标签特定权重特征的中医诊断方法及系统。上述方法包括以下步骤：按照预设权重确定策略获取不同类别下事例的特征数据权重信息；根据不同类别下事例的特征数据权重信息，获取任意两个事例的加权欧几里得距离并选择预设数目加权欧几里得距离最小的事例；对选择的所述事例采用k近邻标签特定权重特征多标记学习方法即ML-LSWAKNN进行处理，获取所述事例对应的评价指标，充分考虑了特征加权对分类的影响，大大提高了分类的精度。Chinese Patent Publication No. is CN104615894B, which discloses a traditional Chinese medicine diagnosis method and system based on the specific weight feature of k-nearest neighbor labels. The above method includes the following steps: obtaining feature data weight information of cases under different categories according to a preset weight determination strategy; obtaining weighted Euclidean distances of any two cases according to the feature data weight information of cases under different categories, and selecting a preset The case with the smallest number-weighted Euclidean distance; the selected case is processed by the k-nearest neighbor label specific weight feature multi-label learning method, namely ML-LSWAKNN, and the evaluation index corresponding to the case is obtained, fully considering the feature weighting pair. The impact of classification greatly improves the accuracy of classification.

上述方案采用邻近算法，或者说K最近邻(KNN，k-NearestNeighbor)分类算法是数据挖掘分类技术中最简单的方法之一。所谓K最近邻，就是k个最近的邻居的意思，说的是每个样本都可以用它最接近的k个邻居来代表。该方法的思路是：如果一个样本在特征空间中的 k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别，则该样本也属于这个类别。KNN算法中，所选择的邻居都是已经正确分类的对象。该方法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别。其中K为临近数，即在预测目标点时取几个临近的点来预测。因此K值得选取非常重要，因为：如果当K的取值过小时，一旦有噪声得成分存在们将会对预测产生比较大影响，例如取K值为1时，一旦最近的一个点是噪声，那么就会出现偏差，K值的减小就意味着整体模型变得复杂，容易发生过拟合；如果K的值取的过大时，就相当于用较大邻域中的训练实例进行预测，学习的近似误差会增大。这时与输入目标点较远实例也会对预测起作用，使预测发生错误。因此虽然能够进行权重标记，但K的最佳取值难以确定，导致后期的结果不够准确。The above scheme adopts the proximity algorithm, or the K-Nearest Neighbor (KNN, k-Nearest Neighbor) classification algorithm is one of the simplest methods in the data mining classification technology. The so-called K nearest neighbors means the k nearest neighbors, which means that each sample can be represented by its nearest k neighbors. The idea of this method is: if most of the k most similar samples in the feature space (that is, the closest neighbors in the feature space) belong to a certain category, then the sample also belongs to this category. In the KNN algorithm, the selected neighbors are all objects that have been correctly classified. In the classification decision, this method only determines the category of the sample to be classified according to the category of the nearest one or several samples. Among them, K is the number of proximity, that is, when predicting the target point, take several adjacent points to predict. Therefore, it is very important to choose the value of K, because: if the value of K is too small, once there are noise components, they will have a greater impact on the prediction. For example, when the value of K is 1, once the nearest point is noise, Then there will be deviations, and the reduction of the K value means that the overall model becomes complicated and overfitting is prone to occur; if the value of K is too large, it is equivalent to using the training examples in the larger neighborhood for prediction. , the learning approximation error will increase. At this time, instances far away from the input target point will also play a role in the prediction, making the prediction wrong. Therefore, although weight marking can be performed, the optimal value of K is difficult to determine, resulting in inaccurate later results.

相似度计算主要任务是衡量对象之间的相似程度，是信息检索、推荐系统、数据挖掘等的一个基础性计算。在K最近邻(KNN，K-NearestNeighbor)分类算法中用到的欧几里得度量 (Euclidean Metric)(也称欧氏距离)是一个通常采用的距离定义，指在m维空间中两个点之间的真实距离，或者向量的自然长度(即该点到原点的距离)。在二维和三维空间中的欧氏距离就是两点之间的实际距离,较为直观，易于理解，但在高维空间中，欧氏距离的表现往往差强人意。中医症状空间是一个典型的高维空间，有上百个症状，每个症状用0或1来表示，0为未出现某症状，1为出现某症状。假设证型1对应症状向量[0,0,0,0,0,0,0,0,1,1]，证型2对应[1,1,0,0,0,0,0,0,0,0]，那么两者之间的欧氏距离S12＝2；证型3对应症状向量 [1,1,1,1,1,1,1,1,0,0]，证型4对应[0,0,0,1,1,1,1,1,1,1]，两者之间的欧式距离

显然S34大于S12，理论上证型1和证型2更加相似，但实际上证型1和证型2没有任何相关性，反而是证型3和证型4有多个症状重叠，相似性更高。这样导致症状空间中样本出现的阳性症状较少(标记为1的症状)而阴性症状较多(标记为0的症状)，导致相似度计算受到阴性症状的影响较大，影响模型效果。The main task of similarity calculation is to measure the similarity between objects, which is a basic calculation for information retrieval, recommendation system, data mining, etc. The Euclidean Metric (also called Euclidean distance) used in the K-Nearest Neighbor (KNN, K-NearestNeighbor) classification algorithm is a commonly used definition of distance, referring to two points in an m-dimensional space The true distance between , or the natural length of the vector (that is, the distance from the point to the origin). Euclidean distance in two-dimensional and three-dimensional space is the actual distance between two points, which is more intuitive and easy to understand, but in high-dimensional space, the performance of Euclidean distance is often unsatisfactory. TCM symptom space is a typical high-dimensional space with hundreds of symptoms. Each symptom is represented by 0 or 1. 0 means no symptom occurs, and 1 means a certain symptom occurs. Suppose thatsyndrome 1 corresponds to the symptom vector [0,0,0,0,0,0,0,0,1,1], andsyndrome 2 corresponds to [1,1,0,0,0,0,0,0, 0,0], then the Euclidean distance between the two is S12=2; syndrome type 3 corresponds to the symptom vector [1,1,1,1,1,1,1,1,0,0], syndrome type 4 corresponds to [0,0,0,1,1,1,1,1,1,1], the Euclidean distance between the two

Obviously, S34 is greater than S12. In theory,syndrome type 1 andsyndrome type 2 are more similar, but in fact,syndrome type 1 andsyndrome type 2 have no correlation. Instead, syndrome type 3 and syndrome type 4 have multiple overlapping symptoms, and the similarity is higher. In this way, the samples in the symptom space have fewer positive symptoms (symptoms marked 1) and more negative symptoms (symptoms marked 0), resulting in the similarity calculation being greatly affected by negative symptoms, which affects the model effect.

发明内容SUMMARY OF THE INVENTION

1、要解决的问题1. The problem to be solved

针对现有技术中，中医临床上，证候往往不会单一出现，时常交织在一起，传统的数据挖掘技术无法同时进行建模和分析的问题，本发明提供一种基于加权有向图的中医量化诊断系统，将复杂的病机中的特征数据以有向图直观地表现出来，通过症状与证型的权重关系完成对病机的动态构建，相比目前智能辨证系统中穷尽列举标准证名模式的方法，能更好的显示对于构建的中医模型评测结果，且诊断的适应性广，准确度好，能有效提高中医的诊断效率。Aiming at the problem in the prior art that in the clinical practice of traditional Chinese medicine, syndromes often do not appear singly and are often intertwined, and the traditional data mining technology cannot perform modeling and analysis at the same time, the present invention provides a weighted directed graph-based traditional Chinese medicine Quantitative diagnosis system, which intuitively expresses the characteristic data of complex pathogenesis in a directed graph, and completes the dynamic construction of pathogenesis through the weight relationship between symptoms and syndrome types. Compared with the current intelligent syndrome differentiation system, the standard syndrome names are exhaustively listed. The method of the model can better display the evaluation results of the constructed TCM model, and has wide adaptability and good accuracy of diagnosis, which can effectively improve the diagnostic efficiency of TCM.

2、技术方案2. Technical solutions

第一方面，本发明提供一种基于加权有向图的中医量化诊断系统，包括：权重计算模块，按照预定策略计算特征数据的权重；有向图构造模块，根据所述权重与所述特征数据之间关系构造加权有向图；推理诊断模块，将待检测的事例通过所述有向图构造模块进行推理，获取所述事例对应结果。In a first aspect, the present invention provides a TCM quantitative diagnosis system based on a weighted directed graph, comprising: a weight calculation module, which calculates the weight of characteristic data according to a predetermined strategy; A weighted directed graph is constructed from the relationship between them; a reasoning and diagnosis module is used to infer the case to be detected through the directed graph construction module, and obtain the corresponding result of the case.

进一步的，所述预定策略包括互信息计算方法、置信度计算方法和信息熵计算方法。Further, the predetermined strategy includes a mutual information calculation method, a confidence calculation method and an information entropy calculation method.

进一步的，所述权重计算模块包括特征数据矩阵构建子模块、特征数据相关度确定子模块以及特征数据权重获取子模块；其中Further, the weight calculation module includes a feature data matrix construction submodule, a feature data correlation determination submodule and a feature data weight acquisition submodule; wherein

所述特征数据矩阵构建子模块用于将特征数据转换成稀疏矩阵；The feature data matrix construction submodule is used to convert the feature data into a sparse matrix;

所述特征数据相关度确定子模块根据所述稀疏矩阵按照所述预定策略计算特征数据的相关度；The feature data correlation determination submodule calculates the correlation of feature data according to the sparse matrix according to the predetermined strategy;

所述特征数据权重获取子模块用于对所述特征数据的相关度进行标准化处理，获得所述特征数据的权重。The feature data weight acquisition sub-module is used to standardize the correlation of the feature data to obtain the feature data weight.

进一步的，所述特征数据矩阵构建子模块的数据处理过程为：Further, the data processing process of the feature data matrix construction submodule is:

(1)根据不同类别的特征数据分别构建稀疏矩阵A和稀疏矩阵B；(1) Construct sparse matrix A and sparse matrix B respectively according to different types of feature data;

(2)分别从所述稀疏矩阵A中取出单个元素列和分别从矩阵B取出单个元素做“与”运算，获得矩阵Ci；(2) take out a single element column from the sparse matrix A respectively and take out a single element from the matrix B respectively to do "AND" operation to obtain matrix Ci;

其中m表示矩阵中的列数，n表示矩阵中的行数。

where m is the number of columns in the matrix and n is the number of rows in the matrix.

进一步的，所述特征数据相关度确定子模块用于根据所述稀疏矩阵A、稀疏矩阵B和稀疏矩阵C_i，按照互信息计算方法计算特征数据的相关度，具体为：Further, the feature data correlation determination submodule is used to calculate the correlation of the feature data according to the mutual information calculation method according to the sparse matrix A, the sparse matrix B and the sparse matrix C_i , specifically:

p(x,y)＝c_i＝x,n＝y (3)p(x,y)=_ci=x,n=y (3)

其中，x表示症状、y表示证型，p(x)表示稀疏矩阵A中a_mn项在其所在列中出现的概率，a_mn是所述稀疏矩阵A中的元素，用0或1表示；p(y)表示所述稀疏矩阵B中b_mn项在其所在列中出现的概率，b_mn是所述稀疏矩阵B中的元素，用0或1表示；p(x,y)表示矩阵C中c_mn出现的的概率， PMI(x,y)为所述稀疏矩阵A和所述稀疏矩阵B中每个元素同时出现的概率，m表示矩阵中的列数，n表示矩阵中的行数。Among them, x represents the symptom, y represents the syndrome type, p(x) represents the probability that a_mn item in the sparse matrix A appears in its column, and a_mn is the element in the sparse matrix A, which is represented by 0 or 1; p(y) represents the probability that the item b_mn in the sparse matrix B appears in its column, b_mn is the element in the sparse matrix B, represented by 0 or 1; p(x, y) represents the matrix C In the probability of occurrence of c_mn , PMI(x, y) is the probability of each element in the sparse matrix A and the sparse matrix B appearing at the same time, m represents the number of columns in the matrix, and n represents the number of rows in the matrix .

进一步的，所述特征数据权重获取子模块的处理过程为：Further, the processing process of the feature data weight acquisition sub-module is:

获取所述特征数据的相关度

计算所述特征数据的权重： WF＝(wf1,wf2,...wfk,...wfn),其中，

Obtain the correlation of the feature data

Calculate the weight of the feature data: WF=(wf1,wf2,...wfk,...wfn), where,

进一步的，根据所述稀疏矩阵A、所述稀疏矩阵B，和所述特征数据的权重，利用三元组构成有向图。Further, according to the sparse matrix A, the sparse matrix B, and the weight of the feature data, a directed graph is formed by triples.

更进一步的，所述待检测的事例通过所述有向图构造模块进行推理的方法具体为：Further, the method for inferring the case to be detected through the directed graph construction module is specifically:

将待检测的事例的与所述特征数据相对应，得出所述特征数据对应的权重；Corresponding the case to be detected with the feature data to obtain the weight corresponding to the feature data;

根据所述特征数据对应的权重进行加权求和，得出参考结果和其对应权重和；Carry out weighted summation according to the corresponding weights of the feature data, and obtain the reference result and its corresponding weight sum;

対所述对应权重和降序排序，取阈值对参考结果进行舍弃，获取所述事例对应的最优结果。According to the corresponding weight and descending order, the reference result is discarded by taking the threshold value, and the optimal result corresponding to the case is obtained.

第二方面，本发明提供一种计算机可读存储介质，所述计算机存储介质存储有上述任意一项所述的中医量化诊断系统。In a second aspect, the present invention provides a computer-readable storage medium, where the computer storage medium stores the traditional Chinese medicine quantitative diagnosis system described in any one of the above.

3、有益效果3. Beneficial effects

相比于现有技术，本发明的有益效果为：Compared with the prior art, the beneficial effects of the present invention are:

(1)本发明提供一种基于加权有向图的中医量化诊断系统，将复杂的病机中的特征数据以有向图直观地表现出来，通过症状与证型的权重关系完成对病机的动态构建，相比目前智能辨证系统中穷尽列举标准证名模式的方法，能更好的显示对于构建的中医模型评测结，且诊断的适应性广，准确度好，能有效提高中医的诊断效率；(1) The present invention provides a TCM quantitative diagnosis system based on a weighted directed graph, which intuitively expresses the characteristic data in the complex pathogenesis with a directed graph, and completes the identification of the pathogenesis through the weighted relationship between symptoms and syndrome types. Dynamic construction, compared with the method of exhaustively enumerating the standard syndrome name patterns in the current intelligent syndrome differentiation system, it can better display the evaluation results of the constructed Chinese medicine model, and has wide adaptability and high accuracy of diagnosis, which can effectively improve the diagnosis efficiency of Chinese medicine. ;

(2)本发明在对特征数据权值进行归一化的处理，将原始数据均转换为无量纲化指标测评值，即各指标值都处于同一个数量级别上，让各个特征对结果做出的贡献相同，可以进行综合测评分析，提高计算的精确度；(2) In the present invention, in the process of normalizing the feature data weights, the original data are converted into dimensionless index evaluation values, that is, each index value is on the same level of magnitude, so that each feature can make a difference to the result. Contributions are the same, comprehensive evaluation and analysis can be carried out to improve the accuracy of the calculation;

(3)本发明待检测的事例的与特征数据相对应，得出所述特征数据对应的权重；根据所述特征数据对应的权重进行加权求和，得出参考结果和其对应权重和；対所述对应权重和降序排序，取阈值对参考结果进行舍弃，获取所述事例对应的最优结果，将结果进行排序输出，便于患者或者医生对概率较大的证型进行论证治疗；(3) The case to be detected in the present invention corresponds to the characteristic data, and the weight corresponding to the characteristic data is obtained; according to the weight corresponding to the characteristic data, weighted summation is performed to obtain the reference result and its corresponding weight sum; For the corresponding weight and descending order, the reference result is discarded by taking the threshold value, the optimal result corresponding to the case is obtained, and the result is sorted and output, which is convenient for the patient or doctor to demonstrate and treat the syndrome type with high probability;

(4)本发明通过按照互信息方法计算，得到的权重信息构建的加权有向图，依据中医诊疗思维实现，可以更好的实现中医专家经验抽取与模型的构建；(4) the present invention calculates according to the mutual information method, and the weighted directed graph that the obtained weight information is constructed is realized according to the thinking of Chinese medicine diagnosis and treatment, and can better realize the extraction of Chinese medicine expert experience and the construction of the model;

(5)本发明提供的系统结构简单，设计合理，易于使用。(5) The system provided by the present invention is simple in structure, reasonable in design and easy to use.

附图说明Description of drawings

图1为本发明中医量化诊断系统结构图；Fig. 1 is the structure diagram of the quantitative diagnosis system of traditional Chinese medicine of the present invention;

图2为本发明中医量化诊断系统构建的有向图。FIG. 2 is a directed graph constructed by the quantitative diagnosis system of traditional Chinese medicine according to the present invention.

具体实施方式Detailed ways

下面将结合附图对本发明技术方案的实施例进行详细的描述。以下实施例仅用于更加清楚地说明本发明的技术方案，因此只作为示例，而不能以此来限制本发明的保护范围。需要注意的是，除非另有说明，本申请使用的技术术语或者科学术语应当为本发明所属领域技术人员所理解的通常意义。Embodiments of the technical solutions of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only used to more clearly illustrate the technical solutions of the present invention, and are therefore only used as examples, and cannot be used to limit the protection scope of the present invention. It should be noted that, unless otherwise specified, the technical or scientific terms used in this application should have the usual meanings understood by those skilled in the art to which the present invention belongs.

在本申请中，术语“第一”、“第二”等仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。在本发明的描述中，“多个”的含义是两个以上，除非另有明确具体的限定。In this application, the terms "first", "second", etc. are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. In the description of the present invention, "plurality" means two or more, unless otherwise expressly and specifically defined.

应当理解，当在本说明书和所附权利要求书中使用时，术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在，但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It is to be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described features, integers, steps, operations, elements and/or components, but does not exclude one or more other features , whole, step, operation, element, component and/or the presence or addition of a collection thereof.

还应当理解，在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明说明书和所附权利要求书中所使用的那样，除非上下文清楚地指明其它情况，否则单数形式的“一”、“一个”及“该”意在包括复数形式。It is also to be understood that the terminology used in this specification of the present invention is for the purpose of describing particular embodiments only and is not intended to limit the present invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms unless the context clearly dictates otherwise.

如在本说明书和所附权利要求书中所使用的那样，术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。As used in this specification and the appended claims, the term "if" may be contextually interpreted as "when" or "once" or "in response to determining" or "in response to detecting" .

具体实现中，本发明实施例中描述的终端包括但不限于诸如具有触摸敏感表面(例如，触摸屏显示器和/或触摸板)的移动电话、膝上型计算机或平板计算机之类的其它便携式设备。还应当理解的是，在某些实施例中，所述设备并非便携式通信设备，而是具有触摸敏感表面(例如，触摸屏显示器和/或触摸板)的台式计算机。In specific implementations, the terminals described in the embodiments of the present invention include, but are not limited to, other portable devices such as mobile phones, laptops, or tablet computers with touch-sensitive surfaces (e.g., touchscreen displays and/or touchpads). It should also be understood that in some embodiments, the device is not a portable communication device, but a desktop computer with a touch-sensitive surface (e.g., a touch screen display and/or a touch pad).

在接下来的讨论中，描述了包括显示器和触摸敏感表面的终端。然而，应当理解的是，终端可以包括诸如物理键盘、鼠标和/或控制杆的一个或多个其它物理用户接口设备。In the discussion that follows, a terminal including a display and a touch-sensitive surface is described. It should be understood, however, that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

终端支持各种应用程序，例如以下中的一个或多个：绘图应用程序、演示应用程序、文字处理应用程序、网站创建应用程序、盘刻录应用程序、电子表格应用程序、游戏应用程序、电话应用程序、视频会议应用程序、电子邮件应用程序、即时消息收发应用程序、锻炼支持应用程序、照片管理应用程序、数码相机应用程序、数字摄影机应用程序、web浏览应用程序、数字音乐播放器应用程序和/或数字视频播放器应用程序。The terminal supports various applications such as one or more of the following: drawing applications, presentation applications, word processing applications, website creation applications, disc burning applications, spreadsheet applications, gaming applications, telephony applications programs, video conferencing applications, email applications, instant messaging applications, exercise support applications, photo management applications, digital camera applications, digital video camera applications, web browsing applications, digital music player applications and / or a digital video player application.

可以在终端上执行的各种应用程序可以使用诸如触摸敏感表面的至少一个公共物理用户接口设备。可以在应用程序之间和/或相应应用程序内调整和/或改变触摸敏感表面的一个或多个功能以及终端上显示的相应信息。这样，终端的公共物理架构(例如，触摸敏感表面)可以支持具有对用户而言直观且透明的用户界面的各种应用程序。Various applications that can be executed on the terminal can use at least one common physical user interface device, such as a touch sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal may be adjusted and/or changed between applications and/or within respective applications. In this way, the common physical architecture of the terminal (e.g., touch sensitive surface) can support various applications with a user interface that is intuitive and transparent to the user.

实施例1Example 1

本实施例提供了一种基于加权有向图的中医量化诊断系统，如图1所示，包括：权重计算模块，按照预定策略计算特征数据的权重；有向图构造模块，根据所述权重与所述特征数据之间关系构造加权有向图；推理诊断模块，将待检测的事例通过所述有向图构造模块进行推理，获取所述事例对应结果。将复杂的病机中的特征数据以有向图直观地表现出来，通过症状与证型的权重关系完成对病机的动态构建，相比目前智能辨证系统中穷尽列举标准证名模式的方法，能更好的显示对于构建的中医模型评测结，且诊断的适应性广，准确度好，能有效提高中医的诊断效率。This embodiment provides a TCM quantitative diagnosis system based on a weighted directed graph. As shown in FIG. 1 , it includes: a weight calculation module, which calculates the weight of the feature data according to a predetermined strategy; a directed graph construction module, according to the weight and The relationship between the feature data constructs a weighted directed graph; the reasoning and diagnosis module infers the case to be detected through the directed graph constructing module, and obtains the corresponding result of the case. The feature data in the complex pathogenesis is displayed intuitively in a directed graph, and the dynamic construction of the pathogenesis is completed through the weight relationship between symptoms and syndrome types. It can better display the evaluation results of the constructed TCM model, and has wide adaptability and good accuracy in diagnosis, which can effectively improve the diagnostic efficiency of TCM.

其中，权重计算模块，按照预定策略计算特征数据权重；Wherein, the weight calculation module calculates the feature data weight according to a predetermined strategy;

此处特征数据指的是中医上的证型和症状，症状(symptom)是指疾病过程中机体内的一系列机能、代谢和形态结构异常变化所引起的病人主观上的异常感觉或某些客观病态改变。临床常见的重要症状有发热、疼痛、体重改变、浮肿、呼吸困难、咳嗽、咳痰、咯血、食欲减退、消化不良、吞咽困难、恶心呕吐、呕血、便血、黄疸、排尿异常、贫血、休克等。证型，是中医所特有的一种名称。证，既证候，是指疾病发展过程中某一个阶段的病理属性的概括。中医将人体分为阴阳气血，又将病因分为风寒暑湿燥热痰及虚实等。证型就是由不同的病因引起阴阳气血的不同变化导致人体的不同疾病状态。它们可以通过execl、csv和txt 等数据形式进行存储；预定策略包括互信息计算方法、置信度计算方法和信息熵计算方法。在本实例中采用互信息计算方法，但不用于限制本发明。具体的，所述权重计算模块包括特征数据矩阵构建子模块、特征数据相关度确定子模块以及特征数据权重获取子模块；其中所述特征数据矩阵构建子模块用于将特征数据转换成稀疏矩阵；所述特征数据相关度确定子模块用于根据所述稀疏矩阵按照所述预定策略计算特征数据的相关度；所述特征数据权重获取子模块用于对所述特征数据的相关度进行标准化处理，获得所述特征数据的权重。The characteristic data here refers to the syndrome types and symptoms in traditional Chinese medicine. Symptoms refer to the patient’s subjective abnormal feeling or some objective abnormal changes caused by a series of abnormal changes in function, metabolism and morphological structure in the body during the disease process. Sick change. Common clinical symptoms include fever, pain, weight change, edema, dyspnea, cough, expectoration, hemoptysis, loss of appetite, indigestion, dysphagia, nausea and vomiting, hematemesis, blood in the stool, jaundice, abnormal urination, anemia, shock, etc. . Syndrome type is a unique name of traditional Chinese medicine. Syndrome, also known as syndrome, refers to the generalization of the pathological attributes at a certain stage in the development of the disease. Chinese medicine divides the human body into yin, yang, qi and blood, and divides the causes into wind, cold, summer, dampness, dryness, heat, phlegm, deficiency and excess. Syndrome types are different disease states of the human body caused by different changes in yin, yang, qi and blood caused by different etiologies. They can be stored in the form of data such as execl, csv and txt; the predetermined strategies include mutual information calculation method, confidence calculation method and information entropy calculation method. The mutual information calculation method is adopted in this example, but is not used to limit the present invention. Specifically, the weight calculation module includes a feature data matrix construction submodule, a feature data correlation determination submodule, and a feature data weight acquisition submodule; wherein the feature data matrix construction submodule is used to convert the feature data into a sparse matrix; The feature data correlation determination submodule is used to calculate the feature data correlation according to the sparse matrix according to the predetermined strategy; the feature data weight acquisition submodule is used to standardize the feature data correlation, Obtain the weight of the feature data.

(1)根据不同类别的特征数据分别构建稀疏矩阵A和稀疏矩阵B；具体的把特征数据按照症状和证型进行分类，再经过处理得到的矩阵形式，稀疏矩阵A中的元素为a_mn，稀疏矩阵B中的元素为b_mn，m表示矩阵中的列数，n表示矩阵中的行数；矩阵中的值为0或1，矩阵的列名为症状或证型，1代表此条病案中有这个症状，0代表没有。(1) Construct sparse matrix A and sparse matrix B respectively according to different types of feature data; specifically, classify the feature data according to symptoms and syndromes, and then process the obtained matrix form. The elements in sparse matrix A are a_mn, The element in the sparse matrix B is b_mn, m represents the number of columns in the matrix, and n represents the number of rows in the matrix; the value in the matrix is 0 or 1, the column name of the matrix is symptom or syndrome, and 1 represents this case. There is this symptom, 0 means no.

(2)分别从所述稀疏矩阵A中取出单个元素列和分别从矩阵B取出单个元素做“与”运算，获得矩阵C_i；(2) respectively taking out a single element column from the sparse matrix A and taking out a single element from the matrix B to perform an "AND" operation to obtain a matrix C_i ;

对新构造的矩阵Ci元素为c_mn，矩阵内的元素同样是0或者1，其结果为稀疏矩阵A和B运算而来，列名不做要求，为纯运算的到的矩阵。For the newly constructed matrix Ci, the element is c_mn, and the elements in the matrix are also 0 or 1. The result is the operation of the sparse matrix A and B, and the column name is not required, and it is the matrix obtained by pure operation.

所述特征数据相关度确定子模块用于根据所述稀疏矩阵A、稀疏矩阵B和稀疏矩阵C_i，按照互信息计算方法计算特征数据的相关度，具体为：The feature data correlation determination submodule is used to calculate the correlation of the feature data according to the mutual information calculation method according to the sparse matrix A, the sparse matrix B and the sparse matrix C_i , specifically:

p(x,y)＝c_i＝x,n＝y (3)p(x,y)=_ci=x,n=y (3)

其中，x表示症状、y表示证型，p(x)表示稀疏矩阵A中a_mn项在其所在列中出现的概率， a_mn是所述稀疏矩阵A中的元素，用0或1表示；p(y)表示所述稀疏矩阵B中b_mn项在其所在列中出现的概率，b_mn是所述稀疏矩阵B中的元素，用0或1表示；p(x,y)表示矩阵C中c_mn出现的的概率，PMI(x,y)为所述稀疏矩阵A和所述稀疏矩阵B中每个元素同时出现的概率，m表示矩阵中的列数，n表示矩阵中的行数。需要说明的是，互信息(Mutual Information)是信息论里一种有用的信息度量，它可以看成是一个随机变量中包含的关于另一个随机变量的信息量，或者说是一个随机变量由于已知另一个随机变量而减少的不肯定性。Among them, x represents the symptom, y represents the syndrome type, p(x) represents the probability that a_mn item in the sparse matrix A appears in its column, and a_mn is the element in the sparse matrix A, which is represented by 0 or 1; p(y) represents the probability that the item b_mn in the sparse matrix B appears in its column, b_mn is the element in the sparse matrix B, represented by 0 or 1; p(x, y) represents the matrix C In the probability of occurrence of c_mn , PMI(x, y) is the probability of each element in the sparse matrix A and the sparse matrix B appearing at the same time, m represents the number of columns in the matrix, and n represents the number of rows in the matrix . It should be noted that mutual information (Mutual Information) is a useful information measure in information theory. It can be regarded as the amount of information contained in a random variable about another random variable, or a random variable due to known Another random variable reduces uncertainty.

所述特征数据权重获取子模块具体用于，获取所述特征数据的相关度：The feature data weight acquisition sub-module is specifically used to obtain the correlation of the feature data:

计算所述特征数据的权重WF＝(wf1,wf2,...wfk,...wfn), 其中，

Calculate the weight of the feature data WF=(wf1,wf2,...wfk,...wfn), where,

需要说明的是，在多指标评价体系中，由于各评价指标的性质不同，通常具有不同的量纲和数量级。当各指标间的水平相差很大时，如果直接用原始指标值进行分析，就会突出数值较高的指标在综合分析中的作用，相对削弱数值水平较低指标的作用。因此，本实施例在对特征数据权值进行归一化的处理，将原始数据均转换为无量纲化指标测评值，即各指标值都处于同一个数量级别上，让各个特征对结果做出的贡献相同，可以进行综合测评分析，提高计算的精确度。It should be noted that in the multi-index evaluation system, due to the different nature of each evaluation index, it usually has different dimensions and orders of magnitude. When the level of each indicator differs greatly, if the original indicator value is directly used for analysis, the role of the indicator with higher value in the comprehensive analysis will be highlighted, and the role of the indicator with lower value level will be relatively weakened. Therefore, in this embodiment, the weights of the feature data are normalized, and the original data are converted into dimensionless index evaluation values, that is, each index value is at the same level of magnitude, so that each feature can make a difference to the result. Contributions are the same, comprehensive evaluation and analysis can be carried out to improve the accuracy of the calculation.

有向图构造模块，根据所述权重与所述特征数据之间关系构造加权有向图。The directed graph construction module constructs a weighted directed graph according to the relationship between the weight and the feature data.

还需要说明的是，有向图是一种图论模型，其核心在于如何确定边的方向和权重，现有技术多采用人工设定边的方向，利用统计学习方法确定权重。在中医诊断领域，边的方向也是人工设定，边的权重多采用频数或者条件概率等，本实施例中采用的是点式互信息作为权重的计算。有向图构造过程如下：根据不同特征数据之间的权重信息，如稀疏矩阵A、所述稀疏矩阵B，和稀疏矩阵A、所述稀疏矩阵B中元素的权重，构造三元组形成加权有向图。It should also be noted that the directed graph is a graph theory model, and its core lies in how to determine the direction and weight of the edge. The existing technology mostly adopts the manual setting of the direction of the edge, and uses the statistical learning method to determine the weight. In the field of traditional Chinese medicine diagnosis, the direction of the edge is also manually set, and the weight of the edge is mostly frequency or conditional probability. In this embodiment, point-type mutual information is used as the calculation of the weight. The directed graph construction process is as follows: According to the weight information between different feature data, such as the sparse matrix A, the sparse matrix B, and the weights of the elements in the sparse matrix A and the sparse matrix B, construct triples to form weighted to the diagram.

具体的，若每个症状ZZ由不同的向量表示形式如：症状ZZ＝[ZX,WF],则三元网络有向图的构成表示为{ZZ，ZX，WF}，其中ZX表示症状，WF表示权值。Specifically, if each symptom ZZ is represented by a different vector, such as: symptom ZZ=[ZX, WF], the composition of the directed graph of the ternary network is represented as {ZZ, ZX, WF}, where ZX represents the symptom, WF represents the weight.

构建的有向图如图2所示，ZX(1)、ZX(2)、ZX(3)表示三个不同证型，ZZ(1)、ZZ(2)…… ZZ(k)表示ZX(1)的k个症状，其中ZZ(1)是ZX(1)、ZX(2)和ZX(3)的公共症状，ZZ(2)是ZX(1) 和ZX(2)的公共症状，ZZ(3)、ZZ(4)……、ZZ(k)为ZX(1)独有的症状，通过下图可以发现当某个症状所在证型越多时，相比较此证型中的其他症状而言，此症状对应该证型的权重应该较小；当某个证型的症状总数越少时，相比较此证型中某个症状在其他证型中的权重而言，该症状在此证型中的权重应该较大。The constructed directed graph is shown in Figure 2, ZX(1), ZX(2), ZX(3) represent three different card types, ZZ(1), ZZ(2)... ZZ(k) represents ZX( 1) k symptoms, where ZZ(1) is the common symptom of ZX(1), ZX(2) and ZX(3), ZZ(2) is the common symptom of ZX(1) and ZX(2), ZZ (3), ZZ(4)..., ZZ(k) are the unique symptoms of ZX(1). From the figure below, it can be found that when there are more syndrome types of a certain symptom, compared with other symptoms in this syndrome type, the In other words, the weight of this symptom to the corresponding syndrome type should be smaller; when the total number of symptoms of a syndrome type is less, compared with the weight of a symptom in this syndrome type in other syndrome types, the symptom in this syndrome type should be weighted less. The weight in the type should be larger.

推理诊断模块，将待检测的事例的与所述特征数据相对应，得出所述特征数据对应的权重；根据所述特征数据对应的权重进行加权求和，得出参考结果和其对应权重和；対所述对应权重和降序排序，取阈值对参考结果进行舍弃，获取所述事例对应的最优结果，対所述对应权重降序排序，取阈值对参考结果进行舍弃，获取所述事例对应的最优结果。The reasoning and diagnosis module corresponds to the characteristic data of the case to be detected, and obtains the corresponding weight of the characteristic data; according to the weighted summation of the corresponding weights of the characteristic data, the reference result and the corresponding weight sum are obtained. ; For the corresponding weights and descending order, take the threshold to discard the reference result, obtain the optimal result corresponding to the case, and sort the corresponding weight in descending order, take the threshold to discard the reference result, and obtain the corresponding result of the case. optimal result.

具体的，如图2所示，现从某患者的病历中抽取的症状信息包含症状ZZ(1)、ZZ(2)、ZZ(3)、 ZZ(4)、……ZZ(k)，通过折半查找算法确定类别得到该类别对应的权重，折半查找算法为本领域常用技术手段，在此不赘述。则该患者为证型zx的概率为：Specifically, as shown in Figure 2, the symptom information currently extracted from a patient's medical record includes symptoms ZZ(1), ZZ(2), ZZ(3), ZZ(4), ... ZZ(k). The split-half search algorithm determines a category to obtain the weight corresponding to the category, and the split-half search algorithm is a common technical means in the field, which will not be repeated here. Then the probability of the patient being syndrome type zx is:

P(ZX)＝Sum(WFxk)＝WF(11)+WF(12)+WF(13)+WF(14)+WF(15)+…+WF(1k)，対证型的权重降序排序，取阈值对参考结果进行舍弃，具体的阈值可以取到中位数前面排序靠前的证型，对中位数后面的排序进行舍弃，获取所述事例对应的最优结果，其中WF(xk)(k＝1,2,3,…n) 为所述特征数据的权重，权重越大说明对结果的贡献率越高。P(ZX)=Sum(WFxk)=WF(11)+WF(12)+WF(13)+WF(14)+WF(15)+...+WF(1k), the weights of the proofs are sorted in descending order, Take the threshold value to discard the reference result. The specific threshold value can be used to get the syndrome type in front of the median, and discard the order after the median to obtain the optimal result corresponding to the case, where WF(xk) (k=1, 2, 3,...n) is the weight of the feature data, the larger the weight, the higher the contribution rate to the result.

本实施例将复杂的病机中的特征数据以有向图直观地表现出来，以病机的有向图定义为基础，通过症状与证型的权重关系完成对病机的动态构建；将待检测的事例的与特征数据相对应，得出所述特征数据对应的权重；根据所述特征数据对应的权重进行加权求和，得出参考结果和其对应权重和；対所述对应权重和降序排序，取阈值对参考结果进行舍弃，获取所述事例对应的最优结果，相比目前智能辨证系统中穷尽列举标准证名模式的方法，可有效回避目前中医智能辨证领域中共同存在的瓶颈问题即证型爆炸的问题，为中医智能辨证领域提供了新方向。需要说明的是，证型爆炸为中医辨证电脑系统的医理设计中，确定了48项辨证基本内容(即辨证元素)，存取了1500个标准证名模式(即复合证型)，并在其研究中，采取了 “调阈、兼容”等办法，并通过模糊数学相关理论，如利用空间度量法、变换减维(或增维) 法等对48项模元(即标准证型模式)进行模糊聚类分析，可形成500多个演绎证名模式。基于模式匹配的智能辨证模型中，必须确定标准证型模式。理论上讲，48个基本辨证要素的所有排列组合是肯定能够覆盖临床的，但一方面，其数据量(248)却是一个天文数字，将其各种组合予以编排是不可能的，另外一方面，临床上也并非各个辨证要素都可以任意排列组合，如{实火，阳虚}或{外风，实寒，实火}等组合方式在中医理论体系下是不能构成证型的。这就是目前中医智能辨证领域中共同存在证型爆炸问题。In this embodiment, the characteristic data in the complex pathogenesis is visually represented by a directed graph, and the dynamic construction of the pathogenesis is completed based on the directed graph definition of the pathogenesis through the weight relationship between symptoms and syndromes; The detected cases correspond to the characteristic data, and the weight corresponding to the characteristic data is obtained; the weighted summation is performed according to the weight corresponding to the characteristic data, and the reference result and the corresponding weight sum are obtained; the corresponding weight and the descending order are obtained. Sort, select the threshold to discard the reference results, and obtain the optimal results corresponding to the cases. Compared with the method of exhaustively enumerating the standard syndrome name patterns in the current intelligent syndrome differentiation system, it can effectively avoid the common bottleneck problem in the field of intelligent syndrome differentiation of traditional Chinese medicine. The problem of the instant syndrome explosion provides a new direction for the field of intelligent syndrome differentiation of traditional Chinese medicine. It should be noted that, in the medical design of the TCM syndrome differentiation computer system, the syndrome type explosion has determined 48 basic contents of syndrome differentiation (ie, syndrome differentiation elements), accessed 1,500 standard syndrome name patterns (ie, compound syndrome types), and stored them in the system. In the research, methods such as "adjusting the threshold and compatibility" were adopted, and the 48 modules (ie, the standard pattern) were analyzed by the relevant theories of fuzzy mathematics, such as the use of the spatial measurement method, the transformation dimension reduction (or dimension increase) method, etc. Fuzzy cluster analysis can form more than 500 deductive testimonial patterns. In the intelligent dialectical model based on pattern matching, the standard dialectical pattern must be determined. In theory, all the arrangements and combinations of the 48 basic dialectical elements can certainly cover the clinic. In terms of clinical practice, not all elements of syndrome differentiation can be arranged and combined arbitrarily, such as {external fire, yang deficiency} or {external wind, excessive cold, and actual fire} and other combinations cannot constitute syndrome types under the theoretical system of TCM. This is the common problem of syndrome type explosion in the field of TCM intelligent syndrome differentiation.

如图1所示，本实施例还提供了一种有向图的中医量化诊断方法，详细步骤表述如下：As shown in Figure 1, this embodiment also provides a directed graph quantitative diagnosis method for traditional Chinese medicine, and the detailed steps are described as follows:

步骤一：按十折交叉验证方法从原始数据中划分训练数据train_data和测试数据test_data；Step 1: Divide the training data train_data and test data test_data from the original data according to the ten-fold cross-validation method;

步骤二：For标签向量L中的每个标签l执行步骤三——步骤五；Step 2: For each label l in the label vector L, execute step 3 - step 5;

步骤三：依照权重确定方法，利用train_data数据计算各个特征的点式互信息然后将点式互信息归一化处理作为每个特征的权重信息；Step 3: according to the weight determination method, utilize the train_data data to calculate the point-type mutual information of each feature and then normalize the point-type mutual information as the weight information of each feature;

步骤四：在所有test_data数据中，依照公式(1)计算取自test_data数据的每一个未知事例与train_data事例之间对应的权重，选出K个权重最大的事例N(K)；Step 4: In all test_data data, according to formula (1), calculate the corresponding weight between each unknown case taken from the test_data data and the train_data case, and select the K cases with the largest weights N(K);

步骤五：End for。Step 5: End for.

为了说明本方法的有效性，我们进行了相关实验:选择418诊次冠心病数据作为研究对象，利用模型对样本进行训练。系统自动计算产生(症状，证型，权重)的三元组。为了便于加载和调用，后台将其转换为JSON格式进行存储，即{证型:{症状1:权重1,症状2:权重2……}} 下面以症状“心气不足”、“肺阴亏虚”为例进行说明：{'心气不足':{'心慌':0.17970033 096330984,'胸闷':0.1426162709006604,'头晕':0.09675323697280372,'头痛':0.06133919 561179082,'咽喉痛':0.54120158784469415,'腰部酸痛':0.049497041840412495,'双下肢凹陷性水肿':04621506196625556,'乏力':0.04621232129119378,'纳差':0.0407775007851630 54,'夜寐欠安':0.030477793526566727,'小便频':0.027262711151846412,'便秘':024742634905585082,'大便调':0.023804154276122487}}。In order to illustrate the effectiveness of this method, we conducted relevant experiments: we selected 418 coronary heart disease data as the research object, and used the model to train the samples. The system automatically calculates the triples generated (symptoms, syndromes, weights). In order to facilitate loading and calling, the background will convert it to JSON format for storage, namely {card type: {symptom 1:weight 1, symptom 2:weight 2...}} The following symptoms are "insufficient heart qi" and "lung yin deficiency". "Example to illustrate: {'insufficient heart qi': {'palpitation': 0.17970033 096330984, 'chest tightness': 0.1426162709006604, 'dizziness': 0.09675323697280372, 'headache': 0.06133919 561179082, '561179082, '568 waist pain': 0.4994,012 :0.049497041840412495,'双下肢凹陷性水肿':04621506196625556,'乏力':0.04621232129119378,'纳差':0.0407775007851630 54,'夜寐欠安':0.030477793526566727,'小便频':0.027262711151846412,'便秘':024742634905585082,'大便Tune': 0.023804154276122487}}.

{'肺阴亏虚':{'心慌':0.14956935140308092,'胸闷':0.12866125182845342,'头晕':0. 08022981081402662,'头痛':0.05275585322018142,'咽喉痛':0.052129023680618475,'腰部酸痛':0.04847449413050493,'双下肢凹陷性水肿':04621506196625556,'乏力':0.04142249 1758973445,'纳差':032739345289794476,'夜寐欠安':0.028341493055174725,'小便频':0. 024742634905585082,'便秘':0.023804154276122487,'大便调':0.023528103602116983}}。{'肺阴亏虚':{'心慌':0.14956935140308092,'胸闷':0.12866125182845342,'头晕':0. 08022981081402662,'头痛':0.05275585322018142,'咽喉痛':0.052129023680618475,'腰部酸痛':0.04847449413050493,'双下肢凹陷性水肿':04621506196625556,'乏力':0.04142249 1758973445,'纳差':032739345289794476,'夜寐欠安':0.028341493055174725,'小便频':0. 024742634905585082,'便秘':0.023804154276122487,'大便调' :0.023528103602116983}}.

模型有效性测试：Model validity test:

从样本中随机抽取10％样本进行测试，进行10次随机抽样，计算结果与原始标注的诊断结果进行比对，一致的记为1，否则为0，计算模型的1-错误率,覆盖率,排序损失,平均精度，汉明损失，用于评价模型效果。Randomly select 10% of the samples from the sample for testing, conduct 10 random sampling, and compare the calculated results with the original marked diagnostic results. If they are consistent, they will be recorded as 1, otherwise they will be 0. Calculate the 1-error rate of the model, the coverage rate, Ranking loss, average precision, and Hamming loss are used to evaluate model performance.

相关指标的定义如下：The relevant indicators are defined as follows:

1-错误率(One Error,OE↓)，该指标用于考察在样本的概念标记排序序列中，序列最前端的标记不属于样本标记集合的情况。其指标表达式为：1-Error rate (One Error, OE↓), this indicator is used to examine the situation in which the label at the front end of the sequence does not belong to the sample label set in the conceptual label sorting sequence of the sample. Its index expression is:

排序第一H(x_i)＝1，否则H(x_i)＝0。Sort first H(x_i )=1, otherwise H(x_i )=0.

覆盖率(Coverage↓)，该指标用于考察在样本的概念标记排序序列中，覆盖隶属于样本的所有概念标记所需的搜索深度情况。其指标表达式为：Coverage (Coverage↓), this indicator is used to investigate the search depth required to cover all conceptual tags belonging to the sample in the ranking sequence of the conceptual tags of the sample. Its index expression is:

其中C(x_i)＝{l|f(x_i,l)≥f(x_i,l_i'),l∈y}，且

where C(x_i )={l|f(x_i, l)≥f(x_i ,l_i '),l∈y}, and

排序损失(Ranking Loss,RL↓)，该指标用于考察在样本的概念标记排序序列中出现排序错误的情况。其指标表达式为：Ranking Loss (RL↓), this metric is used to investigate the occurrence of ranking errors in the ranking sequence of conceptual markers of the sample. Its index expression is:

其中

in

平均精度(Average Precision,AVP↑)，该指标用于考察在样本的概念标记排序序列中，排在隶属于该样本的概念标记之前的标记仍属于样本标记集合的情况。Average Precision (AVP↑), this indicator is used to examine the situation that in the ordering sequence of concept labels of a sample, the label before the concept label belonging to the sample still belongs to the sample label set.

其中

in

汉明损失(Hamming Loss,HL↓)，该指标用于考察样本在单个标记上的误分类情况，即隶属于该样本的概念标记未出现在标记集合中而不属于该样本的概念标记出现在标记集合中。Hamming loss (Hamming Loss, HL↓), this indicator is used to examine the misclassification of a sample on a single label, that is, the concept label belonging to the sample does not appear in the label set but the concept label that does not belong to the sample appears in the tag collection.

其中Q为标签总数，h(x_i)为分类结果。Where Q is the total number of labels and h(_xi ) is the classification result.

注：↑表示值越大效果越好，↓表示值越小效果越好，m为记录的条数。Note: ↑ means the larger the value, the better the effect, ↓ means the smaller the value, the better the effect, m is the number of records.

模型各指标计算结果如下表所示：The calculation results of each index of the model are shown in the following table:

表1模型评价Table 1 Model evaluation

需要说明的是，得出的结果用多标记问题常用的几个评价指标Hamming_Loss、Average_Precision、One_Error、Ranking_Loss、Coverage来进行评价，本发明构建的模型能更准确显示评测结果，One_Error反映了模型诊断结果与真实结果相比的误判率；Average_Precision反映了模型诊断结果与真实诊断结果的相似度；Ranking_Loss反映了模型诊断结果中各个子项的排序与真实诊断结果对应情况的误差率；Hamming_Loss反映了模型诊断结果中各个子项与真实诊断结果对应情况的误判率；Coverage反映了模型诊断结果与真实诊断结果相比的冗余情况。因此本发明的模型测试，不仅要考虑了各项指标的数值意义，还关注其中医角度的结果阐释。It should be noted that the obtained results are evaluated by several evaluation indicators commonly used in multi-label problems: Hamming_Loss, Average_Precision, One_Error, Ranking_Loss, and Coverage. The model constructed by the present invention can more accurately display the evaluation results, and One_Error reflects the model diagnosis results. The misjudgment rate compared with the real results; Average_Precision reflects the similarity between the model diagnosis results and the real diagnosis results; Ranking_Loss reflects the error rate between the ranking of each sub-item in the model diagnosis results and the actual diagnosis results; Hamming_Loss reflects the model The misjudgment rate of each sub-item in the diagnosis result corresponding to the real diagnosis result; Coverage reflects the redundancy of the model diagnosis result compared with the real diagnosis result. Therefore, the model test of the present invention should not only consider the numerical significance of each index, but also pay attention to the interpretation of the results from the perspective of traditional Chinese medicine.

实施例2Example 2

本实施例提供了一种计算机可读存储介质，所述计算机存储介质存储有实施例1描述的系统。This embodiment provides a computer-readable storage medium, where the system described inEmbodiment 1 is stored in the computer storage medium.

所述计算机可读存储介质可以包括终端(计算机)的内部存储单元，例如终端的硬盘或内存。所述计算机可读存储介质也可以是所述终端的外部存储设备，例如所述终端上配备的插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(SecureDigital,SD)卡，闪存卡(Flash Card)等。进一步地，所述计算机可读存储介质还可以既包括所述终端的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述终端所需的其他程序和数据。所述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may include an internal storage unit of a terminal (computer), such as a hard disk or a memory of the terminal. The computer-readable storage medium may also be an external storage device of the terminal, for example, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card equipped on the terminal, Flash card (Flash Card) and so on. Further, the computer-readable storage medium may also include both an internal storage unit of the terminal and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the terminal. The computer-readable storage medium can also be used to temporarily store data that has been output or is to be output.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two. Interchangeability, the above description has generally described the components and steps of each example in terms of function. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods of implementing the described functionality for each particular application, but such implementations should not be considered beyond the scope of the present invention.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，上述描述的终端和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the terminal and unit described above, reference may be made to the corresponding process in the foregoing method embodiments, and details are not repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露终端和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接，也可以是电的，机械的或其它的形式连接。In the several embodiments provided in this application, it should be understood that the disclosed terminal and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本发明实施例方案的目的。The units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions in the embodiments of the present invention.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分，或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-OnlyMemory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially or a part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围，其均应涵盖在本发明的权利要求和说明书的范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features thereof can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention. scope, which should be included in the scope of the claims and description of the present invention.

Claims

Translated fromChinese

1.一种基于加权有向图的中医量化诊断系统，其特征在于，包括：1. a traditional Chinese medicine quantitative diagnosis system based on weighted directed graph, is characterized in that, comprises:

权重计算模块，按照预定策略计算特征数据的权重；The weight calculation module calculates the weight of the characteristic data according to the predetermined strategy;

有向图构造模块，根据所述权重与所述特征数据之间关系构造加权有向图；A directed graph construction module, which constructs a weighted directed graph according to the relationship between the weight and the feature data;

推理诊断模块，将待检测的事例通过所述有向图构造模块进行推理，获取所述事例对应结果；an inference diagnosis module, which infers the case to be detected through the directed graph construction module, and obtains the corresponding result of the case;

所述权重计算模块包括特征数据矩阵构建子模块、特征数据相关度确定子模块以及特征数据权重获取子模块；其中The weight calculation module includes a feature data matrix construction submodule, a feature data correlation determination submodule, and a feature data weight acquisition submodule; wherein

具体包括：把特征数据按照症状和证型进行分类，再经过处理得到的矩阵形式，分别构建稀疏矩阵A和稀疏矩阵B；Specifically, it includes: classifying the characteristic data according to symptoms and syndrome types, and then processing the obtained matrix form to construct a sparse matrix A and a sparse matrix B respectively;

所述特征数据权重获取子模块用于对所述特征数据的相关度进行标准化处理，获得所述特征数据的权重；The characteristic data weight acquisition sub-module is used to standardize the correlation of the characteristic data to obtain the weight of the characteristic data;

根据稀疏矩阵A、所述稀疏矩阵B，和稀疏矩阵A、所述稀疏矩阵B中元素的权重，构造三元组形成加权有向图；当某个症状所在证型越多时，相比较此证型中的其他症状而言，此症状对应该证型的权重应该较小；当某个证型的症状总数越少时，相比较此证型中某个症状在其他证型中的权重而言，该症状在此证型中的权重应该较大；According to the sparse matrix A, the sparse matrix B, and the weights of the elements in the sparse matrix A and the sparse matrix B, construct triples to form a weighted directed graph; For other symptoms in the type, the weight of this symptom to the syndrome type should be smaller; when the total number of symptoms of a syndrome type is less, compared with the weight of a symptom in this syndrome type in other syndrome types , the symptom should have a larger weight in this syndrome;

所述待检测的事例通过所述有向图构造模块进行推理的方法具体为：The method for inferring the case to be detected through the directed graph construction module is specifically:

対所述对应权重和降序排序，取阈值对参考结果进行舍弃，获取所述事例对应的最优结果。According to the corresponding weight and descending order, the reference result is discarded by taking the threshold, and the optimal result corresponding to the case is obtained.

2.根据权利要求1所述的一种基于加权有向图的中医量化诊断系统，其特征在于，所述预定策略包括互信息计算方法、置信度计算方法和信息熵计算方法。2 . The weighted directed graph-based quantitative diagnosis system for traditional Chinese medicine according to claim 1 , wherein the predetermined strategy includes a mutual information calculation method, a confidence degree calculation method and an information entropy calculation method. 3 .

3.根据权利要求1所述的一种基于加权有向图的中医量化诊断系统，其特征在于，所述特征数据矩阵构建子模块的数据处理过程为：3. a kind of traditional Chinese medicine quantitative diagnosis system based on weighted directed graph according to claim 1, is characterized in that, the data processing process of described characteristic data matrix building submodule is:

其中，稀疏矩阵A中的元素为a_mn，稀疏矩阵B中的元素为b_mn，矩阵Ci元素为c_mn，m表示矩阵中的列数，n表示矩阵中的行数，i表示矩阵Ci的下标数。

Among them, the element in sparse matrix A is a_mn , the element in sparse matrix B is b_mn , the element in matrix Ci is c_mn , m represents the number of columns in the matrix, n represents the number of rows in the matrix, and i represents the number of rows in the matrix Ci Subscript number.

4.根据权利要求2或3任意一项所述的一种基于加权有向图的中医量化诊断系统，其特征在于，所述特征数据相关度确定子模块用于根据所述稀疏矩阵A、稀疏矩阵B和稀疏矩阵C_i，按照互信息计算方法计算特征数据的相关度，具体为：4. a kind of traditional Chinese medicine quantitative diagnosis system based on weighted directed graph according to any one of claim 2 or 3, is characterized in that, described feature data correlation degree determination submodule is used for according to described sparse matrix A, sparse Matrix B and sparse matrix C_i , calculate the correlation of feature data according to the mutual information calculation method, specifically:

p(x,y)＝c_i＝x,n＝y (3)p(x,y)=_ci=x,n=y (3)

其中，x表示症状、y表示证型，p(x)表示稀疏矩阵A中a_mn项在其所在列中出现的概率，a_mn是所述稀疏矩阵A中的元素，用0或1表示；p(y)表示所述稀疏矩阵B中b_mn项在其所在列中出现的概率，b_mn是所述稀疏矩阵B中的元素，用0或1表示；p(x,y)表示矩阵C中c_mn出现的概率，PMI(x,y)为所述稀疏矩阵A和所述稀疏矩阵B中每个元素同时出现的概率，m表示矩阵中的列数，n表示矩阵中的行数。Among them, x represents the symptom, y represents the syndrome type, p(x) represents the probability that a_mn item in the sparse matrix A appears in its column, and a_mn is the element in the sparse matrix A, which is represented by 0 or 1; p(y) represents the probability that the item b_mn in the sparse matrix B appears in its column, b_mn is the element in the sparse matrix B, represented by 0 or 1; p(x, y) represents the matrix C In the probability of occurrence of c_mn , PMI(x, y) is the probability of simultaneous occurrence of each element in the sparse matrix A and the sparse matrix B, m represents the number of columns in the matrix, and n represents the number of rows in the matrix.

5.根据权利要求3所述的一种基于加权有向图的中医量化诊断系统，其特征在于，所述特征数据权重获取子模块的处理过程为：5. a kind of traditional Chinese medicine quantitative diagnosis system based on weighted directed graph according to claim 3, is characterized in that, the processing procedure of described feature data weight acquisition submodule is:

获取所述特征数据的相关度

计算所述特征数据的权重：WF＝(wf1,wf2,...wfk,...wfn),其中，

Obtain the correlation of the feature data

Calculate the weight of the feature data: WF=(wf1,wf2,...wfk,...wfn), where,

6.一种计算机可读存储介质，其特征在于，所述计算机存储介质存储有如权利要求1-5任意一项所述的中医量化诊断系统。6. A computer-readable storage medium, wherein the computer storage medium stores the traditional Chinese medicine quantitative diagnosis system according to any one of claims 1-5.