CN109409434A

Movatterモバイル変換

Info

Publication number: CN109409434A
Application number: CN201811292849.5A
Authority: CN
Inventors: 黄立勤; 陈宋
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2018-02-05
Filing date: 2018-11-01
Publication date: 2019-03-01
Anticipated expiration: 2038-11-01
Also published as: CN109409434B

Abstract

Translated fromChinese

本发明提出一种基于随机森林的肝脏疾病数据分类规则提取的方法，包括：步骤1：对肝脏疾病中不平衡或者不规则的数据进行预处理，通过SMOTE（合成少数过采样技术）获得肝脏疾病数据集；步骤2：利用随机森林模型对肝脏疾病数据集进行二进制稀疏编码，获得肝脏疾病规则集；步骤3：对肝脏疾病规则集进行弹性范数稀疏编码规则提取，获得编码肝脏疾病规则集；步骤5：进行原始数据验证，生成最终规则集。本发明提出的结合L1和L2范数的弹性范数规则提取和特征选择方法使本发明方法不仅可以选择相对较少的特征，并可以提高泛化能力，提高分类精度。本发明提出的二次规则提取与验证方法大大提升了生成规则的可信度。

The present invention proposes a method for extracting classification rules of liver disease data based on random forest, including: Step 1: Preprocess unbalanced or irregular data in liver disease, and obtain liver disease through SMOTE (Synthetic Minority Oversampling Technology). Data set; Step 2: Use the random forest model to perform binary sparse coding on the liver disease data set to obtain a liver disease rule set; Step 3: Extract the elastic norm sparse coding rule from the liver disease rule set to obtain a coded liver disease rule set; Step 5: Validate the original data and generate the final rule set. The elastic norm rule extraction and feature selection method combining L1 and L2 norm proposed by the present invention enables the method of the present invention not only to select relatively few features, but also to improve generalization ability and classification accuracy. The quadratic rule extraction and verification method proposed by the present invention greatly improves the credibility of the generated rules.