CN117216150A

Movatterモバイル変換

Info

Publication number: CN117216150A
Application number: CN202311143074.6A
Authority: CN
Inventors: 陆海涛; 黎跃鸣; 杜泓江; 朱晓霞; 邰创业; 刘海龙; 季晓松
Original assignee: Shanghai Dragon New Media Co ltd
Current assignee: Shanghai Dragon New Media Co ltd
Priority date: 2023-09-06
Filing date: 2023-09-06
Publication date: 2023-12-12

Abstract

本发明提供了一种基于数据仓库的数据挖掘系统，属于数据管理技术领域，它解决了现有数据挖掘功能单一，缺少预测、安保防范和训练学习等功能的技术问题。包括用户界面模块、挖掘的主题及任务模块、数据预处理模块、挖掘模块、方法插槽模块、模式评价模块、人工选择模块、训练学习模块、知识库模块和数据清理模块以及用于监控各个模块的数据安保模块，挖掘模块由挖掘综合器和数据挖掘算法库组成，方法插槽模块包括描述性数据挖掘子模块、异常检测数据挖掘子模块和预测性数据挖掘子模块，知识表示模块可进行事先无法预测的有价值知识进行收集分析，本发明可基于数据仓库进行建模、数据挖掘，具有预测和防范和训练学习等功能。

The invention provides a data mining system based on a data warehouse, which belongs to the technical field of data management. It solves the technical problems of the existing data mining function being single and lacking functions such as prediction, security prevention, training and learning. Including user interface module, mining topic and task module, data preprocessing module, mining module, method slot module, model evaluation module, manual selection module, training learning module, knowledge base module and data cleaning module, as well as monitoring modules Data security module, the mining module consists of a mining synthesizer and a data mining algorithm library. The method slot module includes a descriptive data mining sub-module, anomaly detection data mining sub-module and predictive data mining sub-module. The knowledge representation module can perform pre-processing Unpredictable valuable knowledge can be collected and analyzed. The present invention can perform modeling and data mining based on the data warehouse, and has functions such as prediction, prevention, training and learning.

Description

Translated fromChinese

一种基于数据仓库的数据挖掘系统A data mining system based on data warehouse

技术领域Technical field

本发明属于数据管理技术领域，涉及一种基于数据仓库的数据挖掘系统。The invention belongs to the technical field of data management and relates to a data mining system based on a data warehouse.

背景技术Background technique

随着数据库技术的广泛应用,以及人们对当今社会信息的高层次需求以事务处理为核心、支持业务操作环境与平台的数据库技术已不能适应人们在分析和决策层次上的需要。为了有效地为企业和政府的管理与决策过程提供重要的信息需要根据决策的需要收集来自企业内外的有关数据并加以适当的组织处理,以形成一个综合的面向决策的环境。With the widespread application of database technology and people's high-level needs for information in today's society, database technology with transaction processing as its core and supporting business operating environments and platforms can no longer meet people's needs at the analysis and decision-making levels. In order to effectively provide important information for the management and decision-making process of enterprises and governments, it is necessary to collect relevant data from inside and outside the enterprise according to the needs of decision-making and organize it appropriately to form a comprehensive decision-oriented environment.

基于数据仓库的数据挖掘技术是一种对数据仓库中的数据进行深层次的加工和处理的过程也是一种实现数据仓库决策价值的方法和工具。构建数据仓库的最终目标是为了从各类海量数据中提取出对有关决策和管理活动具有重要指导意义的规律性知识但是由于各类数据是分散于若干业务数据库或其他数据源中,因此,要得到对各类决策分析有用的知识必须具有相应的从海量数据中提取价值信息的工具。Data mining technology based on data warehouse is a process of in-depth processing and processing of data in the data warehouse. It is also a method and tool to realize the decision-making value of the data warehouse. The ultimate goal of building a data warehouse is to extract regular knowledge from various types of massive data that is important for guiding decision-making and management activities. However, since various types of data are scattered in several business databases or other data sources, it is necessary to To obtain useful knowledge for various types of decision-making analysis, we must have corresponding tools to extract valuable information from massive data.

但是现有的数据挖掘系统，性能单一保守，缺少预测、安全防范和学习训练等功能，不能有效的进行个性化选择学习和多方式的可视化展示。However, the existing data mining system has single and conservative performance, lacks functions such as prediction, security prevention, and learning and training, and cannot effectively perform personalized selection learning and multi-mode visual display.

经检索，如中国专利文献公开了数据挖掘方法和装置【申请号：201510598360.0；公开号：CN105404637B】。这种数据挖掘方法和装置的方法包括：获取数据挖掘模型，所述数据挖掘模型对应于与数据仓库中的数据表，并且所述数据表中记录有进行数据挖掘所依据的数据挖掘规则；根据所述数据挖掘规则对数据仓库中的事实数据进行挖掘。虽然专利中公开的数据挖掘方法和装置可实现了在数据仓库系统中的自动数据挖掘，可根据所述数据列表模型及所述指标模型对应的数据挖掘规则，获取所述事实表的数据列名称，并对所述指标模型中的指标进行筛选、计算、统计和归类，但是其挖掘功能过于单一，缺少预测、安保防范和学习训练等功能，不能进行事先无法预测的有价值知识进行收集分析。After searching, for example, Chinese patent documents disclose data mining methods and devices [Application No.: 201510598360.0; Publication No.: CN105404637B]. The method of this data mining method and device includes: obtaining a data mining model, the data mining model corresponds to a data table in a data warehouse, and the data table records data mining rules based on which data mining is performed; according to The data mining rules mine fact data in the data warehouse. Although the data mining method and device disclosed in the patent can realize automatic data mining in the data warehouse system, the data column names of the fact table can be obtained according to the data mining rules corresponding to the data list model and the indicator model. , and screen, calculate, count and classify the indicators in the indicator model. However, its mining function is too simple and lacks functions such as prediction, security prevention and learning training, and it cannot collect and analyze valuable knowledge that cannot be predicted in advance. .

基于此，我们提出一种基于数据仓库的数据挖掘系统，实现对数据仓库进行建模和深度数据挖掘，同时保证本系统具有预测、安保防范和训练学习等功能。Based on this, we propose a data mining system based on data warehouse to implement modeling and in-depth data mining of the data warehouse, while ensuring that the system has functions such as prediction, security prevention, and training and learning.

发明内容Contents of the invention

本发明的目的是针对现有的技术存在上述问题，提出了一种基于数据仓库的数据挖掘系统，该发明要解决的技术问题是：如何实现对数据仓库进行建模和深度数据挖掘，同时保证本系统具有预测、安保防范和训练学习等功能。The purpose of the present invention is to propose a data mining system based on a data warehouse in view of the above-mentioned problems existing in the existing technology. The technical problem to be solved by the invention is: how to implement modeling and in-depth data mining of the data warehouse while ensuring This system has functions such as prediction, security prevention, and training and learning.

本发明的目的可通过下列技术方案来实现：The object of the present invention can be achieved through the following technical solutions:

一种基于数据仓库的数据挖掘系统，包括用户界面模块、挖掘的主题及任务模块、数据预处理模块、挖掘模块、方法插槽模块、模式评价模块、人工选择模块、训练学习模块、知识库模块和数据清理模块以及用于监控各个模块的数据安保模块，所述挖掘模块由挖掘综合器和数据挖掘算法库组成，方法插槽模块包括描述性数据挖掘子模块、异常检测数据挖掘子模块和预测性数据挖掘子模块，模式评价模块与知识库模块之间通过知识表示模块进行知识数据转化传递，知识表示模块可进行事先无法预测的有价值知识进行收集分析，数据清理模块对数据预处理模块进行数据清理。A data mining system based on data warehouse, including user interface module, mining topic and task module, data preprocessing module, mining module, method slot module, pattern evaluation module, manual selection module, training and learning module, and knowledge base module and data cleaning module and data security module for monitoring each module. The mining module is composed of a mining synthesizer and a data mining algorithm library. The method slot module includes a descriptive data mining sub-module, anomaly detection data mining sub-module and prediction The data mining sub-module, the pattern evaluation module and the knowledge base module carry out knowledge data transformation and transfer through the knowledge representation module. The knowledge representation module can collect and analyze valuable knowledge that cannot be predicted in advance, and the data cleaning module performs on the data preprocessing module. Data cleaning.

本发明的工作原理：用户界面模块用于通过描述和可视化方式将原始数据转化为具体数据的数量和关系特征，挖掘的主题及任务模块用于确定主题类型，数据预处理模块对数据进行规范化、集成、转化、归约和转换，挖掘模块可建立数据挖掘的模型，并保证的模型的准确性、模型可理解性和模型的性能，方法插槽模块用于提供数据挖掘的方法，其中描述性数据挖掘子模块进行描述性方法数据挖掘、异常检测数据挖掘子模块进行异常检测方法数据挖掘和预测性数据挖掘子模块进行预测性方法数据挖掘，模式评价模块用于验证及评估和对数据挖掘问题进行反馈，并将该知识的表示保存至知识库模块，人工选择模块用于按照客户的需求进行个性化选择，个性化选择的方式和方法通过训练学习模块进行学习，并将该个性化选择的方式和方法保存到知识库模块，并导入挖掘模块的数据挖掘算法库，并在下次挖掘模块的挖掘综合器进行挖掘时，也同步通过该个性化选择的方式和方法进行数据挖掘，知识表示模块可进行事先无法预测的有价值知识进行收集分析，数据清理模块对数据预处理模块进行数据清理，数据安保模块监控各个模块，保证各个模块的运行安全，避免外部病毒攻击本系统。The working principle of the present invention: the user interface module is used to convert original data into the quantity and relationship characteristics of specific data through description and visualization, the mining topic and task module is used to determine the topic type, and the data preprocessing module normalizes and Integration, transformation, reduction and conversion, the mining module can establish a data mining model and ensure the accuracy, understandability and performance of the model. The method slot module is used to provide data mining methods, among which the descriptive The data mining sub-module performs descriptive method data mining and the anomaly detection data mining sub-module performs anomaly detection method data mining and the predictive data mining sub-module performs predictive method data mining. The pattern evaluation module is used to verify and evaluate data mining problems. Give feedback and save the representation of the knowledge to the knowledge base module. The manual selection module is used to make personalized selections according to customer needs. The methods and methods of personalized selections are learned through the training learning module, and the personalized selections are The methods and methods are saved to the knowledge base module and imported into the data mining algorithm library of the mining module. When the mining synthesizer of the mining module performs mining next time, data mining will also be performed simultaneously through the personalized selected methods and methods. The knowledge representation module It can collect and analyze valuable knowledge that cannot be predicted in advance. The data cleaning module cleans the data of the data preprocessing module. The data security module monitors each module to ensure the safe operation of each module and avoid external viruses from attacking the system.

所述挖掘的主题及任务模块的主题类型包括但不限于保持力控制、风险预测、收益率分析、数据趋势分析、雇员分析、区域分析、分类聚类和可视化研究。The topics to be mined and the topic types of task modules include but are not limited to retention control, risk prediction, profitability analysis, data trend analysis, employee analysis, regional analysis, classification clustering and visual research.

所述数据预处理模块包括依次进行的数据源子模块、数据规范化子模块、数据集成子模块、数据转化子模块、数据归约子模块和微软数据转换服务子模块，数据源子模块的数据进入数据预处理模块后，数据清理模块对数据进行清理，清理后的数据进入数据规范化子模块，数据清理模块同时还可对数据源子模块、数据规范化子模块、数据集成子模块、数据转化子模块、数据归约子模块和微软数据转换服务子模的数据进行清理，数据源子模块对挖掘的主题及任务模块的主题数据以及任务数据进行读取。The data preprocessing module includes a data source sub-module, a data normalization sub-module, a data integration sub-module, a data conversion sub-module, a data reduction sub-module and a Microsoft data conversion service sub-module that are performed in sequence. The data from the data source sub-module enters After the data preprocessing module, the data cleaning module cleans the data. The cleaned data enters the data normalization sub-module. The data cleaning module can also perform data source sub-module, data normalization sub-module, data integration sub-module and data transformation sub-module. , data reduction sub-module and Microsoft data conversion service sub-module are cleaned, and the data source sub-module reads the topic data and task data of the mined topic and task modules.

所述数据集成子模块的关键是获取数据，包括但不限于访问数据仓库，访问数据方法包括但不限于：通过基于事务的关系数据库或基于PC的数据库访问数据、通过数据转换工具访问数据、用查询工具访问数据、从平面文件中访问数据。The key to the data integration sub-module is to obtain data, including but not limited to accessing the data warehouse. The methods of accessing data include but are not limited to: accessing data through transaction-based relational databases or PC-based databases, accessing data through data conversion tools, using Query tools access data, access data from flat files.

所述数据归约子模块通过聚集、删除冗余特性或聚类等方法来压缩数据，包括但不限于的方法有数据立方体聚集、维归约、数据压缩、数值归约、离散化和概念分层产生。The data reduction sub-module compresses data through methods such as aggregation, deletion of redundant features or clustering, including but not limited to methods such as data cube aggregation, dimension reduction, data compression, numerical reduction, discretization and concept analysis. layer is generated.

所述数据清理模块要解决的问题包括但不限于数据质量、冗余数据、过时数据和术语定义的变化；会使数据集产生的问题包括但不限于一致性问题、失效数据的清洗问题、印刷错误的清洗问题、数值缺失和数据导出。The problems to be solved by the data cleaning module include but are not limited to data quality, redundant data, outdated data and changes in terminology definitions; problems that will arise in the data set include but are not limited to consistency issues, invalid data cleaning issues, printing issues, etc. Wrong cleaning issues, missing values, and data export.

所述挖掘模块可建立数据挖掘的模型，数据挖掘的模型包括但不限于保证模型的准确性、模型可理解性和模型的性能；模型的准确性可通过时间来检验其有多大程度的准确性，模型可理解性即为模型是否可以使我们了解输入对结果会产生什么作用、模型是否可以使我们了解预测为什么会成功或失败、模型是否可以使我们对复杂的数据集产生预测的结果以及模型是否能对其产生的结果进行检测，模型的性能具体为需要以什么速度构造出模型以及需要以什么速度从模型中获得预测结果。The mining module can establish a data mining model. The data mining model includes but is not limited to ensuring the accuracy of the model, the understandability of the model, and the performance of the model; the accuracy of the model can be tested over time to what extent it is accurate. , model understandability is whether the model can enable us to understand what effect the input will have on the results, whether the model can enable us to understand why the prediction will succeed or fail, whether the model can enable us to produce predicted results for complex data sets, and whether the model can enable us to predict the results of complex data sets. Whether the results it produces can be tested, the performance of the model specifically refers to how quickly the model needs to be constructed and how quickly the predicted results need to be obtained from the model.

所述方法插槽模块包括描述性数据挖掘子模块、异常检测子模块和预测性数据挖掘子模块，描述性数据挖掘子模块的分析方法包括但不限于关联分析、聚类分析和序列分析，异常检测子模块的分析方法包括但不限于异常检测分析，预测性数据挖掘子模块的分析方法包括但不限于进化分析、分类分析、非结构数据分析和统计回归分析。The method slot module includes a descriptive data mining sub-module, anomaly detection sub-module and predictive data mining sub-module. The analysis methods of the descriptive data mining sub-module include but are not limited to correlation analysis, cluster analysis and sequence analysis. The analysis methods of the detection sub-module include but are not limited to anomaly detection analysis, and the analysis methods of the predictive data mining sub-module include but are not limited to evolutionary analysis, classification analysis, unstructured data analysis and statistical regression analysis.

所述模式评价模块包括但不限于验证及评估子模块和数据挖掘问题反馈子模块，验证和评估子模块的验证方法包括用与建立模型相同的数据集对模型进行评价比用不同的数据集对其进行评价会获得更好的结果、模型的某些预测结果会比其他预测结果更加准确和由于模型以样例数据为基础建立的，应具有好的结果；验证和评估子模块的评估方法由于在不同的数据挖掘方法都汇集在数据挖掘算法下，它们的确存在着很大的区别；数据挖掘从人工智能领域借鉴了很多东西，人工智能技术的种类繁多，存在众多不同数据挖掘方法的原因；数据挖掘问题反馈子模块的问题包括但不限于商业用户提出的问题、技术问题、数据挖掘应用问题、实施数据挖掘项目考虑的问题和数据挖掘对社会的影响的有关隐私问题。The model evaluation module includes but is not limited to the verification and evaluation sub-module and the data mining problem feedback sub-module. The verification method of the verification and evaluation sub-module includes using the same data set used to build the model to evaluate the model rather than using different data sets. Its evaluation will obtain better results. Some prediction results of the model will be more accurate than other prediction results and since the model is established based on sample data, it should have good results; the evaluation method of the verification and evaluation sub-module is due to When different data mining methods are brought together under the data mining algorithm, they do have big differences; data mining borrows a lot from the field of artificial intelligence. There are many types of artificial intelligence technology, and there are many reasons for different data mining methods; Questions in the data mining problem feedback sub-module include but are not limited to questions raised by business users, technical issues, data mining application issues, issues considered in implementing data mining projects and privacy issues related to the impact of data mining on society.

所述事先无法预测的有价值知识包括但不限于其他候选结果、获选边际率和预测，其他候选结果中除了想要知道模型将会预测出什么结果之外，可能还会对其他候选预测结果也产生兴趣，获选边际率对预测结果非常感兴趣的一点是最终预测结果与其他候选结果之间的差距有多大，预测则是对预测过程可能想要知道的另一件事情就是模型为什么会得到这样的预测结果。The valuable knowledge that cannot be predicted in advance includes but is not limited to other candidate results, selection margins and predictions. In addition to wanting to know what results the model will predict, other candidate results may also predict the results of other candidates. Also interested, the selected marginal rate is very interested in the prediction results is how big the gap is between the final prediction results and other candidate results. Prediction is about the prediction process. Another thing you may want to know is why the model does get such prediction results.

所述用户界面模块通过描述和可视化方式将原始数据转化为不限于以下方式：规则、表格、图表、图像、判定树和数据立方体，用于展示数据的数量和关系特征。The user interface module converts raw data through description and visualization into, but not limited to, the following methods: rules, tables, charts, images, decision trees, and data cubes to display the quantity and relationship characteristics of the data.

与现有技术相比，本基于数据仓库的数据挖掘系统具有以下优点：Compared with existing technology, this data mining system based on data warehouse has the following advantages:

通过挖掘的主题及任务模块与数据预处理模块配合，进行数据主题确认，并对数据进行规范化、集成、转化、归约和转换；Through the cooperation of the mining topic and task module and the data preprocessing module, the data topic is confirmed, and the data is standardized, integrated, transformed, reduced and transformed;

通过挖掘模块和方法插槽模块配合，建立数据挖掘的模型，并提供数据挖掘的方法，其中描述性数据挖掘子模块进行描述性方法数据挖掘、异常检测数据挖掘子模块进行异常检测方法数据挖掘和预测性数据挖掘子模块进行预测性方法数据挖掘；Through the cooperation of the mining module and the method slot module, a data mining model is established and data mining methods are provided. The descriptive data mining sub-module performs descriptive method data mining, and the anomaly detection data mining sub-module performs anomaly detection method data mining. The predictive data mining sub-module performs predictive method data mining;

通过模式评价模块与知识库模块配合，用于验证及评估和对数据挖掘问题进行反馈，并将该知识的表示保存至知识库模块，并通过知识表示模块可进行事先无法预测的有价值知识进行收集分析；The pattern evaluation module cooperates with the knowledge base module to verify, evaluate and provide feedback on data mining issues, and saves the representation of the knowledge to the knowledge base module. The knowledge representation module can be used to conduct valuable knowledge analysis that cannot be predicted in advance. collection and analysis;

通过人工选择模块和训练学习模块配合，用于按照客户的需求进行个性化选择，个性化选择的方式和方法通过训练学习模块进行学习，并将该个性化选择的方式和方法保存到知识库模块；Through the cooperation of the manual selection module and the training and learning module, it is used to make personalized selections according to customer needs. The methods and methods of personalized selection are learned through the training and learning module, and the personalized selection methods and methods are saved to the knowledge base module. ;

数据清理模块对数据预处理模块进行数据清理，数据安保模块监控各个模块，保证各个模块的运行安全，避免外部病毒攻击本系统。The data cleaning module cleans the data of the data preprocessing module, and the data security module monitors each module to ensure the safe operation of each module and avoid external viruses from attacking the system.

附图说明Description of drawings

图1是本发明中数据挖掘与数据仓库的关系图。Figure 1 is a relationship diagram between data mining and data warehouse in the present invention.

图2是本发明中数据挖掘与数据仓库以及知识库的关系。Figure 2 shows the relationship between data mining, data warehouse and knowledge base in the present invention.

图3是本发明中数据挖掘的分析流图。Figure 3 is an analysis flow diagram of data mining in the present invention.

图4是本发明中数据挖掘的流程图。Figure 4 is a flow chart of data mining in the present invention.

图5是本发明中数据集成的结构图。Figure 5 is a structural diagram of data integration in the present invention.

图6是本发明中数据归约的结构图。Figure 6 is a structural diagram of data reduction in the present invention.

图7是本发明中数据清理的结构图。Figure 7 is a structural diagram of data cleaning in the present invention.

图8是本发明中数据挖掘的模型的分析图。Figure 8 is an analysis diagram of the data mining model in the present invention.

图9是本发明中无法预测的有价值知识的结构图。Figure 9 is a structural diagram of unpredictable valuable knowledge in the present invention.

图10是本发明中用户界面的结构图。Figure 10 is a structural diagram of the user interface in the present invention.

图11是本发明中实施例1的超市数据归类模型的结构图。Figure 11 is a structural diagram of the supermarket data classification model in Embodiment 1 of the present invention.

图12是本发明中实施例1的超市数据星型模型的结构图。Figure 12 is a structural diagram of the supermarket data star model in Embodiment 1 of the present invention.

具体实施方式Detailed ways

以下是本发明的具体实施例并结合附图，对本发明的技术方案作进一步的描述，但本发明并不限于这些实施例。The following are specific embodiments of the present invention combined with the accompanying drawings to further describe the technical solution of the present invention, but the present invention is not limited to these embodiments.

如图1-图10所示，本基于数据仓库的数据挖掘系统，包括用户界面模块、挖掘的主题及任务模块、数据预处理模块、挖掘模块、方法插槽模块、模式评价模块、人工选择模块、训练学习模块、知识库模块和数据清理模块以及用于监控各个模块的数据安保模块，挖掘模块由挖掘综合器和数据挖掘算法库组成，方法插槽模块包括描述性数据挖掘子模块、异常检测数据挖掘子模块和预测性数据挖掘子模块，模式评价模块与知识库模块之间通过知识表示模块进行知识数据转化传递，知识表示模块可进行事先无法预测的有价值知识进行收集分析，数据清理模块对数据预处理模块进行数据清理。As shown in Figures 1 to 10, this data mining system based on data warehouse includes user interface module, mining theme and task module, data preprocessing module, mining module, method slot module, pattern evaluation module, and manual selection module , training learning module, knowledge base module and data cleaning module, as well as data security module for monitoring each module. The mining module consists of a mining synthesizer and a data mining algorithm library. The method slot module includes descriptive data mining sub-modules, anomaly detection The data mining sub-module and the predictive data mining sub-module, the pattern evaluation module and the knowledge base module carry out knowledge data transformation and transfer through the knowledge representation module. The knowledge representation module can collect and analyze valuable knowledge that cannot be predicted in advance, and the data cleaning module Perform data cleaning on the data preprocessing module.

用户界面模块用于通过描述和可视化方式将原始数据转化为具体数据的数量和关系特征，挖掘的主题及任务模块用于确定主题类型，数据预处理模块对数据进行规范化、集成、转化、归约和转换，挖掘模块可建立数据挖掘的模型，并保证的模型的准确性、模型可理解性和模型的性能，方法插槽模块用于提供数据挖掘的方法，其中描述性数据挖掘子模块进行描述性方法数据挖掘、异常检测数据挖掘子模块进行异常检测方法数据挖掘和预测性数据挖掘子模块进行预测性方法数据挖掘，模式评价模块用于验证及评估和对数据挖掘问题进行反馈，并将该知识的表示保存至知识库模块，人工选择模块用于按照客户的需求进行个性化选择，个性化选择的方式和方法通过训练学习模块进行学习，并将该个性化选择的方式和方法保存到知识库模块，并导入挖掘模块的数据挖掘算法库，并在下次挖掘模块的挖掘综合器进行挖掘时，也同步通过该个性化选择的方式和方法进行数据挖掘，知识表示模块可进行事先无法预测的有价值知识进行收集分析，数据清理模块对数据预处理模块进行数据清理，数据安保模块监控各个模块，保证各个模块的运行安全，避免外部病毒攻击本系统。The user interface module is used to transform the original data into the quantitative and relational characteristics of specific data through description and visualization. The mining topic and task module is used to determine the topic type. The data preprocessing module standardizes, integrates, transforms and reduces the data. and conversion, the mining module can establish a data mining model and ensure the accuracy of the model, the understandability of the model and the performance of the model. The method slot module is used to provide data mining methods, in which the descriptive data mining sub-module is described The sexual method data mining and anomaly detection data mining sub-modules perform anomaly detection method data mining and the predictive data mining sub-module performs predictive method data mining. The pattern evaluation module is used to verify, evaluate and provide feedback on data mining issues, and then The representation of knowledge is saved to the knowledge base module. The manual selection module is used to make personalized selections according to customer needs. The methods and methods of personalized selection are learned through the training learning module, and the methods and methods of personalized selection are saved to the knowledge base module. library module, and import the data mining algorithm library of the mining module, and when the mining synthesizer of the mining module performs mining next time, it will also conduct data mining simultaneously through the personalized selection method and method. The knowledge representation module can perform unpredictable tasks in advance. Valuable knowledge is collected and analyzed. The data cleaning module cleans the data of the data preprocessing module. The data security module monitors each module to ensure the safe operation of each module and avoid external viruses from attacking the system.

挖掘的主题及任务模块的主题类型包括但不限于保持力控制、风险预测、收益率分析、数据趋势分析、雇员分析、区域分析、分类聚类和可视化研究。The topics to be mined and the topic types of task modules include but are not limited to retention control, risk prediction, profitability analysis, data trend analysis, employee analysis, regional analysis, classification clustering and visual research.

数据预处理模块包括依次进行的数据源子模块、数据规范化子模块、数据集成子模块、数据转化子模块、数据归约子模块和微软数据转换服务子模块，数据源子模块的数据进入数据预处理模块后，数据清理模块对数据进行清理，清理后的数据进入数据规范化子模块，数据清理模块同时还可对数据源子模块、数据规范化子模块、数据集成子模块、数据转化子模块、数据归约子模块和微软数据转换服务子模的数据进行清理，数据源子模块对挖掘的主题及任务模块的主题数据以及任务数据进行读取。The data preprocessing module includes the data source submodule, data normalization submodule, data integration submodule, data conversion submodule, data reduction submodule and Microsoft data conversion service submodule in sequence. The data from the data source submodule enters the data preprocessing module. After processing the module, the data cleaning module cleans the data, and the cleaned data enters the data normalization sub-module. The data cleaning module can also perform data source sub-module, data normalization sub-module, data integration sub-module, data transformation sub-module, data The data of the reduction sub-module and the Microsoft Data Transformation Service sub-module are cleaned, and the data source sub-module reads the topic data and task data of the mined topic and task modules.

数据集成子模块的关键是获取数据，包括但不限于访问数据仓库，访问数据方法包括但不限于：通过基于事务的关系数据库或基于PC的数据库访问数据、通过数据转换工具访问数据、用查询工具访问数据、从平面文件中访问数据。The key to the data integration sub-module is to obtain data, including but not limited to accessing the data warehouse. Accessing data methods include but are not limited to: accessing data through transaction-based relational databases or PC-based databases, accessing data through data conversion tools, and using query tools Access data, access data from flat files.

数据归约子模块通过聚集、删除冗余特性或聚类等方法来压缩数据，包括但不限于的方法有数据立方体聚集、维归约、数据压缩、数值归约、离散化和概念分层产生。The data reduction sub-module compresses data through methods such as aggregation, deletion of redundant features or clustering, including but not limited to methods such as data cube aggregation, dimension reduction, data compression, numerical reduction, discretization and concept hierarchical generation. .

数据清理模块要解决的问题包括但不限于数据质量、冗余数据、过时数据和术语定义的变化；会使数据集产生的问题包括但不限于一致性问题、失效数据的清洗问题、印刷错误的清洗问题、数值缺失和数据导出。The problems to be solved by the data cleaning module include but are not limited to data quality, redundant data, outdated data and changes in terminology definitions; problems that will arise in the data set include but are not limited to consistency issues, invalid data cleaning issues, and printing errors. Cleaning issues, missing values, and data export.

挖掘模块可建立数据挖掘的模型，数据挖掘的模型包括但不限于保证模型的准确性、模型可理解性和模型的性能；模型的准确性可通过时间来检验其有多大程度的准确性，模型可理解性即为模型是否可以使我们了解输入对结果会产生什么作用、模型是否可以使我们了解预测为什么会成功或失败、模型是否可以使我们对复杂的数据集产生预测的结果以及模型是否能对其产生的结果进行检测，模型的性能具体为需要以什么速度构造出模型以及需要以什么速度从模型中获得预测结果。The mining module can establish a data mining model. The data mining model includes but is not limited to ensuring the accuracy of the model, the understandability of the model, and the performance of the model; the accuracy of the model can be tested over time to see how accurate it is. Understandability is whether the model can enable us to understand the impact of inputs on the results, whether the model can enable us to understand why predictions succeed or fail, whether the model can enable us to produce predicted results for complex data sets, and whether the model can The results it produces are tested. The performance of the model is specifically how fast the model needs to be constructed and how quickly the prediction results need to be obtained from the model.

方法插槽模块包括描述性数据挖掘子模块、异常检测子模块和预测性数据挖掘子模块，描述性数据挖掘子模块的分析方法包括但不限于关联分析、聚类分析和序列分析，异常检测子模块的分析方法包括但不限于异常检测分析，预测性数据挖掘子模块的分析方法包括但不限于进化分析、分类分析、非结构数据分析和统计回归分析。The method slot module includes a descriptive data mining sub-module, anomaly detection sub-module and predictive data mining sub-module. The analysis methods of the descriptive data mining sub-module include but are not limited to correlation analysis, cluster analysis and sequence analysis. The anomaly detection sub-module The analysis methods of the module include but are not limited to anomaly detection analysis, and the analysis methods of the predictive data mining sub-module include but are not limited to evolutionary analysis, classification analysis, unstructured data analysis and statistical regression analysis.

模式评价模块包括但不限于验证及评估子模块和数据挖掘问题反馈子模块，验证和评估子模块的验证方法包括用与建立模型相同的数据集对模型进行评价比用不同的数据集对其进行评价会获得更好的结果、模型的某些预测结果会比其他预测结果更加准确和由于模型以样例数据为基础建立的，应具有好的结果；验证和评估子模块的评估方法由于在不同的数据挖掘方法都汇集在数据挖掘算法下，它们的确存在着很大的区别；数据挖掘从人工智能领域借鉴了很多东西，人工智能技术的种类繁多，存在众多不同数据挖掘方法的原因；数据挖掘问题反馈子模块的问题包括但不限于商业用户提出的问题、技术问题、数据挖掘应用问题、实施数据挖掘项目考虑的问题和数据挖掘对社会的影响的有关隐私问题。The model evaluation module includes but is not limited to the verification and evaluation sub-module and the data mining problem feedback sub-module. The verification method of the verification and evaluation sub-module includes using the same data set used to build the model to evaluate the model than using different data sets. The evaluation will obtain better results, some prediction results of the model will be more accurate than other prediction results and since the model is established based on sample data, it should have good results; the evaluation methods of the verification and evaluation sub-modules will be different due to the Data mining methods are all brought together under data mining algorithms, and they do have big differences; data mining borrows a lot from the field of artificial intelligence. There are many types of artificial intelligence technology, and there are many reasons why there are many different data mining methods; data mining Questions in the question feedback sub-module include but are not limited to questions raised by business users, technical issues, data mining application issues, issues considered in implementing data mining projects and privacy issues related to the impact of data mining on society.

事先无法预测的有价值知识包括但不限于其他候选结果、获选边际率和预测，其他候选结果中除了想要知道模型将会预测出什么结果之外，可能还会对其他候选预测结果也产生兴趣，获选边际率对预测结果非常感兴趣的一点是最终预测结果与其他候选结果之间的差距有多大，预测则是对预测过程可能想要知道的另一件事情就是模型为什么会得到这样的预测结果。Valuable knowledge that cannot be predicted in advance includes but is not limited to other candidate results, selection margins and predictions. In addition to wanting to know what results the model will predict, other candidate results may also have consequences for other candidate prediction results. Interest, the marginal rate of selection. One thing that is very interesting about the prediction results is how big the gap is between the final prediction results and other candidate results. Prediction is about the prediction process. Another thing you may want to know is why the model gets this result. prediction results.

用户界面模块通过描述和可视化方式将原始数据转化为不限于以下方式：规则、表格、图表、图像、判定树和数据立方体，用于展示数据的数量和关系特征。The user interface module transforms raw data into not limited to the following ways through description and visualization: rules, tables, charts, images, decision trees and data cubes to display the quantitative and relational characteristics of the data.

实施例1：基于超市销售管理的数据仓库进行数据挖掘Example 1: Data mining based on supermarket sales management data warehouse

1、需求分析1. Demand analysis

数据仓库的数据是面向主题的，超市数据仓库系统最重要的是商品和顾客。超市的高级管理人员最关心的是商品销量，销售额和利润也很关心顾客的购买行为和习惯。可以采用雪花模型，雪花模型中包括事实表和维表。事实表存储事实的度量值和各个维的码值；维表存储维的描述信息，包括维的层次、成员类别和码值等。The data of the data warehouse is subject-oriented. The most important things in the supermarket data warehouse system are products and customers. Supermarket senior managers are most concerned about product sales. Sales and profits are also concerned about customers' purchasing behavior and habits. You can use the snowflake model, which includes fact tables and dimension tables. The fact table stores the measurement values of the facts and the code values of each dimension; the dimension table stores the description information of the dimensions, including the dimension hierarchy, member categories, code values, etc.

针对超市海量数据信息，系统主要从商品销售，库存，采购信息以及客户关系信息入手：(1)对于商品销售，要如何通过商品的采购，存储与销售，最大限度地获取利润，需要通过加强对每种商品的管理，降低商品的采购成本和管理费用，吸引更多的客户，其中最重要的就是商品促销，需要适当的促销策略针对合适的顾客群，以增加销售利润。(2)库存对超市利润有很大影响，要采用JIT技术，在合适的时候合适的地点在合适的时间向合适的顾客出手合适的商品，再不脱销的情况下尽量减少库存，以降低成本。再次热销商品往往是加快企业资金流的动力，在商品采购中需求分析哪些是热销商品，尽可能的采购热销商品。(3)有效划分主客户群体以了解主要客户群体状况，主要客户群对企业销售服务的需求状况，不同客户群给企业带来的利润，采用不同的营销策略应对不同的客户群体，对客户群体的消费进行合理引导。In view of the massive data information in supermarkets, the system mainly starts from product sales, inventory, purchasing information and customer relationship information: (1) For product sales, how to maximize profits through the purchase, storage and sales of products requires strengthening the control. The management of each commodity reduces the purchase cost and management expenses of the commodity and attracts more customers. The most important of these is commodity promotion, which requires appropriate promotion strategies to target the right customer groups to increase sales profits. (2) Inventory has a great impact on supermarket profits. JIT technology must be used to deliver the right products to the right customers at the right time and at the right place, and minimize inventory without running out of stock to reduce costs. Hot-selling products are often the driving force for accelerating the capital flow of enterprises. When purchasing products, it is necessary to analyze which products are hot-selling products and purchase hot-selling products as much as possible. (3) Effectively divide the main customer groups to understand the status of the main customer groups, the demand of the main customer groups for the company's sales services, the profits that different customer groups bring to the company, adopt different marketing strategies to deal with different customer groups, and evaluate the customer groups. consumption should be reasonably guided.

2、模型构建2. Model construction

挖掘的主题及任务模块进行概念模型设计：数据仓库中数据的组织是面向主题的，首先要确定主题，主题是一个在较高层次将数据归类的标准。Conceptual model design of the mining topic and task modules: The organization of data in the data warehouse is topic-oriented. First, the topic must be determined. The topic is a standard for classifying data at a higher level.

超市数据归类模型的结构图如图11所示。The structure diagram of the supermarket data classification model is shown in Figure 11.

本系统确定了两个基本主题：供应，销售。其属性如下：商品(商品编号，商品名称，型号，所属类别，供应商编号，单价，供应量，供应日期)；顾客(顾客编号，姓名，性别，所属群体)；供应商(供应商编号，地址，联系方式，重要程度)；超市(连锁超市编号，地址，联系方式)；销售《销售流水号，商品编号，客户编号，采购价，销售日期，销售量，销售单价)；供应(供应编号，供货商号，日期，供应量，连锁超市编号)。This system identifies two basic themes: supply and sales. Its attributes are as follows: product (product number, product name, model, category, supplier number, unit price, supply quantity, supply date); customer (customer number, name, gender, group to which it belongs); supplier (supplier number, address, contact information, importance); supermarket (supermarket chain number, address, contact information); sales (sales serial number, product number, customer number, purchase price, sales date, sales volume, sales unit price); supply (supply number , supplier number, date, supply quantity, supermarket chain number).

3、事实表设计3. Fact table design

数据预处理模块通过建立事实表用来存储主题的主干内容，包含业务销售数据，如现金登记事务，商品交易事务等，目前大多数超市都已安装并使用销售终端系统(POS)，每个POS清单都是一次交易过程的具体记录，包含消费者一次购买活动的全部信息，且数据丰富，把许多POS清单联系起来可以挖掘很多潜在信息，最著名的就是美国的啤酒尿布案例。本系统将POS交易清单处理后作为主要的事实表，包含销售流水号，商品编号，供应商编号，商品单价，商品采购号，商品利润，购买数量，累计销售额，累计利润，交易时间等内容。以事实表为中心，各维度按星型模式链接到中心事实表。The data preprocessing module creates a fact table to store the main content of the topic, including business sales data, such as cash registration transactions, commodity transaction transactions, etc. Currently, most supermarkets have installed and used point-of-sale systems (POS). Each POS Lists are specific records of a transaction process, containing all information about a consumer's purchase activity, and are rich in data. Linking many POS lists can unearth a lot of potential information. The most famous one is the case of beer diapers in the United States. This system processes the POS transaction list as the main fact table, including sales serial number, product number, supplier number, product unit price, product purchase number, product profit, purchase quantity, cumulative sales, cumulative profit, transaction time, etc. . With the fact table as the center, each dimension is linked to the central fact table in a star schema.

4、维度表设计4. Dimension table design

维度表设计为表示主要就是将维度的诸多属性的值放置在单独的表中，本系统设计的维度如下：The dimension table is designed to represent the values of many attributes of the dimension in a separate table. The dimensions designed in this system are as follows:

商品(商品编号，商品名称，所属类别，供应商编号，单价，数量)；顾客(顾客编号，姓名，性别，所属群体)；员工(员工号，员工姓名，员工级别)；供应商(供应商编号，供应商名称，地址，联系方式，重要程度)；超市(连锁超市编号，地址，联系方式，管理者编号)；促销(促销编号，促销名，促销类别，优惠类别，起始日期，终止日期)；反时间维(年，月，日)；产品分类(产品分类编号，产品分类名称)；销售清单(销售流水号，产品编号，供应商编号，顾客编号，订货时间，产品单价，产品数量，折扣，销售量，库存数量)。Product (product number, product name, category, supplier number, unit price, quantity); customer (customer number, name, gender, group affiliation); employee (employee number, employee name, employee level); supplier (supplier Number, supplier name, address, contact information, importance); supermarket (supermarket chain number, address, contact information, manager number); promotion (promotion number, promotion name, promotion category, discount category, start date, termination date); inverse time dimension (year, month, day); product category (product category number, product category name); sales list (sales serial number, product number, supplier number, customer number, order time, product unit price, product Quantity, discount, sales volume, inventory quantity).

系统采用“星型模型”表示多维数据集：超市数据星型模型的结构图如图12所示。The system uses a "star model" to represent multidimensional data sets: the structure diagram of the supermarket data star model is shown in Figure 12.

5、挖掘模块进行数据挖掘5. Mining module for data mining

联机分析处理建立OLAP分析模型，在0LAP分析模型上进行获取数据，OLAP分析操作，展示OLAP分析结果。OLAP具有很强的功能，它能将多维数据按照任意的维度路径，以直观的方式展现给使用者。我们使用Microsoft SQL Server 2005通过上钻，下钻，旋转，切片等操作，实现以下功能：Online analytical processing establishes an OLAP analysis model, obtains data on the OLAP analysis model, performs OLAP analysis operations, and displays OLAP analysis results. OLAP has a strong function. It can display multi-dimensional data to users in an intuitive way according to any dimensional path. We use Microsoft SQL Server 2005 to implement the following functions through drill-up, drill-down, rotation, slicing and other operations:

销售分析：管理人员能够对商品的销售情况进行查询和分析，并可以从多维角度分析数据，通过销售分析，管理人员可以对超市的经营状况有一个直观的了解，从而做出高效的决策。Sales analysis: Managers can query and analyze the sales of goods and analyze data from a multi-dimensional perspective. Through sales analysis, managers can have an intuitive understanding of the supermarket's operating conditions and make efficient decisions.

商品分析：通过对商品的跟踪调查，判断商品的生命周期。对于生长期的商品可以采用大量的促销方式以打开商品的销售渠道：对于发展起的商品，可以使用适当的营销手段保持其市场占有率和稳定的增长率对处于稳定期的商品，应尽量吸引消费者的购买兴趣，开拓市场，以延长其稳定期：对于衰退期的商品应视情况寻找其他替代品。Product analysis: Determine the life cycle of the product through tracking and investigation of the product. For goods in the growth period, a large number of promotion methods can be used to open the sales channels of the goods: for developed goods, appropriate marketing methods can be used to maintain their market share and stable growth rate. For goods in the stable period, try to attract Consumers are interested in purchasing and open up the market to extend their stable period: for commodities in decline, other substitutes should be found depending on the situation.

供货商分析：对于不同供货商提供的同种商品的销量，成本，利润进行比较分析，从而挑选出最好的供应商。Supplier analysis: Comparatively analyze the sales volume, cost, and profit of the same product provided by different suppliers to select the best supplier.

描述性数据挖掘子模块通过关联分析挖掘隐藏在数据之间的相互关系。支持度，置信度和规则约束可以发现两个或多个数据项之间的关系。支持度，置信度和规则约束作为挖掘关联规则的阙值，可以过滤掉无意义的关联规则。关联分析在超市应用中可以发现一些隐藏的顾客购买行为。The descriptive data mining sub-module mines the interrelationships hidden in the data through correlation analysis. Support, confidence, and rule constraints can discover relationships between two or more data items. Support, confidence and rule constraints serve as thresholds for mining association rules and can filter out meaningless association rules. Correlation analysis can discover some hidden customer purchasing behaviors in supermarket applications.

预测性数据挖掘子模块可以把超市中所有商品分为四类：高销量高盈利商品、高销量低盈利商品，低销量高盈利商品和低销量低盈利商品。The predictive data mining sub-module can classify all products in the supermarket into four categories: high-volume and high-profit products, high-volume and low-profit products, low-volume and high-profit products, and low-volume and low-profit products.

分析样本中的每一类商品所具有的特征，建立分类规则，以便根据新商品的特征对其进行分类。在这一模块中将使用基于信息的ID3决策树分类方法预测。与测试在历史数据中找到变化规律，建立模型，并利用模型来预测未来数据的变化。在超市管理中，通过挖掘销售规律，可以预测超市未来一段时间的销售情况，并制定相应的营销措施。例如通过对历史销售资料进行肺列分析后发现，用户在购买了相机之后还会在购买胶卷电池等后续商品，因此，在相机出现大规模出售之后，便应考虑后续商品的购进，同时还可以采取一些广告，展览，使用等营销手段，激发顾客的购买兴趣，从而实现销售的增长和效益的提高。Analyze the characteristics of each type of product in the sample and establish classification rules to classify new products according to their characteristics. In this module, the information-based ID3 decision tree classification method will be used for prediction. Use testing to find changing patterns in historical data, build models, and use models to predict changes in future data. In supermarket management, by exploring sales patterns, the sales situation of the supermarket in the future can be predicted and corresponding marketing measures can be formulated. For example, through a column analysis of historical sales data, it was found that after purchasing a camera, users would also purchase subsequent products such as film and batteries. Therefore, after a large-scale sale of cameras, the purchase of subsequent products should be considered, and at the same time, the purchase of subsequent products should be considered. Some advertising, exhibitions, use and other marketing methods can be used to stimulate customers' purchasing interest, thereby achieving sales growth and efficiency improvement.

异常检测数据挖掘子模块进行异常检测，异常检测用来发现不寻常的模式，即数据集中显著不同于其他数据的对象。超市可以采用该技术发现优秀和较差的销售人员，以提高服务质量。同样，也可以发现销量比较好的商品和销量特别差并采取相应的措施。Anomaly detection data mining sub-module performs anomaly detection, which is used to discover unusual patterns, that is, objects in the data set that are significantly different from other data. Supermarkets can use this technology to identify good and poor sales staff to improve service quality. Similarly, you can also find products with relatively good sales and those with particularly poor sales and take corresponding measures.

模式评价模块用于验证及评估和对数据挖掘问题进行反馈，并将该知识的表示保存至知识库模块；The pattern evaluation module is used to verify, evaluate and provide feedback on data mining problems, and save the representation of the knowledge to the knowledge base module;

知识表示模块可进行事先无法预测的有价值知识进行收集分析，其他候选结果中除了想要知道模型将会预测出什么结果之外，可能还会对其他候选预测结果也产生兴趣，获选边际率对预测结果非常感兴趣的一点是最终预测结果与其他候选结果之间的差距有多大，预测则是对预测过程可能想要知道的另一件事情就是模型为什么会得到这样的预测结果。The knowledge representation module can collect and analyze valuable knowledge that cannot be predicted in advance. In addition to wanting to know what results the model will predict, among other candidate results, you may also be interested in other candidate prediction results. The marginal rate of selection One thing that is very interesting about the prediction results is how far the final prediction is from other candidate results. Another thing you may want to know about the prediction process is why the model got the prediction results it did.

人工选择模块用于按照超市销售的其他需求进行个性化选择，个性化选择的方式和方法通过训练学习模块进行学习，并将该个性化选择的方式和方法保存到知识库模块，并导入挖掘模块的数据挖掘算法库，并在下次挖掘模块的挖掘综合器进行挖掘时，也同步通过该个性化选择的方式和方法进行数据挖掘；The manual selection module is used to make personalized selections according to other needs of supermarket sales. The methods and methods of personalized selection are learned through the training learning module, and the methods and methods of personalized selection are saved to the knowledge base module and imported into the mining module. The data mining algorithm library, and the next time the mining synthesizer of the mining module performs mining, data mining will also be performed simultaneously through the personalized selection method and method;

数据清理模块对数据预处理模块的数据质量、冗余数据、过时数据等进行数据清理，数据安保模块监控各个模块，保证各个模块安全运行，避免外部病毒(如友商制造的病毒)攻击本系统。The data cleaning module cleans the data quality, redundant data, outdated data, etc. of the data preprocessing module. The data security module monitors each module to ensure the safe operation of each module and avoid external viruses (such as viruses manufactured by friend manufacturers) from attacking the system. .

上述各模型及数据，通过用户界面模块进行描述和可视化方式将原始数据转化为具体数据的数量和关系特征显示出来。The above-mentioned models and data are described and visualized through the user interface module to convert the original data into specific data quantity and relationship characteristics to display.

综上，通过挖掘的主题及任务模块与数据预处理模块配合，进行数据主题确认，并对数据进行规范化、集成、转化、归约和转换；In summary, through the cooperation of the mining topic and task module and the data preprocessing module, the data topic is confirmed, and the data is standardized, integrated, transformed, reduced and converted;

本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代，但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention belongs can make various modifications or additions to the described specific embodiments or substitute them in similar ways, but this will not deviate from the spirit of the present invention or exceed the definition of the appended claims. range.

Claims

1. The data mining system based on the data warehouse comprises a user interface module, a mining theme and task module, a data preprocessing module, a mining module, a method slot module, a mode evaluation module, a manual selection module, a training learning module, a knowledge base module, a data cleaning module and a data security module for monitoring each module.

2. The data warehouse-based data mining system of claim 1, wherein the types of topics of the mined topics and task modules include, but are not limited to, retention control, risk prediction, yield analysis, data trend analysis, employee analysis, regional analysis, classification clustering, and visualization studies.

3. The data mining system according to claim 1, wherein the data preprocessing module includes a data source sub-module, a data normalization sub-module, a data integration sub-module, a data conversion sub-module, a data reduction sub-module and a microsoft data conversion service sub-module, the data of the data source sub-module enters the data preprocessing module, the data cleaning module cleans the data after entering the data normalization sub-module, and the data cleaning module can clean the data of the data source sub-module, the data normalization sub-module, the data integration sub-module, the data conversion sub-module, the data reduction sub-module and the microsoft data conversion service sub-module at the same time, and the data source sub-module reads the subject data and the task data of the mined subject and task modules.

4. A data warehouse-based data mining system as claimed in claim 3, wherein the key of the data integration sub-module is to obtain data, including but not limited to accessing the data warehouse, the method of accessing data including but not limited to: accessing data through a transaction-based relational database or a PC-based database, accessing data through a data conversion tool, accessing data with a query tool, accessing data from a flat file; the data reduction sub-module compresses data by aggregation, deleting redundant features or clustering, etc., including but not limited to data cube aggregation, dimension reduction, data compression, numerical reduction, discretization, and concept layering generation.

5. A data warehouse-based data mining system as claimed in claim 1 or 4, wherein the problems to be solved by the data cleaning module include, but are not limited to, data quality, redundant data, obsolete data, and changes in term definitions; problems that may cause a dataset to include, but are not limited to, consistency problems, cleanup of invalid data, cleanup of printing errors, missing values, and data export.

6. A data warehouse-based data mining system as claimed in claim 1, wherein the mining module may build a data mined model including, but not limited to, ensuring accuracy of the model, model understandability, and performance of the model; the accuracy of the model can be checked by time to determine how much accuracy the model can understand, namely whether the model can make us know what effect the input will have on the result, whether the model can make us know why the prediction will succeed or fail, whether the model can make us generate the predicted result for the complex data set, and whether the model can detect the result generated by the model, the performance of the model is specifically what speed the model needs to be constructed and what speed needs to be obtained from the model.

7. The data warehouse-based data mining system of claim 1, wherein the method slot module includes a descriptive data mining sub-module, an anomaly detection sub-module, and a predictive data mining sub-module, wherein the analysis method of the descriptive data mining sub-module includes, but is not limited to, association analysis, cluster analysis, and sequence analysis, the analysis method of the anomaly detection sub-module includes, but is not limited to, anomaly detection analysis, and the analysis method of the predictive data mining sub-module includes, but is not limited to, evolution analysis, classification analysis, unstructured data analysis, and statistical regression analysis.

8. A data warehouse-based data mining system as claimed in claim 1, wherein the pattern evaluation module includes, but is not limited to, a verification and evaluation sub-module and a data mining problem feedback sub-module, the verification and evaluation sub-module verification method including evaluating a pattern with the same data set as the pattern, to obtain better results than evaluating a pattern with a different data set, some of the predictions of the pattern being more accurate than others and should have good results because the pattern is built on the basis of sample data; the evaluation methods of the verification and evaluation submodules are quite different as different data mining methods are all collected under the data mining algorithm; the data mining uses things from the field of artificial intelligence, the variety of artificial intelligence techniques is various, and the reasons for a plurality of different data mining methods exist; problems with the data mining problem feedback sub-module include, but are not limited to, business user posed problems, technical problems, data mining application problems, problems with implementing data mining project considerations, and privacy related problems with the impact of data mining on society.

9. A data warehouse-based data mining system as claimed in claim 1, wherein the previously unpredictable valuable knowledge includes, but is not limited to, other candidate results, selected marginal rates and predictions, wherein other candidate results may be of interest to other candidate predicted results in addition to knowing what the model will predict, the selected marginal rates being of great interest to the predicted results as to how far the gap between the final predicted result and other candidate results is, the prediction being why the model will have such predicted results for another thing the prediction process may want to know.

10. The data warehouse-based data mining system as claimed in claim 1, wherein the user interface module converts raw data by description and visualization to, but not limited to, the following: rules, tables, charts, images, decision trees, and data cubes for exposing the quantity and relational features of the data.