Movatterモバイル変換


[0]ホーム

URL:


CN103198138A - Large-scale hot continuous rolling data scheme customizing system based on cloud computing - Google Patents

Large-scale hot continuous rolling data scheme customizing system based on cloud computing
Download PDF

Info

Publication number
CN103198138A
CN103198138ACN2013101304423ACN201310130442ACN103198138ACN 103198138 ACN103198138 ACN 103198138ACN 2013101304423 ACN2013101304423 ACN 2013101304423ACN 201310130442 ACN201310130442 ACN 201310130442ACN 103198138 ACN103198138 ACN 103198138A
Authority
CN
China
Prior art keywords
data
theme
module
item
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013101304423A
Other languages
Chinese (zh)
Inventor
邹丽晖
张德政
华镇
阿孜古丽
孙义
谢永红
刘宏岚
杜鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTBfiledCriticalUniversity of Science and Technology Beijing USTB
Priority to CN2013101304423ApriorityCriticalpatent/CN103198138A/en
Publication of CN103198138ApublicationCriticalpatent/CN103198138A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Landscapes

Abstract

Translated fromChinese

本发明提供一种基于云计算的大规模热连轧数据主题定制系统,包括ETL模块、数据持久层模块和主题定制模块。ETL模块用于实现原始热轧钢系统数据解析、云数据仓库数据表构建和数据抽取功能;数据持久层模块利用云数据仓库组织和存储ETL模块抽取的结构化数据;针对云数据仓库中数据主题分析问题,主题定制模块通过主题库、经验库为用户提供合理的主题定制方案,并通过公共数据挖掘方法库和MapReduce模版容器提供通用的数据分析功能,提高数据管理人员的工作效率。本发明提供的主题定制系统具有灵活的可扩展性,可集成于任意原始热连轧数据系统,处理用户在实际需求不明确时对复杂数据的主题分类问题,为热轧钢数据挖掘分析提供可靠的数据集保障。

Figure 201310130442

The invention provides a large-scale hot rolling data theme customization system based on cloud computing, which includes an ETL module, a data persistence layer module and a theme customization module. The ETL module is used to realize the original hot-rolled steel system data analysis, cloud data warehouse data table construction and data extraction functions; the data persistence layer module uses the cloud data warehouse to organize and store the structured data extracted by the ETL module; for the data subject in the cloud data warehouse To analyze the problem, the theme customization module provides users with reasonable theme customization solutions through the theme library and experience library, and provides general data analysis functions through the public data mining method library and MapReduce template container to improve the work efficiency of data managers. The theme customization system provided by the present invention has flexible scalability, can be integrated into any original hot rolling data system, handles the subject classification of complex data when the actual needs of users are not clear, and provides reliable data mining and analysis for hot rolled steel data. data set guarantee.

Figure 201310130442

Description

A kind of extensive hot continuous rolling data theme custom-built system based on cloud computing
Technical field
The present invention relates to large-scale data processing technology field in the iron and steel metallurgical industry, relate in particular to hot continuous rolling preprocessing in data mining field.
Background technology
In the daily production run of hot strip rolling production line, produced the real time data of magnanimity, containing abundant scientific research in these data and be worth.For a long time, because not enough to the attention degree of mass data, mismanagement causes long-time scattered the depositing of data, effectively do not utilized, and on the angle of data mining, be a kind of very big waste.This has influenced the development of hot continuous rolling process also to a certain extent greatly.
Along with development of computer, all hot-strip factories have basically all realized the management of electronic information at present.But these only are some storages, the statistics to available data and show, as the direct displaying of temperature, thickness, plate shape and some parameters.In technological requirement more and more higher today, be difficult to reach progress on the strip quality by this direct displaying.Therefore, the hot rolling data are carried out further exploration, excavate contact and the rule of its inherence, ever more important just seems.
The pretreated pattern of original data mining is to decide earlier theme, stipulate that by theme some tables of data that need make up the cloud data warehouse of corresponding theme, and the cloud data warehouse extracts the needed data of theme according to theme selected part data table related from database of correspondence.Yet hot-rolled steel original system complex manufacturing technology, data type is heavy, and there is not a good design structure of present technology, add of the remote past, it is that the mode of the information that extracts of the structure of definition lasting data layer earlier can't satisfy the design under the unknown demand that traditional database is built table organization's decimation pattern, and in the face of the mass data collection, the storage of database, expansion and analysis ability are also very limited.In addition, because hot-rolled steel system real time data data type complexity, add that the professional person also can't be exhaustive to system and domain knowledge institute, be difficult to propose definite demand at the system reform, this makes traditional " collaboration application program development pattern ", namely by information technologist and business department's collaborative work content, on the basis of sorting out, the motif area that identifies different pieces of information becomes very difficult thing.
Summary of the invention
Technical matters to be solved by this invention is the cloud data warehouse that can be used for analysis mining for one of original hot-rolled steel system constructing, and provide a prolongable theme customization function, be used for the complex data collection under the unknown demand condition is carried out theme customization flexibly, so that further data are excavated and analyzed.
The present invention's first purpose is to propose a kind of hot continuous rolling data theme custom-built system based on cloud computing, it is characterized in that described system comprises the ETL(information extraction) module, lasting data layer module and theme customized module;
The ETL(information extraction) module, be used for resolving the hot continuous rolling system data structure, generate data dictionary file and gauge outfit file, data dictionary file and gauge outfit file are sent to lasting data layer module, and regularly hot continuous rolling system acquisition text data is formatd;
Lasting data layer module, lasting data layer module are used for making up data dictionary and tables of data according to the described data dictionary file that receives from the ETL module and gauge outfit file for the cloud data warehouse, and the cloud data warehouse is gone in the collection text data merger after regularly will formaing;
The theme customized module carries out the theme customization based on the cloud data warehouse.
Preferably, the ETL module comprises:
The data structure resolution unit is used for resolving the hot continuous rolling system data structure and generates data dictionary file and gauge outfit file;
Structuring template base generation unit is used for the gauge outfit file that data structure elucidation unit generates is formatd the masterplate file of generating structure template base;
The text data formatting unit is used for that the masterplate file in the structuring template base regularly is loaded into data and resolves template base, hot continuous rolling system acquisition text data is formatd, and send to lasting data layer module.
Preferably, the theme customized module comprises:
Theme library inquiry unit is used for according to keyword query theme storehouse, determines whether the theme storehouse exists the required theme item of user;
Experience storehouse recommendation unit is used for providing the attribute selection of the data dictionary of tables of data when there is not required theme item in the theme storehouse, and with user-selected attribute as required theme item attribute, and in the experience storehouse, obtain the proposed topic item based on user-selected attribute;
Theme storehouse registering unit is used for when there is required theme item in described proposed topic item, required theme item is registered become owner of exam pool; When not having required theme item in the described proposed topic item, accept user-defined new theme item, and exam pool is become owner of in described new theme item registration;
Communication unit when service data, is used for sending to the cloud data warehouse request of data of theme item.
Another purpose of the present invention is to propose a kind of hot continuous rolling data theme method for customizing based on cloud computing, it is characterized in that this method for customizing may further comprise the steps:
Step 1, ETL module are resolved the hot continuous rolling system data structure, generate data dictionary file and gauge outfit file, and data dictionary file and gauge outfit file are sent to lasting data layer module, and regularly hot continuous rolling system acquisition text data are formatd;
Step 2, lasting data layer module are cloud data warehouse establishment data dictionary and tables of data according to the data dictionary file and the gauge outfit file that receive, and the collection text data after regularly merger ETL module formats;
Step 3, the theme customized module carries out the theme customization based on the cloud data warehouse.
Preferably,step 1 specifically may further comprise the steps:
Step 1.1, ETL module parses hot continuous rolling system data structure generates data dictionary file and gauge outfit file;
Step 1.2, ETL module his-and-hers watches header file formats the masterplate file of generating structure template base;
Step 1.3, the ETL module regularly is loaded into data with the masterplate file in the structuring template base and resolves template base, hot continuous rolling system acquisition text data is formatd, and send to lasting data layer module.
Preferably, step 3 specifically may further comprise the steps:
Step 3.1, theme customized module determine according to keyword query theme storehouse whether the theme storehouse exists the required theme item of user;
Step 3.2, when not having required theme item in the theme storehouse, the theme customized module provides the attribute of the data dictionary of tables of data to select, and receives the user attribute in the data dictionary is selected, and obtain the proposed topic item based on user-selected attribute in the experience storehouse;
Step 3.3 when having required theme item in the proposed topic item of experience storehouse, is become owner of exam pool with required theme item registration; When not having required theme item in the proposed topic item of experience storehouse, user-defined new theme item is accepted in the experience storehouse, and exam pool is become owner of in described new theme item registration;
Step 3.4, during service data, the theme storehouse sends the request of data of theme item to the cloud data warehouse.
Preferably, the mode of the new theme item of User Defined is in the step 3.3: obtain maximum coupling theme item from the proposed topic item of experience storehouse, revise the attribute of maximum coupling theme item, form new theme item.
The invention has the advantages that, it has used and has been different from the pretreated normal processes of former data mining, utilize the data structure of original system, from the data pick-up process, dynamically generate the cloud data warehouse, the large-scale parallel characteristics of recycling cloud data warehouse dynamically generate the data motif area and oppositely realize the data preprocessing process, have demonstrated fully expansibility and the dirigibility of system.And system has the advantages that to allow the definition of user's freedom and flexibility ground and expand the theme item, greatly convenient under unknown demand according to the self-defined theme item of actual conditions, and data mining and the analysis that can expand other professional themes based on this system, this mode is to a perfect set transforming process by an incomplete collection, and based on cloud computing storage can free extendability, also for this expansion theme collection provides great advantage, allow the rule that the user can the how potential data of better utilization system discovery inside close to unlimited memory space.
The present invention can effectively resolve certainly to the hot-rolled steel system of complexity, and whole set of data is made up the cloud data warehouse as data resource, has significantly reduced professional person and the incoordination of program development personage in the processing demands process; Hot-rolled steel system in particular for unknown demand provides the customization function of theme freely, makes system more flexible multi-purpose, and theme customization easily provides bigger Data Control space also for the hot-rolled steel field, and the easier rule of finding from data instructs and produces.
Description of drawings
The structural representation of Fig. 1 hot continuous rolling data of the present invention theme custom-built system.
The flow chart of data processing of ETL module in Fig. 2 hot continuous rolling data of the present invention theme custom-built system.
The partial data structure tree of Fig. 3 hot continuous rolling data of the present invention theme custom-built system.
Dictionary file and the gauge outfit file of Fig. 4 hot continuous rolling data of the present invention theme custom-built system lasting data layer module construction.
The cloud data warehouse model of Fig. 5 hot continuous rolling data of the present invention theme custom-built system lasting data layer module.
The theme the expanded customized module of Fig. 6 hot continuous rolling data of the present invention theme custom-built system and the process flow diagram of other module interactive operations.
Fig. 7 hot continuous rolling data of the present invention theme custom-built system theme item customization instance graph.
Fig. 8 hot continuous rolling data of the present invention theme custom-built system is the theme item customization instance graph of coupling not exclusively.
Embodiment
The invention provides a kind of system that is structured on the cloud computing basis, is intended to handle the complicated hot continuous rolling data set theme customization under the unknown demand.
As shown in Figure 1, hot continuous rolling data set theme custom-built system comprises: the ETL(information extraction) module, lasting data layer module and can expand the theme customized module, finish from the data preprocessing process of data acquisition, data parsing, data loading, theme customization.System passes through the original hot continuous rolling of ETL module parses system, and according to the data structure that parses dynamic construction tables of data in lasting data layer module; Real-time image data on the hot-rolled steel production line and historical data can be drawn into ETL module temporary folder; Regularly the aggregation of data of ETL module collection is gone into lasting data layer module every day in the table in the cloud data warehouse; Can expand the theme customized module and be by the data set customization theme of experience storehouse to having built, and be that common data method for digging storehouse provides support by its MapReduce template container, rule of thumb excavate theme for the data set custom analysis that has had by the user.
Wherein, the ETL module has automatic resolution system data structure, makes up cloud data warehouse list structure, and text data three partial functions of timing architecture collection constitute.
The ETL module is according to system's self structure, by resolving hot-rolled steel original system header file, and the structure tree of generation system data, wherein the structure tree node comprises that field name and field explain.For example when the hot-rolled steel system header file (suffix is the file of .h) of C language development is resolved, the structure content that among the header file .h with struct is keyword is taken out, with the structure name as top mode, the structure content is as time node layer, and iteration can be built into the structure tree of system data in this way.
After obtaining structure tree, whole data-structure tree of ETL module recurrence traversal splits node item in the tree, generates the gauge outfit file that is used for making up cloud data warehouse data dictionary file and is used for the storage data.Data dictionary file and gauge outfit file that the ETL module is finished parsing send to lasting data layer module, and lasting data layer module generates data dictionary and tables of data according to these 2 files for the cloud data warehouse.
The ETL module formats the masterplate file that becomes the structuring template base with the gauge outfit file, conveniently image data is deposited in the cloud data warehouse of lasting data layer; The ETL module regularly is loaded into data with the masterplate file in the structuring template base and resolves template base, and based on template file the text data in production line and the historical data is formatd, in order to the text data of format is drawn in the cloud data warehouse of lasting data layer module.
Lasting data layer module is mainly by forming based on the cloud data warehouse of cloud storage.The cloud data warehouse is used for the data of the former rolling system of structured storage, and its list structure and data dictionary constitute middle generation by the ETL module at resolution system, and it mainly is that the data of regularly merger ETL module parses are in the cloud data warehouse and storage.The cloud data warehouse design framework of native system is on distributed file system (HDFS) Hadoop of Hadoop cloud computing model Distributed File System, it utilizes the multinode distributed nature storage data resource of HDFS, thereby has solved the parallelization of data processing and the dynamic expansion problem of memory capacity simultaneously.In the cloud data warehouse, the tables of data that builds is by the statement operation (being similar to general sql statement) of cloud data warehouse, to format good collection text data from the ETL module directly loads the cloud Data Warehouse table, mapping mode one by one by the position between the data dictionary of tables of data and data item, when service data, service data is come in the position of the position mapping (enum) data by data dictionary, and this is called as " pattern when reading " of data manipulation.The mode of operation that is similar to traditional database is provided for cloud data warehouse user by this pattern, brings bigger facility to their exploitation.Data dictionary in the cloud data warehouse is the main foundation of customization theme.It is the main contact between theme item attribute and the tables of data, and theme item attribute has access to data in the tables of data by data dictionary, thus the statement operation that data are correlated with.
The theme customized module be can expand and theme storehouse, experience storehouse, MapReduce template container and common data method for digging storehouse comprised, the theme customized module mainly is on the cloud data warehouse that builds, experience and knowledge according to user oneself, the guiding in the explanation of reference data dictionary and experience storehouse specifies some users to need data item to make up theme, because the cloud data warehouse is the storage to the total system data, the theme structure is exactly a dynamic partition process to the system architecture table.The theme storehouse is the good filing theme collection with a plurality of themes (as: quality theme, parameter theme etc.) of customization, each theme comprises some theme items, comprise related datas such as table name, row name and subject, theme is a table section (this zone can be overlapping) in fact, this is the zone of dividing for the cloud computing parallel parsing, and the theme customization also is a process that the zone is divided, and it represents the size of data that this theme is controlled.Motif area also is in charge of the parallelization of data set, and MapReduce template container provides parallelization support design for data mining public method storehouse.It can excavate and analyze a plurality of themes a parallelization process.The theme collection that filing is good is the experience storehouse, be to deposit in the database with the form of tables of data, the experience storehouse is the data digging system by more existing association areas, inherit the good fairly perfect theme storehouse of filing that they have customized, come to lead for the unknown demand of native system, adopt the algorithm of maximum matching degree to come to offer help to the user from the experience storehouse as far as possible.The experience storehouse need dispose relevant field synonym vocabulary and mate semantic error between the different system, and the error between the system is reduced to minimum.It is as follows specifically can to expand theme customization flow process: whether the user understands according to inquiry theme storehouse has own desired theme item to be present in the theme storehouse; If do not have, user applies is checked the data dictionary of the tables of data in the cloud data warehouse, when the user selectes the data item of a tables of data, system can provide the related subject item of this attribute that exists in the experience storehouse automatically and recommend, do not allow the user blindly select, the user selects to use or do not use the proposed topic item by actual demand then; When the demand of customization does not exist in the experience storehouse, user-defined theme item can be registered into the experience storehouse, a process as self study, and the experience storehouse is the attribute list in statistics theme storehouse regularly, the attribute that the frequency of occurrences is higher is as responsive attribute, and they recommend the user as the determinant attribute that the user begins to customize theme; Exam pool is become owner of in new theme item registration, and marks off this needed zone of theme item and scope in dictionary; After motif area was delimited, MapReduce template container can provide the parallel algorithm support of MapReduce for common data mining analysis method; It is a parallelization integrator, can disposablely handle when adopting with a kind of mining algorithm to a plurality of theme items of same data set, improves speed and efficient that data analysis is excavated greatly.Common data method for digging storehouse is the public method with some data minings of MapReduce realization, as: correlation rule, neural network, genetic algorithm and traditional decision-tree etc., dynamic load is to MapReduce template container during use, significantly reduced data mining personnel's workload, simultaneously, utilize MapReduce template container, the API that the data management personnel can easily use container to provide writes the program of some subject analyses, agree with system more easily, the parallel mining efficiency of performance.
Based on above-mentioned hot continuous rolling data set theme custom-built system, the storehouse is gathered, classifies, built to the data of hot continuous rolling industrial circle complex data collection, and then reach effective theme customization, wherein the concrete grammar flow process is as follows:
Step 1, ETL module are resolved the hot-rolled steel system data structure, generate data dictionary file and gauge outfit file, and data dictionary file and gauge outfit file are sent to lasting data layer module, and regularly hot continuous rolling system acquisition text data are formatd.
The flow process of ETL module is shown in Fig. 2:
Step 1.1, ETL module parses hot-rolled steel system data structure generates data dictionary file and gauge outfit file.
This step is initialization step, ETL module analysis hot-rolled steel original system header file, generate the structure tree (the structure tree node comprises field name and field explanation) of hot-rolled steel original system data, whole data-structure tree of ETL module recurrence traversal, split node item in the tree, generate the gauge outfit file that is used for making up cloud data warehouse data dictionary file and is used for the storage data; Afterwards, data dictionary file and gauge outfit file that the ETL module is finished parsing send to lasting data layer module, so that lasting data layer module generates data dictionary and tables of data according to above-mentioned 2 files for the cloud data warehouse.
The said structure tree is the multilayer tree structure that is made of the multilayer node, complexity in view of structure tree, minor structure tree with one of them branch " rolling line data " is example, as shown in Figure 3, the implication that top mode is represented is the name of rolling line data structure body, equally also is the name that the cloud data warehouse makes up the rolling line tables of data; Second layer node is represented the attribute that the rolling line data comprise, and they are as the field in the rolling line tables of data equally.The 3rd node layer is similar to the relation of second layer node and ground floor node, and they are the explanations to second layer node, equally also is the field that makes up second layer node table.
After structure tree generated, the ETL module split node item in the tree with whole data-structure tree of mode recurrence traversal of range traversal, generates data dictionary file and gauge outfit file, is used for making up cloud data warehouse gauge outfit." rolling line data " minor structure tree with Fig. 3 is example, with node " rolling line data " as big label "<rolling line data〉", traverse its child node then, be respectively " coil of strip attribute ", " roughing presets result data " ..., etc., with them as the subtab of "<rolling line data〉" among the xml "<1〉coil of strip attribute</1〉", "<2〉roughing preset result data</2〉", ..., subsequent node is finished the generation of data dictionary file and gauge outfit file by that analogy.
The sample of data dictionary file and gauge outfit file generates with following XML file.
Data dictionary file sample:
<?xml?version="1.0"?>
<?xml-stylesheet?type="text/xsl"?href="configuration.xsl"?>
<rolling line parameter 〉
<1〉steel reel number</1 〉
<2〉steel grade</2 〉
<3〉slab number</3 〉
<4〉material code</4 〉
</rolling line parameter 〉
<roughing parameter 〉
</roughing parameter 〉.
 
Gauge outfit file sample:
<?xml?version="1.0"?>
<?xml-stylesheet?type="text/xsl"?href="configuration.xsl"?>
<rolling line parameter 〉
<1>p.mill.pri.MatId</1>
<2>p.mill.pri.SteelGrade</2>
<3>p.mill.pri.SlabNo</3>
</rolling line parameter 〉
<roughing parameter 〉
</roughing parameter 〉
…。
 
After the initialization of the analysis of having finished original rolling system and production line hot continuous rolling data set theme custom-built system, the text data of gathering on the production line need be write in the cloud data warehouse, for the ease of writing of image data, need gather the format of text data.
Step 1.2, ETL module his-and-hers watches header file formats the template file of generating structure template base.
Deposit in for the ease of image data in the cloud data warehouse of lasting data layer.The ETL module is according to the set type template file of the structuring template base that generates corresponding format of the gauge outfit item mode of the dictionary item of above-mentioned data dictionary and gauge outfit file, the template base form is with above-mentioned XML form, with the binary file of the non-structured file of extracting of numeral position or collection.
The code format of using when the masterplate file of structuring template base is the extraction document data, it is used for gathering text data (comprising the text data in production line and the historical data) and is formatted into the file that the cloud data warehouse extracts order format, so that the cloud data warehouse deposits data in corresponding rational position.
Step 1.3, the ETL module regularly is loaded into data according to the masterplate file in the structuring template base and resolves template base, will gather text data (comprising the text data in production line and the historical data) format, and send to lasting data layer module.
After forming the structuring template base, the data in the ETL module are resolved template base and can regularly be formatd image data, and concrete steps are: at first, data are resolved template base in real time or are regularly obtained the text data of image data and set of data samples; Secondly, at official hour, data are resolved in the template base and are loaded into the structuring template base; At last, data parsing template base formats according to the text data of the structuring template base that loads to the image data set of data samples.
Step 2, lasting data layer module make up the cloud data warehouse and create data dictionary and tables of data according to the data dictionary file and the gauge outfit file that receive, the collection text data after the cloud data warehouse timing merger ETL module format in the lasting data layer module.
Lasting data layer module obtained data dictionary file and gauge outfit file from the ETL module, and be that the cloud data warehouse creates data dictionary and tables of data, be that example is specially with " rolling line data " in the step 1.1: the data dictionary table of rolling line data will "<1〉steel reel number</1〉" as first field, "<2〉steel grade</2〉" as second field, "<3〉slab numbers</3〉" as the 3rd field by that analogy; In the tables of data relevant with this numeral dictionary then corresponding "<1〉p.mill.pri.MatId</1〉" as first field, "<2〉p.mill.pri.SteelGrade</2〉" as second field, "<3〉p.mill.pri.SlabNo</3〉" as the 3rd field, make the table of their each self-generatings in the cloud data warehouse, form the field location one-to-one relationship, as shown in Figure 4.
Cloud data warehouse model such as Fig. 5, it is built on the distributed file system HDFS (Hadoop Distributed File System) of hadoop cloud computing model, has been realized the method for operating of cloud data warehouse by MapReduce.
In the cloud data warehouse, the tables of data that builds is by the statement operation (being similar to general sql statement) of cloud data warehouse, to format good collection text data from the ETL module directly loads the cloud Data Warehouse table, in tables of data, pass through the mapping mode one by one of position between data dictionary and the data item, when service data, service data is come in the position of the position mapping (enum) data by dictionary, is called " pattern when reading " of data manipulation.
Be the example of a tables of data below, table 1 be store in the cloud data warehouse obtain data from the ETL module, they are non-structured data of a kind of streaming, customization all can't be operated by such data for data analysis and theme.The cloud data warehouse is the data model of setting up by the data mapping, it is to read mapping relations according to a kind of position of the modelling of database, with data file, gauge outfit file and data dictionary file are mapped to an integral body by the position, come the service data parallel access by the mode of operation sql statement (off-gauge sql statement) that is similar to traditional database then, table 2 is the data after resolving, it is that the partial data that inquires from the cloud data warehouse by cloud Data Warehouse query manipulation (select * from product) is capable, so as after data analysis and theme customized treatment.
The data of storing in the cloud data warehouse:
Table 1
File 1:H101007780 Q345B 01A95921D228 P02 0 190 1,500 11000 ....
File 2:H101000020Q235B 02B00201E011 P01 1 190 1,500 8111
File 3:H101000030Q235B 01A93722D011 P01 1 190 1,500 8383
…。
The data that inquire:
Table 2
The steel reel numberSteel gradeSlab numberThe material codeCold and hot dress signSlab thicknessWidth of plate slabSlab length
H101007780Q345B01A95921D228P020190150011000
H101000020Q235B02B00201E011P01119015008111
H101000030Q235B01A93722D011P01119015008383
Step 3, the theme customized module is according to carry out the theme customization for the cloud data warehouse that builds.
The theme storehouse is the filing theme collection of the good a plurality of themes of customization, each theme comprises a plurality of theme items, comprise related datas such as table name, row name and subject, theme is a table section (this zone can be overlapping) in fact, this is the zone of dividing for the cloud computing parallel parsing, and the theme customization also is a process that the zone is divided, and it represents the size of data that this theme is controlled.Wherein, table name is that the physical mappings table in the theme Xiang Zaiyun data warehouse that exists concentrated in theme, the required attribute field of using and data when comprising this theme item analysis in the table; The row name is required for the data attribute name of analyzing that relates under this theme item, namely be stored in the field name in the table, also can be called theme item attribute, in Fig. 7, the field name " plate embryo chemical constitution and inlet thickness " " tapping temperature " " roll-force of each frame of milling train " " mill speed and roll gap " in " influencing the theme item table of the factor of thickness of slab " etc. all is the row names.The theme key name is to be that this analyzes the logical name of theme Xiang Suoding, it be used for and external user mutual, inner then corresponding to the table of physical mappings, all be the theme key name as " influencing the factor of thickness of slab " of Fig. 7 and " thickness of slab is related with chemical constitution " of Fig. 8.
Fig. 6 can expand the theme customized module specifically and the process flow diagram of other module interactive operations.
Step 3.1, theme customized module determine according to keyword query theme storehouse whether the theme storehouse exists the required theme item of user.
The theme customized module receives the key word relevant with inquiry theme item of user's input, and in the theme storehouse, search for the related subject item according to key word, generate the tabulation of related subject item, theme item putting in order in tabulation can be the degree of association of this theme item and key word or the priority of theme item access frequency.The user can select required theme item from the tabulation of related subject item.
Step 3.2, when not having required theme item in the theme storehouse, the theme customized module provides the attribute of the data dictionary of tables of data to select, and receives the user attribute in the data dictionary is selected, and obtain the proposed topic item based on selected properties in the experience storehouse.
When the required theme Xiang Wei of user appears at the tabulation of related subject item, the theme customized module is according to the data dictionary of user applies demonstration corresponding to tables of data, when the attribute (the attribute item of data dictionary or data dictionary attribute) of a certain or a plurality of fields of user's selected data dictionary, system with it as required theme item attribute, automatically provide the associated recommendation of the theme item of this attribute that exists in the experience storehouse, do not allow the user blindly select, the user selects to use or do not use the proposed topic item by actual demand then.
Step 3.3 when having required theme item in the experience storehouse, is become owner of exam pool with required theme item registration; When not having required theme item in the experience storehouse, accept user-defined new theme item, exam pool is become owner of in described new theme item registration.
When the demand of customization does not exist in the experience storehouse, user-defined theme item can be registered into the experience storehouse, a process as self study, and the experience storehouse is the attribute list in statistics theme storehouse regularly, the attribute that the frequency of occurrences is higher is as responsive attribute, and they recommend the user as the determinant attribute that the user begins to customize theme; Exam pool is become owner of in new theme item registration, and marks off this needed zone of theme item and scope in dictionary; After the regional assignment of theme item, MapReduce template container can provide the parallel algorithm support of MapReduce for common data mining analysis method; It is a parallelization integrator, can disposablely handle when adopting with a kind of mining algorithm to a plurality of theme items of same data set, improves speed and efficient that data analysis is excavated greatly.Common data method for digging storehouse is the public method with some data minings of MapReduce realization, as: correlation rule, neural network, genetic algorithm and traditional decision-tree etc., dynamic load is to MapReduce template container during use, significantly reduced data mining personnel's workload, simultaneously, utilize MapReduce template container, the API that the data management personnel can easily use container to provide writes the program of some subject analyses, agree with system more easily, the parallel mining efficiency of performance.
Step 3.4, during service data, request of data is sent to the cloud data warehouse in the theme storehouse.Extract the data item of request from the cloud data warehouse, be used for carrying out data mining and analysis.
When needs carried out data mining to a certain theme item and analyze, the theme storehouse sent request of data to the cloud data warehouse, so that the related attribute data of storing in the acquisition cloud data warehouse of this theme item comprises in the request of data choosing theme item attribute.The cloud data warehouse mates the attribute of each field of theme item attribute and data dictionary, according to the attribute of data dictionary of coupling and the mapping relations of data item, obtains required theme item and is stored in data item in the cloud data warehouse tables of data.Afterwards, the cloud data warehouse sends to the data item of obtaining in (non-native system) analysis module and carries out data analysis.
Be the example that the customization of theme item is effectively recommended as Fig. 7, user's request is to look for some data analysis theme items relevant with thickness of slab, at first he can select theme item attribute relevant with thickness of slab as slab chemical constitution and inlet thickness, the theme storehouse can provide all relevant theme items of attribute therewith that the theme storehouse has according to field name, the relevant theme item of bar attribute therewith also can be applied for as the experience storehouse in the theme storehouse simultaneously, its experience record is scanned in the experience storehouse, can find many relevant records of attribute therewith, as wherein one: slab chemical constitution and inlet thickness, tapping temperature, the roll-force of each frame of milling train, mill speed and roll gap, the setting value of each parameter and self study value etc., the theme key name is for influencing the factor of thickness of slab; The user can become owner of exam pool with this theme item registration as just in time needing this theme item, and the theme storehouse is to the request of cloud data warehouse during service data, and the cloud data warehouse sends to requested data item in the analysis module, namely can be used for data mining and analysis.
Be a not exclusively example of coupling as Fig. 8, the user does not have the theme item that discovery needs oneself in the experience storehouse, can expand the theme customized module at first allows the user select maximum occurrence, as: the user wonders there is which kind of rule between slab thickness and each element chemistry composition, and this theme item is imperfect or do not have in the former experience storehouse, the user can select maximum coupling theme item slab inlet thickness at first very easily so, Fe, Gu, Mg, Ag etc., adding user-defined attribute item Pd(symbol of element then) this is non-existent attribute in other thematic systems, be built into new theme item registration so and become owner of exam pool, the theme storehouse is loaded into the experience storehouse as a kind of learning ways in experience storehouse when upgrading, for follow-up transplanting provides great convenience and intelligent.
This embodiment has been finished the pretreated system integration of data mining substantially, makes up and the theme customization from data pick-up, warehouse, allows data mining exploitation user can easily utilize existing data resource to excavate how valuable rule targetedly.
Above-described only is preferred embodiment of the present invention, and the present invention not only is confined to above-described embodiment, all any changes of doing within the spirit and principles in the present invention, is equal to replacement, improvement etc. and all should be included within protection scope of the present invention.

Claims (7)

Translated fromChinese
1.一种基于云计算的热连轧数据主题定制系统,其特征在于,所述系统包括ETL(信息抽取)模块、数据持久层模块和主题定制模块;1. A hot rolling data theme customization system based on cloud computing, characterized in that the system includes an ETL (information extraction) module, a data persistence layer module and a theme customization module;ETL(信息抽取)模块,用于解析热连轧系统数据结构,生成数据字典文件和表头文件,将数据字典文件和表头文件发送至数据持久层模块,并定时对热连轧系统采集文本数据进行格式化;ETL (information extraction) module is used to analyze the data structure of the hot rolling system, generate data dictionary files and header files, send the data dictionary files and header files to the data persistence layer module, and regularly collect text from the hot rolling system Data is formatted;数据持久层模块,数据持久层模块用于根据从ETL模块接收的所述数据字典文件和表头文件为云数据仓库构建数据字典和数据表,并定时将格式化后的采集文本数据归并入云数据仓库;The data persistence layer module is used to construct a data dictionary and data table for the cloud data warehouse according to the data dictionary file and table header file received from the ETL module, and regularly merge the formatted collected text data into the cloud database;主题定制模块,基于云数据仓库进行主题定制。Theme customization module, based on the cloud data warehouse for theme customization.2.如权利要求1所述的热连轧数据主题定制系统,其特征在于,ETL模块包括:2. The hot continuous rolling data subject customization system as claimed in claim 1, wherein the ETL module comprises:数据结构解析单元,用于解析热连轧系统数据结构生成数据字典文件和表头文件;The data structure analysis unit is used to analyze the data structure of the hot rolling system to generate a data dictionary file and a table header file;结构化模板库生成单元,用于对数据结构解析单元生成的表头文件进行格式化生成结构化模板库的模版文件;A structured template library generation unit, used to format the header file generated by the data structure analysis unit to generate a template file of the structured template library;文本数据格式化单元,用于将结构化模板库里的模版文件定时加载入数据解析模板库,对热连轧系统采集文本数据进行格式化,并发送到数据持久层模块。The text data formatting unit is used to regularly load the template files in the structured template library into the data analysis template library, format the text data collected by the hot rolling system, and send it to the data persistence layer module.3.如权利要求1所述的热连轧数据主题定制系统,其特征在于,主题定制模块包括:3. The hot continuous rolling data theme customization system as claimed in claim 1, wherein the theme customization module includes:主题库查询单元,用于根据关键字查询主题库,确定主题库是否存在用户所需主题项;The theme library query unit is used to query the theme library according to the keywords to determine whether there is a theme item required by the user in the theme library;经验库推荐单元,用于当主题库中不存在所需主题项时,提供数据表的数据字典的属性选择,并将用户所选属性作为所需主题项属性,并基于用户所选属性在经验库中获取推荐主题项;The experience library recommendation unit is used to provide the attribute selection of the data dictionary of the data table when the required topic item does not exist in the topic library, and use the attribute selected by the user as the attribute of the required topic item, and based on the attribute selected by the user in the experience Obtain recommended topic items in the library;主题库注册单元,用于当所述推荐主题项中存在所需主题项时,将所需主题项注册入主题库;当所述推荐主题项中不存在所需主题项时,接受用户自定义的新主题项,并将所述新主题项注册入主题库;The theme library registration unit is used to register the required theme items into the theme library when there are required theme items in the recommended theme items; when there is no required theme item in the recommended theme items, accept user-defined , and register the new theme item into the theme library;通信单元,在操作数据时,用于向云数据仓库发送主题项的数据请求。The communication unit is used to send the data request of the subject item to the cloud data warehouse when operating the data.4.一种基于云计算的热连轧数据主题定制方法,其特征在于,该定制方法包括以下步骤:4. A method for customizing hot rolling data subject based on cloud computing, characterized in that, the method for customizing comprises the following steps:步骤一、ETL模块对热连轧系统数据结构进行解析,生成数据字典文件和表头文件,将数据字典文件和表头文件发送至数据持久层模块,并定时对热连轧系统采集文本数据进行格式化;Step 1. The ETL module analyzes the data structure of the hot rolling system, generates a data dictionary file and a header file, sends the data dictionary file and the header file to the data persistence layer module, and periodically executes the text data collected by the hot rolling system. format;步骤二,数据持久层模块根据接收的数据字典文件和表头文件,为云数据仓库创建数据字典和数据表,并定时归并ETL模块格式化后的采集文本数据;Step 2, the data persistence layer module creates a data dictionary and data table for the cloud data warehouse according to the received data dictionary file and table header file, and regularly merges the collected text data formatted by the ETL module;步骤三,主题定制模块基于云数据仓库进行主题定制。Step 3, the theme customization module customizes the theme based on the cloud data warehouse.5.一种如权利要求4所述的热连轧数据主题定制方法,其特征在于,步骤一具体包括以下步骤:5. A method for customizing hot rolling data subjects as claimed in claim 4, wherein step 1 specifically comprises the following steps:步骤1.1,ETL模块解析热连轧系统数据结构生成数据字典文件和表头文件;Step 1.1, the ETL module analyzes the data structure of the hot rolling system to generate a data dictionary file and a table header file;步骤1.2,ETL模块对表头文件进行格式化生成结构化模板库的模版文件;Step 1.2, the ETL module formats the header file to generate a template file of the structured template library;步骤1.3,ETL模块将结构化模板库里的模版文件定时加载入数据解析模板库,对热连轧系统采集文本数据进行格式化,并发送到数据持久层模块。In step 1.3, the ETL module regularly loads the template files in the structured template library into the data analysis template library, formats the text data collected by the hot rolling system, and sends it to the data persistence layer module.6.一种如权利要求4所述的热连轧数据主题定制方法,其特征在于,步骤三具体包括以下步骤:6. A method for customizing hot rolling data subjects as claimed in claim 4, wherein step 3 specifically comprises the following steps:步骤3.1,主题定制模块根据关键字查询主题库,确定主题库是否存在用户所需的主题项;Step 3.1, the theme customization module queries the theme library according to keywords, and determines whether the theme library has the theme item required by the user;步骤3.2,当主题库中不存在所需主题项时,主题定制模块提供数据表的数据字典的属性选择,接收用户对数据字典中的属性选择,并基于用户所选属性在经验库中获取推荐主题项;Step 3.2, when the required theme item does not exist in the theme database, the theme customization module provides the attribute selection of the data dictionary of the data table, receives the user's selection of attributes in the data dictionary, and obtains recommendations in the experience database based on the attributes selected by the user topic item;步骤3.3,当经验库推荐主题项中存在所需主题项时,将所需主题项注册入主题库;当经验库推荐主题项中不存在所需主题项时,经验库接受用户自定义的新主题项,并将所述新主题项注册入主题库;Step 3.3, when the required topic items exist in the topic items recommended by the experience library, register the required topic items into the topic library; when the required topic items do not exist in the topic items recommended by the experience library, the experience library accepts user-defined new theme item, and register the new theme item into the theme library;步骤3.4,操作数据时,主题库向云数据仓库发送主题项的数据请求。Step 3.4, when operating the data, the subject library sends a data request for the subject item to the cloud data warehouse.7.一种如权利要求6所述的热连轧数据主题定制方法,其特征在于,步骤3.3中用户自定义新主题项的方式为:从经验库推荐主题项中获得最大匹配主题项,修改最大匹配主题项的属性,形成新主题项。7. A method for customizing hot rolling data topics as claimed in claim 6, characterized in that the user-defined new topic item in step 3.3 is as follows: obtain the maximum matching topic item from the recommended topic item in the experience database, modify The attribute of the maximum matching topic item forms a new topic item.
CN2013101304423A2013-04-162013-04-16Large-scale hot continuous rolling data scheme customizing system based on cloud computingPendingCN103198138A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN2013101304423ACN103198138A (en)2013-04-162013-04-16Large-scale hot continuous rolling data scheme customizing system based on cloud computing

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN2013101304423ACN103198138A (en)2013-04-162013-04-16Large-scale hot continuous rolling data scheme customizing system based on cloud computing

Publications (1)

Publication NumberPublication Date
CN103198138Atrue CN103198138A (en)2013-07-10

Family

ID=48720695

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN2013101304423APendingCN103198138A (en)2013-04-162013-04-16Large-scale hot continuous rolling data scheme customizing system based on cloud computing

Country Status (1)

CountryLink
CN (1)CN103198138A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107042234A (en)*2017-03-152017-08-15中冶华天工程技术有限公司The intelligent production line and production method gathered based on bar whole process big data
CN108171640A (en)*2017-12-212018-06-15武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所)Marine communication system data storage system and statistical method
CN110134685A (en)*2019-05-062019-08-16武汉中岩测控技术有限公司A kind of monitoring method and system based on big data field automatic Mosaic algorithm
CN110348954A (en)*2019-06-252019-10-18河南科技大学A kind of complicated technology module partition method of mass customization
CN110355214A (en)*2019-06-242019-10-22科芃智能科技(苏州)有限公司A kind of quality stream inlet thickness storage calculation method based on most rickle
CN112507098A (en)*2020-12-182021-03-16北京百度网讯科技有限公司Question processing method, question processing device, electronic equipment, storage medium and program product
CN112818048A (en)*2021-01-282021-05-18北京软通智慧城市科技有限公司Hierarchical construction method and device of data warehouse, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102012912A (en)*2010-11-192011-04-13清华大学Management method for unstructured data based on cloud computing environment
CN102254024A (en)*2011-07-272011-11-23国网信息通信有限公司Mass data processing system and method
CN102394923A (en)*2011-10-272012-03-28周诗琦Cloud system platform based on n*n display structure
CN102521246A (en)*2011-11-112012-06-27国网信息通信有限公司Cloud data warehouse system
CN102567391A (en)*2010-12-202012-07-11中国移动通信集团广东有限公司Method and device for building classification forecasting mixed model
US20120297017A1 (en)*2011-05-202012-11-22Microsoft CorporationPrivacy-conscious personalization
US20120303559A1 (en)*2011-05-272012-11-29Ctc Tech Corp.Creation, use and training of computer-based discovery avatars

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102012912A (en)*2010-11-192011-04-13清华大学Management method for unstructured data based on cloud computing environment
CN102567391A (en)*2010-12-202012-07-11中国移动通信集团广东有限公司Method and device for building classification forecasting mixed model
US20120297017A1 (en)*2011-05-202012-11-22Microsoft CorporationPrivacy-conscious personalization
US20120303559A1 (en)*2011-05-272012-11-29Ctc Tech Corp.Creation, use and training of computer-based discovery avatars
CN102254024A (en)*2011-07-272011-11-23国网信息通信有限公司Mass data processing system and method
CN102394923A (en)*2011-10-272012-03-28周诗琦Cloud system platform based on n*n display structure
CN102521246A (en)*2011-11-112012-06-27国网信息通信有限公司Cloud data warehouse system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107042234A (en)*2017-03-152017-08-15中冶华天工程技术有限公司The intelligent production line and production method gathered based on bar whole process big data
CN108171640A (en)*2017-12-212018-06-15武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所)Marine communication system data storage system and statistical method
CN108171640B (en)*2017-12-212021-01-12武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所)Data storage system and statistical method for ship communication system
CN110134685A (en)*2019-05-062019-08-16武汉中岩测控技术有限公司A kind of monitoring method and system based on big data field automatic Mosaic algorithm
CN110355214A (en)*2019-06-242019-10-22科芃智能科技(苏州)有限公司A kind of quality stream inlet thickness storage calculation method based on most rickle
CN110348954A (en)*2019-06-252019-10-18河南科技大学A kind of complicated technology module partition method of mass customization
CN110348954B (en)*2019-06-252022-02-25河南科技大学 A Mass Customization-Oriented Partitioning Method for Complex Process Modules
CN112507098A (en)*2020-12-182021-03-16北京百度网讯科技有限公司Question processing method, question processing device, electronic equipment, storage medium and program product
CN112818048A (en)*2021-01-282021-05-18北京软通智慧城市科技有限公司Hierarchical construction method and device of data warehouse, electronic equipment and storage medium

Similar Documents

PublicationPublication DateTitle
JP6857689B2 (en) Data retrieval devices, programs, and recording media
CN103198138A (en)Large-scale hot continuous rolling data scheme customizing system based on cloud computing
CA3072514C (en)Knowledge-driven federated big data query and analytics platform
CN103984755A (en)Multidimensional model based oil and gas resource data key system implementation method and system
US20200272664A1 (en)Knowledge-driven federated big data query and analytics platform
EP3699774B1 (en)Knowledge-driven federated big data query and analytics platform
CN103425740B (en)A kind of material information search method based on Semantic Clustering of internet of things oriented
Fan et al.An integrated personalization framework for SaaS-based cloud services
CN101490675A (en)Methods and apparatus for reusing data access and presentation elements
Decourselle et al.A survey of FRBRization techniques
Valdestilhas et al.Where is my URI?
US8832601B2 (en)ETL tool utilizing dimension trees
Horsthofer-Rauch et al.Sustainability-integrated value stream mapping with process mining
CN108121760A (en)A kind of mining analysis towards OGC geographic information services data is with recommending method
JP5916974B1 (en) Data search device, program, and recording medium
Mavrogiorgou et al.A comparative study in data mining: clustering and classification capabilities
CN115687623B (en) A method and system for constructing industrial digital twin data space
Bakariya et al.Mining rare itemsets from weblog data
Pham et al.Computing domain ontology knowledge representation and reasoning on graph database
KR102488466B1 (en)Apparatus and method to design key-value database based in table diagram
KR102756152B1 (en)Electronic device and non-transitory computer-readable medium storing program code
Xiao et al.Construction of an Information Service Platform for Overseas Chinese Affairs with Digital Humanities
MartensProgress, Public
CN120670455A (en) Data interaction method and system
Wang et al.Design method of data acquisition in intelligent sensor based on web data mining clustering technology

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C02Deemed withdrawal of patent application after publication (patent law 2001)
WD01Invention patent application deemed withdrawn after publication

Application publication date:20130710


[8]ページ先頭

©2009-2025 Movatter.jp