Summary of the invention
In order to solve the above technical problems, it is an object of the invention to: provide it is a kind of be able to ascend efficiency based on standard adviseThe data cleaning method of model, device and system.
The first technical solution adopted by the present invention is:
A kind of data cleaning method based on data standard specification, comprising the following steps:
Obtain data standard specification information and data source;
Quality testing is carried out to data source according to data standard specification information, generates problem report work order and by problem reportWork order is sent to the first processing account;
After problem report work order is processed, processed problem report work order is stored in knowledge base.
Further, described that quality testing is carried out to data source according to data standard specification information, generate problem report work orderAnd problem report work order is sent to the first processing account, the step for specifically include:
According to the data standard specification of each field in data standard specification asset data source;
Addition data quality checking task, the first processing account of configuration simultaneously execute task schedule, obtain each word in data sourceThe quality measurements of section;
Problem report work order is generated according to the quality measurements of field each in data source and sends problem report work orderTo the first processing account.
Further, further comprising the steps of:
According to data standard specification information, inquiry uses identical data standard criterion and processed problem from knowledge baseReport work order.
Further, further comprising the steps of:
The first information of user's input is obtained, is searched in knowledge base comprising the first information and processed according to the first informationThe problem of report work order.
Second of technical solution adopted by the present invention is:
A kind of data cleansing device based on data standard specification, comprising:
Memory, for storing program;
Processor executes a kind of data cleaning method based on data standard specification for loading described program.
The third technical solution adopted by the present invention is:
A kind of Data clean system based on data standard specification, comprising:
Module is obtained, for obtaining data source;
Data standard specification information management module, for adding, modifying and deleting data standard specification information;
Quality detection module generates problem report for carrying out quality testing to data source according to data standard specification informationIt accuses work order and problem report work order is sent to the first processing account;
Problem report worksheet module, for handling problem report work order;
Knowledge base reports work order for inquiring and storing the problem of having handled.
Further, the quality detection module includes:
Configuration unit is mapped, for advising according to the data standard of each field in data standard specification asset data sourceModel;
Task execution schedule unit, for adding data quality checking task, the first processing account of configuration and executing taskScheduling, obtains the quality measurements of each field in data source;
Workform management unit, for generating problem report work order according to the quality measurements of field each in data source and inciting somebody to actionProblem report work order is sent to the first processing account.
Further, further includes:
Enquiry module, for according to data standard specification information, inquiry to use identical data standard criterion from knowledge baseAnd processed problem report work order.
Further, further includes:
Search module is searched in knowledge base comprising the according to the first information for obtaining the first information of user's inputOne information and processed problem report work order.
Further, the workform management unit is also used to:
Problem report work order is assigned to second processing account from the first processing account by the second information for obtaining user's inputNumber;
Or
Problem report work order, is sent to the external system of setting by the third information for obtaining user's input.
The beneficial effects of the present invention are: being carried out the present invention is based on normal data specification information to the data source that needs cleanQuality testing, and generate problem report work order and be sent to relevant processing account, when processing people completes to problem report work orderIt is complete in order to handle people's reference in follow-up data cleaning process by the storage of problem report work order into knowledge base after processingThe solution that work order is reported at the problem of processing, to promote the efficiency of data cleansing.
Specific embodiment
The present invention is further detailed with specific embodiment with reference to the accompanying drawings of the specification.
Referring to Fig.1, a kind of data cleaning method based on data standard specification, this method can be realized by computer.
It the described method comprises the following steps:
S1, data standard specification information and data source are obtained.The data standard specification information may include a plurality of rule,Processing people can increase, delete and modify to the rule in data standard criterion information according to actual needs.
S2, quality testing is carried out to data source according to data standard specification information, generates problem report work order and by problemReport work order is sent to the first processing account.During carrying out quality testing to data source, it is found that existing for data sourceThe situation that problem, i.e. discovery data source do not meet the rule in data standard specification information, problem report work order will record dataThe problems of source, for example, record N field m-th data it is problematic.Then the data problem of data source is had recordedProblem report work order can be transferred to the account of processing people, i.e., the first processing account, and the first processing account can be fixed,It is also possible to set during each data cleansing.
S3, after problem report work order is processed, will processed problem report work order be stored in knowledge base in.Wherein,It will record the solution of processing people in the problem of processing report work order.For example, the m-th data of n-th field are there are problem,Solution for this problem is to be deleted the data, merged, replacing either other operations.In this way, if rearDuring continuous data cleansing, processing people encounters similar problem, the solution before can finding, and helps to promote numberAccording to the efficiency of cleaning.
As preferred embodiment, the step S2 is specifically included:
S21, according to the data standard specification of each field in data standard specification asset data source;It will be in data sourceEach field data standard specification corresponding with each field establishes association by way of mapping.
S22, addition data quality checking task, the first processing account of configuration simultaneously execute task schedule, obtain in data sourceThe quality measurements of each field;Method in the present embodiment may be performed simultaneously multiple data cleansing tasks, it is therefore desirable to increaseIf the function of task schedule.
S23, problem report work order is generated according to the quality measurements of field each in data source and by problem report work orderIt is sent to the first processing account.In the present embodiment, data problem existing for each field is included in problem report work order.
As preferred embodiment, the solution that people uses for reference passing problem report work order, this implementation are handled for convenienceExample is further comprising the steps of:
S4, according to data standard specification information, inquiry is using identical data standard criterion and processed from knowledge baseProblem report work order.The present embodiment can be according to the processing selected data standard specification information of people, automatically from knowledge baseIt is presented with the case for using identical data standard criterion, and to user.User is allowed easily to find CROSS REFERENCESolution, to promote the efficiency of data cleansing.
It is further comprising the steps of as preferred embodiment:
S5, the first information for obtaining user's input, are searched in knowledge base comprising the first information and according to the first informationThe problem of processing, reports work order.In the present embodiment, user can be scanned for by inputting the first information, and the first information canTo be title or the format of handled data etc. of relevant field, the present embodiment can be passing identical there is no usingIn the case where the data cleansing case of data standard specification, approximation is searched in processed problem report work order using keywordData cleansing scheme, in order to handle the solution that people uses for reference passing data cleansing case, to promote the effect of data cleansingRate.
A kind of data cleansing device based on data standard specification, comprising:
Memory, for storing program;The memory can be the storage equipment such as USB flash disk, hard disk or CD.
Processor executes the data based on data standard specification of any of the above-described kind of embodiment for loading described programCleaning method.
Present embodiment discloses a kind of Data clean systems based on data standard specification, comprising:
Module is obtained, for obtaining data source;The data source can be from the data-interface, local of external systemDatabase or storage medium.
Data standard specification information management module, for adding, modifying and deleting data standard specification information;The dataStandard criterion information may include a plurality of rule, and processing people can be according to actual needs to the rule in data standard criterion informationIncreased, deleted and is modified.
Quality detection module generates problem report for carrying out quality testing to data source according to data standard specification informationIt accuses work order and problem report work order is sent to the first processing account.During carrying out quality testing to data source, it can send outExisting data source there are the problem of, i.e. the discovery data source situation that does not meet the rule in data standard specification information, problem reportWork order will record the problems of data source, for example, record N field m-th data it is problematic.Then data are had recordedThe problem of data problem in source report work order can be transferred to the account of processing people, i.e., the first processing account, the first processing accountIt number can be fixed, be also possible to set during each data cleansing.
Problem report worksheet module, for handling problem report work order;In this module, processing people can be logged in certainlyOneself account, and problem report work order is handled, for example, can pass through aiming at the problem that being pointed out in problem report work orderThe modes such as deletion, increase and modification are handled.Last solution can be stored in knowledge base with problem report work order.
Knowledge base reports work order for inquiring and storing the problem of having handled.Processing people can search in knowledge baseThere are problems that the solution of similar situation report work order, in the past to promote the efficiency of data cleansing.
This system can manage data standard specification information convenient for processing people, improve the flexibility ratio of data cleansing, andExisting problem report work order can be made full use of as the case used for reference, promote the efficiency of data cleansing.
As preferred embodiment, the quality detection module includes:
Configuration unit is mapped, for advising according to the data standard of each field in data standard specification asset data sourceModel.Mapping configuration unit establishes each field data standard specification corresponding with each field in data source by way of mappingAssociation.
Task execution schedule unit, for adding data quality checking task, the first processing account of configuration and executing taskScheduling, obtains the quality measurements of each field in data source;It is clear that system in the present embodiment may be performed simultaneously multiple dataWash task, it is therefore desirable to add the function of task schedule.
Workform management unit, for generating problem report work order according to the quality measurements of field each in data source and inciting somebody to actionProblem report work order is sent to the first processing account.In the present embodiment, number existing for each field is included in problem report work orderAccording to problem.
As preferred embodiment, the solution that people uses for reference passing problem report work order, this implementation are handled for convenienceExample further include:
Enquiry module, for according to data standard specification information, inquiry to use identical data standard criterion from knowledge baseAnd processed problem report work order.The present embodiment can according to processing the selected data standard specification information of people, automatically fromMatching uses the case of identical data standard criterion in knowledge base, and presents to user.User is easily looked forTo the solution of CROSS REFERENCE, to promote the efficiency of data cleansing.
As preferred embodiment, further includes:
Search module is searched in knowledge base comprising the according to the first information for obtaining the first information of user's inputOne information and processed problem report work order.In the present embodiment, user can be scanned for by inputting the first information, describedThe first information can be title or format of handled data of relevant field etc., and the present embodiment can be not present passingIn the case where data cleansing case using identical data standard specification, using keyword in processed problem report work orderIt is middle to search approximate data cleansing scheme, in order to handle the solution that people uses for reference passing data cleansing case, to promote numberAccording to the efficiency of cleaning.
As preferred embodiment, for the ease of problem report work order is turned processing, the workform management unit is also used to:
Problem report work order is assigned to second processing account from the first processing account by the second information for obtaining user's inputNumber;
Or
Problem report work order, is sent to the external system of setting by the third information for obtaining user's input.
The present embodiment neatly assignment problem can report that work order, to handle, can also will be asked to different processing peopleTopic report work order is sent to external system.
For the step number in above method embodiment, it is arranged only for the purposes of illustrating explanation, between stepSequence do not do any restriction, the execution of each step in embodiment sequence can according to the understanding of those skilled in the art come intoRow is adaptively adjusted.
It is to be illustrated to preferable implementation of the invention, but the present invention is not limited to the embodiment above, it is ripeVarious equivalent deformation or replacement can also be made on the premise of without prejudice to spirit of the invention by knowing those skilled in the art, thisEquivalent deformation or replacement are all included in the scope defined by the claims of the present application a bit.