Storehouse system and method is cut in a kind of ETL operation automaticallyTechnical field
The present invention relates to field of computer technology, particularly relate to a kind of ETL operation and automatically cut storehouse system and method.
Background technology
ETL operation refer to utilize ETL instrument by distribution, data (as relation data, flat data file etc.) in heterogeneous data source carry out after being drawn into interim middle layer cleaning, change, integrated, finally be loaded in data warehouse or Data Mart, become the basis of on-line analytical processing, data mining.
System important at present all can have storage facility located at processing plant and backup library (BCV storehouse), and backup library deposits the data of taking pictures of producing a certain moment, as emergent use.ETL instrument is all the peak time avoiding data manipulation obtain instant data of taking pictures from storage facility located at processing plant mostly, but in the epoch that present information is highly developed, the data manipulation of storage facility located at processing plant is quite frequent, the ETL operation that the extraction time can be caused longer is due to dirty mistake of reading the newspaper, thus can not take out data, only have by being switched to backup library extracted data manually.Which at substantial manpower, implementation efficiency is low, and cannot ensure stability and the reliability of operation.
Summary of the invention
Technical matters to be solved by this invention is for the deficiencies in the prior art, provides a kind of ETL operation automatically to cut storehouse system and method, realizes in ETL operation, and database automatically switches to ensure that ETL operation completes smoothly.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: storehouse system is cut in a kind of ETL operation automatically, comprises storage facility located at processing plant, backup library, ETL Job execution module, ETL operation scan module, backup library monitoring module and ETL and cuts storehouse processing module;
Described storage facility located at processing plant, it is for providing data source for ETL operation;
Described backup library, it is for the data in schedule backup storage facility located at processing plant, when ETL operation provides data source when mistake appears in the operation of storage facility located at processing plant for the ETL operation reported an error;
Described ETL Job execution module, it carries out ETL process for extracting desired data from storage facility located at processing plant, cut according to ETL the storehouse triggering command of cutting that storehouse processing module sends when the operation of storage facility located at processing plant reports an error and the operand of the ETL operation that reports an error is switched to backup library by storage facility located at processing plant, and then according to the manual intervention information amendment job configuration information that ETL cuts storehouse processing module transmission, in backup library, perform the ETL operation that reports an error;
Described ETL operation scan module, it performs the situation of each ETL operation for periodic scanning ETL Job execution module, when find that there is report an error ETL operation time, generate manual intervention information according to the job information of the ETL operation that reports an error and send to ETL to cut storehouse processing module;
Described backup library monitoring module, it is verified for the regular synchronous situation to backup library, and sends to ETL to cut storehouse processing module the status information of backup library;
Described ETL cuts storehouse processing module, it is for after the manual intervention information receiving the transmission of ETL operation scan module, the state of backup library is judged according to backup library status information, when backup library is in data syn-chronization completion status, generation is cut storehouse triggering command and is sent to ETL to cut storehouse processing module, when ETL Job execution module complete cut storehouse operation after, manual intervention information is sent to ETL Job execution module.
The invention has the beneficial effects as follows: the present invention is after operation reports an error, automatically Job Operations object is switched to backup library by storage facility located at processing plant, ensure that operation is normally extracted, thus the stability of data pick-up can be ensured, support the data promptness of down-stream system, save human cost, significantly improve work efficiency, make system maintenance hommization more, intellectuality.
On the basis of technique scheme, the present invention can also do following improvement.
Further, before described ETL cuts storehouse processing module manual intervention information also for this operation of insertion after storehouse operation is cut in the execution of ETL Job execution module, detect this ETL operation that reports an error according to the ETL Job execution situation that reports an error of ETL operation scan module acquisition and whether there is successfully record, if existed, the operand of the ETL operation that reports an error is switched back storage facility located at processing plant by backup library, if there is no then detect this ETL operation that reports an error further whether to run, if run, wait for, judge whether again to run successfully until run, if run successfully, the operand of the ETL operation that reports an error is switched back storage facility located at processing plant by backup library, otherwise control ETL Job execution module removes already present temporary file and manual intervention information, insert the manual intervention information of this operation, and then ETL Job execution module to report an error ETL operation according to this manual intervention information and executing.
Adopt the beneficial effect of above-mentioned further scheme: before the manual intervention of inserting this operation, remove temporary file, to ensure extracted data accuracy, avoid because the temporary file of front subjob also being calculated into when again running after operation failure, and then cause the generation of repeating data; Manual intervention information before cleaning is to ensure the correctness that manual intervention information is inserted, and avoids causing reporting an error of database unique constraints because repeatedly inserting identical manual intervention information.
Further, described ETL operation scan module is also for reporting an error ETL operation when backup library is complete when scanning, cut storehouse processing module to ETL and send the notice that the ETL operation process that reports an error completes, described ETL cuts notice that storehouse processing module sends according to ETL scan module and generates and cut storehouse triggering command and send to ETL Job execution module, ETL Job execution module by the operand of ETL operation by backup library switchback storage facility located at processing plant.
Further, described ETL Job execution module memory contains allocation list, according to manual intervention information amendment allocation list corresponding contents, performs in backup library.
Further, described ETL operation scan module detect simultaneously more than one report an error ETL operation time, generate job number list by the storage facility located at processing plant of the ETL Job Operations that respectively reports an error, process each ETL operation that reports an error respectively.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: storehouse method is cut in a kind of ETL operation automatically, comprises the steps:
Step 1, periodic scanning each ETL Job execution situation;
Step 2, judges whether that existence reports an error ETL operation, performs step 3, otherwise return step 1 if existed;
Step 3, the job information according to the ETL operation that reports an error pieces together out manual intervention information;
Step 4, the data syn-chronization situation of periodic reinvestigation backup library, judges the duty of backup library, if be in data syn-chronization completion status, performs step 5; If data syn-chronization does not complete, wait for, until data syn-chronization completes and then performs step 5;
Step 5, generation is cut storehouse triggering command and the operand of the ETL operation that reports an error is switched to backup library by storage facility located at processing plant, and Qie Ku completes according to manual intervention information amendment job configuration information, performs ETL operation in backup library.
On the basis of technique scheme, the present invention can also do following improvement.
Further, perform to report an error before ETL in backup library in step 5 and also need to proceed as follows:
Step 51, obtains the ETL Job execution situation that reports an error, detects this ETL operation that reports an error and whether there is successfully record, if existed, performs step 55; If there is no then step 52 is performed;
Whether step 52, detect this ETL operation that reports an error and run, if run, wait for, judge whether to run successfully until run again, if run successfully, performs step 55; Otherwise perform step 53;
Step 53, removes already present temporary file and manual intervention information, and the manual intervention of inserting this operation is carried out, and then according to manual intervention information amendment job configuration information, in backup library, performs the ETL operation that reports an error;
Step 54, judges whether the ETL operation that reports an error runs succeeded in backup library, if success, performs step 55; Otherwise return step 53;
Step 55, switches back storage facility located at processing plant by backup library, process ends by the operand of the ETL operation that reports an error.
Further, technique scheme also comprises and reports an error ETL operation when backup library is complete when scanning, and generates and cuts storehouse trigger command by the operand of the ETL operation that reports an error by backup library switchback storage facility located at processing plant.
Further, trigger according to manual intervention information amendment allocation list corresponding contents the ETL operation that reports an error to perform in backup library.
Further, technique scheme also comprise when to be checked through simultaneously more than one report an error ETL operation time, generate job number list by the storage facility located at processing plant of the ETL Job Operations that respectively reports an error, process each ETL operation that reports an error respectively.
Accompanying drawing explanation
Fig. 1 is that storehouse system chart is cut in a kind of ETL operation of the present invention automatically;
Fig. 2 is that storehouse method flow diagram is cut in a kind of ETL operation of the present invention automatically.
In accompanying drawing, the list of parts representated by each label is as follows:
100, storage facility located at processing plant, 200, backup library, 300, ETL Job execution module, 400, ETL operation scan module, 500, backup library monitoring module, 600, ETL cuts storehouse processing module.
Embodiment
Be described principle of the present invention and feature below in conjunction with accompanying drawing, example, only for explaining the present invention, is not intended to limit scope of the present invention.
As shown in Figure 1, storehouse system is cut in a kind of ETL operation automatically, comprises storage facility located at processing plant 100, backup library 200, ETL Job execution module 300, ETL operation scan module 400, backup library monitoring module 500 and ETL and cuts storehouse processing module 600; Described storage facility located at processing plant 100, it is for providing data source for ETL operation; Described backup library 200, it is for the data in schedule backup storage facility located at processing plant 100, when ETL operation provides data source when mistake appears in the operation of storage facility located at processing plant 100 for the ETL operation reported an error; Described ETL Job execution module 300, it carries out ETL process for extracting desired data from storage facility located at processing plant 100, when the operation of storage facility located at processing plant 100 reports an error according to ETL cut storehouse processing module 600 send storehouse triggering command of cutting the operand of the ETL operation that reports an error is switched to backup library 200 by storage facility located at processing plant 100, and then according to the manual intervention information amendment job configuration information that ETL cuts storehouse processing module 600 transmission, in backup library 200, perform the ETL operation that reports an error; Described ETL operation scan module 400, it performs the situation of each ETL operation for periodic scanning ETL Job execution module 300, when find that there is report an error ETL operation time, generate manual intervention information according to the job information of the ETL operation that reports an error and send to ETL to cut storehouse processing module 600; Described backup library monitoring module 500, it is verified for the regular synchronous situation to backup library 200, and sends to ETL to cut storehouse processing module 600 status information of backup library 200; Described ETL cuts storehouse processing module 600, it is for after the manual intervention information receiving the transmission of ETL operation scan module 400, the state of backup library is judged according to backup library 200 status information, when backup library is in data syn-chronization completion status, generation is cut storehouse triggering command and is sent to ETL to cut storehouse processing module, when ETL Job execution module complete cut storehouse operation after, manual intervention information is sent to ET L Job execution module 300.
Before described ETL cuts storehouse processing module 600 manual intervention information also for this operation of insertion after storehouse operation is cut in the execution of ETL Job execution module 300, detect this ETL operation that reports an error according to the ETL Job execution situation that reports an error of ETL operation scan module 400 acquisition and whether there is successfully record, if existed, the operand of the ETL operation that reports an error is switched back storage facility located at processing plant by backup library, if there is no then detect this ETL operation that reports an error further whether to run, if run, wait for, judge whether again to run successfully until run, if run successfully, the operand of the ETL operation that reports an error is switched back storage facility located at processing plant by backup library, otherwise control ETL Job execution module 300 removes already present temporary file and manual intervention information, insert the manual intervention information of this operation, and then ETL Job execution module 300 to report an error ETL operation according to this manual intervention information and executing.
Described ETL operation scan module 400 is also for reporting an error ETL operation when backup library 200 is complete when scanning, cut storehouse processing module 600 to ETL and send the notice that ETL operation process completes that reports an error, described ETL cuts notice that storehouse processing module 600 sends according to ETL scan module and generates and cut storehouse triggering command and send to ETL Job execution module 300, ETL Job execution module 300 by the operand of ETL operation by backup library 200 switchback storage facility located at processing plant 100.
Store job information allocation list in described ETL Job execution module 300, according to manual intervention information amendment allocation list corresponding contents, perform in backup library.Described manual intervention information can customize time of the ETL job run that reports an error, frequency and number of times.
Described ETL operation scan module 400 is checked through simultaneously more than one report an error ETL operation time, generate job number list by the storage facility located at processing plant of the ETL Job Operations that respectively reports an error, process each ETL operation that reports an error respectively.
As shown in Figure 2, storehouse method is cut in a kind of ETL operation automatically, comprises the steps:
Step 1, periodic scanning each ETL Job execution situation;
Step 2, judges whether that existence reports an error ETL operation, performs step 3, otherwise return step 1 if existed;
Step 3, the job information according to the ETL operation that reports an error pieces together out manual intervention information;
Step 4, the data syn-chronization situation of periodic reinvestigation backup library, judges the duty of backup library, if be in data syn-chronization completion status, performs step 5; If data syn-chronization does not complete, wait for, until data syn-chronization completes and then performs step 5;
Step 5, generation is cut storehouse triggering command and the operand of the ETL operation that reports an error is switched to backup library by storage facility located at processing plant, and Qie Ku completes according to manual intervention information amendment job configuration information, performs ETL operation in backup library.
Also need to proceed as follows before performing the ETL that reports an error in step 5 in backup library:
Step 51, obtains the ETL Job execution situation that reports an error, detects this ETL operation that reports an error and whether there is successfully record, if existed, performs step 55; If there is no then step 52 is performed;
Whether step 52, detect this ETL operation that reports an error and run, if run, wait for, judge whether to run successfully until run again, if run successfully, performs step 55; Otherwise perform step 53;
Step 53, removes already present temporary file and manual intervention information, and the manual intervention of inserting this operation is carried out, and then according to manual intervention information amendment job configuration information, in backup library, performs the ETL operation that reports an error;
Step 54, judges whether the ETL operation that reports an error runs succeeded in backup library, if success, performs step 55; Otherwise return step 53;
Step 55, switches back storage facility located at processing plant by backup library, process ends by the operand of the ETL operation that reports an error.
Examine again after Qie Ku whether to exist and look into successfully record, do not change for guaranteeing to cut job state in the process of storehouse, because consider and people may be had to have processed the ETL operation that reports an error by hand, ETL job state has been become successfully, so do not need again to perform in backup database, therefore switchback storage facility located at processing plant again.
Technique scheme also comprises and reports an error ETL operation when backup library is complete when scanning, and generates and cuts storehouse trigger command by the operand of the ETL operation that reports an error by backup library switchback storage facility located at processing plant.
Trigger according to job configuration information in manual intervention information amendment allocation list the ETL operation that reports an error to perform in backup library.Temporary file generates in job run process, the data extracted constantly is appended in temporary file in data extraction process in database, and job run success, temporary file no longer increases and namely automatically converts formal file to.Described job configuration information comprises some information of job run, such as job number, database, extracts table name, spanned file name, start time, end time etc.Described manual intervention information can customize time of ETL job run, frequency and number of times.
Technique scheme also comprise when to be checked through simultaneously more than one report an error ETL operation time, generate job number list by the storage facility located at processing plant of the ETL Job Operations that respectively reports an error, process each ETL operation that reports an error respectively.
The present invention can realize detecting automatically the database meeting application needs, thus automatically connects, and from database, extract the data required for application.
The technical program core comprises three parts: the ETL job information that 1) reports an error scans; 2) backup library (BCV storehouse) synchronous situation is verified; 3) automatically cut storehouse and insert intervention information.
1) the ETL job information that reports an error scans
A marginal time point is evaluated, timing scan ETL operation performance, if do not report an error ETL operation according to the instantaneity of operation, then continue scanning, the ETL operation if reported an error, the job information according to the ETL operation that reports an error risks manual intervention information, for ETL operation of reforming.
Wherein, manual intervention information simulation people is forced service operation, is a trigger switch of ETL Job execution, and manual intervention information can enforceable operation ETL operation.Just can time of self-defined job run, frequency and number of times etc. as long as insert intervention information.
2) backup library (BCV storehouse) synchronous situation is verified
Backup library all carries out every day synchronously, but backup library also exists the lock in time of situation early or late, so first confirm the synchronous situation of backup library before inserting manual intervention information, prevents from not reaching expected effect by after database switching.This inspection is quantitative check, checks at regular intervals once, prepares for cutting storehouse.
3) automatically cut storehouse and insert manual intervention information
After the instruction that acquisition backup library synchronously completes, just the storage facility located at processing plant that relevant operation is extracted is switched to backup library, (consider when guaranteeing that this operation does not really have successful charge book and do not run and people may be had to have processed the ETL operation that reports an error by hand, ETL job state has been become successfully, so just do not need again to perform in backup database), insert manual intervention information and operation has been adjusted extracted data again, after job run, timing tracking is carried out to job run situation, to guarantee that operation can successfully terminate, after end by the operand of ETL operation more again switchback storage facility located at processing plant (because backup library exists synchronous evening, compared to storage facility located at processing plant, the integrality of data is slightly poor, so preferentially extract from storage facility located at processing plant when extracting).
In allocation list, record comprises the corresponding relation of storage facility located at processing plant and backup library, such as A storehouse corresponding A storehouse BCV, the corresponding B storehouse BCV in B storehouse.
Generating job number list by storehouse is because different operations may be carry out extracting from different storage facility located at processing plants, and the backup library that different storage facility located at processing plants is corresponding different, this step will be switched to different BCV storehouses to distinguish different work.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.